|
Keep These Databases Running.
(Achieving highest level of database business
continuity)
|
|
Content
Introduction..
Overview..
The
Challenge.
Database Administration Functions.
Monitoring, Detection and Alerting..
Containment and Resolution..
Root Cause and Performance/Scalability Analysis.
Applying DBA InfoPower, Inc. Products to the Process.
Monitoring, Detection and Alerting..
Containment and Resolution..
Root Cause and Performance/Scalability Analysis.
Conclusion
About the Author.
About
DBA InfoPower, Inc.
|
Introduction |
Achieving high database business continuity in a cost effective way
can sometimes be compared to a catch 22 situation: with the current technology
dominated by hardware-rich cluster solutions, companies need lots of
money to reach high database uptime while attempting not to spend tons
of money in order to stay cost effective.
One common mistake made in effort to increase database business continuity
is placing 100% effort into creation of high availability and fault tolerant
physical architectures.
While Gartner reports that high percentage of database availability
is lost due to physical reasons (hardware failures), at least 15% of
it is attributed to logical unavailability caused by human factor.
For active business environments that constantly require new database
code deployment and database re-architecture (e.g., online retailers,
banks involved in securities trading, telecommunications companies, etc.),
the cost of human factor is significantly higher. Therefore, knowledge
and control over human factor and changes introduced by its existence
can result in huge benefits to the company and to the stability of its
operational environment.
|
| Overview |
| |
What is achieving high database business continuity anyway? The main
thing – we need to understand what is our goal. What exactly we
are trying to accomplish? Based on the industry’s many publications,
best practices and expert opinions, we can define the following set of
goals:
| 1) |
Identify database related problems impacting business functions |
| 2) |
Proactively contain and resolve these issues |
| 3) |
Meet business response time and availability SLAs |
| 4) |
Have full information on planned and on-going business changes
and understanding of their impact on databases |
| 5) |
Have early identification and ability to forecast scalability and
capacity needs |
| 6) |
Avoid unnecessary spending on hardware, software licenses and human
labor. |
|
| The Challenge |
| |
Oracle and DB2 databases are two of the most reliable and configurable
database engines used by businesses and governments these days.
On the positive side, both databases can support any required business
model and with the right architecture can scale to satisfy extremely
fast expansion of business activity.
On the negative side, application growth, upgrades and patches, constant
growth in number of supported databases, changing user base and changing
business needs place a heavy load on database support personnel in their
goal of achieving high business continuity.
In addition, the underlying hardware and software infrastructures are
constantly changing with the introduction of new hardware servers, front-ends,
and application servers only adding to the above challenge.
An analyst from the Gartner Group writes, “At the core of business
data for most production applications is a relational database management
system (RDBMS). RDBMS monitoring and administration has evolved into
a highly specialized market…” This is a highly specialized
market because providing and sustaining high database continuity is not
an easy task. It cannot and should not be approached without adequate
strategy, supporting procedures and products in place.
The mission of this paper is to outline a successful methodology and
strategy for achieving superior database business continuity, supported
by a software that is written by domain experts and applied with great
success across multiple clients and industries.
|
| Database
Administration Functions |
The following sections describe important database administration functions
directed at reduction of critical application downtime due to catastrophic
database events and improvement of database performance. They also cover
the key requirements for a product capable of aiding a DBA in performing
those functions.
|
| Monitoring,
Detection and Alerting |
The first task in monitoring is to identify the right set of performance
characteristics to monitor. While a number of characteristics is generic
and can be used across databases, each database has specifics related
to the application and business functions it supports.
In order to identify these database-specific performance characteristics,
it is critical to use data received during root cause analysis of database
outages and performance analysis and to include leading performance characteristics
into the monitored set.
List of monitored metrics can include combination of database and OS
metrics (including kernel statistics), derivative metrics generated by
custom SQL and custom scripts.
Another important task is creation of the database performance baseline.
While use of an absolute baseline is helpful on very stable databases
(which is seldom the case in real production environments), it quickly
diminishes when application profile, business functions or usage of database
changes. As a result baselines needs to be frequently re-evaluated and
re-established.
During the period when the old baseline is no longer valid and a new
baseline is not established yet, judgment on database load and performance
can be very subjective.
Resolution of this situation is introduction of automatic baseline generation
into database monitoring. As a result, the baseline is constantly and
automatically adjusted according to the database behavior and is valid
100% of the he time. As an additional benefit, it can be normalized to
fit the range of, say 0 to 1, so any significant change in the baseline
will be spotted as a drop from 1 to 0.5 or 0 or rise from 0 to 0.5 or
1.
In addition to automatic metric baseline, real time statistical processing
can be used to identify accumulation of small changes in system or SQL
metric behavior and proactively alert of issues that can have potential
impact on database business continuity. Real time alerting allows the
DBA to take necessary actions to prevent imminent database problems.
It also enables automatic or semi-automatic containment and resolution
of such problems.
Other important components of software supporting these key DBA functions
are listed below:
 |
Trend clarification using smoothing filers with moving
averages |
 |
Critical conditions alerts |
 |
Alert voice notification for crowded operations rooms and busy
DBAs |
 |
Secured connection to databases |
 |
Easy deployment across databases |
 |
Visual features – mixing/overlapping – RAC nodes overlap/DB2
EEE nodes overlap/comparison |
 |
Ability to mix / visually correlate databases in heterogeneous
environment |
 |
Use of Monitoring Dashboard to consolidate and monitor many databases
at once |
|
| Containment
and Resolution
|
When critical situation is identified, in many cases DBA has
from two to five minutes to take corrective action before the database
becomes unresponsive and has to be restarted or failed over.
In these situations typing SQL commands or running scripts to find out “culprit” is
in no way effective action to restore database availability to the business.
Unfortunately this is how such situations dealt with nowadays. Sometimes
3rd party GUI tools are used, but only to terminate sessions explicitly
specified by a DBA.
As a sound alternative, expert level software needs to be in place that
can contain issue literally in seconds and resolve it automatically (or
semi-automatically) in no more then three steps (containment, action
request and resolution). In many cases this can be an automatic resolution
of blocking locks or termination of resource consuming unauthorized SQL,
bad SQL with the suddenly changed execution plan, etc.
In some cases a very “small” DBA or user action can cause
significant impact on database continuity (e.g., collecting statistics
on a very busy table). Such factors are usually overlooked when general
resolution approach is used.
An immediate corrective action would be group elimination of database
connections or logical “mark down” of the business component
impacting overall database continuity.
|
| Root
Cause and Performance/Scalability Analysis
|
|
After an immediate danger to database environment is
eliminated, next step is to perform root cause analysis on what exactly
changed, understand magnitude and timing of the change.
This information is vital for deep understanding and ability to correlate
database level events with the potential changes in business activity,
software and/or user behavior.
Change capture should cover all instrumented database performance areas,
such as wait events, system statistics, latches, I/O, UNDO, individual
SQL, changed SQL Plans, etc (for Oracle). For DB2, it includes numerous
instance and database level metrics, I/O metrics on buffer pools, table
spaces and tables levels.
Other important components of root cause analysis and performance/scalability
analysis are:
|
Seasonality identification - Ability to understand patterns
of changes in database behavior, i.e. for example, to identify if database
resource consumption happens regularly, or if they started only number
of days ago.
|
 |
Combined metrics - Ability to combine multiple classes of database metrics
for visual correlation |
 |
Cross Database combined metrics - Ability to combine database metrics
across databases for visual correlation. This ability is extremely important
for both ‘share nothing” database systems (DB2 EEE or replicated
databases) and “share everything” systems (Oracle RAC/OPS) |
 |
Smoothing capability – ability to apply smoothing filters on database
metrics allow for clear trend identification. Combined with the linear
regression DBA can receive reliable indication of change in database resource
consumption and potential impact on database business continuity. |
 |
Automated reporting - capability that allows automated generation of
change capture reports, combined database metrics reports across all required
databases |
|
|
DBA InfoPower. Inc. Products
|
|
To employ best strategy in achieving high database business continuity,
DBA InfoPower, Inc. created a product line of solutions intended to facilitate
successful execution of methods listed above for every DBA.
DBA InfoPower, Inc. offers DBA Heartbeat (for Oracle, DB2 and MySQL),
database real time monitoring and proactive alerting component, DBAct
(for Oracle -GA and DB2 - beta release) - real time problem containment
and resolution product and DBA Performance Explorer-I (for Oracle -GA and
DB2 - beta release) – root cause analysis and performance/scalability
analysis product that assists clients in daily task of achieving high database
business continuity.
The great benefit of DBA InfoPower products is that DBA Heartbeat and
Performance Explorer are heterogeneous cross-database products that allow
a DBA to have maximum efficiency in supporting complex multi-database multi-vendor
environments.
|
|
Applying
DBA InfoPower, Inc. Products to the Process |
|
This section will uncover successful application of strategy in achieving
high database business continuity using DBA InfoPower, Inc. products.
|
| Monitoring,
Detection and Alerting
|
DBA PHB– PROactive
Heartbeat provides the DBA with a great power to proactively identify
and keep alerted on issues that can seriously impact business continuity.
The following is DBA PHB setup and work flow:
| Step1: |
Identify set of metrics to monitor. The following can
be done by:
| a) |
Selecting prepackaged set of metrics prepared
by DBA Infopower experts |
| b) |
Creating custom metric set aligned with the enterprise
business usage patterns |
| c) |
Utilizing DBA Performance Explorer-I product to identify
performance metrics that are directly associated with database
continuity threatening events and therefore good candidates
for monitoring. |
|
| Step 2: |
Once the metrics to monitor were identified, a DBA Heartbeat™ Agent
Builder™ module is used to design and create monitoring agent
definition. |
| Step 3: |
DBA Heartbeat™ Alert Builder™ is used to create and
set alert conditions. Alert Builder™ sets visual and voice
alerts per agent metric, its moving average and an automatic metric
baseline. |
| Step 4: |
DBA Heartbeat™ Connection Manager™ is used to deploy
agents across database servers. Mass agent deployment and agent activation/deactivation
is accomplished by utilizing a single point of control. |
| Step 5: |
DBA Heartbeat Console™ connects to the agents and begins
real-time monitoring and alerting |
DBA Heartbeat™ also provides the DBA with
the following advanced real-time monitoring features:
 |
Simple and effective Agent Builder supporting database
metrics, OS metrics, custom script metrics, custom SQL metrics |
 |
Alert Builder with custom message and voice alert capabilities |
 |
Easy to identify alert message panel |
 |
Effective Connection Manager acting as a single point of control
for connection configuration and agent deployment, activation and
deactivation |
 |
Secured communication with agents over customized SSH/SSL protocol |
 |
Automated metric baseline coupled with proactive problem identification
algorithm |
 |
Smoothing filters for clear trend identification |
 |
Ability to mix monitored metrics across databases and database
platforms on the same monitoring panel |
 |
Ability to consolidate and monitor multiple database servers as
a logical cluster |
 |
Ability to consolidate and monitor multiple database servers on
a dashboard alert panel with instant visualization of alerted metric |
 |
Portability and support of many hardware platforms - written in
Java. |
|
| Containment and Resolution
|
DBAct - Provides the DBA with a great power
to contain and instantly resolve issues threatening database business
continuity. This command-line cross platform module weapons DBA with
over 80 actionable functions necessary to identify and automatically
or semi-automatically resolve cases of extreme contention and resource
consumption.
Examples:
“dbact killblock now” – eliminate
blocking locks
“dbact cleanswap now” – eliminate idle/unused sessions consuming
swap space
|
| Root
Cause and Performance Scalability Analysis
|
DBA PEi -Performance
Explorer-I is a complete root cause analysis tool that enables a lightning
speed discovery of database faults/overload causes, replacing difficult
and time consuming manual performance analysis and report generation.
Performance Explorer-I features:
| 1) |
Automatic period comparison and change capture
- Allows quick identification of the root cause of any changes threatening
database
continuity and performance characteristics. It eliminates the need for
eyeballing across dozens of reports to find a problem that needs to be
fixed NOW! |
| 2) |
High performance visualization – Tens of thousands
time points are rendered in seconds. A whole quarter of data can be visualized
with 5-minute snapshot granularity blazingly FAST! |
| 3) |
Overlapping view of current and historic data – Quickly
identifies if database behavior is normal or anomalous. |
| 4) |
Performance prognosis - Calculating and visualizing
regression trends reduces complex data view to a manageable and understandable
form that can be used to generate clear headroom and capacity prognoses. |
| 5) |
Data Smoothing -. Sophisticated, tunable filtering smoothes
data to clarify performance trends. |
| 6) |
SQL level Detail - Drill Down to the level of individual
SQL queries to determine how a change in different SQL characteristics
is affecting the system. Similar queries can be aggregated to display overall
system impact! |
| 7) |
Full performance management – All statistics,
wait events, latches, SQL and I/O performance data collected |
| 8) |
Full batch capability – Run analysis in batch
and in parallel across hundreds of databases. Instantly generate HTML and
graphical reports. |
| 9) |
Portability and support of many platforms - Written
in Java |
|
| Conclusion
|
Providing high database business continuity is not an easy task
and requires thoughtful knowledge of database internals and specifics of
business environment. When it comes to managing this task, DBA InfoPower,
Inc. provides complete solution line of best quality components that will
automate monitoring and alert identification tasks, provide powerful problem
containment and resolution module as well as automated solution for root
cause analysis and performance/scalability analysis. DBA InfoPower, Inc.
will provide technology you need to feel confident in a hard task of securing
high database business continuity and availability.
|
| About the Author
|
Ron Warshawsky is a principal technologist and founder of DBA InfoPower
Inc. He has over 12 years of technical experience working with Oracle/DB2
databases, starting from beginning to principal positions in database and
high availability architecture. He designed and implemented numerous database
high availability and fault tolerant / fault preventive solutions that
serve Fortune 100 and leading e-commerce companies.
|
| About DBA InfoPower, Inc.
|
DBA InfoPower, Inc. is an emerging leading provider of database
business continuity solutions. Our products give our clients a significant
boost in business continuity of critical database-centric systems, driving
up business availability numbers and significantly reducing management
and maintenance costs related to identification, prevention, containment,
and root cause analysis of database related problems. Founded in 2001 and
based in Santa Clara, California, DBA InfoPower has offices in Boston,
MA and Yardley, PA. Currently in an expansion stage, we closely work with
500+ trial business clients, who are utilizing our technology.
|
| |
|