Monitoring Databases to Prevent Performance & Availability Outages
Submitted by Bhupinder Singh (@bhupinder) on Friday, 9 March 2018
This presentation covers OS/Infrastructure monitoring but nobody monitors their OS manually.
Most deployments ue Nagios etc. Then why another talk on monitoring , something which everyone knows about. Usually what a sysadmin sees is probaly an isolated event , but somone needs to understand the whole picture ,
what would be the effect of IO slowdown on the database , what all can you expect.
Based on my work in providing Database Support, the idea is to talk with attendees about what we see most commonly in the Service Requests. How they can avoid some of those failure scenarios or at least detect them and minimize the downtime.
Key takeaway is “what are the signs to look for to prevent or minimize outages”.
Most of the content is applicable to databases in genral , but I would also talk about things which are Mysql specific.The last part of this presentation would have some details from MySQL Performance Schema and MySQL Enterprise Monitoring.
A. Why monitor :
a. Database is essential to most application
b. Application server can be redundant, can be spawned any time
c. Database is a limited resource , non-redundant most of the times
d. What can effective monitoring achieve
B. Outage Scenarios:
a. Performance Outage Scenarios: Database is alive
i. Unexpected load ii. Spiked latency iii. runaway queries iv. Unexpected Locking
b. Availability Outage Scenarios: Database has crashed/shutdown
i. Disk is full ii. Network is down iii. Seg(6//11) Crash/ Bugs iv. Out of Memory aka OOM v. Server Code Bugs
C. What to Monitor ?:
CPU & Threads : OS threads vs SQL sessions/connections Number of threads Thread states Running vs blocked CPU Load
IO : Load Usage Latency Disk Errors, multi-path failures
File System: %use of different mount points esp “/tmp” FS errors
OS/SYSTEMS: dmesg (-T if available) /var/log /proc/meminfo /proc/<pid of mysqld>/status (VM*) Paging/swapping activity Background flushing
stack traces in mysqld error log file , lock status
socket status dropped packets/collisions
D. Monitoring for customers:
SQL: long running queries untimely queries locking duration/timeouts/deadlocks/pending transactions/lock waits data volume chnages/rows examined vs returned concurrency temp disk usage/On-disk sorting semaphore waits/pending flushes/fsync connections/network disconnects/aborted connects/aborted clients/max_connections backups & archive logging (binary logging)
MySQL Specific: performance_schema, MEM
The speaker is a DBA/Support Engineer with experience in support mission critical databases. As a MySQL Support Engineer for Oracle he is involved in troubleshooting availability and performance issues for large clients especially in the telecom domain.