Rootconf proposals for round the year in 2018

On DevOps, security, cloud and IT infrastructure

Monitoring Databases to Prevent Performance & Availability Outages

Submitted by Bhupinder Singh (@bhupinder) on Friday, 9 March 2018

videocam_off

Technical level

Intermediate

Section

Full talk

Status

Submitted

Vote on this proposal

Login to vote

Total votes:  +1

Abstract

This presentation covers OS/Infrastructure monitoring but nobody monitors their OS manually.
Most deployments ue Nagios etc. Then why another talk on monitoring , something which everyone knows about. Usually what a sysadmin sees is probaly an isolated event , but somone needs to understand the whole picture ,
what would be the effect of IO slowdown on the database , what all can you expect.

Based on my work in providing Database Support, the idea is to talk with attendees about what we see most commonly in the Service Requests. How they can avoid some of those failure scenarios or at least detect them and minimize the downtime.

Key takeaway is “what are the signs to look for to prevent or minimize outages”.

Most of the content is applicable to databases in genral , but I would also talk about things which are Mysql specific.The last part of this presentation would have some details from MySQL Performance Schema and MySQL Enterprise Monitoring.

Outline

A. Why monitor :

a. Database is essential to most application
b. Application server can be redundant, can be spawned any time
c. Database is a limited resource , non-redundant most of the times
d. What can effective monitoring achieve

B. Outage Scenarios:

a. Performance Outage Scenarios: Database is alive
i. Unexpected load ii. Spiked latency iii. runaway queries iv. Unexpected Locking

b. Availability Outage Scenarios: Database has crashed/shutdown
i. Disk is full ii. Network is down iii. Seg(6//11) Crash/ Bugs iv. Out of Memory aka OOM v. Server Code Bugs

C. What to Monitor ?:

Internal infrastructure:
CPU & Threads : OS threads vs SQL sessions/connections Number of threads Thread states Running vs blocked CPU Load

IO : Load Usage Latency Disk Errors, multi-path failures

File System: %use of different mount points esp “/tmp” FS errors

OS/SYSTEMS: dmesg (-T if available) /var/log /proc/meminfo /proc/<pid of mysqld>/status (VM*) Paging/swapping activity Background flushing

Process:
stack traces in mysqld error log file , lock status

Network:
socket status dropped packets/collisions

D. Monitoring for customers:

SQL: long running queries untimely queries locking duration/timeouts/deadlocks/pending transactions/lock waits data volume chnages/rows examined vs returned concurrency temp disk usage/On-disk sorting semaphore waits/pending flushes/fsync connections/network disconnects/aborted connects/aborted clients/max_connections backups & archive logging (binary logging)

MySQL Specific: performance_schema, MEM

Speaker bio

The speaker is a DBA/Support Engineer with experience in support mission critical databases. As a MySQL Support Engineer for Oracle he is involved in troubleshooting availability and performance issues for large clients especially in the telecom domain.

Slides

https://www.dropbox.com/s/eqqbktcsfcno2v9/Monitoring_Alerts_Databases.pdf?dl=0

Comments

Login with Twitter or Google to leave a comment