GoDaddy: MySQL Telemetry Deep Dive and Performance Optimizations
Role: Sr Dir of SRE (Observability & ITSM)
Overview: Tasked with stabilizing our MySQL fleet according to best practices.
Situation: GoDaddy’s large and diverse MySQL fleet suffered from inconsistent configurations, lack of standardized telemetry, and outdated management practices. This led to performance issues, operational inefficiencies, and unnecessary hardware and operational costs.
Task: To review and get healthy across the entire MySQL fleet in an aim to improve reliability, normalize configurations according to best practices, bring all hosts under standardized automated deployment and fleet management tooling, identify and retire underutilized or aged high risk hosts, and deploy comprehensive telemetry packages to improve visibility and proactive management.
Action:
- Conducted a comprehensive audit of the MySQL fleet to identify inconsistencies and areas for improvement.
- Normalized configurations and brought the fleet under standardized automated deployment and fleet management tooling (e.g., Rundeck).
- Identified MySQL hosts for retirement, resulting in significant hardware and operational cost savings.
- Designed and deployed standard telemetry packages across the fleet to greatly increase visibility into performance and health.
- Integrated with ServiceNow CMDB for accurate inventory and configuration tracking.
Tech Stack Used: Rundeck, ServiceNow CMDB, MySQL, standard telemetry tools (e.g., Percona Monitoring and Management or similar).
Result: Audited, normalized, and brought the fleet of MySQL hosts under standardized automated deployment and fleet management tooling; identified hosts for retirement (~2M in combined hardware and operational waste) and deployed standard telemetry packages to greatly increase visibility. Significantly increased visibility into the MySQL fleet’s health and performance through standardized telemetry.
Context: Optimizing the MySQL fleet was critical for GoDaddy’s operational efficiency and cost management, as MySQL underpinned many of its core services. This project improved database reliability, reduced operational toil, and freed up significant resources.
Visuals: