GoDaddy: Tardis Oncall Automation
Role: Director of Global NOC
Overview: Developed an automation framework integrating several internal tools to ease escalations for our NOC to be more efficient.
Situation: The NOC’s process for escalating incidents and on-call notifications was often manual and involved interacting with multiple disparate internal tools. This led to delays, potential for miscommunication, and increased workload on NOC staff during critical incidents.
Task: To develop an automation framework (named “Tardis”) that would integrate key internal tools (including an SMS/Robocall tool) to provide a single-pane-of-glass interface for escalations, thereby simplifying and accelerating the process for NOC staff.
Action:
- Designed and developed the “Tardis” automation framework.
- Integrated Tardis with several internal tools, including MIR3 (an SMS/Robocall tool), MySQL for data storage, and Redis for caching/session management.
- Created a user interface (using Bootstrap) that provided NOC staff with a single pane of glass for managing escalations.
- Initially supported manual escalation triggers via the interface, with later enhancements to enable automated escalations (“no human in the middle”).
Tech Stack Used: Node.js, MIR3 (sms/Robocall tool), MySQL, Redis, Bootstrap.
Result: Greatly reduced the time per task for NOC staff by providing a single-pane-of-glass interface and the ability to escalate calls—initially manually and later without a human in the middle. The Tardis framework improved the efficiency and consistency of on-call escalations.
Context: This project was a significant step in modernizing the GoDaddy NOC’s tooling and operational procedures. By automating and simplifying escalations, Tardis helped reduce Mean Time To Acknowledge/Engage (MTTA/MTTE) for incidents, improving overall incident management effectiveness.