The Public Utility Commission (PUC) of Texas’ has implemented a requirement that utility providers provide redundancy for Advanced Metering System (AMS) and Market Operations applications in the event that the utility’s primary data center becomes inoperable. In response, a Fortune 500 utility company established an Enterprise disaster recovery implementation program committed to enabling a solution that would ensure fewer than 59 minutes of downtime.
The company quickly realized that its existing data center was significantly underprepared for integration into a multi-data center environment. Not only did a second data center need to be created to mirror the first, but the first needed to be completely upgraded to allow for new capabilities including storage, network, security and platform improvements.
The creation of a second data center created other challenges within the IT organization as a whole: First, the company would have to staff and maintain two data centers, and second, the company would need to be capable of shifting all operations to the secondary data center in the event of disaster. These challenges meant introducing new technologies, new processes, and new responsibilities to the IT team.
The company engaged Sendero to manage the entire life cycle of this program. The effort started with the building of the second data center facility, and ended with delivering the final product that enabled applications to be orchestrated and moved between data centers. The solution had to meet the Recovery Time Objective (RTO) of 59 minutes. Working with the company’s stakeholders to manage the software, hardware, and infrastructure technology, Sendero actively led project cutovers, documentation gathering efforts, and strategic planning sessions to ensure an effective solution.
Sendero led many change management efforts including training, demonstrations, knowledge transfer sessions, and documentation buildout. To ensure the IT organization had a lasting artifact it could reference to understand the Disaster Recovery solution, Sendero created a single document that detailed the step-by-step processes to follow in the event of a disaster, specific to the roles within the IT organization. The documentation also covered the application and infrastructure steps required to ensure minimal downtime during the failover process. This playbook provided each IT group with an understanding of how the groups were interrelated, with references to detailed vendor and project documentation.
Sendero’s team actively led project cutovers, documentation gathering efforts, and strategic planning sessions to ensure a quality and effective solution.
Sendero drove this project to completion with minimal downtime and critical issues
- Implementation of the DR solution, which allowed for the successful migration of AMS applications between two data centers, was completed by the original commitment date.
- The IT Operations organization was aligned with the PMO methodology to ensure quality delivery, vendor accountability, and consistent communication across the IT Operations organization for all future IT projects.
- The Operational Readiness and Assurance effort successfully completed and handed over the Geographical High Availability Playbook, which served as a detailed, cross-functional operations manual to execute the DR solution for director through system administrator audiences.
In addition to achieving the 59-minute availability requirement, the solution provided the following benefits:
- Reduced backup and recovery times (with the rollout of a new backup solution)
- Integrated operating systems with Active Directory for centralized management of all accounts
- Automated support with scripts to start, stop, monitor, and clean each application
All DR-enabled systems were validated for end-to-end integration (regardless of the data center in which the application resided), and server automation was implemented for future projects using virtualization capabilities at both data centers.