The Public Utility Commission (PUC) of Texas’ has implemented a requirement that utility providers provide redundancy for Advanced Metering System (AMS) and Market Operations applications in the event that the utility’s primary data center were inoperable. In response, a Fortune 500 utility company established an Enterprise Disaster Recovery program committed to enabling a solution that would ensure less than 59 minutes of downtime.
The company quickly realized that its existing datacenter was significantly underprepared for integration into a multi-datacenter environment. Not only did a second datacenter need to be created to mirror the first, but the first needed to be completely upgraded to allow for new capabilities including storage, network, security and platform improvements.
The creation of a second datacenter created other challenges within the IT organization as a whole: First, the company now had to staff and maintain two datacenters, and second, the company would need to be capable of shifting all operations to the secondary data center in the event of disaster. These challenges meant introducing new technologies, new processes, and new responsibilities to the IT team.
The company engaged Sendero to manage the entire life cycle of this program, from building the second data center facility to delivering the final product that enabled applications to be orchestrated and moved between data centers. The solution had to meet the Recovery Time Objective (RTO) of 59 minutes. Working with the company’s stakeholders to manage the software, hardware, and infrastructure technology, we actively led project cutovers, documentation gathering efforts, and strategic planning sessions to ensure a quality and effective solution.
Sendero led many change management efforts including training, demonstrations, knowledge transfer sessions, and documentation buildout. To ensure the IT organization had a lasting artifact they could reference to understand the Disaster Recovery solution, Sendero created a single document that detailed the step-by-step processes to follow in the event of a disaster, specific to the roles within the IT organization. The documentation also covered the application and infrastructure steps required to ensure minimal downtime during the failover process. This ‘Playbook’ provided each IT group insight into how they were integrated with one another, as well as referenced detailed vendor and project documentation.
Sendero’s team actively led project cutovers, documentation gathering efforts, and strategic planning sessions to ensure a quality and effective solution.
Sendero drove this project to completion with minimal downtime and critical issues
- Implementation of the DR solution, which allowed for the successful migration of AMS applications between two data centers, was completed by the original commitment date.
- The IT Operations organization was aligned with the PMO methodology to ensure quality delivery, vendor accountability, and consistent communication across the IT Operations organization for all future IT projects.
- The Operational Readiness and Assurance effort successfully completed and handed over the Geographical High Availability Playbook, which served as a detailed, cross-functional operations manual to execute the DR solution for director through system administrator audiences.
In addition to achieving the 59-minute availability requirement, the solution provided the following benefits:
- Reduced backup and recovery times (with the rollout of a new backup solution)
- Integrated operating systems with Active Directory for centralized management of all accounts
- Automated support with scripts to start, stop, monitor, and clean each application
All DR-enabled systems were validated for end-to-end integration (regardless of which data center the application resided in), and server automation was implemented for future projects using virtualization capabilities at both data centers.