An automated proxy server configuration update failed to complete successfully which introduced an incomplete IP list into the career website hosting platform backend. This incomplete IP list caused a security process to incorrectly begin flagging legitimate incoming admin users as spam users (because their legitimate source IP’s could not be validated against the incomplete IP list). As a result, any content/sites created by those users was set to inactive - effectively taking the site down during the incident period. Since some of the affected admin users were core senior staff who had setup customer sites - customer sites were impacted.
The security process inactivated content/sites associated with the flagged users - resulting in the unplanned inactivation of 15 customer career sites.
Automated monitoring immediately began alerting operations staff who identified the root cause of the issue, reactivated sites, corrected the IP config issue, and restored admin users who were marked as spam.
10/29/2018 - 12:45pm US EDT - Initial config updated failed.
10/29/2018 - 12:59pm US EDT - Security processes began inactivating sites and monitoring alerts began triggering.
10/29/2018 - 1:25pm US EDT - Root cause identified and remediation begun.
10/29/2018 - 1:37pm US EDT - Affected sites reactivated & caching tier invalidations begun.
10/29/2018 - 2:19pm US EDT - All caching tiers verified as refreshed.
Career Websites (X-Cloud Candidate) - Approximately 15 customer career sites
The root cause was the automated proxy server configuration update process failure. That has been disabled until a thorough RCA can be completed to find the cause of the partial run and correct with appropriate testing.
Additional quality / verification processes are being considered for Proxy IP update process.
Security tooling is being reviewed to determine if site/content inactivation can be decoupled from user inactive workflows.