Root Cause Analysis
This incident was caused by human error where a database migration was initiated while a daily backup was still running on our production database. The overlap caused a conflict between the migration process and the backup operation.
Timeline
11:08 UTC - We are currently investigating reports of issues.
11:11 UTC - We received confirmations that users are no longer receiving server errors.
Resolution
To fix this issue, our team rescheduled the database migration to an earlier time, ensuring it does not coincide with the backup operation. Additionally, we updated our run book for database migrations to include a step to check if a backup is running before proceeding.
Remediation Items
- Reschedule database migration to an earlier time [Completed]
- Update run book to include a check for running backups before initiating a migration [Completed]