Services in US1 region are unavailable
Incident Report for Scanii.com
Postmortem

Postmortem on 2022-03-29 Outage

What happened

Last night (EDT),  Scanii.com  suffered an outage of our US1 region taking down our  api-us1.scanii.comapi.scanii.com  and our management UI  https://www.scanii.com . This outage  lasted 1 hour and 13 minutes .

The outage was caused by a mistake in a  Terraform  infrastructure-as-code file that triggered our services to be  unnecessarily replaced in the AWS us-east-1 region. Unfortunately, once destroyed, the recreation of these services failed due to an inconsistency between the Terraform view of the infrastructure and how, in actually, the infrastructure existed in that region.

Once the cause of the failed redeploy was identified, the team was able to correct the issue, complete the redeploy and bring our services back online.

What we’re doing to prevent this from happening again

We’re revamping our process for Terraform changes minimizing the reliance on the engineer spotting a potential dangerous resource destruction. This method was effective when we had hundreds of resources but it is impractical for the size of our infrastructure today with thousands of resources across multiple regions and 2 cloud providers.

As always, we apologize to all of our customers for the trouble this incident may have caused and if you have any questions or concerns, please don’t hesitate to reach out to us at support@uvasoftware.com

Posted Mar 30, 2022 - 19:52 EDT

Resolved
Issue resolved, all services are back to normal. We will keep monitoring things for a bit and work on a postmortem to be published later on this week.
Posted Mar 29, 2022 - 22:40 EDT
Investigating
We are currently investigating this issue.
Posted Mar 29, 2022 - 21:37 EDT
This incident affected: Management Portal (scanii.com) and API Endpoints (api-us1.scanii.com).