Services in US1 region are unavailable

Incident Report for Scanii.com

Postmortem

Postmortem on 2022-03-29 Outage

What happened

Last night (EDT), Scanii.com suffered an outage of our US1 region taking down our api-us1.scanii.com , api.scanii.com and our management UI https://www.scanii.com . This outage lasted 1 hour and 13 minutes .

The outage was caused by a mistake in a Terraform infrastructure-as-code file that triggered our services to be unnecessarily replaced in the AWS us-east-1 region. Unfortunately, once destroyed, the recreation of these services failed due to an inconsistency between the Terraform view of the infrastructure and how, in actually, the infrastructure existed in that region.

Once the cause of the failed redeploy was identified, the team was able to correct the issue, complete the redeploy and bring our services back online.

‌

What we’re doing to prevent this from happening again

We’re revamping our process for Terraform changes minimizing the reliance on the engineer spotting a potential dangerous resource destruction. This method was effective when we had hundreds of resources but it is impractical for the size of our infrastructure today with thousands of resources across multiple regions and 2 cloud providers.

As always, we apologize to all of our customers for the trouble this incident may have caused and if you have any questions or concerns, please don’t hesitate to reach out to us at support@uvasoftware.com

Posted Mar 30, 2022 - 19:52 EDT

Resolved

Issue resolved, all services are back to normal. We will keep monitoring things for a bit and work on a postmortem to be published later on this week.

Posted Mar 29, 2022 - 22:40 EDT

Investigating

We are currently investigating this issue.

Posted Mar 29, 2022 - 21:37 EDT

This incident affected: Administrative (Management Portal (scanii.com)) and API Endpoints (api-us1.scanii.com).