Skip to main content

Server outage

Ben

When & What was effected?
3pm-6pm GMT the server matrix.acter.global and with that the Acter App was unavailabe.

What happened
Due to a problem with the Apple Push Notifications we had to update the server infrastructure. During which we noticed that some database upgrades were due to be made. Running them on the staging instance everything worked just fine and came back up quickly. So we decided to also run them on the main server underestimating how much larger that server upgrade would be, took the server into maintenance mode and started the update. After about an hour in the upgrade (without much visible progress) we decided to cancel it and delay it for some better time. Unfortunately, restoring to the previous state took almost another hour in itself and thus we experienced a prolonged down time of the main server until everything was restored.

What was compromised/lost?
Nothing was compromised or lost. All data could be restored without problems.

Measures taken to prevent this in the future?
For the time being we will only be doing database upgrades for pre-scheduled time frames with low activity and not do some "on the go" anymore.