December 19, 2022

Cloud storage migration with zero downtime

Blog Main Page

Curai’s mission is to “provide the world’s best healthcare to everyone”. Our platform allows patients to chat with providers about both urgent care and primary care concerns.

When a patient uploads a photo to their doctor during a chat, we save the photo to AWS’s Simple Storage Service, commonly referred to as S3. Because much of the codebase and infrastructure that originally powered Curai Health was inherited (see blog post here), the S3 bucket where these were stored was located in a legacy AWS account. Nearly all resources in this AWS account were either deprecated or migrated over to our primary product engineering AWS account prior to my joining Curai. This bucket was the last lingering resource, and migrating it would enable us to permanently close the legacy account. The photos were also encrypted prior to upload using some custom code, which also required manually maintaining an encryption key.

We wanted to move the existing photos over to our current AWS account, as well as remove the custom encryption, and leverage S3’s built-in functionality to encrypt files at rest. When doing this migration, we needed to ensure there was no downtime, so that if a doctor wanted to view a photo from their chat with a patient, it would work regardless of where the photo was stored.

First, we created a new S3 bucket in the non-legacy AWS account. At Curai, we use Terraform to provision resources, and leverage Terraform workspaces and variables to create one bucket per environment. This allows us to develop and test against non-production data in dev and staging environments.

In order to ship this quickly, we wanted to minimize changes to application code. The existing legacy code established connections to AWS using user credentials which are saved as environment variables (in particular, AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY). We simply extended this pattern by renaming our existing credentials and adding an additional set of credentials: one to access the new account, on top of the existing credentials that access the legacy account. These credentials were stored in Terraform Cloud and AWS Secrets Manager. Another concern of ours was right-sizing permissions — the legacy account credentials were overly broad and allowed access to a whole host of unnecessary AWS peripherals. To address this concern, we created a user with read, write, list, and delete access to this single S3 bucket. We created an additional user specifically for local development. This user has access only to the S3 bucket in the development environment, and its credentials are shared with the development team.With the credentialing and permissions concerns out of the way, we prepared to make actual changes to our application code. The chat photos implementation uses 2 endpoints: one to retrieve photos and another to upload photos.

To start, we modified the retrieve endpoint handler to attempt to retrieve photos from the environment-specific Curai S3 bucket. If the object was not found, we then looked in the legacy bucket. After releasing this fallback, we updated the upload endpoint handler to write only to the new data store (the environment-specific Curai S3 bucket).

Once new uploads were being added to and retrieved from the new data store, we migrated the existing content from the old data store. For these custom-encrypted files, the process was to download the file from the legacy S3 bucket, decrypt, and upload the decrypted file to the new S3 bucket. We ran this script as a task against our various clusters and monitored closely for any corrupt files.

The proof, as they say, is in the pudding, and we wanted to make sure that this migration was actually successful. We had instrumented our application code with logging that would detail requests that ended up falling back to the legacy bucket. After running the migration, we kept an eye on these logs to make sure no requests were finding the fallback path, which we confirmed using CloudWatch log aggregations.

The next was the best part of any project: deleting dead code! We removed the fallback path, as well as the home-grown encrypt/decrypt functionality. We were also able to clean up the stored API keys that had been used to connect to legacy S3. Finally, after a final audit, we deleted all resources and closed down the legacy AWS account, and celebrated this grand occasion with ice cream.

Want to join the Curai team? We’re hiring for several roles — check out our careers page here: https://www.curaihealth.com/careers




Stay Informed!

Sign up for our newsletter to stay up to date on all Curai news and information.