Hitesh Jawa
May 12, 2021

2020–21 Internships at Curai

Blog Main Page

We haven’t talked about our engineering internships in a while (see our previous post here). A lot has changed since then! The engineering team has grown from 15 to 25 people, Curai Health as an organization from 30 to 51 people, and we are now a remote-first company with employees all around the country. Our processes have evolved for this remote-first reality with more focus on documentation, smoother onboarding, and clear and consistent expectations for a successful internship. We accept interns year-round on all 4 engineering teams (Applications, Platform and Infrastructure, AI Engineering, and AI Research). Each internship typically lasts 12 weeks and each intern is assigned a dedicated mentor. A successful internship at Curai means 2 things: 1) leveraging your strengths to launch impactful features for Curai, working collaboratively with the rest of the engineering team, and 2) learning something new that will be relevant to your career.

Over the last 18 months, our teams hosted 8 interns, each of whom got to launch features and to learn quite a bit along the way. 3 of them joined us as full-time engineers after the completion of their internship. When gathering post-internship feedback, all of them consistently highlighted: 1) Curai’s mission to ‘provide the world’s best healthcare to everyone’ 2) all the wonderful people of Curai who made them feel at home, and 3) the valuable ongoing mentorship and guidance they got from their mentors.

Intern Scavenger Hunt Extravaganza at Curai Health
Intern Scavenger Hunt Extravaganza at Curai Health

Enjoy reading about their work below, and please read more about Curai culture and the open positions here if you are interested in helping us with our mission by doing an internship or taking on any of our other positions.

1. Automated Patient History Gathering

Shreya Ravi, Stanford

My name is Shreya, a rising Junior studying computer science at Stanford. This summer I had the opportunity to design, build, test, and deploy an automated patient history gathering system under the guidance of my mentor, Peter Lu. To drive down the cost of virtual visits for patients, it is important to reduce the amount of time a provider (a clinical associate or a doctor) spends with each patient while maintaining quality of care. Therefore my objective was to gather basic information about the patient and store the information in the patient’s medical record, which the provider reviews when providing care.

I prototyped a Figma visual design mock of what the question and associated chat message should be using React Storybook and integrated it into the app. To structure the history the patient provided us, we limited the options from which the patient could choose (e.g. allergies could only be from a list of known allergies we provided). This would cover most cases and providers can follow up about anything the patient may not have been able to enter. I made an API call to the backend to get the appropriate list of medical conditions, allergies, etc. for each history gathering question, and another API call to get the patient’s known conditions, allergies, etc. Together, this data provides the patient with all the tools to easily inform their care provider about their medical history. Finally, I updated the patient’s medical chart using the FHIR specification. FHIR is a standard for storing and sharing healthcare information electronically. I created FHIR resources for each condition, allergy, etc. and committed these to the existing database storing patients’ medical charts. This database is queried each visit to display the patient’s chart to the care provider, so these changes persist across visits.

Past medical history gathering

After putting together these parts, the patient history gathering feature was ready to be deployed. There were many exciting technical aspects of this project, but what made this experience unique was that I was responsible from the designing to the deployment stage of every aspect of this feature. I worked together with others on the Curai team to achieve this and this both accelerated my learning and made my contribution to the product feel more impactful.

Working at Curai this summer was an incredibly insightful and rewarding experience. Despite having to work from home considering global circumstances, I felt like I was able to connect with many teammates, whether through smaller group meetings, happy hour games on Fridays, or 1-on-1 meetings with my mentor Peter, manager Hitesh, and many others on the Curai team. What was especially comforting was that there was space at all-hands meetings on Fridays to ask questions about the direction of the company, why decisions were made, and what steps were being taken to promote diversity and inclusion in the company. I’m excited to see Curai’s platform grow!

2. Cloud ML Infrastructure

Luke Qin, Duke University

Hi I’m Luke. I am studying computer science at Duke University. This past summer I worked on moving Curai’s machine learning model training, experimentation, and deployment to the cloud under the guidance of my mentor, Matt Willian.

I investigated two leading cloud providers, Microsoft Azure and Amazon AWS, evaluating both against our primary use case — an end-to-end model retraining pipeline. After some quick experimentation on both, I quickly settled on AWS primarily because I found it more mature for our use cases. The next step was to design the pipeline, specifying different stages that could be pluggable with the existing codebase and desired functionality.

The primary components of the pipeline were: (1) generating an updated training dataset, retraining the existing model, evaluating model performance, and deploying the updated model. I decided that I would use the AWS Step Functions Data Science Python SDK to orchestrate the first 3 stages of the pipeline and used this notebook as an inspiration for how our step function pipeline would be implemented.

  1. Generate Dataset : The first step of our training pipeline will load data from our production database in BigQuery to S3 buckets. I realized we could leverage AWS Glue functions and existing pre-processing Python scripts to accomplish this .
  2. Train Model: We then pass the dataset as an input to our training step, and run a custom, model-specific training script. The resulting updated model is saved to an S3 bucket and deployed as an endpoint in a pre-existing AWS pytorch container. Because our use cases require additional logic on top of a simple prediction output (i.e. applying the softmax function, taking the top k ranked entities for an entity linking model), we take advantage of the input and output functions in Sagemaker for Pytorch models, and add the additional logic to our output function.
  3. Evaluation: We create another Glue function that runs a custom evaluation script and publishes the evaluation metrics results to a CloudWatch dashboard. A CloudWatch dashboard would allow ML engineers to determine if the retrained model should be shipped to production.
  4. Deployment: We then deploy the retrained model as an inference pipeline. We used this notebook to guide our implementation.

Training and inference script stored in a Sagemaker model object

Using the above design I executed multiple successful experiments using AWS Sagemaker. I adapted to the cloud ML paradigm for 2 existing models (Summarization and Entity Linking) and spent significant time refactoring the monolith model training code for these 2 models to adapt. This also gave me an opportunity to work with 2 other ML engineers along with my mentor Matt. This work helped set a path to solve infrastructure and scale challenges that the ML team was facing. Finally I took another model (Word Embeddings), converted it to a service and launched it to production to solve a performance bottleneck (reduce bootup time by more than 50%) on the API server.

Hi I’m Winnie. Prior to Curai, I spent several years working in the pharmacy field before making the switch to software development through a coding bootcamp. Like many others who joined the healthcare field, I wanted to help people in some way. One of the things we are building at Curai is a modern electronic health record (EHR) which incorporates machine learning and intuitive functionality to assist providers with charting and diagnosis. A big problem that providers face is that they spend a lot of time charting which is essential but time consuming. By harnessing AI, Curai’s EHR is able to auto chart findings and reduce the burden on the providers.Throughout my internship, I got the opportunity to touch a lot of areas of our EHR system with a goal to improve efficiency and product usability to allow our providers to spend more time on patient care rather than filling out a patient record. My main internship project was to update the medications and allergies workflow. My solution was to reduce the number of steps in the provider’s workflow to either view or update a patient’s allergy/medication by completely removing the use of modals in this process.A few of the highlights of my internship:Developed a feature fully from conception to production. I provided feedback on design, coded both the frontend and backend. I worked with React components on the frontend and Flask on the backend.Shipped numerous usability improvements to minimize the time and steps needed to accomplish certain actions for our providers. This reduced an extremely manual multi-click process to just 2 clicks. This reduction in process was essential in reducing time spent charting so providers can focus more on patient care.Thinking long term, an example of this was designing the allergy and medication components in our EHR. Although the two components follow similar workflow, separating the components makes maintaining and testing the code much simpler in the future.One of the biggest learnings I had during my internship was how to code in the context of a team. The features I built needed to work with all the features that were worked on by other team members. It wasn’t working in a silo and I needed to ensure that when my code runs it doesn’t break other things. This ingrained in me the importance of testing and to make sure the code I write can be easily tested, understood, and maintained by others.

Inline allergy panel that replaced multiple modal dialogsAt the beginning of my internship, I was concerned about the challenge of getting to know the team in a fully remote environment. Turns out I had nothing to worry about. One of my favorite parts of the internship was getting to know everyone. There was no shortage of jokes, memes and witty comments exchanged. The team made it very apparent from day one that questions were encouraged, so it made the herculean task of onboarding into my very first software engineering role much less intimidating. I’m excited to report that at the end of my internship I was offered a full-time role which I accepted. I can continue to learn while contributing to a mission that I believe in.4. Net Promoter Score (NPS)Melissa Rauch, Hackbright AcademyHi I am Melissa. Last fall I took an internship with Curai Health on their Product Engineering team. As a recent graduate from Hackbright Academy I was both thrilled and quite nervous as I had just made a pretty huge career change from midwifery to software engineering. During my internship, I was eager to work on a project where I could continue building upon the foundational knowledge in full stack engineering I had gained during bootcamp and make some kind of meaningful contribution to the Curai Health App. Lucky for me there was just the project waiting for me to dive into! This project was to build an in-app Net Promoter Score tool that would serve as a means to help expand into enterprise partnerships.

Net Promoter ScoreNet Promoter Score (NPS) is an industry standard metric for capturing customer satisfaction of a product or service. Adding NPS to our app was an important milestone for us to show the impact of our work, and I was excited about the opportunity to own something so impactful during my internship.Some of the highlights of this project were:Writing a technical solution proposal complete with detailed workflow and timelinesLeveraging FHIR (Fast Healthcare Interoperability Resources) to support data interoperability to create the new NPS questionnaireConfiguring a React Native Slider to capture user input of their NPS Score and adding it to our Design System to make it available for use in other componentsConsidering user behavior into how and when we show the NPS surveyExecuting a full stack project from start to finish where I got to be the lead engineerBeyond the NPS project, I had time to take on a few side projects that included contributing to engineer onboarding documentation, refactoring and revamping the Curai Health support page, QAing other exciting new projects prior to their launch, as well as fixing bugs and reviewing PRs. Having this assortment of side projects helped me round out my internship by learning the importance of the work that supports your teammates, being on the ball to improve processes, and actively participating in the bigger picture.To be quite honest, my favorite part of this internship wasn’t necessarily the project I worked on, or the new skills I learned. It was the people I met and the place that Curai is. While I started my internship fully remote, only seeing my coworkers on google meet and never stepping into an office, I felt like I had found my place in the tech world and that I was surrounded by some of the most brilliant minds with the biggest hearts. Lucky for me, I was offered a conversion to full time and I didn’t have to say goodbye at the end of my internship. I am now on the Product Engineering team as a full time Software Engineer.5. Adding an Additional Question TypeLaura Miller, Hackbright AcademyHey! I’m Laura, and for the last 12 weeks I’ve had the pleasure of working as an Engineering Intern on the automation team! The goal of automation at Curai is to automate the parts of the workflow between health care provider and patient. One way we’ve worked towards that goal is by building a service called Question Serving (QS) which allows our providers to easily send batches of questions to patients. With QS, we can automatically chart a patient’s answer to their health record. Then the newly charted findings get sent through our AI models to generate follow up questions and suggested diagnosis! This process reduces the amount of time a provider needs to spend to ask questions, search for diagnoses, and chart responses. We are continuously improving the question serving flow to make it more accurate, faster, and easier to use.When I started my internship, we were able to send two different types of questions, FreeText and SingleTernaryChoice. The FreeText question acts as our default type and allows the patients to respond to a question by directly typing in the chat. The SingleTernaryChoice question type gives the patient the options of yes, no, and unsure responses. We can then use the response to automatically chart symptoms or findings as present, absent, or unknown. The project for my internship was to build a third question type, SingleSelect, which would allow us to present multiple options to our patients. With this new type, we would be able to chart more specific symptoms. For example, instead of earlier ( Do you smoke, Y/N), we could send this question along with four potential response options:“Which best describes your current situation regarding smoking or vaping?”“I smoke every day”“I smoke some days”“I quit smoking”“No smoking history”Building SingleSelect came with its own set of challenges. First we needed to refactor our existing question serving infrastructure to both support this new type and allow us to add additional types more easily in the future. We also needed to determine how we would identify which questions would leverage the new SingleSelect type and craft response options for each of those questions. There was also a new UX requirement because we needed to be able to display up to six possible options for a question and make the existing question types cohesive with the new system. By the end of the project, I had touched almost every aspect of our code base and had a usable feature that was ready to be launched to production!

Multiple question typesThe internship experience at Curai is unlike any other I’ve ever had before! I felt like a full team member almost immediately which, in my opinion, is a particularly challenging feat as an intern in this newly remote world. I had the opportunity (and support!) to explore and do a little bit of everything, which has been the best part of working at a startup. Curai has been such a positive environment to start my software engineering career. I’m very grateful to have spent time working at a company with such a meaningful mission and alongside talented teammates who are passionate about their work!6. AWS SageMaker InferenceTom Joshi, Columbia EngineeringAs a Machine Learning Engineer Intern, my projects focused on the intersection of Machine Learning, Data Infrastructure, and Natural Language Processing. As my starter tasks, I incorporated Semgrep into our CI/CD pipeline, and improved Curai’s language engine to remove duplicated output and these helped me get comfortable with the dev and cloud environments.My main project was deploying one of Curai’s classification models to AWS SageMaker. In the old implementation, the model weights and architecture are checked into GitHub with the rest of our server software so that they can be imported into the backend server: However, we want to start decoupling the model deployments from backend server deployments. Deployments would then be able to move at a pace separate from other launches. A SageMaker endpoint offers an isolated environment with preinstalled machine learning libraries. From this endpoint, learning loops and A/B Testing can eventually be implemented. Therefore, porting the model to an AWS SageMaker endpoint, we can decouple the model deployments from backend server deployments.

SageMaker Inference for Curai modelsThere were a number of considerations including no longer having local access to our knowledge base, text preprocessing files, and embeddings endpoint. Therefore, major code refactoring had to occur to the major components in our data pipeline to decouple the SageMaker environment of the model from the rest of the code base.The final solution includes a programmatic process for uploading and deploying the model code depicted below. We have two buckets. One contains our model, knowledge base data, and any other files you may have. The other bucket contains data, the endpoint definition, and text preprocessing files. You tar these files together and use the upload script to get them to an AWS S3 folder. Then you run our deploy script and that takes those files you defined in Amazon S3 and creates a SageMaker endpoint.This internship was an incredible learning experience. I gained experience in both Product-focused Machine Learning and Natural Language Processing, and felt having real impact on Curai users. Finally, Machine Learning Infrastructure was at the core of this project and I produced a Machine Learning design document so that everyone on the automation team could see how I was thinking about problems. Curai Health is a phenomenal place to learn and grow as a young engineer!7. Knowledge Base MetricsVivi Nguyen, University of California, BerkeleyHi! I’m Vivi, and this spring I was an intern on the Machine Learning team. My main project was to equip the Curai’s medical knowledge base (KB) with metrics, and to develop visualizations to help make the metrics understandable. KB is a critical part of Curai’s technology. It powers our machine learning models, a series of generated questions for patients during chat, and is also responsible for autocompletion in the electronic health system (EHR). Along with KB there’s Xray, a tool that is the browser and editor for the KB, and it is built using SQLAlchemy and Flask Admin. It’s both a database and a web app that lets people on Curai’s medical team change the contents of the KB.Metrics give us ways to measure coverage and adequacy of the KB content- it’s important to have data about the KB to understand how the KB is actually doing! A few months ago, there wasn’t a lot of visibility on this. If someone needed data about the KB, they would need to manually write scripts to parse through JSON data (not a great situation).In my first iteration of the Xray dashboard, I focused on querying general metrics about the concepts in the KB, to answer questions like: How many concepts are there? When was the KB recently modified? These metrics provided a base layer of information about the KB. Then, I explored additional metrics about the concepts themselves, to answer questions like: How many concepts are there of each type (Finding, Disorder, etc.)? How many synonyms does each concept have? I also worked on creating visualizations using Chart.js to make the metrics understandable, and added a dropdown menu to filter by different criteria.

Knowledge Base metricsFinally, I worked on making the metrics actionable by adding click functionality to the metrics visualization chart. Once a user clicks on the graph, it will pull up a filtered Flask Admin list view of all the concepts that fit the graph criteria. From there, the user can edit the concepts as needed. This allows Xray users to not only identify, but to be able to correct “bugs” in the KB content. Aside from my main internship project, I also worked on adding is_a relationship hierarchies to Xray, so Xray users can also edit parent/child is_a relationships of concepts.Overall, I had a great internship experience! Even though we were all working remotely, I was still able to get to know people and felt part of a community. I liked that I was able to take ownership of a meaningful project. I’m so grateful to my manager, Francois, and my mentor, Jo-Jo, (and so many others on the team!) for all the work they put into making my internship a memorable and valuable experience.

Stay Informed!

Sign up for our newsletter to stay up to date on all Curai news and information.