Kafka in the Cloud: Why it’s 10x better with Confluent | Find out more
As one of the largest cancer research and treatment organizations in the United States, City of Hope’s mission is to transform cancer care. Advancing this mission requires an array of cutting-edge technologies to fuel innovative treatments and services tailored for patients’ specific needs.
City of Hope deploys artificial intelligence (AI)-powered models to predict specific events that are possible during a patient’s treatment. It has built many in-house AI models, including one designed to predict and prevent the risk of sepsis for the cancer center’s most at-risk population: bone marrow transplant patients whose immune systems have been temporarily compromised for treatment.
City of Hope’s sepsis prediction model reduces sepsis and ICU escalation rates for its bone marrow transplant patients.
When using predictive models in a clinical application, the data and results need to be as real time as possible to ensure timeliness and accuracy. This calls for implementing technologies like data streaming, which serves as the backbone of predictive models.
Having performed nearly 19,000 stem cell and bone marrow transplants, City of Hope has one of the largest and most successful programs of its kind in the United States. City of Hope wants to avoid treatment complications, making patient safety of utmost importance.
Sepsis, a life-threatening infection, can rapidly progress without warning, leading to severe organ damage or death. Bone marrow transplant patients are particularly vulnerable to sepsis as a result of chemotherapy or radiation received before the transplant and immune system suppression afterward. Between 5% to 10% of transplant patients will develop sepsis, and the risk of negative outcomes in these patients is very high.
While there are out-of-the-box sepsis prediction models available, City of Hope needed to create a specialized model that could be trained and evaluated to work on bone marrow transplant patients. This brought its medical team and the applied AI and data science teams together to create a specialized predictive model that it has used for almost three years.
One of City of Hope’s key goals is to help clinicians make data-driven clinical decisions using real-time data.
When training predictive models, City of Hope uses as much data as possible for the model’s scope and application. For the sepsis prediction model, it primarily focused on using data relevant to the treatment of inpatient bone marrow transplant patients. This includes data on patient vitals, medications, lab results and admission, discharge, and transfer (ADT) data.
However, as sepsis can rapidly progress in severity or to septic shock, it needed a timely application that could help predict the risk of sepsis using real-time information. Relying on its data warehouse to get that data no longer made the cut. Traditional data warehouses are designed for batch processing that's often executed during the overnight hours. Additionally, to ensure the most accurate prediction results, City of Hope needed the model to generate predictions only after the bone marrow transplant occurred and once the cell product was administered to the patient. A one-day lag from the data warehouse for developing a real-time model for sepsis prediction wasn’t going to work.
To address these requirements, the team built real-time infrastructure using Apache Kafka®. This infrastructure needed to be scalable and future-proof so they could add different data streams and generate predictions for acute care in real time or pseudo-real time, depending on the application.
They designed and implemented a Kafka environment to preprocess real-time data streams from medical records and to run this data through their predictive models. These models then generate predictions that are fed back into the Epic electronic health record system and trigger different clinical workflows and notifications for a doctor or a nurse if a patient needs an escalation of care.
City of Hope decided to use Kafka to create a system that was designed for data streaming and real-time data.
It uses Epic as the source of data for the sepsis prediction model. Epic has built-in trigger events that generate outbound messages in real time in the form of Health Level 7 (HL7) messages whenever something changes—for example, when a new lab work result comes in or when a new blood pressure measurement is recorded in Epic.
After doctors and nurses input this information into Epic, the data is sent to the data integration engine Corepoint. It converts those HL7 messages to JSON, and from there the data goes into the cloud and ultimately into Kafka topics. City of Hope has five different Kafka topics: vitals, lab results, medication orders, ADT, and substance administration.
City of Hope’s sepsis prediction model operates on a subset of patient information—those undergoing bone marrow transplant procedures. They track two events, one that says the patient has been admitted and another that says the patient has been administered a cell product during the transplant. They then use a series of Kafka topics and microservices to check those conditions, store information, and then trigger the model to generate predictions. Kafka also serves as the backup solution. If the prediction model crashes, it seamlessly retrieves the necessary state from the designated topic in Kafka—taking advantage of its immutable log feature, which allows instant playback and infinite retention—ensuring uninterrupted processing.
For inpatients undergoing procedures, the data is filtered, preprocessed, and combined before being sent back into a predictive model that completes the processing.
Next, the predictions go into a Kafka topic, and a microservice sends predictions back into Epic. These trigger best practice advisories within Epic for physicians or nurses to see and take action.
Confluent Cloud on Microsoft Azure enables City of Hope to better manage its real-time infrastructure. Confluent equips the organization with the right set of tools to ensure it has clarity into the system and better understand what’s going on behind the scenes and quickly fix problems. With Confluent’s rich ecosystem of connectors, it was able to connect the data to Kafka in only a couple of hours.
For a national organization of its size, even small changes can have a big impact. For example, there are different data sources for labs that process a patient’s blood culture, and each lab uses its own identifiers in its system. In spot-checking data topics using ksqlDB, the team noticed that the naming convention of white blood cell count from one of the lab sources had changed. If left unnoticed, this could have thrown off the predictive model and led to inaccurate results for patients.
While there’s no one-size-fits-all advice when it comes to building and deploying AI-powered systems, it’s important to build models that can easily scale and support future application development.
The key is to embrace a research-first approach to ensure the right implementation. This will also ensure you have the right internal stakeholders on your side who will back up your vision for building for the future. It’s important to nurture those relationships with various stakeholders across the organization who will be your allies and can help drive your company’s long-term growth.
There are a lot of technical aspects to be mindful of as well, especially with the scalability, reliability, and availability of the system you are building. This is especially true in a healthcare setting when patient care is on the line. Deploying Confluent has made sepsis predictions easier and has helped keep patients monitored and healthy during their cancer treatment processes.
This article was originally published on The New Stack on May 7, 2024.
Check out Confluent's AI resource hub to learn more about how you can build next-generation data intensive AI applications with a next generation data streaming platform.
While the promise of AI has been around for years, there’s been a resurgence thanks to breakthroughs across reusable large language models (LLMs), more accessible machine learning models, more data than ever, and more powerful GPU capabilities. This has sparked organizations to accelerate their AI
In just a few months since it became widely available, generative AI has swiftly captivated the attention of organizations across industries. In March 2023, IDC polled organizations and found that 61% were already doing something with generative AI (GenAI).