15. 1. 2018

Processing medical records: turning archives into actionable insights

Have you ever thought about the amounts of data, which are produced every day in each hospital? All those anamneses, daily diagnoses, laboratory reports, nursing records, X-ray reports, patient’s educational protocols and agreements, discharge summaries and many more. Majority of these files are written in plain texts, they are meant to be put into archives and never read again when the treatment is over. However, that’s about to change as we have been intensively working on our platform for processing medical records during the past year.

Nowadays, most of these files are fortunately typewritten on computers (even though there are still some doctors in Czechia, who prefer to write by hand). The ratio of typewritten documents will probably further increase as the physicians are forced to use the information technologies by their governments (e.g., the implementation of electronic prescriptions).

Going through all medical records manually is extremely demanding since the production of texts in hospitals is huge.

We found out that a hospital department with 4,000 hospitalizations per year can produce more than 100,000 standard pages of common medical records describing hospitalizations each year. Just one department! Let’s suppose, that you’d like to read through all of these records and create some analysis. It would take you more than a year just to read all the texts, even if you read fast (50 pages an hour, 8 hours a day, 225 days a year). And this is only one department. If you consider preparing such an analysis for a whole medium sized hospital with 20,000 hospitalizations per year, it would take approximately 5 people for a year to read all the produced documentation. Not to mention that another comparable analysis would mean 5 people for a year again.

Therefore, we have applied natural language processing (NLP) methods, so our platform can process medical records and extract relevant information in a fraction of time (hours, not years) needed by humans. Our solution gains insights about patient history (diagnoses, symptoms, medication etc.) directly from unstructured medical records, connects them into compendious timelines and make them available in a database or a information discovery platform (search and browse not only the extracted information). We further analyze these data to discover high-risk patients or to classify the hospitalizations.

We are successful in detecting potential healthcare-associated infections (HAI) and their risk factors.

Nowadays, we are successful in detecting potential healthcare-associated infections (HAI). These are the infections which patients get while receiving medical treatment in a healthcare facility (e.g., ventilator-associated pneumonia or catheter-associated urinary tract infection). Such infections can be life-threatening for patients and are definitely costly for hospitals. Moreover, we also look for the factors, which are likely to cause HAI, in order to improve the preventive measures. We can help with the design of medical research experiments, because the extracted data can be browsed by our information discovery application and we can easily define the groups of patients with specific combinations of conditions and laboratory results. We are now working on another use case: utilizing information extracted from medical records to support the classification of hospitalizations into diagnosis-related groups (DRG), which can improve the coding quality and raise the reimbursements from DRG-based payment systems.