Applying machine learning techniques to electronic health records could revolutionise healthcare and insurance, say Atreyee Bhattacharyya and Kanishka Jindal
In recent years, the healthcare industry has witnessed significant growth in the use of electronic health records (EHRs); in the US, the proportion of non-federal acute care hospitals using least a basic EHR system increased from 9.4% to 83.8% between 2008 and 2015. Government agencies are also recognising the value of EHRs: in 2014, the Indian government introduced a uniform system for the maintenance of EHRs by hospitals and healthcare providers across the nation, and the Indian insurance regulator has expressed its hopes that EHRs will benefit the industry.
EHRs provide an integrated, digitised view of patients’ medical records, capturing patients’ health conditions over time and enabling the secure exchange of information between authorised providers. Good EHRs are expected to contain patient demographics, allergies, clinical notes and diagnoses, diagnostic imaging reports and radiology images, lab and test results, administrative and billing data, regulatory compliance and legal permissions. Most importantly, they should allow data compilation for research and analysis, and provide time efficiency.
EHRs help to improve the quality of care provided by offering quick access to patients’ records and improving patient-physician communication. They can also improve co-ordination and resource planning, for example by avoiding the ordering of duplicate tests – saving costs for hospitals, patients and insurance companies.
EHR systems are also shown to provide higher patient satisfaction. The biggest benefit here is experienced by patients with complex, long-term or chronic conditions. A Norwegian survey (bit.ly/NorwayEHRs) on patient-accessible EHRs has shown that most respondents think EHR data provides enhanced knowledge of their health condition, makes it easier to control their health status, and enables better self-care, greater empowerment and easier communication with health care providers.
EHR data analytics also enhances disease management. Predictive modelling using EHR data can help estimate clinical outcomes such as mortality and re-admission probability. It can identify higher risk lives and hence enable preventative care for such lives. Population studies based on EHR data can help to predict disease progression by the type of condition, and thus aid disease management.
EHRs, artificial intelligence and machine learning
The broad adoption of EHRs presents a huge opportunity for the application of artificial intelligence (AI) and machine learning (ML) techniques. Meaningful analysis of clinical notes can be performed with the help of natural language processing techniques, which analyse unstructured data and obtain meaningful information from the notes. Deep neural networks such as convolutional neural networks (CNNs) can be used in radiology image classification.
Structured and unstructured data
MIMIC-III is a large, freely available database of de-identified health-related data relating to patients who were admitted to critical care units at a large tertiary care hospital between 2001 and 2012. It contains both static and longitudinal structured data and sequential unstructured data.
Static structured data includes demographics such as gender and marital status. Temporal data includes:
- Structured: vital signs, pulse, systolic and diastolic blood pressure, and many clinical test results
- Unstructured: sequential unstructured clinical notes.
In a 2020 study, structured and unstructured data from the dataset was combined directly through multi-modal deep neural networks for learning patient representation (bit.ly/CombiningDL). The resulting model was used to predict patient outcomes such as mortality, length of stay and re-admission. The models were based on fusion CNN and fusion long short-term memory, which can have general broader applications without requiring domain knowledge.
The results showed that by combining unstructured clinical notes with structured data, the proposed models outperformed models that use either unstructured notes or structured data alone.
Machine learning versus logistic regression
Rajkomar et al. performed a study that incorporated the University of California’s entire EHR from 2012 to 2016 and the University of Chicago Medicine’s EHR from 2009 to 2016 (go.nature.com/3nlQWqR). Both of the datasets contained structured data, and one also included free-text medical notes. Three models were built – the first based on recurrent neural networks, the second on a tri-attention neural network, and the third on a neural network with boosted time-based decision stumps. The results from these models were combined using ensembling. For prediction of inpatient mortality, the area under the receiver operating characteristic (AUROC) of this model was significantly higher than the AUROC of a traditional predictive model (in this case, a 28-factor logistic regression model).
The models also outperformed existing traditional models for predicting re-admission and length of stay. Crucially, the datasets had tens of thousands of potential predictor variables for each patient, including clinical notes, but the ML models were able to identify which predictors to include for a particular prediction without hand-selection of variables by an expert.
“ML models were able to identify which predictors to include for a particular prediction without hand-selection of variables by an expert”
Lauritsen et al. developed xAI-EWS – an explainable AI model with an early warning score system that predicts acute critical illness using EHRs (bit.ly/ExpAI_EHR). The model comprised a temporal convolutional network prediction module and a deep Taylor decomposition explanation module, tailored to temporal explanations. xAI-EWS was able to explain how the prediction outcomes are driven by specific input variables, which helped clinicians to understand the reasoning behind the predictions.
What’s in it for actuaries?
In life and health underwriting, EHRs provide a quicker and easier process than obtaining an attending physicians’ statement, thus resulting in lower underwriting costs. Data analytics and predictive modelling can be used to produce a risk score on which underwriting loadings could be based. Automated underwriting through a rules engine should allow automatic acceptance and decline for the super healthy/super unhealthy lives.
When it comes to pricing, EHR data can accurately predict hospitalisation and thus the expected cost of future claims for
health insurance. This is particularly important where past claims data is sparse or not available – for example for new business. Researchers at Atrius Health found that, using patients’ demographics, past use of health facilities, medical diagnoses and medications, they could accurately predict hospitalisation in the next six months. Models using EHR-only and claims-only data had similar predictive power.
Additionally, longitudinal population studies based on EHR data can improve the existing understanding of the progression of particular health conditions. This can help to improve predictions of expected costs of future claims for both health and life insurers, and thus lead to better claims management. Understanding of disease progression should help in the development of disease management tools, which could reduce claims cost.
Challenges and considerations
EHRs could revolutionise insurance underwriting and healthcare analytics, but critical obstacles remain relating to their privacy and security. There is continued demand for assurance that patients’ records are securely protected. Due to the data’s sensitive nature, EHRs are governed by strict government privacy laws such as the Health Insurance Portability and Accountability Act in the US and the Data Protection Act in the UK. These have important implications for actuarial access.
EHRs also have challenges in their implementation. There is currently no single format for EHR data, which reduces its ease of use for actuaries. The cost to insurers of piloting and implementing an EHR system is also significant. Insurers can have access to EHR data through vendors (who obtain data by logging into a patient portal), or through aggregators (which allow acquisition of applicant-authorised EHRs from multiple vendors). However, the information available to insurers on these portals is often less than the full EHR. Current hit rates on EHRs for insurance applicants are still low, although this is expected to increase. Wide access to EHR data for actuarial or underwriting purposes, however, depends on challenges around data security and standardisation being overcome. Although work is ongoing in these areas, uncertainties remain, and widespread access should only be expected in the near to mid-term future.
As use of EHRs matures across the globe, they will be very useful for patients, healthcare professionals and insurers. The future lies in the development of flexible factor-based architectures that can operate effortlessly within the workflow of a healthcare environment. EHR data combined with the emergence of genomic data is opening up the potential for precision healthcare and personalised medicine. However, the challenges around ease of access, interoperability, low hit rates and so on need to be addressed before the full benefits can be realised.
Atreyee Bhattacharyya is an associate director at Willis Towers Watson in London, and chair of the AI and Automation Working Party
Kanishka Jindal is an actuarial manager at Munich Re in Mumbai, and a member of the AI and Automation Working Party