Decode deaths with BERT to improve medical device safety and design

A portrait of Michelle Wu, the founder and CEO of Nyquist Data.

Michelle Wu is the founder and CEO of Nyquist Data. [Photo courtesy of Nyquist Data]

By Qiang Kou and Michelle Wu, Nyquist Data

A recent study shows that the number of death events in the FDA’s MAUDE (Manufacturer and User Facility Device Experience) database has been vastly underestimated because many are not reported as deaths.

Lalani et al. manually reviewed 290,141 MAUDE reports and found that around 17% of the death events had been misclassified. That means the patient died, but the event was labeled as having “no consequences or impact to patient.”

The manual review requires expertise in different medical specialties and is too time-consuming to process millions of added reports. This problem can be viewed as a binary classification problem. And we can fine-tune the BERT model to solve it.

What is BERT?

BERT stands for Bidirectional Encoder Representations from Transformers. Recent progress in pre-trained neural language models has significantly improved the performance of many natural language processing (NLP) tasks, including machine translation, next-sentence prediction, question answering and sentiment analysis. Among all the new methods, BERT is one of the most critical.

“Bidirectional” and “transformers” in BERT’s name indicate the model’s unique features. Historically, NLP models could only read text input sequentially, either left-to-right or right-to-left, but not simultaneously. In 2017, Google introduced Transformers, which do not require data sequences to be processed in any fixed order. This enabled training on larger datasets than traditional convolutional neural networks (CNNs) and recurrent neural networks (RNNs). Enabled by Transformers, BERT is designed to read simultaneously in both directions, known as bidirectionality.

Another unique feature of BERT is that it can be fine-tuned with one additional output layer and fewer resources on smaller datasets to create state-of-the-art models for a wide range of tasks. Many researchers are fine-tuning the BERT model architecture for specific tasks by pre-training it with certain contextual representations. Some examples are:

patentBERT for patent classification;
SciBERT for scientific texts;
BioBERT for biomedical text mining;
ChemBERTa for molecular property prediction;
And G-BERT for medication recommendation

Fine-tuning BERT

The fine-tuning process starts with a new dataset. In our case, we need to construct a training set of adverse event reports labeled as death-related or death-irrelevant. In this study, we avoided manual labeling by assuming that a false death label for an adverse event is rare. It is uncommon for a doctor or manufacturer to report an irrelevant event as death. Thus, we assumed all death-related reports had the correct labels. All MAUDE reports with event type “death” or patient problems “death/death, intrauterine fetal/sudden cardiac death” are labeled as death-related.

Using this standard, out of 2,749,878 MAUDE reports in 2022, 8,673 are labeled as death-related, and 8,666 have detailed event descriptions. Building the death-irrelevant part of the training dataset is more challenging due to those misclassified death events. We used MAUDE reports with patient problems “no consequences or impact to patient/no patient involvement/no known impact or consequence to patient/no clinical signs, symptoms or conditions” to guarantee they are death-irrelevant. 1,451,441 MADUE reports were selected using this standard, and we randomly selected the same number of death-irrelevant MAUDE reports as death-related ones.

An example of a MAUDE entry that lists a deceased patient’s problem as low oxygen saturation instead of death.

We used the Transformers library to fine-tune the BERT model. An F-1 score of 98.8% was achieved. This showed why BERT had replaced traditional NLP models in many fields because a state-of-the-art model could be created by fine-tuning the original BERT model. We applied the fine-tuned BERT model over the 21,702 MAUDE reports under product code LWS (implantable cardioverter defibrillator without CRT). The model reported how likely it is that one MAUDE report is death-related.

As a binary classification problem, setting the decision threshold is tricky. Our case differs from a standard classification because we assumed that existing death-related labels were correct. We ranked all probabilities and used the minimal probability with the death-related label as the threshold. Twelve missed death events were reported, and all were validated by human review (Table 1, below).

Report Number	Event Type Reported	Death-related Event Description
2649622-2022-00662	Injury	The RV and atrial leads were removed and during the removal of the pacing RV lead, the heart was perforated and the patient subsequently passed away.
2124215-2022-08176	Malfunction	It was reported that the patient implanted with this implantable cardioverter defibrillator (ICD) expired due to non-device related reasons.
2124215-2022-11115	Malfunction	Of the eight, one patient recovered after cardiopulmonary resuscitation (CPR) and external shocks, and one patient died.
2124215-2022-10614	Injury	Due to deterioration in his overall clinical condition, S-ICD therapies were disabled 2 weeks later, and the patient passed away soon thereafter.
2649622-2022-10143	Malfunction	As a part of Medtronic’s follow-up on the event, the patient’s clinic confirmed that patient is deceased.
9614453-2022-01673	Malfunction	It was reported that the patient is deceased.
2182208-2022-02038	Injury	The article reports patient deaths.
2017865-2022-19405	Injury	The patient passed away due to the unrelated event.
2017865-2022-40753	Injury	The patient passed away due to Steinert disease
2182208-2022-03497	Injury	The article reports sudden deaths which occurred during the follow up period.
2182208-2022-03499	Injury	The article reports sudden deaths which occurred during the follow up period.
2017865-2022-45956	Injury	It was also noted that the patient had expired due to an unrelated condition.

The BERT model has achieved state-of-the-art results in a series of NLP tasks and researchers are constantly finding new applications, with recent progress in topic modeling. In general, topic modeling is a clustering process. It can group a collection of documents into topics and show the main keywords in each topic.

Traditional models, such as LDA and Non-Negative Matrix Factorization (NMF), describe a document as a bag of words and a mixture of latent topics. The main limitation of traditional models is that they disregard semantic relationships among words and do not account for the context of words in a sentence. The BERT model can generate contextual text representations with semantic properties. This leads to the embedding and clustering-based topic modeling pioneered by Top2Vec and BERTopic.

We applied the BERTopic method over the LWS MAUDE reports, and a series of topics were extracted (the top 10 topics are shown in Table 2 below). This process was automated, and no text pre-processing was required, which is an advantage of the BERT-based method. The BERTopic model reported “shock impedance” as the most frequent topic (frequency 0.01091). The importance of “shock impedance” in ICDs has been validated by the literature.

Topic	Count	Words
0	291	shock impedance, shock, ohms, range
1	236	shock impedance, ohms, shock, impedance
2	213	atp, pacing atp, antitachycardia, antitachycardia pacing
3	171	integrity, integrity counter, sic, counter sic
4	166	pacing, pacing impedance, impedance the, high pacing
5	161	infection there, revision due, of system, explanted it
6	154	icd was, infection there, revision due, of system
7	136	noise, noise the, exhibited noise, ventricular lead
8	136	inappropriate, twave, inappropriate shocks, twave over sensing
9	131	svc, cava svc, vena cava, cava

Using AI models to improve product safety and design

The previous examples showed how AI models can decode hidden insights from adverse event reports. The application of machine learning and artificial intelligence techniques to the mining of adverse events is still in its infancy. With millions of adverse events reported, humans cannot review each report. Machine learning models are in high demand to analyze vast amounts of adverse event data. Government agencies need such models to monitor medical products and improve product safety. Medical device companies need them to monitor their products and improve product design.

Even using the original model without pre-training on biomedical content, we still achieve an F-1 score of 98.8%. This shows the BERT model successfully represents text with semantic properties, which can lead to many potential applications. The topic modeling is one of them. The BERTopic method automatically extracted the common topics from a collection of MAUDE reports under the sample product code. This can help manufacturers prioritize their post-market surveillance tasks and can be used as a guide when expanding into new fields.

A portrait of Qiang Kou, the tech co-founder of Nyquist AI.

Qiang Kou is the tech co-founder of Nyquist Data. [Photo courtesy of Nyquist Data]

Michelle Wu is the founder and CEO of Nyquist Data. She received her MBA from Stanford University and has over a decade of experience in pharmaceuticals, medical technology and digital innovation.

Qiang Kou is the tech co-founder of Nyquist Data. He holds a Ph.D. in bioinformatics.

How to join the MDO Contributors Network

The opinions expressed in this blog post are the author’s only and do not necessarily reflect those of Medical Design & Outsourcing or its employees.