Predicting clinical events based on raw text: from bag of words to attention-based transformers

Dmitri Roussinov, Andrew Conkie, Andrew Patterson, Christopher Sainsbury

Research output: Contribution to conferencePaperpeer-review


Predicting hospital readmission or a patient fatality often happens to be resource- and life- saving, thus is a very important and challenging task for NLP/IR and machine learning applications in e-Health domain. While many successful approaches exist to predict such clinical events based on categorical and numerical variables, most of health records consist of raw text clinical notes. However, the models taking advantage of the free from natural language found in those notes rarely reach the accuracy level acceptable for the clinicians. In addition, in spite of their success in other domains, the superiority of deep neural approaches over classical bags of words has not yet been convincingly demonstrated for this task. Using a publicly available dataset with clinical notes, we have explored several text classification models to predict patient re-admission or a fatality and established that 1) The performance of our deep neural models exceed those based on bag of words by several percentage points. 2) This allows to achieve the accuracy typically acceptable for the clinicians as of practical
use (area under the ROC curve above .70). 3) Our model based on averaging n-gram embeddings works the best, exceeding more specialized ones such as recurrent, convolutional, and attention-based transformer models. 4) Our modifications in the attention-based transformer model suggested here to overcome its input size limit are crucial to achieve the top performance.
Original languageEnglish
Number of pages10
Publication statusPublished - 18 Jun 2021
EventUK Healthcare Text Analytics Conference 2021 - Virtual, London, United Kingdom
Duration: 17 Jun 202118 Jun 2021


ConferenceUK Healthcare Text Analytics Conference 2021
Abbreviated titleHealTAC 2021
Country/TerritoryUnited Kingdom
Internet address


  • predicting
  • prediction
  • clinical events
  • raw text
  • bag of words
  • attention-based transformers


Dive into the research topics of 'Predicting clinical events based on raw text: from bag of words to attention-based transformers'. Together they form a unique fingerprint.

Cite this