The 2nd Clinical Natural Language Processing Workshop

At NAACL 2019. Minneapolis, USA. June 7, 2019.

Workshop Program

Friday, June 7, 2019 in *Nicollet D1*

09:00–09:15Opening Remarks
09:15–10:30Invited Speaker (slides)
Heng Ji (University of Illinois at Urbana-Champaign)
10:30–11:20Coffee Break
11:20–12:30Oral Session 1
11:20–11:40Effective Feature Representation for Clinical Text Concept Extraction
Yifeng Tao, Bruno Godefroy, Guillaume Genthial and Christopher Potts [pdf]
11:40–11:55An Analysis of Attention over Clinical Notes for Predictive Tasks
Sarthak Jain, Ramin Mohammadi and Byron C. Wallace [pdf]
11:55–12:10Extracting Adverse Drug Event Information with Minimal Engineering
Timothy Miller, Alon Geva and Dmitriy Dligach [pdf]
12:10–12:25Hierarchical Nested Named Entity Recognition
Zita Marinho, Afonso Mendes, Sebastião Miranda and David Nogueira [pdf]
12:30–14:00Lunch
14:00–15:30Oral Session 2
14:00–14:20Towards Automatic Generation of Shareable Synthetic Clinical Notes Using Neural Language Models
Oren Melamud and Chaitanya Shivade [pdf]
14:20–14:40A Novel System for Extractive Clinical Note Summarization using EHR Data
Jennifer Liang, Ching-Huei Tsou and Ananya Poddar [pdf]
14:40–14:55Study of lexical aspect in the French medical language. Development of a lexical resource
Agathe Pierson and Cédrick Fairon [pdf]
14:55–15:10A BERT-based Universal Model for Both Within- and Cross-sentence Clinical Temporal Relation Extraction
Chen Lin, Timothy Miller, Dmitriy Dligach, Steven Bethard and Guergana Savova [pdf]
15:10–15:25Publicly Available Clinical BERT Embeddings
Emily Alsentzer, John Murphy, William Boag, Wei-Hung Weng, Di Jindi, Tristan Naumann and Matthew McDermott [pdf]
15:30–16:00Coffee Break
16:00–16:45Poster Session
A General-Purpose Annotation Model for Knowledge Discovery: Case Study in Spanish Clinical Text
Alejandro Piad-Morffis, Yoan Guitérrez, Suilan Estevez-Velarde and Rafael Muñoz [pdf]
Predicting ICU transfers using text messages between nurses and doctors
Faiza Khan Khattak, Chloe Pou-Prom, Robert Wu and Frank Rudzicz [pdf]
Medical Entity Linking using Triplet Network
Ishani Mondal, Sukannya Purkayastha, Sudeshna Sarkar, Pawan Goyal, Jitesh Pillai, Amitava Bhattacharyya and Mahanandeeshwar Gattu [pdf]
Annotating and Characterizing Clinical Sentences with Explicit Why-QA Cues
Jungwei Fan [pdf]
Extracting Factual Min/Max Age Information from Clinical Trial Studies
Yufang Hou, Debasis Ganguly, Lea Deleris and Francesca Bonin [pdf]
Distinguishing Clinical Sentiment: The Importance of Domain Adaptation in Psychiatric Patient Health Records
Eben Holderness, Philip Cawkwell, Kirsten Bolton, James Pustejovsky and Mei-Hua Hall [pdf]
Medical Word Embeddings for Spanish: Development and Evaluation
Felipe Soares, Marta Villegas, Aitor Gonzalez-Agirre, Martin Krallinger and Jordi Armengol-Estapeé [pdf]
Attention Neural Model for Temporal Relation Extraction
Sijia Liu, Liwei Wang, Vipin Chaudhary and Hongfang Liu [pdf]
Automatically Generating Psychiatric Case Notes From Digital Transcripts of Doctor-Patient Conversations
Nazmul Kazi and Indika Kahanda [pdf]
Clinical Data Classification using Conditional Random Fields and Neural Parsing for Morphologically Rich Languages
Razieh Ehsani, Tyko Niemi, Gaurav Khullar and Tiina Leivo [pdf]
16:45–17:30Panel Discussion
Hongfang Liu (Mayo Clinic)
Piet de Groen (University of Minnesota)
Elmer Bernstam (University of Texas Health Science Center, Houston)
Alistair Johnson (MIT Laboratory for Computational Physiology)

Invited Talk

Heng Ji (University of Illinois at Urbana-Champaign)
Enhancing Quality and Robustness of Biomedical Information Extraction (slides)

Abstract

Extracting information from unstructured texts has a big impact on the biomedical domain, which can potentially tackle problems from disease diagnosis, drug discovery, to precision medicine. In this talk I'll present our recent progress on improving the quality and robustness of biomedical information extraction (IE).

Our first goal is to improve the quality of extraction from a formal genre - scientific literature. IE for the biomedical domain is general more challenging than that in the general news domain since it requires broader acquisition of domain-specific knowledge and deeper understanding of complex contexts. To better encode contextual information and external background knowledge, we propose a novel knowledge base (KB)-driven tree-structured long short-term memory networks (Tree-LSTM) framework, and a graph convolutional networks model, incorporating two new types of features: (1) dependency structures to capture wide contexts; (2) entity properties (types and category descriptions) from external ontologies via entity linking. This framework achieves state-of-the-art performanceon Drug-Drug Interaction Relation Extraction and the BioNLP shared task on Event Extractoin with Genia dataset [Li et al., NAACL2019].

However, all of these current supervised deep learning models are not robust when moving to a new genre. We observe the performance of our framework significantly degrades when we move to new informal genres such as clinical notes from the i2b2 task. In fact, we face similar challenges all the time, when we move an IE system to a new genre, domain, topic, scenario, or language. One major reason lies in the improper way of using word embeddings in the DNN model. The quality of word embeddings is not consistent throughout the vocabulary due to the long-tail distribution of word frequency. Without sufficient contexts, rare word embeddings are usually less reliable than those of common words. This issue is particularly important for clinical notes which are often full of abbreviations and informal variants of entity mentions. To address this issue, we guide the model to dynamically select and compose features using explicit reliability signals (including word frequency) that inform the model of the quality of each word embedding [Lin et al., ACL2019].

Speaker Bio

Heng Ji is Edward P. Hamilton Development Chair Professor at Rensselaer Polytechnic Institute. She will join Computer Science Department of University of Illinois at Urbana-Champaign as a tenured full professor in August 2019. She received her B.A. and M. A. in Computational Linguistics from Tsinghua University, and her M.S. and Ph.D. in Computer Science from New York University. Her research interests focus on Natural Language Processing, especially on Information Extraction (IE) and Knowledge Base Population. She is selected as "Young Scientist" and a member of the Global Future Council on the Future of Computing by the World Economic Forum in 2016 and 2017. She received "AI's 10 to Watch" Award by IEEE Intelligent Systems in 2013, NSF CAREER award in 2009, faculty awards from Google, IBM, Bosch and Tencent, PACLIC2012 Best Paper Runner-up, ACL2019 Best Demo Nomination, "Best of SDM2013" paper, and "Best of ICDM2013" paper. She has coordinated the NIST TAC Knowledge Base Population task since 2010, and served as the Program Committee Co-Chair of NAACL-HLT2018. She is the associate editor for IEEE/ACM Transaction on Audio, Speech, and Language Processing.

Presenter Information

Poster boards are 8ft wide by 4ft high and will be in set up in the *Hyatt Exhibit Hall*. Posters should be put up a half hour or so before the scheduled poster session, and cannot be left up all day.

Screens will have an aspect ratio of 16:9.