AUTHOR(S): Ignacio Martinez Soriano, Juan Luis Castro Peña
ABSTRACT Actually in the Hospital Information Systems, there is a wide range of clinical information representation from the Electronic Health Records (EHR), and most of the information contained in clinical reports is written in natural language free text. In this context, we are researching the problem of automatic clinical named entities recognition from free text clinical reports. We are using Snomed-CT (Systematized Nomenclature of Medicine – Clinical Terms) as dictionary to identify all kind of clinical concepts, and thus the problem we are considering is to map each clinical entity named in a free text report with its Snomed-CT unique ID. More in general, we are developed a new approach for the named entity recognition (NER) problem in specific domains, and we have applied it to recognize clinical concepts in free text clinical reports. In our approach we apply two types of NER approaches, dictionary-based and machine learning-based. We use a specific domain dictionary-based gazetteer (using Snomed-CT to get the standard clinical code for the clinical concept), and the main approach that we introduce is using a unsupervised shallow learning neural network, word2vec from Mikolov et al., to represent words as vectors, and then making the recognition based on the distance between candidates and dictionary terms. We have applied our approach on a Dataset with 318.585 clinical reports in Spanish from the emergency service of the Hospital “Rafael Méndez” from Lorca (Murcia) Spain, and preliminary results are encouraging. |
KEYWORDS Snomed-CT, word2vec, doc2vec, clinical information extraction, skipgram, medical terminologies, search semantic, named entity recognition, ner, medical entity recognition |
Cite this paper Ignacio Martinez Soriano, Juan Luis Castro Peña. (2017) Automatic Medical Concept Extraction from Free Text Clinical Reports, a New Named Entity Recognition Approach. International Journal of Computers, 2, 38-46 |