Abstract: Actually in the Hospital Information Systems, there is a wide range of clinical information representation from the Electronic Health Records (EHR), and most of the information contained in clinical reports is written in natural language free text. In this context, we are researching the problem of automatic clinical named entities recognition from free text clinical reports. We are using Snomed-CT (Systematized Nomenclature of Medicine – Clinical Terms) as dictionary to identify all kind of clinical concepts, and thus the problem we are considering is to map each clinical entity named in a free text report with its Snomed-CT unique ID. More in general, we are developed a new approach for the named entity recognition (NER) problem in specific domains, and we have applied it to recognize clinical concepts in free text clinical reports. In our approach we apply two types of NER approaches, dictionary-based and machine learning-based. We use a specific domain dictionary-based gazetteer (using Snomed-CT to get the standard clinical code for the clinical concept), and the main approach that we introduce is using a unsupervised shallow learning neural network, word2vec from Mikolov et al., to represent words as vectors, and then making the recognition based on the distance between candidates and dictionary terms. We have applied our approach on a Dataset with 318.585 clinical reports in Spanish from the emergency service of the Hospital “Rafael Méndez” from Lorca (Murcia) Spain, and preliminary results are encouraging.
Keywords: Snomed-CT, word2vec, doc2vec, clinical information extraction, skipgram, medical terminologies, search semantic, named entity recognition, ner, medical entity recognition
Cite this paper
Ignacio Martinez Soriano, Juan Luis Castro Peña. (2017) Automatic Medical Concept Extraction from Free Text Clinical Reports, a New Named Entity Recognition Approach. International Journal of Computers, 2 , 38-46

Copyright © 2017 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution License 4.0


