ICD intelligent coding method based on deep learning and knowledge graph

文档序号:1952747 发布日期:2021-12-10 浏览:14次 中文

阅读说明:本技术 一种基于深度学习和知识图谱的icd智能编码的方法 (ICD intelligent coding method based on deep learning and knowledge graph ) 是由 张友书 肖尚华 程岚 祝伟 于 2021-09-29 设计创作,主要内容包括:本发明提出了一种基于深度学习和知识图谱的ICD智能编码的方法,包括:获取电子病历数据和医嘱项目数据;对所述电子病历数据和医嘱项目数据进行数据标准化处理,得到标准化处理后的数据;构建BERT+BiLSTM+CRF训练诊断名称识别模型,利用该模型识别所述标准化处理后的数据的诊断名称;基于BERT模型,计算每一个诊断名称的最终ICD编码;对各个诊断名称的ICD编码进行合并;基于疾病收费项目知识图谱,根据当前病历的收费医嘱,计算本次消耗医疗资源最多的诊断,将其作为主要诊断。(The invention provides an ICD intelligent coding method based on deep learning and knowledge graph, comprising the following steps: acquiring electronic medical record data and medical advice item data; carrying out data standardization processing on the electronic medical record data and the medical advice item data to obtain data after standardization processing; building a BERT + BilSTM + CRF training diagnosis name recognition model, and recognizing the diagnosis name of the data after the standardization processing by using the model; calculating a final ICD code of each diagnosis name based on a BERT model; merging ICD codes of all diagnosis names; based on the disease charge item knowledge map, the diagnosis that consumes the most medical resources at this time is calculated according to the charge medical advice of the current medical record, and is taken as the main diagnosis.)

1. An ICD intelligent coding method based on deep learning and knowledge graph is characterized by comprising the following steps:

step S1, acquiring electronic medical record data and medical advice item data;

step S2, carrying out data standardization processing on the electronic medical record data and the medical advice item data to obtain standardized data;

step S3, constructing a BERT + BilSTM + CRF training diagnosis name recognition model, and recognizing the diagnosis name of the data after the standardization processing by using the model;

step S4, calculating the final ICD code for each diagnosis name based on the BERT model, including the following steps:

step S41, constructing a training set;

step S42, constructing a training model, comprising: based on the training set, performing fine-tuning on the basis of BERT-base to obtain a final BERT model;

step S43, coding based on the trained BERT model, including:

calculating LCS (ICD names j, Ci) for each identified diagnosis name Ci;

for the ICD name of LCS >1, finding the corresponding ICD code from the ICD dictionary, constructing Pairi < diagnosis name, ICD code >, inputting the trained bert model to obtain the probability Pi;

calculating to obtain a Pair with the maximum probability, wherein the ICD codes in the Pair are final ICD codes of diagnosis names;

step S5, merging ICD codes of all diagnosis names;

and step S6, calculating the diagnosis with the most medical resources consumed at this time according to the charge medical advice of the current medical record based on the disease charge item knowledge map, and taking the diagnosis as the main diagnosis.

2. The method for ICD intelligent coding based on deep learning and knowledge-graph according to claim 1, wherein in the step S2, the step of performing data standardization on the electronic medical record data and the order item data comprises the steps of: standardizing medical record document names, standardizing medical record field names and standardizing medical advice charge items.

3. The method for ICD intelligent coding based on deep learning and knowledge-graph according to claim 1, wherein in the step S3,

step S31, constructing a training set

Step S32, constructing a BERT + BilSTM + CRF training diagnosis name recognition model based on the training set;

step S33, diagnosis name recognition is performed based on the trained model, and a diagnosis name in the text of the normalized data is recognized.

4. The method for ICD intelligent coding based on deep learning and knowledge-graph according to claim 3, wherein in the step S32, building a BERT + BiLSTM + CRF training diagnosis name recognition model comprises the following steps:

performing word embedding representation on input text by using a pre-training language model bert-base;

the word embedding expression is used as the input of the BilSTM, and the probability of each character pair BIOE is output;

and inputting the character BIOE probability as CRF, and outputting a BIOE label of each character.

5. The method for ICD intelligent coding based on deep learning and knowledge-graph according to claim 1, wherein in the step S5, the ICD codes of the diagnosis names are merged, comprising the steps of:

step S51, establishing a diagnostic code merging rule set S ═ { S1, S2, S3, …, sn }, wherein each rule is a ternary expression code1+ code2- > code3, code1, code2 and code3 are diagnostic codes;

step S52, establishing a diagnosis code inverted list;

step S53, traverse all current diagnoses C, for CiFinding out a relevant merging rule set Si according to the diagnosis code inverted list;

step S54, go through all rules S ═ c in Sii+ci1->ci2Looking at c in ternary expressioni1Whether or not it is present in all diagnoses C;

if so, ci2Adding to and deleting C from all diagnoses CiAnd ci1Repeating step S53; if not, the next rule is traversed until the traversal is complete.

6. The method for ICD intelligent coding based on deep learning and knowledge-graph according to claim 1, wherein in the step S6,

go through all diagnoses C, for the current diagnosis Ci

Finding C from the disease-charge item knowledge mapiCorresponding charging item Ki

All medical orders F and KiFind the intersection to get Fi

Traverse FiAmount of (1) to obtain CiCorresponding total medical resource consumption amount Mi

According to M pressingiSorting from big to small to obtain CiA corresponding order;

the first to be paid is the primary diagnosis.

Technical Field

The invention relates to the technical field of intelligent coding, in particular to an ICD intelligent coding method based on deep learning and knowledge maps.

Background

Computer-aided coding, there are currently three main technical solutions:

the first is a keyword search prompting scheme, similar to keyword prompting of hundred-degree search, which searches all ICD code names, prompts the ICD names and codes and guides a coding person to operate step by step to obtain the final codes based on diagnosis keywords input by a doctor.

The second is a rule-based coding system, which sets a certain coding logic rule, triggers the rule under certain conditions and prompts correct coding.

The third is based on the AI intelligent coding scheme. And (3) applying an advanced natural language processing technology and a deep learning model, and automatically generating correct ICD codes based on medical record information without manual intervention.

For the third scheme, the prior patent "a real-time intelligent auxiliary ICD coding system and method based on medical records" discloses an ICD computer-assisted coding method, but the method has the following problems:

1. the main diagnostic option, no cost information is considered. The diagnosis that consumes the most medical resources should in principle be selected as the primary diagnosis.

2. The diagnostic merge module is absent. The merging coding problem is not solved, as shown in table 1.

Table 1 merging coding cases

3. The model textCNN is not good enough.

Disclosure of Invention

The object of the present invention is to solve at least one of the technical drawbacks mentioned.

Therefore, the invention aims to provide a method for ICD intelligent coding based on deep learning and knowledge graph.

In order to achieve the above object, an embodiment of the present invention provides a method for ICD intelligent coding based on deep learning and knowledge graph, including the following steps:

step S1, acquiring electronic medical record data and medical advice item data;

step S2, carrying out data standardization processing on the electronic medical record data and the medical advice item data to obtain standardized data;

step S3, constructing a BERT + BilSTM + CRF training diagnosis name recognition model, and recognizing the diagnosis name of the data after the standardization processing by using the model;

step S4, calculating the final ICD code for each diagnosis name based on the BERT model, including the following steps:

step S41, constructing a training set;

step S42, constructing a training model, comprising: based on the training set, performing fine-tuning on the basis of BERT-base to obtain a final BERT model;

step S43, coding based on the trained BERT model, including:

calculating LCS (ICD names j, Ci) for each identified diagnosis name Ci;

for the ICD name of LCS >1, finding the corresponding ICD code from the ICD dictionary, constructing Pairi < diagnosis name, ICD code >, inputting the trained bert model to obtain the probability Pi;

calculating to obtain a Pair with the maximum probability, wherein the ICD codes in the Pair are final ICD codes of diagnosis names;

step S5, merging ICD codes of all diagnosis names;

and step S6, calculating the diagnosis with the most medical resources consumed at this time according to the charge medical advice of the current medical record based on the disease charge item knowledge map, and taking the diagnosis as the main diagnosis.

Further, in step S2, the data normalization processing of the electronic medical record data and the order item data includes the steps of: standardizing medical record document names, standardizing medical record field names and standardizing medical advice charge items.

Further, in the step S3,

step S31, constructing a training set

Step S32, constructing a BERT + BilSTM + CRF training diagnosis name recognition model based on the training set;

step S33, diagnosis name recognition is performed based on the trained model, and a diagnosis name in the text of the normalized data is recognized.

Further, in the step S32, constructing a BERT + BiLSTM + CRF training diagnosis name recognition model, which includes the following steps:

performing word embedding representation on input text by using a pre-training language model bert-base;

the word embedding expression is used as the input of the BilSTM, and the probability of each character pair BIOE is output;

and inputting the character BIOE probability as CRF, and outputting a BIOE label of each character.

Further, in step S5, the merging ICD codes of the diagnosis names includes the following steps:

step S51, establishing a diagnostic code merging rule set S ═ { S1, S2, S3, …, sn }, wherein each rule is a ternary expression code1+ code2- > code3, code1, code2 and code3 are diagnostic codes;

step S52, establishing a diagnosis code inverted list;

step S53, traverse all current diagnoses C, for CiFinding out a relevant merging rule set Si according to the diagnosis code inverted list;

step S54, go through all rules S ═ c in Sii+ci1->ci2Looking at c in ternary expressioni1Whether or not it is present in all diagnoses C;

if so, ci2Adding to and deleting C from all diagnoses CiAnd ci1Repeating step S53; if not, the next rule is traversed until the traversal is complete.

Further, in the step S6,

go through all diagnoses C, for the current diagnosis Ci

Finding C from the disease-charge item knowledge mapiCorresponding charging item Ki

All medical orders F and KiFind the intersection to get Fi

Traverse FiAmount of (1) to obtain CiCorresponding total medical resource consumption amount Mi

According to M pressingiSorting from big to small to obtain CiA corresponding order;

the first to be paid is the primary diagnosis.

According to the ICD intelligent coding method based on deep learning and knowledge graph, the disease and charging item knowledge graph is introduced, and main diagnosis is accurately selected; introducing a coding combination rule to improve the coding accuracy; the latest deep learning model BERT (BERT is a natural language processing model with good recognized effect at present) is applied to accurately map the diagnosis nouns to the standard ICD codes. The invention does not need manual intervention, and automatically codes, thereby greatly reducing the workload of doctors and coders; the coding environment is preposed, so that a doctor can directly code according to the condition of a patient by using the system, the communication times between the doctor and the coding can be obviously reduced, the coding working efficiency is improved, and the coding accuracy is improved; introducing a knowledge graph of a disease charging project and a coding combination rule to improve the coding accuracy; and accurately mapping the diagnosis nouns to standard ICD codes by adopting the latest deep learning model BERT.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a flowchart of a method for ICD intelligent encoding based on deep learning and knowledge-graph according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

Several terms of art to which the present invention relates are explained below.

DRG: the Diagnosis Related Group (DRG) is an important tool for measuring the efficiency of medical service quality and making medical insurance payments. DRG is essentially a case combination classification scheme, a system that manages patients by grouping them into diagnostic groups based on factors such as age, disease diagnosis, complications, treatment modality, severity of the condition, outcome and resource consumption.

First page of medical record: the first page of the hospitalization case is a case data summary formed by refining and summarizing the relevant information of the hospitalization period of the patient in a specific table by using characters, symbols, codes, numbers and the like. The first page of the hospitalization case comprises basic information of the patient, hospitalization process information, diagnosis and treatment information and expense information. The ICD code for diagnosis and operation is the most important factor for determining the DRG grouping device.

ICD coding: international Classification of Diseases (ICD) classifies diseases according to certain characteristics of diseases according to rules and expressed by coding methods. ICD rules specify that when two disease diagnoses or one disease diagnosis are accompanied by an associated clinical presentation, a combined code is used to reflect the overall condition of the disease

The main diagnostic options are: the main diagnosis is generally the reason for hospitalization of the patient, and in principle, the diagnosis should be selected to have the most medical resources consumed, the most harm to the health of the patient, and the longest hospitalization time. The main diagnosis selection is always the important and difficult content in the filling of the first page of the medical record, and directly influences the accuracy of ICD coding and DRGs grouping.

As shown in fig. 1, the method for ICD intelligent coding based on deep learning and knowledge graph according to an embodiment of the present invention includes the following steps:

and step S1, acquiring the electronic medical record data and the order item data.

Specifically, key information such as admission records, operation records, discharge records, death records, charge orders and the like is obtained from an electronic medical record system and an order system.

And step S2, performing data standardization processing on the electronic medical record data and the order item data to obtain standardized data.

Specifically, electronic medical record documents of various hospitals are standardized, so that subsequent unified processing and identification are facilitated. Performing a normalization process comprising: the medical record document name is standardized, the medical record field name is standardized, and the medical advice charge items are standardized.

And step S3, constructing a BERT + BilSTM + CRF training diagnosis name recognition model, and recognizing the diagnosis name of the data after the standardization processing by using the model.

Step S31, a training set is constructed. Based on the electronic medical records of the past three months, a diagnosis name recognition training set S ═ S is constructed1,s2,s3,…,snIn which s isiIn this case, the model is trained in step S32, where the diagnosis includes "1. upper gastrointestinal hemorrhage 2. gastric ulcer 3. coronary atherosclerotic heart failure", and the diagnosis includes upper gastrointestinal hemorrhage, gastric ulcer, coronary atherosclerosis, and heart failure. And training a diagnosis name recognition model by adopting BERT + BilSTM + CRF based on the training set.

Step S321, using a pre-training language model bert-base to perform word embedding representation on the input text

Step S322, inputting the word embedding representation as BilSTM, and outputting the probability of each character pair BIOE

Step S323, inputting the character BIOE probability as CRF, and outputting the BIOE label of each character

And step S33, identifying the diagnosis name based on the trained model to obtain the diagnosis name in a section of text.

Step S4, calculating the final ICD code for each diagnosis name based on the BERT model, including the following steps:

step S41, constructing a training set;

based on the electronic medical records of the past three months, a diagnosis name code training set S ═ S is constructed1,s2,s3,…,snIn which s isiIs composed of<Diagnostic name, ICD coding>Such as<Coronary heart disease, I25.102>. The negative examples are generated randomly.

Step S42, constructing a training model, comprising: and based on the training set, performing fine-tuning on the basis of the bert-base by adopting a next sense prediction mode to obtain a final bert model.

Step S43, coding based on the trained BERT model, including:

for each diagnostic name C identifiediCalculating LCS (ICD name)j,Ci);

For LCS>1, finding out the corresponding ICD code from the ICD dictionary to construct Pairi<Diagnostic name, ICD coding>Inputting the trained bert model to obtain the probability Pi

And calculating to obtain the Pair with the highest probability, wherein the ICD codes in the Pair are the final ICD codes of the diagnosis names.

Step S5, merging ICD codes of all diagnosis names; .

Step S51, establishing a diagnostic code merge rule set S ═ S1,s2,s3,…,snEach rule is a ternary expression code1+ code2->code3, code1, code2, and code3 are diagnostic codes. When code1 and code2 occur simultaneouslyCode3 is generated and code1 and code2 are removed.

Step S52, building a diagnosis code inverted list to increase the search speed, and reducing the complexity of the search average time from O (n) to O (1). The inverted rows are as follows:

code1:s1,s2

code2:s1,s3

code3:s4

step S53, traverse all current diagnoses C, for CiFinding out relevant merging rule set S according to the diagnosis code inverted listi

Step S54, go through all rules S ═ c in Sii+ci1->ci2Looking at c in ternary expressioni1And whether it is present in all diagnoses C. If so, ci2Adding to and deleting C from all diagnoses CiAnd ci1. Step S53 is repeated. If not, the next rule is traversed until the traversal is complete.

And step S6, calculating the diagnosis with the most medical resources consumed at this time according to the charge medical advice of the current medical record based on the disease charge item knowledge map, and taking the diagnosis as the main diagnosis.

In particular, all diagnoses C are traversed, for the current diagnosis Ci(ii) a Finding C from the disease-charge item knowledge mapiCorresponding charging item Ki(ii) a All medical orders F and KiFind the intersection to get Fi(ii) a Traverse FiAmount of (1) to obtain CiCorresponding total medical resource consumption amount Mi(ii) a According to M pressingiSorting from big to small to obtain CiA corresponding order; the first to be paid is the primary diagnosis.

According to the ICD intelligent coding method based on deep learning and knowledge graph, the disease and charging item knowledge graph is introduced, and main diagnosis is accurately selected; introducing a coding combination rule to improve the coding accuracy; the latest deep learning model BERT (BERT is a natural language processing model with good recognized effect at present) is applied to accurately map the diagnosis nouns to the standard ICD codes. The invention does not need manual intervention, and automatically codes, thereby greatly reducing the workload of doctors and coders; the coding environment is preposed, so that a doctor can directly code according to the condition of a patient by using the system, the communication times between the doctor and the coding can be obviously reduced, the coding working efficiency is improved, and the coding accuracy is improved; introducing a knowledge graph of a disease charging project and a coding combination rule to improve the coding accuracy; and accurately mapping the diagnosis nouns to standard ICD codes by adopting the latest deep learning model BERT.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art without departing from the principle and spirit of the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.

9页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种地区DRG分组模拟方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!