Medical record encoding method and device, storage medium and electronic equipment

文档序号：1230297 发布日期：2020-09-08 浏览：38次中文

阅读说明：本技术 病历编码方法、装置、存储介质及电子设备 (Medical record encoding method and device, storage medium and electronic equipment ) 是由焦增涛于 2020-05-27 设计创作，主要内容包括：本公开的实施例提供了一种病历编码方法、装置、存储介质及电子设备。该方法包括：从目标病历中提取多个目标实体；基于预先设置的知识图谱中实体的个数,确定所述多个目标实体对应的独热码；基于预先设置的知识图谱中各实体的第一向量,确定所述多个目标实体对应的多个第一向量；基于所述多个目标实体对应的多个第一向量以及所述多个目标实体对应的独热码,确定所述目标病历的向量；通过深度学习模型,确定所述目标病历的向量所对应的编码。基于知识图谱确定目标病历的向量,实现了目标病历的准确的向量表示,然后结合深度学习模型,实现基于向量的自动编码,提升了病历编码的效率以及准确率。(The embodiment of the disclosure provides a medical record encoding method and device, a storage medium and electronic equipment. The method comprises the following steps: extracting a plurality of target entities from a target medical record; determining the one-hot codes corresponding to the target entities based on the number of the entities in the preset knowledge graph; determining a plurality of first vectors corresponding to the target entities based on a first vector of each entity in a preset knowledge graph; determining a vector of the target medical record based on a plurality of first vectors corresponding to the target entities and the one-hot codes corresponding to the target entities; and determining the code corresponding to the vector of the target medical record through a deep learning model. The vector of the target medical record is determined based on the knowledge graph, accurate vector representation of the target medical record is achieved, then automatic coding based on the vector is achieved by combining a deep learning model, and medical record coding efficiency and accuracy are improved.)

1. A medical record encoding method, the method comprising:

extracting a plurality of target entities from a target medical record;

determining the one-hot codes corresponding to the target entities based on the number of the entities in the preset knowledge graph;

determining a plurality of first vectors corresponding to the target entities based on a first vector of each entity in a preset knowledge graph;

determining a vector of the target medical record based on a plurality of first vectors corresponding to the target entities and the one-hot codes corresponding to the target entities;

and determining the code corresponding to the vector of the target medical record through a deep learning model.

2. The method of claim 1, wherein determining the vector of the target medical record based on a plurality of first vectors corresponding to the plurality of target entities and the one-hot codes corresponding to the plurality of target entities comprises:

and replacing the dimensionality values corresponding to the target entities in the one-hot codes corresponding to the target entities with a plurality of first vectors corresponding to the target entities to determine the vectors of the target medical records.

3. The method of claim 1, wherein determining the one-hot codes corresponding to the plurality of target entities based on a number of entities in a preset knowledge graph comprises:

generating an one-hot code containing the dimensionality of the number of the entities in the knowledge graph according to the number of the entities in the preset knowledge graph, and generating a value of each dimensionality in the one-hot code according to a comparison result of the target entities and the entities in the knowledge graph.

4. The method of claim 1, wherein before determining the first vectors corresponding to the target entities based on the first vectors of the entities in the preset knowledge-graph, the method further comprises: acquiring a first vector of each entity in the knowledge graph;

obtaining a first vector for each entity in the knowledge-graph, comprising:

determining a second vector for each entity in the knowledge-graph based on a community discovery algorithm;

determining a third vector for each entity in the knowledge-graph based on a translation vector algorithm;

determining a first vector for each entity in the knowledge-graph based on the second vector and the third vector for each entity in the knowledge-graph; wherein the number of dimensions of the first vector of each entity is equal to the sum of the number of dimensions of the second vector and the number of dimensions of the third vector.

5. The method of claim 4, wherein determining a first vector for each entity in the knowledge-graph based on the second vector and the third vector for each entity in the knowledge-graph comprises:

respectively normalizing the second vector and the third vector of each entity;

determining weight coefficients of the second vector and the third vector;

determining a first vector of each entity in the knowledge-graph based on the normalized result of the second vector and the third vector of each entity and the weight coefficients of the second vector and the third vector.

6. The method of claim 1, wherein extracting a plurality of target entities from a target medical record comprises:

obtaining a plurality of record lists from the target medical record;

and extracting the target entity of the corresponding category from the target field of each record table based on the corresponding relation between the target field of the record table and the category of the target entity.

7. The method of claim 1, wherein prior to determining, via a deep learning model, the encoding to which the vector of the target medical record corresponds, the method further comprises: acquiring a deep learning model;

obtaining a deep learning model comprising:

taking a first vector of an entity in the knowledge graph as sample data, and acquiring a real code of the sample data;

determining predictive coding of the sample data based on the initial deep learning model;

determining a loss function of the initial deep learning model based on the real coding and the predictive coding of each sample data;

determining parameters of a deep learning model based on the loss function;

a deep learning model is determined based on the parameters.

8. An apparatus for encoding medical records, the apparatus comprising:

an entity extraction module configured to extract a plurality of target entities from a target medical record;

the first determining module is configured to determine the one-hot codes corresponding to the target entities based on the number of the entities in the preset knowledge graph;

the second determining module is configured to determine a plurality of first vectors corresponding to the target entities based on the first vectors of the entities in the preset knowledge graph;

a third determining module configured to determine a vector of the target medical record based on a plurality of first vectors corresponding to the plurality of target entities and the one-hot codes corresponding to the plurality of target entities;

and the fourth determining module is configured to determine, through a deep learning model, a code corresponding to the vector of the target medical record.

9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.

10. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

Technical Field

The present disclosure relates to the field of computer technology and information processing technology, and in particular, to a medical record encoding method, device, storage medium, and electronic device.

Background

International Classification of Diseases (ICD) can classify Diseases into an ordered combination according to their etiology, pathology, clinical manifestations and anatomical location, and can be expressed by a coding method. Currently, the 10 th revision of the international statistical Classification of diseases and related health problems, which retains the ICD abbreviation, is called ICD-10.

Currently, the medical record coding generally adopts the following modes:

1. purely manual coding

The encoding mode has high labor cost and low efficiency, and different encoding personnel may not understand the encoding mode in a manual manner, so that various subsequent statistical analysis works based on medical record encoding cannot be carried out or results are wrong.

2. Encoding based on keyword search

The key words of the coding mode are important, the key word extraction is difficult to ensure the accuracy, and the search recall is difficult to accurately rank in the front.

3. Coding recommendation directly using multi-classification models

Since ICDs of multiple versions have tens of thousands of categories, the categories directly classified by the multi-classification model are too many, and the model effect is difficult to ensure due to the corpus scale and distribution.

Therefore, a new medical record encoding method, device, storage medium and electronic device are needed to realize efficient and accurate medical record encoding.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The embodiment of the disclosure provides a medical record encoding method, a medical record encoding device, a storage medium and electronic equipment, and realizes efficient and accurate medical record encoding.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to an aspect of the embodiments of the present disclosure, there is provided a medical record encoding method, wherein the method includes: extracting a plurality of target entities from a target medical record; determining the one-hot codes corresponding to the target entities based on the number of the entities in the preset knowledge graph; determining a plurality of first vectors corresponding to the target entities based on a first vector of each entity in a preset knowledge graph; determining a vector of the target medical record based on a plurality of first vectors corresponding to the target entities and the one-hot codes corresponding to the target entities; and determining the code corresponding to the vector of the target medical record through a deep learning model.

In some exemplary embodiments of the present disclosure, based on the foregoing scheme, determining a vector of the target medical record based on a plurality of first vectors corresponding to the plurality of target entities and the one-hot codes corresponding to the plurality of target entities includes: and replacing the dimensionality values corresponding to the target entities in the one-hot codes corresponding to the target entities with a plurality of first vectors corresponding to the target entities to determine the vectors of the target medical records.

In some exemplary embodiments of the present disclosure, based on the foregoing scheme, determining the one-hot codes corresponding to the multiple target entities based on the number of entities in the preset knowledge graph includes: generating an one-hot code containing the dimensionality of the number of the entities in the knowledge graph according to the number of the entities in the preset knowledge graph, and generating a value of each dimensionality in the one-hot code according to a comparison result of the target entities and the entities in the knowledge graph.

In some exemplary embodiments of the present disclosure, before determining, based on the foregoing scheme, a plurality of first vectors corresponding to the target entities based on a first vector of each entity in a preset knowledge graph, the method further includes: acquiring a first vector of each entity in the knowledge graph; obtaining a first vector for each entity in the knowledge-graph, comprising: determining a second vector for each entity in the knowledge-graph based on a community discovery algorithm; determining a third vector for each entity in the knowledge-graph based on a translation vector algorithm; determining a first vector for each entity in the knowledge-graph based on the second vector and the third vector for each entity in the knowledge-graph; wherein the number of dimensions of the first vector of each entity is equal to the sum of the number of dimensions of the second vector and the number of dimensions of the third vector.

In some exemplary embodiments of the present disclosure, determining the first vector of each entity in the knowledge-graph based on the second vector and the third vector of each entity in the knowledge-graph based on the foregoing scheme comprises: respectively normalizing the second vector and the third vector of each entity; determining weight coefficients of the second vector and the third vector; determining a first vector of each entity in the knowledge-graph based on the normalized result of the second vector and the third vector of each entity and the weight coefficients of the second vector and the third vector.

In some exemplary embodiments of the present disclosure, based on the foregoing scheme, extracting a target entity from a target medical record includes: obtaining a plurality of record lists from the target medical record; and extracting the target entity of the corresponding category from the target field of each record table based on the corresponding relation between the target field of the record table and the category of the target entity.

In some exemplary embodiments of the present disclosure, based on the foregoing, the method further includes: before determining the encoding of the target medical record based on a deep learning model and the first vector of the target entity, the method further comprises: acquiring a deep learning model; obtaining a deep learning model comprising:

taking a first vector of an entity in the knowledge graph as sample data, and acquiring a real code of the sample data; determining predictive coding of the sample data based on the initial deep learning model; determining a loss function of the initial deep learning model based on the real coding and the predictive coding of each sample data; determining parameters of a deep learning model based on the loss function; a deep learning model is determined based on the parameters.

According to an aspect of the embodiments of the present disclosure, there is provided a medical record encoding apparatus, wherein the apparatus includes: an entity extraction module configured to extract a plurality of target entities from a target medical record; the first determining module is configured to determine the one-hot codes corresponding to the target entities based on the number of the entities in the preset knowledge graph; the second determining module is configured to determine a plurality of first vectors corresponding to the target entities based on the first vectors of the entities in the preset knowledge graph; a third determining module configured to determine a vector of the target medical record based on a plurality of first vectors corresponding to the plurality of target entities and the one-hot codes corresponding to the plurality of target entities; and the fourth determining module is configured to determine, through a deep learning model, a code corresponding to the vector of the target medical record.

In some exemplary embodiments of the present disclosure, based on the foregoing solution, the third determining module is configured to replace the values of the dimensions corresponding to the plurality of target entities in the one-hot codes corresponding to the plurality of target entities with the plurality of first vectors corresponding to the plurality of target entities to determine the vector of the target medical record.

In some exemplary embodiments of the present disclosure, based on the foregoing scheme, the first determining unit is configured to generate an one-hot code including a dimension of the number of entities in the knowledge graph according to the number of entities in the preset knowledge graph, and generate a value of each dimension in the one-hot code according to a comparison result between the target entities and the entities in the knowledge graph.

In some exemplary embodiments of the present disclosure, based on the foregoing solution, the apparatus further includes a vector obtaining module configured to obtain a first vector of each entity in the knowledge-graph; the vector acquisition module comprises: a first determination unit configured to determine a second vector for each entity in the knowledge-graph based on a community discovery algorithm; a second determination unit configured to determine a third vector for each entity in the knowledge-graph based on a translation vector algorithm; a third determination unit configured to determine a first vector for each entity in the knowledge-graph based on the second vector and the third vector for each entity in the knowledge-graph; wherein the number of dimensions of the first vector of each entity is equal to the sum of the number of dimensions of the second vector and the number of dimensions of the third vector.

In some exemplary embodiments of the present disclosure, based on the foregoing scheme, the third determining module is configured to normalize the second vector and the third vector of each entity respectively; determining weight coefficients of the second vector and the third vector; determining a first vector of each entity in the knowledge-graph based on the normalized result of the second vector and the third vector of each entity and the weight coefficients of the second vector and the third vector.

In some exemplary embodiments of the present disclosure, based on the foregoing solution, the entity extraction module is configured to obtain a plurality of record tables from the target medical record; and extracting the target entity of the corresponding category from the target field of each record table based on the corresponding relation between the target field of the record table and the category of the target entity.

In some exemplary embodiments of the present disclosure, based on the foregoing, the apparatus further includes a model obtaining module configured to obtain a deep learning model; the model acquisition module is further configured to acquire a real code of sample data by taking a first vector of an entity in the knowledge graph as the sample data; determining predictive coding of the sample data based on the initial deep learning model; determining a loss function of the initial deep learning model based on the real coding and the predictive coding of each sample data; determining parameters of a deep learning model based on the loss function; a deep learning model is determined based on the parameters.

According to an aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program, wherein the computer program is configured to implement the method as described in the above embodiments when executed by a processor.

According to an aspect of an embodiment of the present disclosure, there is provided an electronic device including: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method as described in the embodiments above.

According to the embodiment of the invention, a plurality of target entities are extracted from a target medical record; determining the one-hot codes corresponding to the target entities based on the number of the entities in the preset knowledge graph; determining a plurality of first vectors corresponding to the target entities based on a first vector of each entity in a preset knowledge graph; determining a vector of the target medical record based on a plurality of first vectors corresponding to the target entities and the one-hot codes corresponding to the target entities; and determining the code corresponding to the vector of the target medical record through a deep learning model. The vector of the target medical record is determined based on the knowledge graph, accurate vector representation of the target medical record is achieved, then automatic coding based on the vector is achieved by combining a deep learning model, and medical record coding efficiency and accuracy are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

In the drawings:

fig. 1 schematically illustrates a flow diagram of a medical record encoding method according to one embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow diagram of a method of a first vector of entities in a knowledge-graph according to one embodiment of the present disclosure;

FIG. 3 schematically illustrates a block diagram of a medical record encoding apparatus according to an embodiment of the disclosure;

FIG. 4 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Fig. 1 schematically illustrates a flow diagram of a medical record encoding method according to one embodiment of the present disclosure. The method provided by the embodiments of the present disclosure may be processed by any electronic device with computing processing capability, for example, a server or a terminal device, and in the following embodiments, the server is taken as an example for illustration, but the present disclosure is not limited thereto.

As shown in fig. 1, a method for coding medical records provided by an embodiment of the present disclosure may include the following steps:

in step S110, a plurality of target entities are extracted from the target medical records.

In the embodiment of the present disclosure, the target medical records may be electronic medical records provided by hospitals, and each target medical record may include a plurality of record tables, such as an admission record table, a discharge record table, an inspection record table, an operation record table, a medical order record table, and the like. A plurality of target entities can be extracted from each target medical record.

In the embodiment of the present disclosure, a correspondence between a target field of a record table and a category of a target entity is preset, and after a record table is obtained from a target medical record, a target entity of a corresponding category is extracted from each record table based on the correspondence.

It is noted that different fields in the record table may correspond to different categories of target entities, so that multiple categories of target entities may be extracted from one record table, and each extracted target entity may include multiple target entities.

Table 1 is a correspondence between a partial record table and a category of a target entity provided in the embodiment of the present invention.

Categories of target entities	Target field of record table
		Symptoms and signs	Admission record table>Chief complaints
Diagnosis of	Discharge recording table>Discharge diagnosis (first order)
		Examination of	Check record table->Syndrome item (Exception representation)
Surgery	Operation recording table>Name of operation
		Medicine and food additive	Medicine order record table>Common name of medicine
Whether or not to smoke	Admission record table>Personal history

TABLE 1

As shown in table 1, different fields based on the admission log (chief complaints and personal history) may correspond to different categories of target entities (symptoms and whether to smoke) and, for a drug order log, may correspond to multiple drugs.

It should be noted that the category of the target entity may be freely expanded, for example, set by a user, when a new category of the target entity is added, a target field of the record table corresponding to the target entity needs to be set based on the reliability, accuracy, and update timeliness of data, and the corresponding relationship is added to the corresponding relationship between the target field of the preset record table and the category of the target entity.

As shown in table 1, the discharge diagnosis field in the discharge record table is used to extract the type of diagnosis corresponding to the first discharge diagnosis in the order. This is because the reliability, accuracy and timeliness of the recorded data are different, for example, the discharge diagnosis in the discharge record table may include multiple diagnoses, and the first diagnosis in the order is usually the most important diagnosis, so the first field in the order of the discharge diagnosis in the discharge record table is set to extract the medical terms of the diagnosis.

In the embodiment of the invention, the target field of the record table for extracting the category of the target entity is set based on the reliability, the accuracy and the updating timeliness of the data, so that the reliability, the accuracy and the updating timeliness of the extracted target entity are improved, accurate source data are provided for subsequent medical record coding, and the accuracy of the medical record is improved.

In step S120, unique hot codes corresponding to the target entities are determined based on the number of entities in the preset knowledge graph.

In the embodiment of the disclosure, a knowledge graph can be preset based on a large amount of medical record data and an algorithm, vertices in the knowledge graph can comprise entities and attributes, and edges connected between the vertices represent the relationship between the two vertices.

In the embodiment of the disclosure, after the knowledge graph is preset, the unique hot codes corresponding to a plurality of entities extracted from the target medical record can be determined based on the number of the entities in the knowledge graph.

In the embodiment of the invention, according to the number of the entities in the preset knowledge graph, the one-hot code of the dimensionality containing the number of the entities in the knowledge graph can be generated, and according to the comparison result of the target entities and the entities in the knowledge graph, the value of each dimensionality in the one-hot code is generated.

It should be noted that, according to the number of entities in the knowledge graph, the number of dimensions of the one-hot encoding of the multiple target entities in the target medical record may be determined, for example, if there are 1 ten thousand entities in the knowledge graph, the length of the W vector of the one-hot encoding of the multiple target entities is 1 ten thousand dimensions. The one-hot code may use 1 or 0 to represent a comparison result between the target entity and an entity in the knowledge graph, and if the target entity belongs to the entity of the knowledge graph or the comparison result between the target entity and a certain entity in the knowledge graph is the same, the value of the dimension is 1, otherwise, the value is 0.

For example, the plurality of target entities extracted from the target medical records are: A. b, C, D, E, wherein the number of entities in the knowledge graph is 10, each being A, B, C, D, E, F, G, H, I, J, and the unique hot codes of the target entities in the target medical record are: 1111100000.

it should be noted that the one-hot code is used to form a vector representation of a plurality of target entities of the target medical record, and in the embodiment of the present invention, a more complex expert method may be used to form a vector representation of a plurality of target entities of the target medical record.

In S130, a plurality of first vectors corresponding to the target entities are determined based on the first vectors of the entities in the preset knowledge graph.

In the embodiment of the invention, after the knowledge graph is preset, the first vectors of all the entities in the knowledge graph can be further obtained, and then a plurality of first vectors of a plurality of target entities are determined.

In the embodiment of the present invention, the first vector of each target entity may be determined based on the comparison result between the target entity and each entity in the knowledge graph. For example, if the target entity is the same as an entity in the knowledge-graph, a first vector for the target entity may be determined based on the first vector for the entity in the knowledge-graph.

It should be noted that the present invention is not limited to executing S130 after the execution of step S120 is finished, and S130 may be executed first and then S120 may be executed.

In step S140, a vector of the target medical record is determined based on a plurality of first vectors corresponding to the target entities and the unique codes corresponding to the target entities.

In the embodiment of the present invention, after the first vectors and the one-hot codes of the plurality of target entities are determined, the first vectors corresponding to the plurality of target entities are used to replace the values of the dimensions corresponding to the plurality of target entities in the one-hot codes corresponding to the plurality of target entities, so as to determine the vectors of the target medical records.

It should be noted that the number of dimensions of the vector of the target medical record is the same as the number of dimensions of the one-hot code, and the dimension value of the vector of the target medical record corresponding to the target entity with the dimension value of 1 in the one-hot code is the first vector of the target entity in the knowledge graph.

For example, the first vector of each entity in the knowledge graph is 512 dimensions, if the bit value of the current dimension of the obtained one-hot code is 1, the bit dimension value is replaced with the specific first vector (512 dimensions) of the entity in the knowledge graph, if the bit value of the current dimension of the obtained one-hot code is 0, the dimension value is replaced with the 0 vector of 512 dimensions, and assuming that 1 ten thousand entities are shared in the knowledge graph, the obtained target medical record is 1 ten thousand dimensions based on the dimension of the knowledge graph, and if each dimension of the one-hot code in the knowledge graph is 512 dimensions, the obtained vector of the target medical record is a vector of 512 x 1 ten thousand dimensions.

In the embodiment of the invention, the first vector of the entity in the knowledge-graph can be determined based on a second vector and a third vector of the entity, wherein the second vector represents the vector of the entity in the knowledge-graph determined based on the community discovery algorithm, the third vector represents the vector of the entity in the knowledge-graph determined based on the translation vector algorithm, weight coefficients are respectively set for the second vector and the third vector, and the first vector is determined by using the second vector, the third vector and the respective weight coefficients.

In step S150, a code corresponding to the vector of the target medical record is determined through a deep learning model.

In the embodiment of the invention, the deep learning model can be acquired firstly. When the deep learning model is obtained, the first vector of the entity in the knowledge graph may be used as sample data, and a real code of the sample data may be obtained, and the real code may also be referred to as a tag of the sample data. An initial deep learning module is constructed, prediction coding of sample data is determined based on the initial deep learning module, then a loss function of the initial deep learning module is determined based on real coding (label) and prediction coding of each sample data, parameters of the deep learning module are determined based on the loss function, and the deep learning module is determined based on the parameters.

It should be noted that in the embodiment of the present invention, a large amount of training deep learning models including sample data of multiple classes of codes may be utilized, so as to improve the accuracy of determining codes by the deep learning models.

In the embodiment of the invention, after the deep learning model is obtained, the determined vector of the target medical record is input into the deep learning model, so that the code of the target medical record is output.

It should be noted that the present invention is not limited to determining the code of the target medical record by using the deep learning model, and other classification algorithms may also be used.

The medical record encoding method provided by the invention is further described below with reference to specific embodiments.

FIG. 2 schematically shows a flow diagram of a method of a first vector of entities in a knowledge-graph according to one embodiment of the present disclosure. As shown in fig. 2, the method may include, but is not limited to, the following steps:

in S210, a second vector of entities in the knowledge-graph is determined based on a community discovery algorithm.

It should be noted that the community discovery algorithm generally needs to detect a "block" cluster or a "community" in the network, and may discover a community structure in the network, or may be regarded as a clustering algorithm, which originates from a social network topological graph research at the earliest.

In the embodiment of the invention, the knowledge graph can be logically regarded as a topological network graph, a community finding COPRA algorithm can be used for searching community structures in the knowledge graph, and medical entities on the knowledge graph are gathered into a plurality of community clusters, which belong to a group of strongly related groups such as diagnosis, symptoms, medicines and the like of a unified community to some extent. The algorithm supports overlapping community discovery, i.e., one entity may belong to multiple communities.

For example, 256 communities are set on the knowledge graph, all communities are assigned with IDs, all entities on the knowledge graph can use a 256-dimensional vector, the vector is a second vector, the second vector is represented by V1, if V1 belongs to a community, the dimension is set to 1, and other positions are set to 0.

It should be noted that, the second vector of the entity in the knowledge graph may also be determined by spectral dichotomy, modularity, random walk randwalk, statistical reasoning, or the like.

In S220, a third vector of entities in the knowledge-graph is determined based on a translation vector algorithm.

In the embodiment of the invention, the translation vector TransE algorithm is an important algorithm in the field of knowledge maps and can generate distributed vector representation of entities and relations in the maps. The specific method is to regard the relationship in each triple instance as the translation from the entity head to the entity tail, and by continuously adjusting h, r and t (vectors of head, translation and tail), make (h + r) equal to t as much as possible, that is, h + r equals t.

In the embodiment of the invention, a distributed vector of each medical entity can be learned from the knowledge graph through a TransE algorithm, the vector is a third vector, the third vector is represented by V2, and the length of the V2 vector is 256 dimensions.

In S230, a first vector of entities in the knowledge-graph is determined based on the second vector and the third vector of entities in the knowledge-graph.

In this embodiment of the present invention, after a second vector and a third vector of an entity are obtained, the second vector and the third vector of the entity in the knowledge graph may be normalized respectively, and a first vector of the entity in the knowledge graph may be determined based on a normalization result of the second vector and the third vector of the entity in the knowledge graph and a weight coefficient of the second vector and the third vector.

In the embodiment of the present invention, when the second vector (third vector) is normalized, the root a of the sum of squares of all the dimension values may be calculated, and then all the dimension values are divided by a to obtain the result of normalization of each dimension.

In the embodiment of the present invention, weighting coefficients may be assigned to the second vector and the third vector, and the weighting coefficients may be empirical values or may be self-defined. The invention provides a specific weight coefficient: the weight coefficient of the second vector is 0.7 and the weight coefficient of the third vector is 0.3.

In the embodiment of the present invention, the first vector of the entity in the knowledge-graph is represented by V, and can be obtained by the following formula:

V＝V1*N1+V2*N2 (1)

where V denotes a first vector, V1 denotes a second vector, V2 denotes a third vector, and N1 and N2 denote weight coefficients of the second vector and the third vector, respectively.

It should be noted that after the first vector is obtained, the first vector needs to be normalized, and the normalization method refers to the normalization method of the second vector (third vector).

It should be noted that the number of dimensions of the first vector of each entity is equal to the sum of the number of dimensions of the second vector and the number of dimensions of the third vector, and the number of dimensions of the second vector and the third vector of each entity may be the same or different.

In the embodiment of the invention, a second vector of an entity in the knowledge graph is determined based on a community discovery algorithm; determining a third vector of an entity in the knowledge-graph based on a translation vector algorithm; determining a first vector of entities in the knowledge-graph based on the second vector and the third vector of entities in the knowledge-graph. The vector of the entity in the knowledge graph is determined through the community discovery algorithm and the translation vector algorithm, and compared with the method that the vector of the entity in the knowledge graph is determined only according to the second vector or the third vector, the accuracy of the entity vector is improved.

The following describes embodiments of the apparatus of the present disclosure, which can be used to perform the medical record encoding method of the present disclosure. For the details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method for standardizing the medicine information described above in the present disclosure.

Fig. 3 schematically illustrates a block diagram of a medical record encoding apparatus according to an embodiment of the present disclosure. Referring to fig. 3, a medical record encoding apparatus 300 according to an embodiment of the present disclosure may include: an entity extraction module 310, a first determination module 320, a second determination module 330, a third determination module 340, and a fourth determination module 350.

An entity extraction module 310 configured to extract a plurality of target entities from a target medical record.

The first determining module 320 is configured to determine the unique hot codes corresponding to the target entities based on the number of entities in the preset knowledge graph.

The second determining module 330 is configured to determine a plurality of first vectors corresponding to the target entities based on the first vectors of the entities in the preset knowledge graph.

A third determining module 340 configured to determine a vector of the target medical record based on a plurality of first vectors corresponding to the plurality of target entities and the one-hot codes corresponding to the plurality of target entities.

The fourth determining module 350 is configured to determine, through a deep learning model, a code corresponding to the vector of the target medical record.

FIG. 4 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present disclosure. It should be noted that the computer system 400 of the electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiments of the present disclosure.

As shown in fig. 4, the computer system 400 includes a Central Processing Unit (CPU)401 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for system operation are also stored. The CPU401, ROM402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section 407 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. A driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.

In particular, the processes described below with reference to the flowcharts may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 409, and/or installed from the removable medium 411. The computer program executes various functions defined in the system of the present application when executed by a Central Processing Unit (CPU) 401.

It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules and/or units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described modules and/or units may also be disposed in a processor. Wherein the names of such modules and/or units do not in some way constitute a limitation on the modules and/or units themselves.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method as described in the embodiments below. For example, the electronic device may implement the steps shown in fig. 1 or fig. 2.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

16页详细技术资料下载

Medical record encoding method and device, storage medium and electronic equipment

相关技术

网友询问留言