Tissue sample classification method, device, equipment and storage medium

文档序号:1923591 发布日期:2021-12-03 浏览:12次 中文

阅读说明:本技术 组织样本的分类方法、装置、设备和存储介质 (Tissue sample classification method, device, equipment and storage medium ) 是由 蔡德 叶虎 马兆轩 肖凯文 韩骁 于 2021-09-01 设计创作,主要内容包括:本申请公开了一种组织样本的分类方法、装置、设备和存储介质,属于计算机技术领域。所述方法包括:获取目标组织样本的图像数据;基于所述目标组织样本的图像数据和可疑阳性细胞检测模型,确定多个可疑阳性细胞的特征向量和每个可疑阳性细胞的特征向量对应的分数值;在所述多个可疑阳性细胞的特征向量中,获取满足预设的分数值条件的多个参考特征向量;基于所述多个参考特征向量和样本分类模型,确定所述目标组织样本的目标样本类型。采用本申请,提供了一种可以通过计算机设备自动进行分类处理的样本分类方法,为医生提供了一种确定样本分类的参考依据,提高了确定目标样本类型的准确性。(The application discloses a method, a device, equipment and a storage medium for classifying tissue samples, and belongs to the technical field of computers. The method comprises the following steps: acquiring image data of a target tissue sample; determining feature vectors of a plurality of suspicious positive cells and score values corresponding to the feature vectors of each suspicious positive cell based on the image data of the target tissue sample and a suspicious positive cell detection model; obtaining a plurality of reference feature vectors meeting a preset score value condition from the feature vectors of the plurality of suspicious positive cells; determining a target sample type for the target tissue sample based on the plurality of reference feature vectors and a sample classification model. By the adoption of the sample classification method, the sample classification method capable of automatically performing classification processing through computer equipment is provided, a reference basis for determining sample classification is provided for doctors, and accuracy of determining the type of the target sample is improved.)

1. A method of classifying a tissue sample, the method comprising:

acquiring image data of a target tissue sample;

determining a plurality of feature vectors of suspicious positive cells and a score value corresponding to the feature vector of each suspicious positive cell based on the image data of the target tissue sample and a suspicious positive cell detection model, wherein the score value is used for indicating the classification confidence of the classification result of the feature vector of the suspicious positive cell corresponding to the score value;

obtaining a plurality of reference feature vectors meeting a preset score value condition from the feature vectors of the plurality of suspicious positive cells;

determining a target sample type for the target tissue sample based on the plurality of reference feature vectors and a sample classification model.

2. The method according to claim 1, wherein the obtaining a plurality of reference feature vectors satisfying a preset score condition among the feature vectors of the plurality of suspected positive cells comprises:

arranging the feature vectors of the plurality of suspicious positive cells according to the sequence of the corresponding fraction values from large to small, and determining the feature vectors with the first preset number as the plurality of reference feature vectors; alternatively, the first and second electrodes may be,

and acquiring the feature vectors of which the corresponding score values are greater than a preset score threshold value from the feature vectors of the suspicious positive cells, and determining the feature vectors as the plurality of reference feature vectors.

3. The method of claim 1, wherein determining the target sample type for the target tissue sample based on the plurality of reference feature vectors and a sample classification model comprises:

determining a plurality of reference feature vector sets based on the plurality of reference feature vectors;

for each reference feature vector set, inputting each reference feature vector in the reference feature vector set into the sample classification model to obtain a probability value of each sample type corresponding to the reference feature vector set;

for each sample type, calculating an average value of probability values of the sample types corresponding to the multiple reference feature vector sets to obtain an average probability value corresponding to each sample type;

and determining the sample type corresponding to the maximum average probability value as the target sample type of the target tissue sample.

4. The method of claim 3, wherein determining a plurality of sets of reference feature vectors based on the plurality of reference feature vectors comprises:

and performing Monte Carlo Monte-Carlo sampling for multiple times in the multiple reference feature vectors to obtain multiple reference feature vector sets, wherein each reference feature vector set comprises a second preset number of reference feature vectors.

5. The method of claim 3, further comprising:

and determining the uncertainty of the target sample type based on the probability value of each sample type corresponding to each reference feature vector set and the average probability value corresponding to each sample type.

6. The method of claim 5, wherein the determining the uncertainty of the target sample type based on the probability value of the target sample type corresponding to each reference feature vector set and the average probability value corresponding to the target sample type comprises:

respectively calculating relative entropies between the probability values of the multiple sample types corresponding to each reference feature vector set and the average probability values corresponding to the multiple sample types to obtain the relative entropy corresponding to each reference feature vector set;

and determining the average value of the relative entropies corresponding to all the reference feature vector sets as the uncertainty of the target sample type.

7. The method of claim 1, wherein determining the target sample type for the target tissue sample based on the plurality of reference feature vectors and a sample classification model comprises:

inputting the plurality of reference feature vectors into the sample classification model to obtain a probability value of each sample type;

and determining the sample type corresponding to the maximum probability value as the target sample type of the target tissue sample.

8. The method of claim 1, further comprising:

acquiring image data of a training tissue sample and a sample type of the training tissue sample;

determining probability sequence data as reference output data based on the sample types of the training tissue samples, wherein the probability sequence data is sequence data composed of probability values of a plurality of sample types arranged in a preset order, in the probability sequence data, the probability value of the sample type of the training tissue sample is 1, and the probability values of other sample types except the sample type of the training tissue sample are 0;

determining a feature vector of each suspicious positive cell in a plurality of suspicious positive cells corresponding to the training tissue sample and a score value corresponding to the feature vector of each suspicious positive cell based on the image data of the training tissue sample and the suspicious positive cell detection model;

arranging the feature vectors of the plurality of suspicious positive cells according to the sequence of the corresponding fraction values from large to small, obtaining a first preset number of feature vectors, and determining the feature vectors as a plurality of sample feature vectors;

performing Monte-Carlo sampling for multiple times in a plurality of sample feature vectors to obtain a sample feature vector set, wherein the sample feature vector set comprises a second preset number of sample feature vectors;

inputting each sample feature vector in the sample feature vector set into a sample classification model to be trained to obtain actual output data;

and training the sample classification model to be trained based on the actual output data and the reference output data to obtain the trained sample classification model.

9. A device for classifying a tissue sample, the device comprising:

a first acquisition module for acquiring image data of a target tissue sample;

a first determination module, configured to determine, based on the image data of the target tissue sample and a suspected positive cell detection model, feature vectors of a plurality of suspected positive cells and a score value corresponding to the feature vector of each suspected positive cell, where the score value is used to indicate a classification confidence of a classification result of the feature vector of the suspected positive cell corresponding to the score value;

a second obtaining module, configured to obtain, from the feature vectors of the suspicious positive cells, a plurality of reference feature vectors that satisfy a preset score value condition;

a second determination module to determine a target sample type for the target tissue sample based on the plurality of reference feature vectors and a sample classification model.

10. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction that is loaded and executed by the processor to perform operations performed by the method of classifying a tissue sample according to any one of claims 1 to 8.

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for classifying a tissue sample.

Background

Currently, people can effectively prevent cancers and tumor diseases through early screening and timely treatment, for example, cervical cancer is one of the common malignant tumors of women, and the early screening and the timely treatment can effectively prevent the cervical cancer. Exfoliative cytology, as a well-established screening tool, plays an important role in early screening, and can be used for examining a tissue sample on an exfoliative cytology slide of a patient, so as to obtain which sample type, negative or positive, the tissue sample of the patient belongs to which sample type or stage in the positive.

The current cast-off cytology examination is usually a smear of cast-off cytology slides, and then the doctor observes the morphology of each cell on the slide under a microscope, etc., to determine to which type of sample the tissue sample belongs.

However, there are various unstable possibilities during the working process of the doctor, which may affect the accuracy of the type of the finally obtained sample, and reduce the accuracy of the classification result.

Disclosure of Invention

The embodiment of the application provides a method for classifying tissue samples, which can solve the problem that the accuracy of sample types obtained in the prior art is relatively low.

In a first aspect, a method for classifying a tissue sample is provided, the method comprising:

acquiring image data of a target tissue sample;

determining a plurality of feature vectors of suspicious positive cells and a score value corresponding to the feature vector of each suspicious positive cell based on the image data of the target tissue sample and a suspicious positive cell detection model, wherein the score value is used for indicating the classification confidence of the classification result of the feature vector of the suspicious positive cell corresponding to the score value;

obtaining a plurality of reference feature vectors meeting a preset score value condition from the feature vectors of the plurality of suspicious positive cells;

determining a target sample type for the target tissue sample based on the plurality of reference feature vectors and a sample classification model.

In a possible implementation manner, the obtaining, among the feature vectors of the suspected positive cells, a plurality of reference feature vectors that satisfy a preset score value condition includes:

arranging the feature vectors of the plurality of suspicious positive cells in a descending order of the corresponding score values, and determining a first preset number of feature vectors as the plurality of reference feature vectors; alternatively, the first and second electrodes may be,

and acquiring the feature vectors of which the corresponding score values are greater than a preset score threshold value from the feature vectors of the suspicious positive cells, and determining the feature vectors as the plurality of reference feature vectors.

In one possible implementation, the determining a target sample type of the target tissue sample based on the plurality of reference feature vectors and a sample classification model includes:

determining a plurality of reference feature vector sets based on the plurality of reference feature vectors;

for each reference feature vector set, inputting each reference feature vector in the reference feature vector set into the sample classification model to obtain a probability value of each sample type corresponding to the reference feature vector set;

for each sample type, calculating an average value of probability values of the sample types corresponding to the multiple reference feature vector sets to obtain an average probability value corresponding to each sample type;

and determining the sample type corresponding to the maximum average probability value as the target sample type of the target tissue sample.

In one possible implementation, the determining a plurality of reference feature vector sets based on the plurality of reference feature vectors includes:

and performing Monte-Carlo (Monte Carlo) sampling for multiple times in the plurality of reference feature vectors to obtain a plurality of reference feature vector sets, wherein each reference feature vector set comprises a second preset number of reference feature vectors.

In one possible implementation, the method further includes:

and determining the uncertainty of the target sample type based on the probability value of each sample type corresponding to each reference feature vector set and the average probability value corresponding to each sample type.

In one possible implementation manner, the determining an uncertainty of the target sample type based on the probability value of the target sample type corresponding to each reference feature vector set and the average probability value corresponding to the target sample type includes:

respectively calculating relative entropies between the probability values of the multiple sample types corresponding to each reference feature vector set and the average probability values corresponding to the multiple sample types to obtain the relative entropy corresponding to each reference feature vector set;

and determining the average value of the relative entropies corresponding to all the reference feature vector sets as the uncertainty of the target sample type.

In one possible implementation, the determining a target sample type of the target tissue sample based on the plurality of reference feature vectors and a sample classification model includes:

inputting the plurality of reference feature vectors into the sample classification model to obtain a probability value of each sample type;

and determining the sample type corresponding to the maximum probability value as the target sample type of the target tissue sample.

In one possible implementation, the method further includes:

acquiring image data of a training tissue sample and a sample type of the training tissue sample;

determining probability sequence data as reference output data based on the sample types of the training tissue samples, wherein the probability sequence data is sequence data composed of probability values of a plurality of sample types arranged in a preset order, in the probability sequence data, the probability value of the sample type of the training tissue sample is 1, and the probability values of other sample types except the sample type of the training tissue sample are 0;

determining a feature vector of each suspicious positive cell in a plurality of suspicious positive cells corresponding to the training tissue sample and a score value corresponding to the feature vector of each suspicious positive cell based on the image data of the training tissue sample and the suspicious positive cell detection model;

arranging the feature vectors of the plurality of suspicious positive cells according to the sequence of the corresponding fraction values from large to small, obtaining a first preset number of feature vectors, and determining the feature vectors as a plurality of sample feature vectors;

performing Monte-Carlo sampling for multiple times in a plurality of sample feature vectors to obtain a sample feature vector set, wherein the sample feature vector set comprises a second preset number of sample feature vectors;

inputting each sample feature vector in the sample feature vector set into a sample classification model to be trained to obtain actual output data;

and training the sample classification model to be trained based on the actual output data and the reference output data to obtain the trained sample classification model.

In a second aspect, there is provided a device for classifying a tissue sample, the device comprising:

a first acquisition module for acquiring image data of a target tissue sample;

a first determination module, configured to determine, based on the image data of the target tissue sample and a suspected positive cell detection model, feature vectors of a plurality of suspected positive cells and a score value corresponding to the feature vector of each suspected positive cell, where the score value is used to indicate a classification confidence of a classification result of the feature vector of the suspected positive cell corresponding to the score value;

a second obtaining module, configured to obtain, from the feature vectors of the suspicious positive cells, a plurality of reference feature vectors that satisfy a preset score value condition;

a second determination module to determine a target sample type for the target tissue sample based on the plurality of reference feature vectors and a sample classification model.

In a possible implementation manner, the second obtaining module is configured to:

obtaining a first preset number of feature vectors with the maximum corresponding score value from the feature vectors of the suspicious positive cells, and determining the feature vectors as the plurality of reference feature vectors; alternatively, the first and second electrodes may be,

and acquiring the feature vectors of which the corresponding score values are greater than a preset score threshold value from the feature vectors of the suspicious positive cells, and determining the feature vectors as the plurality of reference feature vectors.

In a possible implementation manner, the second determining module is configured to:

determining a plurality of reference feature vector sets based on the plurality of reference feature vectors;

for each reference feature vector set, inputting each reference feature vector in the reference feature vector set into the sample classification model to obtain a probability value of each sample type corresponding to the reference feature vector set;

for each sample type, calculating an average value of probability values of the sample types corresponding to the multiple reference feature vector sets to obtain an average probability value corresponding to each sample type;

and determining the sample type corresponding to the maximum average probability value as the target sample type of the target tissue sample.

In a possible implementation manner, the second determining module is configured to:

and performing Monte-Carlo sampling for multiple times in the plurality of reference feature vectors to obtain a plurality of reference feature vector sets, wherein each reference feature vector set comprises a second preset number of reference feature vectors.

In one possible implementation manner, the apparatus further includes a third determining module configured to:

and determining the uncertainty of the target sample type based on the probability value of each sample type corresponding to each reference feature vector set and the average probability value corresponding to each sample type.

In a possible implementation manner, the third determining module is configured to:

respectively calculating relative entropies between the probability values of the multiple sample types corresponding to each reference feature vector set and the average probability values corresponding to the multiple sample types to obtain the relative entropy corresponding to each reference feature vector set;

and determining the average value of the relative entropies corresponding to all the reference feature vector sets as the uncertainty of the target sample type.

In a possible implementation manner, the second determining module is configured to:

inputting the plurality of reference feature vectors into the sample classification model to obtain a probability value of each sample type;

and determining the sample type corresponding to the maximum probability value as the target sample type of the target tissue sample.

In one possible implementation, the apparatus further includes a training module configured to:

acquiring image data of a training tissue sample and a sample type of the training tissue sample;

determining probability sequence data as reference output data based on the sample types of the training tissue samples, wherein the probability sequence data is sequence data composed of probability values of a plurality of sample types arranged in a preset order, in the probability sequence data, the probability value of the sample type of the training tissue sample is 1, and the probability values of other sample types except the sample type of the training tissue sample are 0;

determining a feature vector of each suspicious positive cell in a plurality of suspicious positive cells corresponding to the training tissue sample and a score value corresponding to the feature vector of each suspicious positive cell based on the image data of the training tissue sample and the suspicious positive cell detection model;

arranging the feature vectors of the plurality of suspicious positive cells according to the sequence of the corresponding fraction values from large to small, obtaining a first preset number of feature vectors, and determining the feature vectors as a plurality of sample feature vectors;

performing Monte-Carlo sampling for multiple times in a plurality of sample feature vectors to obtain a sample feature vector set, wherein the sample feature vector set comprises a second preset number of sample feature vectors;

inputting each sample feature vector in the sample feature vector set into a sample classification model to be trained to obtain actual output data;

and training the sample classification model to be trained based on the actual output data and the reference output data to obtain the trained sample classification model.

In a third aspect, a computer device is provided that includes a processor and a memory having stored therein at least one instruction that is loaded and executed by the processor to perform operations performed by a method of classifying a tissue sample.

In a fourth aspect, a computer-readable storage medium is provided that has at least one instruction stored therein, the instruction being loaded and executed by a processor to perform operations performed by a method for classifying a tissue sample.

The technical scheme provided by the embodiment of the application has the following beneficial effects: according to the scheme provided by the embodiment of the application, the feature vectors of a plurality of suspicious positive cells existing in the target tissue sample and the score value corresponding to the feature vector of each suspicious positive cell are determined based on the image data of the target tissue sample and the suspicious positive cell detection model, then a plurality of reference feature vectors meeting the preset score value condition are selected from the feature vectors of the plurality of suspicious positive cells, and the type of the target sample of the target tissue sample is determined based on the plurality of reference feature vectors and the sample classification model. The sample classification method capable of automatically performing classification processing through computer equipment provides a reference basis for determining sample classification for doctors, and accuracy of determining the type of a target sample is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a method for classifying tissue samples according to an embodiment of the present application;

FIG. 2 is a schematic view of a tissue sample provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a suspected positive cell provided by an embodiment of the present application;

FIG. 4 is a flowchart of a method for classifying tissue samples according to an embodiment of the present disclosure;

FIG. 5 is a flow chart of a sample classification model process provided by an embodiment of the present application;

FIG. 6 is a flow chart illustrating a process of a sample classification model according to an embodiment of the present disclosure;

FIG. 7 is a flowchart of a method for training a sample classification model according to an embodiment of the present disclosure;

FIG. 8 is a schematic illustration of a sample type and uncertainty display provided by an embodiment of the present application;

FIG. 9 is a schematic structural diagram of a tissue sample classifying device according to an embodiment of the present disclosure;

fig. 10 is a block diagram of a server according to an embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The embodiment of the application provides a method for classifying tissue samples, which can be realized by a server. The server may be a single server or may be a server cluster composed of a plurality of servers.

The server may comprise a processor, a memory, a communication component, etc., to which the processor is connected, respectively.

The processor may be a Central Processing Unit (CPU). The processor may be configured to read the instructions and process the data, such as obtaining image data of the target tissue sample, determining a plurality of feature vectors of the suspect positive cells and a corresponding score value for each of the feature vectors of the suspect positive cells, obtaining a plurality of reference feature vectors, determining a target sample type for the target tissue sample, and so forth.

The Memory may include a ROM (Read-Only Memory), a RAM (Random Access Memory), a CD-ROM (Compact Disc Read-Only Memory), a magnetic disk, an optical data storage device, and the like. The memory may be used for data storage, such as data storage of image data acquired of the target tissue sample, data storage of intermediate data generated in determining a number of feature vectors of suspect positive cells and a score corresponding to the feature vector of each suspect positive cell, data storage of a number of reference feature vectors acquired, data storage of intermediate data in determining a target sample type of the target tissue sample, and so forth.

The communication means may be a wired network connector, a WiFi (Wireless Fidelity) module, a bluetooth module, a cellular network communication module, etc. The communication component may be used to receive and transmit signals, such as the transmission of information when image data of a target tissue sample is acquired, the transmission of information when a target sample type of a target tissue sample is obtained and then transmitted to a desired terminal, and so forth.

The suspicious positive cell detection model and the sample classification model in the embodiment of the application both belong to the field of Machine Learning (ML), and the Machine Learning is a multi-field cross subject and relates to multi-fields such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how the computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education.

In the embodiment of the present application, the target tissue sample may be any exfoliated cell tissue sample, for example, a cervical exfoliated cytology tissue sample, and the like.

Fig. 1 is a flowchart of a method for classifying a tissue sample according to an embodiment of the present disclosure. Referring to fig. 1, the embodiment includes:

101. image data of a target tissue sample is acquired.

In practice, a target tissue sample of a patient, which may be a tissue sample on a cytology slide, may be obtained first, as shown in FIG. 2. Image data of the target tissue sample is then acquired.

102. And determining the feature vectors of a plurality of suspicious positive cells and the score value corresponding to the feature vector of each suspicious positive cell based on the image data of the target tissue sample and the suspicious positive cell detection model.

Wherein the score value is used for indicating the classification confidence of the classification result of the feature vector of the suspicious positive cell corresponding to the score value.

In the implementation, the image data of the target tissue sample may be input into a suspected positive cell detection model (which may also be referred to as a suspected positive cell detector), the suspected positive cell detection model includes a feature extraction module, when the image data of the target tissue sample is input into the suspected positive cell detection model, the image data of the target tissue sample is input into the feature extraction module, feature extraction is performed on the image data of each cell in the target tissue sample, so as to obtain feature vectors of the image data of each cell, then the feature vectors are input into other modules in the suspected positive cell detection model for data processing, so as to obtain a classification result of each suspected positive cell (i.e., a type of the suspected positive cell) output by the suspected positive cell detection model and a score value ("score value") corresponding to the classification result of each suspected positive cell, the suspected positive cells are cells which are predicted by other models in the suspected positive detection model according to the feature vector of each cell so as to be detected to be possibly positive in the cells. The score value corresponding to the classification result of the suspicious positive cell is used for characterizing the classification confidence of the predicted classification result of the suspicious positive cell, for example, if the classification result of a suspicious positive cell is of type a and the corresponding score value is 0.8, it indicates that the suspicious positive cell has a probability of 80% belonging to type a.

Alternatively, the number of suspicious positive cells of each type may be calculated based on the type of each suspicious positive cell output by the suspicious positive cell detection model, the number of all cells of the target tissue sample may be obtained, a first ratio of the number of suspicious positive cells of each type to the number of all cells of the target tissue sample may be calculated, and a second ratio of the number of all suspicious positive cells to the number of all cells of the target tissue sample may be calculated, the first ratio may be used to characterize a severity value of each type, the second ratio may be used to characterize a positive severity value, and then the first ratio and the second ratio of each type may be displayed to a doctor or a patient in a histogram as reference data.

The types of suspicious positive Cells include various types, for example, if the target tissue sample is an exfoliated cervical tissue sample, the types of suspicious positive Cells may include four types of ASC-US (Atypical Squamous Cells with no definite meaning), LSIL (Low-grade Squamous intraepithelial lesion), ASC-H (Atypical Squamous Cells with no exception of High-grade Squamous intraepithelial lesion), and HSIL (High-grade Squamous intraepithelial lesion), as a reference, fig. 3 is a schematic diagram of a plurality of suspicious positive Cells in an exfoliated cervical tissue sample. Similarly, the target tissue sample includes the four types and a Negative sample type, which is also called NILM (Negative for intracutaneous Lesion malignant cell).

The score value corresponding to each suspicious positive cell represents a likelihood value that the suspicious positive cell detection model judges that the cell is of the type, namely, the score value is used for indicating the classification confidence of classifying the suspicious positive cell based on the feature vector in the suspicious positive cell detection model.

Of course, in the embodiment of the present application, the subsequent processing only needs to obtain the intermediate output of the suspected positive cell detection model (i.e., the feature vector of the suspected positive cell) and the score value corresponding to the feature vector of each suspected positive cell.

103. And acquiring a plurality of reference feature vectors meeting a preset score value condition from the feature vectors of the suspicious positive cells.

In implementation, after the feature vectors of a plurality of suspicious positive cells are obtained, the feature vector of the suspicious positive cell with a relatively clear classification feature is selected as a reference feature vector for subsequent classification judgment of the target tissue sample.

Optionally, there may be multiple methods for determining the reference feature vector, two of which are as follows:

first one

And arranging the feature vectors of the plurality of suspicious positive cells according to the sequence of the corresponding score values from large to small, and determining the feature vectors with the first preset number as a plurality of reference feature vectors.

In implementation, the feature vectors of each suspicious positive cell may be arranged in an order from a large fraction value to a small fraction value, and then the feature vectors corresponding to the first preset number of fraction values are obtained, so as to obtain a first preset number of feature vectors, which are determined as reference feature vectors.

Alternatively, the first preset number may be any reasonable number, for example, 15, or 20, and the like, which is not limited in the embodiment of the present application.

Second kind

And acquiring the feature vector of which the corresponding score value is greater than a preset score value threshold from the feature vectors of the plurality of suspicious positive cells, and determining the feature vector as a reference feature vector.

In implementation, a preset score threshold may be preset, and the feature vector corresponding to the score value greater than the preset score threshold is determined as the reference feature vector.

Optionally, the preset score threshold may be any reasonable value, and if the score value is a value in the range of [0, 1], the preset score threshold may be 0.5 or the like, or may be another value, which is not limited in this embodiment of the present application.

104. A target sample type of the target tissue sample is determined based on the plurality of reference feature vectors and the sample classification model.

In implementation, after a plurality of reference feature vectors are determined, since the reference feature vectors are all feature vectors with clear classification features (i.e., with high score values), the type of the target tissue sample can be predicted according to the plurality of reference feature vectors and the trained sample classification model, so as to determine the target sample type of the target tissue sample.

If the target tissue sample is a cervical exfoliated tissue sample, the sample type of the target tissue sample may be one of the several types negative, NILM, ASC-US, LSIL, ASC-H and HSIL.

Alternatively, there are many ways to determine the target sample type of the target tissue sample based on the plurality of reference feature vectors and the sample classification model. Two of them are as follows:

the first method may be:

and inputting the plurality of reference feature vectors into the sample classification model to obtain the probability value of each sample type. And determining the sample type corresponding to the maximum probability value as the target sample type of the target tissue sample.

In implementation, the determined reference feature vectors are directly input into a trained sample classification model, so as to obtain a probability value of each sample type, and the probability value of one sample type is used for indicating the possibility that suspicious positive cells corresponding to the reference feature vectors are of the sample type, namely, for indicating the possibility that the target tissue sample is of the sample type. It will be appreciated that the sum of the probability values for all sample types output is 1.

The processing flow of the second method can be as shown in fig. 4, which corresponds to the following:

1041. based on the plurality of reference feature vectors, a plurality of sets of reference feature vectors are determined.

In this embodiment of the present application, the process of determining a plurality of reference feature vector sets may be:

and performing Monte-Carlo sampling for multiple times in the multiple reference feature vectors to obtain multiple reference feature vector sets, wherein each reference feature vector set comprises a second preset number of reference feature vectors.

Alternatively, the second predetermined number may be any reasonable number. In the embodiment of the present application, if the first method is used in determining the plurality of reference feature vectors, the second preset number and the first preset number may have a proportional relationship, for example, a ratio of the second preset number to the first preset number may be 1: 1.5.

In the embodiment of the application, the process of determining the plurality of reference feature vectors and the plurality of reference feature vector sets is a determination method which integrates the doctor reading experience. Generally, after obtaining image data of a target tissue sample, a doctor focuses on observing image data of cells which are classified into distinct positive cells, and therefore, a feature vector of a suspicious positive cell with a high score value is selected as a reference feature vector in the embodiment of the present application. Subsequently, the doctor randomly observes the suspicious positive cells with definite classification features to judge the sample type of the target tissue sample, and correspondingly, the embodiment of the application uses Monte-Carlo sampling in a plurality of reference feature vectors to obtain a plurality of reference feature vector sets.

1042. And for each reference characteristic vector set, inputting each reference characteristic vector in the reference characteristic vector set into the sample classification model to obtain the probability value of each sample type corresponding to the reference characteristic vector set.

In implementation, for each set of reference feature vectors, the following process may be performed: inputting each reference feature vector in the reference feature vector set into a trained sample classification model, wherein the sample classification model can output a probability value of each sample type, and the higher the probability value is, the more likely the sample classification model judges that the target tissue sample is the sample type.

As shown in fig. 5, after all the reference feature vectors in one reference feature vector set are input into the sample classification model, each reference feature vector is input into the feature network for nonlinear mapping, and for each reference feature vector, different nonlinear mapping is performed for multiple times to obtain a vector k, a vector q, and a vector v corresponding to each reference feature vector. And multiplying the vector k corresponding to each reference characteristic vector by the vector q to obtain a first vector corresponding to each reference characteristic vector, inputting the first vector corresponding to each reference characteristic vector into the Attention network to calculate a weight value, and multiplying the weight value by the vector v of the corresponding reference characteristic vector to obtain a second vector corresponding to each reference characteristic vector. And respectively inputting the second vector of each reference characteristic vector into the CNN network so as to obtain the probability values of a plurality of sample types corresponding to each reference characteristic vector, and inputting the probability values into the full-connection network so as to obtain the probability values of a plurality of sample types corresponding to the reference characteristic vector set.

According to the method, all the reference feature vector sets are processed, and the probability value of each sample type corresponding to each reference feature vector set can be obtained. For example, if the number of the reference feature vector sets is 5 and the number of the sample types is 6, then for each reference feature vector set, the reference feature vector set is input into the sample classification model, and a probability value corresponding to 6 sample types can be obtained for each reference feature vector, that is, a total of 5 × 6 probability values is obtained.

1043. And for each sample type, calculating the average value of the probability values of the sample types corresponding to the multiple reference feature vector sets to obtain the average probability value corresponding to each sample type.

In implementation, for each sample type, calculating an average value of the probability values of the sample types corresponding to the multiple reference feature vector sets, and obtaining an average probability value corresponding to the sample type. For example, the number of the reference feature vector sets is 3, the number of the sample types is 2, the probability values of all the sample types corresponding to the first reference feature vector set are (0.2, 0.8), the probability values of all the sample types corresponding to the second reference feature vector set are (0.1, 0.9), the probability values of all the sample types corresponding to the third reference feature vector set are (0.3, 0.7), and then the average probability value of the first sample type is (0.2+0.1+0.3)/3 ═ 0.2, and the average probability value of the second sample type is (0.8+0.9+0.7)/3 ═ 0.8.

1044. And determining the sample type corresponding to the maximum average probability value as the target sample type of the target tissue sample.

In implementation, the sample type with the largest average probability value in all the sample types is determined as the target sample type of the target tissue sample. For example, if the number of sample types is 2, the average probability value of the first sample type is 0.2, and the average probability value of the second sample type is 0.8, the second sample type may be determined as the target sample type of the target tissue sample.

Optionally, the sample classification model may also be an integrated model, as shown in fig. 6, all reference feature vectors may be input into the sample classification model, when entering a feature layer in the sample classification model, Monte Carlo dropouts are performed on all reference feature vectors for multiple times, multiple sets of vector data are output (each set of vector data may be greatly different because the result of each Monte Carlo dropout is not necessarily the same), each set of vector data includes a second preset number of reference feature vectors, the multiple sets of vector data are respectively input into different submodels in the integrated model for subsequent processing, so as to obtain a probability value of each sample type corresponding to each set of vector data, the probability value of each sample type corresponding to the multiple sets of vector data is input into a subsequent fully connected layer, so as to obtain probability values of each sample type corresponding to all reference feature vectors, the sample type in which the probability value is the largest may be determined as the target sample type of the target tissue sample.

The technical scheme provided by the embodiment of the application has the following beneficial effects: according to the scheme provided by the embodiment of the application, the feature vectors of a plurality of suspicious positive cells existing in the target tissue sample and the score value corresponding to the feature vector of each suspicious positive cell are determined based on the image data of the target tissue sample and the suspicious positive cell detection model, then a plurality of reference feature vectors meeting the preset score value condition are selected from the feature vectors of the plurality of suspicious positive cells, and the type of the target sample of the target tissue sample is determined based on the plurality of reference feature vectors and the sample classification model. The application provides a sample classification method capable of automatically performing classification processing through computer equipment, provides a reference basis for determining sample classification for doctors, and improves accuracy of determining and judging the type of a target sample.

The sample classification model is a trained sample classification model, and an embodiment of the present application further provides a training method for a sample classification model, as shown in fig. 7, which corresponds to the following:

701. image data of a training tissue sample and a sample type of the training tissue sample are obtained.

In an implementation, a training tissue sample of a determined sample type may be obtained, and then image data and a corresponding sample type of the training tissue sample are obtained.

702. Based on the sample type of the training tissue sample, probability sequence data is determined as reference output data.

The probability sequence data is sequence data composed of probability values of a plurality of sample types arranged according to a preset sequence, in the probability sequence data, the probability value of the sample type of the training tissue sample is 1, and the probability values of other sample types except the sample type of the training tissue sample are 0.

In implementation, the probability value of the sample type of the training tissue sample is determined to be 1, the probability values of other sample types except the sample type of the training tissue sample are determined to be 0, then the probability values of the sample types are arranged according to a preset sequence, sequence data consisting of the probability values of a plurality of sample types, namely, probability sequence data, is obtained, and the probability sequence data is determined to be reference output data. The preset sequence is a preset sequence according to the type of the sample.

For example, if the training tissue sample is a cervical exfoliated cell tissue sample, and the sample type of the cervical exfoliated cell tissue sample is known as ASC-US, the probability value of the sample type of ASC-US is determined to be 1, the probability values of the four sample types of LSIL, ASC-H, HSIL, and NILM are determined to be 0, and if the preset order is HSIL, LSIL, ASC-H, ASC-US, and NILM, the obtained reference output data is 00010.

703. And determining a feature vector of each suspicious positive cell in a plurality of suspicious positive cells corresponding to the training tissue sample and a score value corresponding to the feature vector of each suspicious positive cell based on the image data of the training tissue sample and the suspicious positive cell detection model.

In implementation, the image data of the training tissue sample is input into the trained suspected positive cell detection model, so as to obtain an intermediate output (i.e., the predicted feature vector of the suspected positive cells existing in the training tissue sample) and a final output (i.e., the classification result of each suspected positive cell and the score value corresponding to the classification result) of the suspected positive cell detection model, and obtain the feature vector of each suspected positive cell and the score value corresponding to the feature vector of each suspected positive cell.

704. And arranging the feature vectors of the plurality of suspicious positive cells according to the sequence of the corresponding fraction values from large to small, acquiring a first preset number of feature vectors, and determining the feature vectors as a plurality of sample feature vectors.

In implementation, the obtained feature vectors of a plurality of suspicious positive cells are arranged according to the sequence of the corresponding score values from large to small, then the first preset number of feature vectors are obtained, and the feature vectors are determined as sample feature vectors.

705. And carrying out Monte-Carlo sampling for multiple times in a plurality of sample feature vectors to obtain a sample feature vector set.

The sample feature vector set comprises a second preset number of sample feature vectors;

in implementation, a second preset number of Monte-Carlo samplings are respectively performed on the plurality of sample feature vectors to obtain a second preset number of feature vectors, and a sample feature vector set is formed.

706. And inputting each sample feature vector in the sample feature vector set into a sample classification model to be trained to obtain actual output data.

In implementation, each sample feature vector in the sample feature vector set is input into a sample classification model to be trained, and the sample classification model outputs actual output data.

707. And training a sample classification model to be trained based on actual output data and the reference output data to obtain the trained sample classification model.

In implementation, the sample classification model to be trained may be trained according to the actual output data and the reference output data.

Acquiring image data of a plurality of different training tissue samples and sample types of the training tissue samples, and respectively training the sample classification model to be trained by using the steps 701-707, wherein the training is not stopped until the obtained loss value is smaller than a preset loss value threshold, and the obtained sample classification model is the trained (or called as the trained) sample classification model.

Optionally, the method for classifying tissue samples provided in the embodiment of the present application may predict a target sample type of a target tissue sample, and may calculate uncertainty of the prediction result based on a bayesian inference mode to provide a reference for a doctor or a patient, where the corresponding processing procedure may be as follows:

and determining the uncertainty of the target sample type based on the probability value of each sample type corresponding to each reference feature vector set and the average probability value corresponding to each sample type.

In implementation, the uncertainty of the sample classification process performed by the sample classification model may be calculated based on the probability value of each sample type corresponding to each reference feature vector set and the average probability value corresponding to each sample type, and the uncertainty is determined as the uncertainty of the target sample type.

The embodiment of the present application provides one of the following methods, a bayesian inference method is introduced to calculate a relative entropy, the relative entropy is used as uncertainty, and a corresponding processing procedure may be:

and respectively calculating the relative entropy between the probability values of the multiple sample types corresponding to each reference characteristic vector set and the average probability values corresponding to the multiple sample types to obtain the relative entropy corresponding to each reference characteristic vector set. And determining the average value of the relative entropies corresponding to all the reference feature vector sets as the uncertainty of the target sample type.

In implementations, the relative entropy can be used to characterize the uncertainty of the sample classification process performed by the sample classification model.

First, for each reference feature vector set, a relative entropy between a probability distribution of probability values of a plurality of sample types corresponding to the reference feature vector set and a probability distribution of an average probability value corresponding to the plurality of sample types may be calculated, and a corresponding formula may be as follows:

wherein j is a reference characterThe sequence number of the eigenvector set, i is the sequence number of the sample type, PjIs the probability distribution of the probability values of the plurality of sample types corresponding to the jth reference feature vector set,is the probability distribution of the average probability values of a plurality of sample types,is the relative entropy between the probability distribution of the probability values of the plurality of sample types corresponding to the jth reference feature vector set and the probability distribution of the average probability value of the plurality of sample types (i.e. the relative entropy corresponding to the jth reference feature vector set), PjiIs the probability value of the ith sample type corresponding to the jth reference feature vector set,is the average probability value corresponding to the ith sample type.

After the relative entropy corresponding to each reference feature vector set is calculated according to the above formula, an average value of the relative entropies can be calculated, the obtained average value is the uncertainty of the sample classification processing performed by the sample classification model, and the uncertainty can be determined as the uncertainty of the target sample type.

Alternatively, the tissue sample classification method provided in the embodiment of the present application may be applied to a cytological tissue sample of a cytological slide obtained by various slide production methods (sedimentation type, membrane type, etc.), and an apparatus for implementing the tissue sample classification method may be deployed in a local server of a hospital. Or, the system can also be deployed in a cloud server, so that a local server of a hospital can be called remotely, and related network equipment needs to be provided, so that the local server can transmit image data of a target organization sample to the cloud server, the cloud server can transmit the determined result back to the local server, and the local server displays the result for a doctor.

After the image data of the target tissue sample is acquired by the local server of the hospital, the target tissue sample type and corresponding uncertainty of the target tissue sample may be determined by a tissue sample classification method and then displayed to a doctor or a patient for viewing.

Alternatively, the average probability value of each sample type and the uncertainty of the sample classification model for sample classification processing can be displayed to a doctor or a patient for the doctor to synthesize other materials to determine the target sample type of the target tissue sample. By adopting the method and the device, the probability value of each sample type predicted by the sample classification model is combined with the calculated uncertainty, so that the method and the device can be better used for assisting a doctor in interpretation analysis of the tissue sample in cytological examination.

For the image data of the tissue samples acquired at the same time period in the hospital, each tissue sample can be determined as a target tissue sample, the target tissue sample is predicted by using a tissue sample classification method, the sample type and the corresponding uncertainty of each tissue sample are obtained, and all the sample types and the corresponding uncertainties are displayed to a doctor for comparison and observation. For example, as shown in fig. 8, fig. 8 is a schematic diagram showing The uncertainties corresponding to The five tissue samples after normalization processing according to The first tissue sample (i.e., The tissue sample with The sample type of NILM), and The uncertainties corresponding to The five tissue samples after normalization processing are obtained, it is obvious that The uncertainties of The two sample types of ASC-UC and ASC-H are relatively large, and in practice, since ASC-UC and ASC-H are both atypical sample types, The atypical is in an uncertain state according to The definition of TBS (The Bethesda System, descriptive diagnosis report).

All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.

An embodiment of the present application provides a tissue sample classification apparatus, which may be a computer device in the foregoing embodiment, as shown in fig. 9, where the apparatus includes:

a first acquiring module 910, configured to acquire image data of a target tissue sample;

a first determining module 920, configured to determine, based on the image data of the target tissue sample and a suspected positive cell detection model, a score value corresponding to a feature vector of each suspected positive cell and a feature vector of a plurality of suspected positive cells, where the score value is used to indicate a classification confidence of a classification result of the feature vector of the suspected positive cell corresponding to the score value;

a second obtaining module 930, configured to obtain, from the feature vectors of the suspicious positive cells, a plurality of reference feature vectors that satisfy a preset score condition;

a second determining module 940 for determining a target sample type of the target tissue sample based on the plurality of reference feature vectors and a sample classification model.

In a possible implementation manner, the second obtaining module 930 is configured to:

arranging the feature vectors of the plurality of suspicious positive cells according to the sequence of the corresponding fraction values from large to small, and determining the feature vectors with the first preset number as the plurality of reference feature vectors; alternatively, the first and second electrodes may be,

and acquiring the feature vectors of which the corresponding score values are greater than a preset score threshold value from the feature vectors of the suspicious positive cells, and determining the feature vectors as the plurality of reference feature vectors.

In a possible implementation manner, the second determining module 940 is configured to:

determining a plurality of reference feature vector sets based on the plurality of reference feature vectors;

for each reference feature vector set, inputting each reference feature vector in the reference feature vector set into the sample classification model to obtain a probability value of each sample type corresponding to the reference feature vector set;

for each sample type, calculating an average value of probability values of the sample types corresponding to the multiple reference feature vector sets to obtain an average probability value corresponding to each sample type;

and determining the sample type corresponding to the maximum average probability value as the target sample type of the target tissue sample.

In a possible implementation manner, the second determining module 940 is configured to:

and performing Monte Carlo Monte-Carlo sampling for multiple times in the multiple reference feature vectors to obtain multiple reference feature vector sets, wherein each reference feature vector set comprises a second preset number of reference feature vectors.

In one possible implementation manner, the apparatus further includes a third determining module configured to:

and determining the uncertainty of the target sample type based on the probability value of each sample type corresponding to each reference feature vector set and the average probability value corresponding to each sample type.

In a possible implementation manner, the third determining module is configured to:

respectively calculating relative entropies between the probability values of the multiple sample types corresponding to each reference feature vector set and the average probability values corresponding to the multiple sample types to obtain the relative entropy corresponding to each reference feature vector set;

and determining the average value of the relative entropies corresponding to all the reference feature vector sets as the uncertainty of the target sample type.

In a possible implementation manner, the second determining module 940 is configured to:

inputting the plurality of reference feature vectors into the sample classification model to obtain a probability value of each sample type;

and determining the sample type corresponding to the maximum probability value as the target sample type of the target tissue sample.

In one possible implementation, the apparatus further includes a training module configured to:

acquiring image data of a training tissue sample and a sample type of the training tissue sample;

determining probability sequence data as reference output data based on the sample types of the training tissue samples, wherein the probability sequence data is sequence data composed of probability values of a plurality of sample types arranged in a preset order, in the probability sequence data, the probability value of the sample type of the training tissue sample is 1, and the probability values of the sample types except the sample type of the training tissue sample are 0;

determining a feature vector of each suspicious positive cell in a plurality of suspicious positive cells corresponding to the training tissue sample and a score value corresponding to the feature vector of each suspicious positive cell based on the image data of the training tissue sample and the suspicious positive cell detection model;

arranging the feature vectors of the plurality of suspicious positive cells according to the sequence of the corresponding fraction values from large to small, obtaining a first preset number of feature vectors, and determining the feature vectors as a plurality of sample feature vectors;

performing Monte-Carlo sampling for multiple times in a plurality of sample feature vectors to obtain a sample feature vector set, wherein the sample feature vector set comprises a second preset number of sample feature vectors;

inputting each sample feature vector in the sample feature vector set into a sample classification model to be trained to obtain actual output data;

and training the sample classification model to be trained based on the actual output data and the reference output data to obtain the trained sample classification model.

It should be noted that: in the classification device for tissue samples according to the above embodiment, only the division of the functional modules is illustrated, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the above described functions. In addition, the classification device for tissue samples and the classification method for tissue samples provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments and are not described herein again.

Fig. 10 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 1000 may generate a relatively large difference due to a difference in configuration or performance, and may include one or more processors (CPUs) 1001 and one or more memories 1002, where the memory 1002 stores at least one instruction, and the at least one instruction is loaded and executed by the processors 1001 to implement the methods provided by the foregoing method embodiments. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

In an exemplary embodiment, a computer-readable storage medium, such as a memory, is also provided that includes instructions executable by a processor in a terminal to perform the method of classifying a tissue sample in the above-described embodiments. The computer readable storage medium may be non-transitory. For example, the computer-readable storage medium may be a ROM (read-only memory), a RAM (random access memory), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product or computer program is also provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the method of classifying a tissue sample in an embodiment.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

23页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于端到端学习的化合物和蛋白质相互作用与亲和力预测方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!