Method for establishing traditional Chinese medicine pulse condition recognition model based on time sequence convolution network

文档序号：1896157 发布日期：2021-11-30 浏览：32次中文

阅读说明：本技术 一种基于时序卷积网络的中医脉象识别模型的建立方法 (Method for establishing traditional Chinese medicine pulse condition recognition model based on time sequence convolution network ) 是由郭睿颜建军燕海霞王忆勤朱光耀于 2021-09-08 设计创作，主要内容包括：本发明公开了一种基于时序卷积网络的中医脉象识别模型的建立方法,通过检测手环对腕部桡动脉处的压力脉搏波信号、容积脉搏波信进行采集,通过检测指夹对指端的容积脉搏波信号进行采集,得到原始脉搏波信号,分别获得原始脉搏波信号的单周期波形；并对单周期波形的长度作规正化处理,获得具有相同长度的脉象时间序列的数据集；然后将数据集分为训练集、验证集和测试集,对训练集和验证集进行时序卷积网络计算,并进行验证和超参数调整,最终得到脉象识别模型；然后将测试集导入到脉象识别模型进行识别验证,得到脉象诊断识别的预测结果。本发明所提出的脉象识别方法拥有较高的准确率,可以为人体心血管系统健康状态的诊断提供参考。(The invention discloses a method for establishing a traditional Chinese medicine pulse condition recognition model based on a time sequence convolution network, which comprises the steps of collecting pressure pulse wave signals and volume pulse wave signals at the radial artery position of a wrist through a detection bracelet, collecting the volume pulse wave signals at a finger end through a detection finger clip to obtain original pulse wave signals, and respectively obtaining single-period waveforms of the original pulse wave signals; normalizing the length of the monocycle waveform to obtain a data set of pulse condition time series with the same length; then dividing the data set into a training set, a verification set and a test set, carrying out time sequence convolution network calculation on the training set and the verification set, and carrying out verification and hyper-parameter adjustment to finally obtain a pulse condition identification model; and then, importing the test set into a pulse condition identification model for identification verification to obtain a prediction result of pulse condition diagnosis and identification. The pulse condition identification method provided by the invention has higher accuracy and can provide reference for the diagnosis of the health state of the cardiovascular system of the human body.)

1. A method for establishing a traditional Chinese medicine pulse condition recognition model based on a time sequence convolution network is characterized by comprising the following steps:

step 1: acquiring original pulse wave signals, namely acquiring pressure pulse wave signals and volume pulse wave signals at the radial artery of the wrist through a detection bracelet, acquiring volume pulse wave signals at the finger end through a detection finger clip to obtain the original pulse wave signals, and respectively acquiring single-period waveforms of the original pulse wave signals;

step 2: normalizing the length of the monocycle waveform to obtain a data set of pulse condition time series with the same length;

and step 3: then dividing the data set into a training set, a verification set and a test set, carrying out time sequence convolution network calculation on the training set and the verification set, and carrying out verification and hyper-parameter adjustment to finally obtain a pulse condition identification model;

and 4, step 4: and then, importing the test set into a pulse condition identification model for identification verification to obtain a prediction result of pulse condition diagnosis and identification.

2. The method for building a pulse recognition model of traditional Chinese medicine based on time series convolution network as claimed in claim 1, wherein said normalizing the length of monocycle waveform in step 2 is,

step 1, setting a uniform pulse sequence length to at least accommodate a single-cycle pulse wave;

and 2, supplementing the single-period pulse wave obtained by cutting the same sample at the end of the sequence until the set sequence length is reached.

3. The method for establishing the pulse condition recognition model of the traditional Chinese medicine based on the time sequence convolution network is characterized in that the method for calculating the time sequence convolution network of the training set and the verification set is to connect a plurality of residual block structures in series to form the time sequence convolution network, each residual block structure is provided with two expansion cause-and-effect convolution layers with the same parameter, after an input one-dimensional pulse condition time sequence is convolved by the expansion cause-and-effect convolution layers, weight normalization is carried out, then a linear rectification function is used as an activation function, and finally regularization is carried out twice according to the step; in the process of performing residual connection identity mapping on the residual block, 1 × 1 convolution is used, so that the dimensionality of the input tensor and the dimensionality of the output tensor are consistent.

4. The method for building a model for recognizing pulse condition of TCM based on time series convolution network as claimed in claim 3, wherein said method of dilation causal convolution is,

if the input pulse time sequence is defined as 0 level,for the size of the reception field of the node in the first layer on the first layer, the size of the reception field of each layer in the dilation convolution is calculated by derivation on the premise that the step length is 1 as follows:

wherein in the formula、The size and the expansion rate of a convolution kernel when the layer 1 is subjected to expansion convolution are respectively, and if the size of each convolution kernel is kept unchanged, the size of a reception field of a node in each convolution layer in the layer 0 is as follows after simplification:

。

Technical Field

The invention relates to the technical field of pulse condition identification, in particular to a method for establishing a traditional Chinese medicine pulse condition identification model based on a time sequence convolution network.

Background

The pulse condition signals of the human body are directly related to the heart beating, the smooth pulse channel and the excess and deficiency of qi and blood, and have the characteristics of time-varying property and nonlinearity. Different types of pulse signals may exhibit more distinct morphological differences.

In the analysis and identification research of pulse condition signals, the feature extraction is generally carried out on pulse waves, and then a pulse condition signal classification model is established. The time domain characteristics with physiological significance on the pulse wave are extracted through a characteristic point method, the analysis method is more visual and more applicable, but only partial pulse wave information can be reflected generally; the literature uses a frequency domain analysis method to obtain frequency domain characteristics of different pulse conditions from a statistical angle, but morphological change information on a time dimension is lacked; the time-frequency domain analysis method combines the time and the frequency spectrum information, but the time-frequency domain analysis method is more prone to describing the local state of the pulse condition. It is easy to find that, after the features are usually extracted, the above analysis method also needs to use algorithms such as machine learning to learn the feature data set to establish a pulse condition signal classification model, but the extracted features are difficult to completely reflect the morphological change of the pulse condition signal in the time domain, which may cause the loss of partial detail information, thereby reducing the identification accuracy of the model.

Therefore, the method is improved, and a method for establishing a traditional Chinese medicine pulse condition identification model based on a time sequence convolution network is provided.

Disclosure of Invention

In order to solve the technical problems, the invention provides the following technical scheme:

the invention relates to a method for establishing a traditional Chinese medicine pulse condition recognition model based on a time sequence convolution network, which comprises the following steps:

step 2: normalizing the length of the monocycle waveform to obtain a data set of pulse condition time series with the same length;

In a preferred embodiment of the present invention, the method for normalizing the length of the monocycle waveform in step 2 is,

step 1, setting a uniform pulse sequence length to at least accommodate a single-cycle pulse wave;

and 2, supplementing the single-period pulse wave obtained by cutting the same sample at the end of the sequence until the set sequence length is reached.

As a preferred technical scheme of the invention, the method for performing time sequence convolution network calculation on a training set and a verification set is characterized in that a plurality of residual block structures are connected in series to form a time sequence convolution network, each residual block structure is provided with two expansion cause-and-effect convolution layers with the same parameters, after an input one-dimensional pulse condition time sequence is convolved by the expansion cause-and-effect convolution layers, weight normalization is performed, then a linear rectification function is used as an activation function, and finally regularization is performed twice according to the step; in the process of performing residual connection identity mapping on the residual block, 1 × 1 convolution is used, so that the dimensionality of the input tensor and the dimensionality of the output tensor are consistent.

In a preferred embodiment of the present invention, the method of dilation-causal convolution is,

if the input pulse time sequence is defined as 0 layer, RF_(m，n)If the size of the field of the node in the first layer is the size of the field on the first layer, the field size of each layer in the dilation convolution is calculated by derivation on the premise that the step length is 1 as follows:

wherein, k in the formula_i、d_iThe size and the expansion rate of a convolution kernel when the layer 1 is subjected to expansion convolution are respectively, and if the size of each convolution kernel is kept unchanged, the size of a reception field of a node in each convolution layer in the layer 0 is as follows after simplification:

the invention has the beneficial effects that: the method for establishing the traditional Chinese medicine pulse condition recognition model based on the time sequence convolution network comprises the steps of collecting original pulse wave signals, collecting pressure pulse wave signals and volume pulse wave signals at the radial artery position of a wrist through a detection bracelet, collecting volume pulse wave signals at a finger end through a detection finger clip to obtain the original pulse wave signals, and respectively obtaining single-period waveforms of the original pulse wave signals; normalizing the length of the monocycle waveform to obtain a data set of pulse condition time series with the same length; then dividing the data set into a training set, a verification set and a test set, carrying out time sequence convolution network calculation on the training set and the verification set, and carrying out verification and hyper-parameter adjustment to finally obtain a pulse condition identification model; and then, importing the test set into a pulse condition identification model for identification verification to obtain a prediction result of pulse condition diagnosis and identification. The method can extract the characteristics with obvious difference from the pulse condition time sequence, and better retains the morphological information in the pulse condition signal. In the TCN characteristic self-learning process, under the action of dilation causal convolution, the network enlarges the receptive field of a pulse condition time sequence in a limited layer, and better captures the detail information and morphological change characteristics of a pulse condition signal, so that the pulse condition signal identification model obtains good classification performance. The pulse condition identification method provided by the invention has higher accuracy and can provide reference for the diagnosis of the health state of the cardiovascular system of the human body.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings;

FIG. 1 is a schematic flow chart of the method for establishing a Chinese medicine pulse condition recognition model based on a time sequence convolution network according to the invention;

FIG. 2 is a schematic structural diagram of a residual block structure according to the present invention;

FIG. 3 is a schematic diagram of the structure of the dilated causal convolution of the present invention;

FIG. 4 is a schematic structural diagram of a TCN-based pulse recognition model network structure;

FIG. 5 is a graph of partial signature extracted from a previous TCN layer;

FIG. 6 is a comparison graph of the variation curves of 64 characteristic values extracted by the TCN layer at the later stage;

FIG. 7 shows the number of features that are significantly different between seven groups of pulse time series.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

Example (b): as shown in FIG. 1, the invention relates to a method for establishing a Chinese medicine pulse condition recognition model based on a time sequence convolution network, which comprises the following steps:

step 2: normalizing the length of the monocycle waveform to obtain a data set of pulse condition time series with the same length;

The invention can extract the characteristics with obvious difference from the pulse condition time sequence and better reserve the morphological information in the pulse condition signal. In the TCN characteristic self-learning process, under the action of dilation causal convolution, the network enlarges the receptive field of a pulse condition time sequence in a limited layer, and better captures the detail information and morphological change characteristics of a pulse condition signal, so that the pulse condition signal identification model obtains good classification performance. The pulse condition identification method provided by the invention has higher accuracy and can provide reference for the diagnosis of the health state of the cardiovascular system of the human body.

Because the monocycle waveform lengths of different individuals are not consistent, and the TCN network needs to input a time sequence with consistent length, the time sequence needs to be normalized. The commonly used normalization processing modes comprise tail zero-padding and multi-sampling, and rate resampling. The former, however, results in a large amount of meaningless information in sequences with short pulse periods, and the latter, in turn, may lose important detail changes. Therefore, in order to keep the original shape and information integrity of the pulse condition time sequence, the invention adopts the following normalized processing mode.

The method for normalizing the length of the monocycle waveform in the step 2 is that,

step 1, setting a uniform pulse sequence length to at least accommodate a single-cycle pulse wave;

and 2, supplementing the single-period pulse wave obtained by cutting the same sample at the end of the sequence until the set sequence length is reached.

The method for performing time sequence convolution network calculation on a training set and a verification set is characterized in that a plurality of residual block structures are connected in series to form a time sequence convolution network, each residual block structure is provided with two expansion cause-effect convolution layers with the same parameters, after an input one-dimensional pulse condition time sequence is convolved by the expansion cause-effect convolution layers, weight normalization is performed, then a linear rectification function is used as an activation function, and finally regularization is performed twice according to the step; in the process of performing residual connection identity mapping on the residual block, 1 × 1 convolution is used, so that the dimensionality of the input tensor and the dimensionality of the output tensor are consistent.

The structure diagram of the residual block structure is shown in fig. 2, and the TCN is a network that can be used to deal with timing problems. For a longer pulse condition time sequence, the conventional CNN cannot well capture the dependent information in the sequence in a limited level due to the limitation of the size of the convolution kernel. The TCN adopts a new network structure, and solves the defect of CNN in processing time sequence.

In order to capture more morphological change information in a time series when a pulse condition time series is subjected to convolution processing, it is necessary to enlarge a Receptive Field (RF) of a neural node. In a standard convolution network, enlarging the receptive field generally includes increasing the depth of a network layer, increasing the size of a convolution kernel, pooling, increasing a step size, and the like, but this may result in a large increase in the amount of computation and loss of important feature information in a pulse temporal sequence. The expansion convolution can realize exponential growth of the receptive field while keeping the pulse condition characteristic information by injecting holes into a standard convolution kernel. Based on the causal convolution, a hole is added to the convolution kernel according to the expansion rate (d) to realize the expansion of the convolution kernel. As can be seen in fig. 3, when 1, the convolution kernel size is consistent with the 2 × 1 standard convolution kernel, with a field size of 2 in the one-dimensional convolution, and when d is 2, the node expands the field of view of the pulse time series to 4, and when d is 2

When d is 4, the receptive field of the node pair sequence is again doubled.

Wherein the method of the dilated causal convolution is,

the invention establishes a pulse signal classification model based on TCN, and a model network is shown as figure 4. In the network structure, two stages of TCNs are stacked, so that the reception field is larger and the network is more stable. The input pulse time series dimension was 800 × 1, and experiments have shown that the convolution layer uses a 5 × 1 filter, the step size of sliding is 1, and the filter numbers in the preceding and following stages are set to 32 and 64, respectively. The expansion ratio in the TCN residual block is set to [1,2,4,8,16,32,64] and repeated in two stages of TCNs with a zigzag structure. The front stage has a receptive field size of 509 for the pulse time series, whereas by using two stages of TCNs the receptive field can be increased to 1017. It can be seen that the stacked TCN can realize a larger receptive field through a few layers of networks, greatly reduces the calculation cost compared with the traditional CNN and other networks, and has good characteristic information capturing capability for a longer pulse condition time sequence.

The method for establishing the deep learning network model needs a large amount of sample data as a support, but due to time limitation, social environment and other reasons, the quantity of pulse condition samples acquired clinically is limited, so that the method for establishing the deep learning network model uses the pulse condition data provided by the four-diagnosis information comprehensive research laboratory of Shanghai medical university. The data sampling frequency is 720Hz, and the seven types of pulse condition samples are 1812 cases, wherein 221 cases of smooth pulses, 96 cases of flat pulses, 92 cases of thin pulses, 657 cases of string pulses, 202 cases of thin smooth pulses, 325 cases of thin string pulses and 219 cases of string smooth pulses. Before training the time sequence convolution network, the pressure pulse wave signals are subjected to normalization processing. Therefore, the classification performance of the model on pulse condition data acquired by the pulse-taking bracelet is not influenced by the amplitude difference caused by the inconsistent signal amplification factors of different acquisition devices.

In order to avoid overfitting of a classification model caused by unbalanced sample number and reduce classification performance, the invention uses a few-class sample oversampling technology to balance the number of seven-class pulse condition samples to obtain 4355 samples in total, and the ratio of each sample is respectively as follows: 14.68%, 14.83%, 13.95%, 14.39%, 13.51%, 14.10%, 14.54%. The sample ratio of the training set, the validation set and the test set is set to be 6:2:2 in the experiment.

The invention adopts the accuracy, Precision and Recall indexes commonly used in the classification model research to evaluate the overall performance of the model, and the accuracy Accuarcy, Precision and Recall are defined as follows

Among the three evaluation indexes, the accuracy rate represents the overall prediction accuracy, the accuracy rate represents the prediction accuracy of a certain type of prediction result, and the recall rate represents the prediction accuracy of a certain type of sample. In addition, a confusion matrix is adopted to show the distribution of the prediction results of various pulse condition samples.

The TCN model established by the invention realizes the feature extraction of the time series through convolution. Fig. 5 shows 32 sets of characteristic curves (partial curves are shown in the figure) extracted after the seven classes of pulse condition time series are convolved by 32 filters in the previous TCN layer. It can be seen from the figure that the extracted characteristic curve has strong correlation with the morphological change of the pulse condition time sequence: the first group basically keeps the main change trend characteristics of the pulse condition waveform; the change amplitude of the second group of characteristic curves is small, and the small changes on the waveform, such as the central depression and the dicrotic wave, are mainly reflected; the third group is very sensitive to the jumping point of the pulse wave, and obvious peaks appear after the beginning of each pulse period; the fourth group shows the difference of waveform change for different pulse conditions in time sequence, for example, the chordal pulse and its accompanying pulse are changed acutely, while the thready pulse and its accompanying pulse are changed less, so that the group of characteristics can be considered to retain deeper information in the pulse condition time sequence. Therefore, in the pre-stage TCN layer established by the invention, 32 groups of characteristic curves extracted from the same pulse condition time sequence are different from each other and have characteristics. Therefore, the TCN can be judged to keep the morphological characteristics of the pulse condition time sequence from different angles in the characteristic self-learning process and form information complementation.

In the later stage TCN layer, the 32 groups of characteristic curve sequences are convolved by 64 filters to output 64 characteristics. As shown in fig. 6, three groups of feature extraction results of each pulse condition time series are randomly selected for comparison.

It can be seen that, in 64 features of the same class of pulse condition time series, there are some obvious similar intervals (shown by black line frames in the figure) on the feature value change curve, and the similar intervals of these feature points reflect the existence of common points on the same class of pulse condition time series, indicating that they carry similar or identical physiological and pathological information. In the interval between the concurrent pulses, there is a partial similar interval (shown by a red frame in the figure), for example, in the interval from the 25 th to the 45 th characteristic points of the thready pulse and the thready slippery pulse, the change curve shows 2 to 3 main peaks, and the positions and the amplitudes of the characteristic points of the peaks and the troughs are basically consistent, which indicates that the time sequence of the thready pulse also has morphological information of the time sequence of the thready pulse. In short, 64 features show similar value changes for pulse condition samples of the same kind or the same property, and the commonalities provide important basis for subsequent pulse condition classification and identification.

To further study the variability of pulse time series features extracted through the deep learning network TCN between seven groups of pulses, statistical analysis was performed on 64 features of all test set samples using SPSS 24.0. Firstly, after the normality test and the homogeneity test of the variance are carried out on 64 characteristics of seven groups of samples, the results are found that the requirements of using the parameter test are not met, so that the Kruskal-Wallis H test method in the non-parameter test is selected to carry out statistical analysis on seven groups of independent pulse condition samples, and the significance level is 0.05. The results show that the seven groups of samples showed significant differences (P <0.05) in the 64 extracted features as a whole, i.e. for each feature, at least two groups of the seven groups of pulse profile time series had significant differences in the feature. Further, the present invention performed post-hoc tests on the differences between the pulse profile characteristics of each group and adjusted the significance level to 0.00238 using Bonferroni correction. FIG. 7 shows the feature number of the partial similarity and partial similarity of pulse sample feature value curve showing the partial similarity of the same type pulse sample feature value curve and the statistical difference of pulse sample feature value curve in the seven groups of pulse time series in comparison with each other (P <0.00238) at page 63 of the university of great university of eastern science university academic thesis.

At least more than half of 64 features exist between any two groups of pulse conditions

The number of features with significant difference between the chordal and level is at most 58; the number of features with significant differences between thready and slippery pulses is a minimum of 38. In addition, table 5.2 also shows that the number of features with significant differences between single pulses (smooth, flat, thready, chordal) is relatively large, with a minimum of 45 between thready and flat pulses; the number of features with significant differences between the concurrent pulses (thready and slippery pulse, thready and chordal pulse, chordal and slippery pulse) is relatively small, and the maximum value is 42 between the thready and slippery pulse; the number of the characteristics of the concurrent pulses and the corresponding concurrent single pulses (such as the thready and slippery pulses, the thready and slippery pulses) with significant difference is relatively small, the maximum value is 48 between the thready and slippery pulses, and the minimum value is 38 between the thready and slippery pulses. Therefore, 64 characteristics extracted from the pulse condition time sequence in a TCN self-learning mode well reserve key information on pulse condition morphological change, so that a plurality of characteristics have larger difference between single pulses, and the extracted characteristics also reserve partial information of the concurrent pulse condition on the concurrent pulse condition due to the characteristics of multiple pulse conditions, so that the significant difference exists between the concurrent pulse condition and the concurrent pulse condition.

The number of features is slightly less than between single pulses. Through the qualitative and quantitative analysis of the pulse condition time sequence characteristics extracted by the TCN, the characteristics directly extracted by utilizing time sequence convolution on the pulse condition time sequence can be judged to better reflect the morphological changes of different pulse conditions on the time domain, some detail information and deeper morphological information can be sufficiently mined and retained, and a large number of pulse condition characteristics with significant differences on different types of pulse conditions are obtained.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

11页详细技术资料下载

Method for establishing traditional Chinese medicine pulse condition recognition model based on time sequence convolution network

相关技术

网友询问留言