Topic segmentation method and device for subtitle dialog flow

文档序号:1556936 发布日期:2020-01-21 浏览:17次 中文

阅读说明:本技术 字幕对话流的主题分割方法及装置 (Topic segmentation method and device for subtitle dialog flow ) 是由 周强 张镭镧 于 2019-09-24 设计创作,主要内容包括:本发明提供一种字幕对话流的主题分割方法及装置,方法包括:基于BERT提取字幕对话流的句子序列中每个句子的语义特征,并根据每个句子的语义特征将每个句子表示为向量;将所有所述句子的向量输入至TCN,输出所述句子序列对应的标签序列;其中,所述句子序列中的句子与所述标签序列中的标签一一对应;根据所述标签序列,对所述字幕对话流进行主题分割。本发明自动完成字幕对话流的主题分割,有效提高了分割效率和准确率。(The invention provides a method and a device for segmenting a theme of a subtitle dialog stream, wherein the method comprises the following steps: extracting semantic features of each sentence in a sentence sequence of the subtitle dialog stream based on BERT, and representing each sentence as a vector according to the semantic features of each sentence; inputting the vectors of all the sentences into the TCN, and outputting tag sequences corresponding to the sentence sequences; wherein sentences in the sentence sequence correspond to tags in the tag sequence one by one; and performing theme segmentation on the subtitle dialog flow according to the label sequence. The invention automatically completes the theme segmentation of the subtitle dialog flow, and effectively improves the segmentation efficiency and accuracy.)

1. A method for topic segmentation of a subtitle dialog stream, comprising:

extracting semantic features of each sentence in a sentence sequence of the subtitle dialog stream based on BERT, and representing each sentence as a vector according to the semantic features of each sentence;

inputting the vectors of all the sentences into the TCN, and outputting tag sequences corresponding to the sentence sequences; wherein sentences in the sentence sequence correspond to tags in the tag sequence one by one;

and performing theme segmentation on the subtitle dialog flow according to the label sequence.

2. The method for topic segmentation of a subtitle dialog stream according to claim 1, wherein the step of extracting semantic features of each sentence in the sentence sequence of the subtitle dialog stream based on BERT specifically comprises:

when the subtitle dialog flow is Chinese, dividing each sentence into N characters;

processing each word based on the BERT to obtain a feature vector with the size of H corresponding to each word;

each sentence is represented as a matrix of N × H.

3. The method for topic segmentation of a subtitle dialog stream according to claim 1, wherein the step of extracting semantic features of each sentence in the sentence sequence of the subtitle dialog stream based on BERT specifically comprises:

adding self attention to the output of the previous layer of each layer of the BERT, and outputting a matrix of N x H; wherein N is the length of the sentence input into the BERT, and H is the size of the hidden layer in the BERT;

and taking the matrix output by the second last layer of the BERT as the semantic feature of each sentence.

4. The method of claim 2, wherein the step of representing each sentence as a vector according to semantic features of each sentence specifically comprises:

and averagely pooling the semantic features of each sentence, and averagely pooling the semantic features of each sentence into a feature vector with the size of H.

5. The method for topic segmentation of subtitle dialog streams of any one of claims 1-4, wherein the sequence of tags includes tag 0 and tag 1;

wherein the tag 1 represents a theme transition flag of the subtitle dialog stream.

6. The method for topic segmentation of a subtitle stream according to any one of claims 1-4, wherein the method for topic segmentation of a subtitle stream according to claim 1 is characterized in that the step of inputting all the vectors of sentences into TCN and outputting the tag sequence corresponding to the sentence sequence further comprises:

the TCN is trained using a loss function FocalLoss.

7. An apparatus for topic segmentation of a subtitle stream, comprising:

a sentence representation module for extracting semantic features of each sentence in a sentence sequence of the subtitle dialog stream based on BERT and representing each sentence as a vector according to the semantic features of each sentence;

the theme detection module is used for inputting the vectors of all the sentences into the TCN and outputting the tag sequences corresponding to the sentence sequences; wherein sentences in the sentence sequence correspond to tags in the tag sequence one by one;

and the theme segmentation module is used for carrying out theme segmentation on the subtitle dialog flow according to the label sequence.

8. The apparatus for topic segmentation of subtitle dialog streams of claim 7, wherein the sentence representation module is specifically configured to:

when the subtitle dialog flow is Chinese, dividing each sentence into N characters;

processing each word based on the BERT to obtain a feature vector with the size of H corresponding to each word;

each sentence is represented as a matrix of N × H.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the method for topic segmentation of a subtitle dialog stream according to any one of claims 1 to 6.

10. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, performs the steps of the method for topic segmentation of a subtitle dialog stream according to any one of claims 1 to 6.

Technical Field

The invention belongs to the technical field of topic segmentation of text streams, and particularly relates to a topic segmentation method and device of a subtitle dialog stream.

Background

The topic segmentation technology is used for dividing a text into a plurality of segments according to a topic, wherein the continuity of the topic is kept in each segment, and the topic conversion exists before and after a segmentation point.

This process is illustrated in fig. 1, where the left side of fig. 1 is the original dialog flow before segmentation, where the dialog contains multiple topics, and during the dialog process it is possible to transfer from one topic to another new topic, where the boundaries between the dialogs covered by different topics are not explicitly indicated; on the right side of fig. 1 are segmented dialog segments, each small square representing a dialog segment, the utterances in the dialog segments referring to the same topic, and different dialog segments usually having different topics and being represented by different fills, where the boundaries between the dialog segments are explicitly indicated.

Topic segmentation may provide support for tasks such as text summarization, information extraction, dialog analysis, question answering, and the like. For example, a long text stream, such as a meeting recording or a subtitle, will typically contain segments of different subject matter. If the topic segmentation is not carried out, when a specific part needs to be searched, although the sentence where the specific part is located can be found by searching the keywords, the beginning of the topic segment is difficult to locate. And the text stream is segmented according to the topics and then organized, so that the summarization and the retrieval of the topic segments are easier.

The huge amount of movie titles and television titles on the internet provides rich data resources for natural language processing. A subtitle file of a movie is a typical text stream, which records the speaking contents of each character in time sequence, but each piece of speaking content does not mark the identity information of a speaker, and does not clearly give a transition mark between scenes, and dialog segments with different topics are usually connected together. The transition points for the subject matter snippets typically need to be determined by manually marked transition markers.

The existing traditional method generally utilizes some statistical characteristics of sentences as judgment bases for theme transformation, and often performs theme segmentation on single-word texts such as news, encyclopedias, textbooks and the like or texts such as conference records and the like, wherein the texts are formal, the sentences are long, the number of terms reflecting the themes is large, and the theme cohesion is high. In contrast, the dialog flow text such as the movie subtitle is generally short in words, spoken and low in topic cohesion, and topic segmentation on the dialog flow text by using the traditional method is difficult to achieve a good effect.

Disclosure of Invention

In order to overcome the problem that the existing topic segmentation method is poor in segmentation effect when applied to a subtitle dialog stream or at least partially solve the problem, embodiments of the present invention provide a topic segmentation method and apparatus for a subtitle dialog stream.

According to a first aspect of the embodiments of the present invention, there is provided a method for topic segmentation of a subtitle dialog stream, including:

extracting semantic features of each sentence in a sentence sequence of the subtitle dialog stream based on BERT, and representing each sentence as a vector according to the semantic features of each sentence;

inputting vectors of all sentences into a TCN (Temporal Convolutional Network), and outputting a tag sequence corresponding to the sentence sequence; wherein sentences in the sentence sequence correspond to tags in the tag sequence one by one;

and performing theme segmentation on the subtitle dialog flow according to the label sequence.

According to a second aspect of the embodiments of the present invention, there is provided a topic segmentation apparatus for a subtitle dialog stream, including:

a sentence representation module for extracting semantic features of each sentence in a sentence sequence of the subtitle dialog stream based on BERT and representing each sentence as a vector according to the semantic features of each sentence;

the theme detection module is used for inputting the vectors of all the sentences into the TCN and outputting the tag sequences corresponding to the sentence sequences; wherein sentences in the sentence sequence correspond to tags in the tag sequence one by one;

and the theme segmentation module is used for carrying out theme segmentation on the subtitle dialog flow according to the label sequence.

According to a third aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor calls the program instruction to execute the topic segmentation method for the subtitle dialog stream provided in any one of the various possible implementations of the first aspect.

According to a fourth aspect of embodiments of the present invention, there is also provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method for topic segmentation of a subtitle dialog stream provided in any one of the various possible implementations of the first aspect.

The embodiment of the invention provides a method and a device for segmenting a topic of a subtitle dialog flow, wherein the method starts from the time sequence characteristics of the subtitle dialog flow, utilizes a pre-training language model BERT to encode the sentence semantics in the subtitle dialog flow, expresses the sentences as vectors, further utilizes a sentence-level sequence labeling architecture, automatically completes the detection of topic transformation in the dialog flow based on a time sequence convolution network, starts from the overall situation of the whole dialog, comprehensively considers the sentence semantics transformation, outputs a tag sequence with the same length as the sentence sequence, and effectively improves the segmentation efficiency and the accuracy.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a diagram illustrating a segmentation of a conversation flow topic in the prior art;

fig. 2 is a schematic flowchart of a topic segmentation method for a subtitle dialog flow according to an embodiment of the present invention;

fig. 3 is a schematic overall architecture diagram of a topic segmentation method for a subtitle dialog flow according to an embodiment of the present invention;

fig. 4 is a schematic flow chart illustrating a semantic expression vector of a sentence extracted based on BERT in the topic segmentation method for a subtitle dialog flow according to the embodiment of the present invention;

fig. 5 is a schematic structural diagram of a TCN in the topic segmentation method for a subtitle dialog stream according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a topic segmentation apparatus for a subtitle dialog stream according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

In an embodiment of the present invention, a method for segmenting a topic of a subtitle dialog stream is provided, and fig. 2 is a schematic flow chart of the method for segmenting the topic of the subtitle dialog stream provided in the embodiment of the present invention, where the method includes: s201, extracting semantic features of each sentence in a sentence sequence of the subtitle dialog flow based on BERT, and representing each sentence as a vector according to the semantic features of each sentence;

among them, a BERT (Bidirectional Encoder representation based on transforms) model is a pre-training language model, and this embodiment adopts BERT as a sentence generation code, which is used to extract semantic features from sentences and represent the sentences as vectors.

Since sentences of human language are composed of discrete literal symbols, it is difficult to directly participate in computation in a computer. Therefore, to make the computer able to process natural language, it is necessary to digitize the characters, i.e. encode a sentence into a set of vectors by some encoding method, and then use the set of vectors to represent the sentence and send it to the TCN at the back end for calculation. Different encoding modes have different expression capabilities for sentence semantics.

In general, in a natural language processing task, a sentence is first segmented, each word is converted into a corresponding word vector sequence, and a sentence is further represented as a matrix X ═ X (X ═ y)1,x2,…,xn) Wherein x isiA d-dimensional word vector representing the ith word. Common models for sentence coding include word2vec, ELMO (embedded from language models), GPT (generic Pre-trained Transformer model), BERT, and so on.

Wherein, word2vec focuses on the semantic relationship between the learning words, but word2vec can only statically map a word to a fixed vector, but cannot adjust the word vector of the word according to the context, so that different semantics of the ambiguous word cannot be distinguished; the ELMO adds forward and backward double-layer LSTM (Long Short-term memory) networks on the basis of word vectors, uses different layers to respectively code the syntactic characteristics and semantic characteristics of sentences, and weights and sums the word vectors at the bottom layer and the vectors representing the syntactic and semantic characteristics of the other two layers when a new sentence is to be coded after pre-training, so that the word vectors are not fixed and can be dynamically adjusted according to the context. In order to make up for the deficiency of LSTM feature extraction capability in ELMO, GPT adopts a Transformer model as a feature extractor. However, the GPT is trained only by using a one-way language model, which makes the GPT only able to be predicted in combination with the context. BERT improves this structure by using a bi-directional language model for pre-training based on the use of multiple layers of transformers, enabling the model to be predicted in conjunction with both the context and context.

In the embodiment, in order to better represent the sentence semantics, the pre-training model BERT with the strongest language representation capability is adopted for sentence representation. BERT is pre-training completed by performing two tasks of a masking language model and predicting the next sentence on the basis of mass linguistic data. BERT is a multi-layer bi-directional structure in which each layer is a transform encoder that extracts features from the output of the next layer. The input sentence is preliminarily coded into a plurality of lexical item codes after word segmentation, and each lexical item code is composed of three layers of characteristics of word representation, position representation and delimiter representation.

S202, inputting the vectors of all sentences into the TCN, and outputting tag sequences corresponding to the sentence sequences; wherein sentences in the sentence sequence correspond to tags in the tag sequence one by one;

in this embodiment, a time-series convolutional network TCN is used to perform topic detection on a sentence, and topic segment boundaries, i.e., topic transformation points, are detected on the basis of examining global information of a conversation. When the topic is detected, a structure of sentence-level sequence labeling is adopted, and for each sentence in the conversation flow, a 0-1 label is output to indicate whether the sentence is the beginning of a new topic fragment. For example, tag 0 indicates that the sentence is not the beginning of a new topic segment, and tag 1 indicates that the sentence is the beginning of a new topic segment, i.e., tag 1 marks the boundary of a topic segment. The tag sequence is the same length as the sentence sequence in the dialog flow.

The conventional model of sequence labeling is generally a Recurrent Neural Network (RNN) structure, and usually adopts a long and short time memory unit LSTM. However, the LSTM has limited memory and suffers from long-distance information attenuation. For longer documents, LSTM does not have a good idea of its global structure. In addition, LSTM does not support parallel computing and convergence speed is slow. Therefore, in this embodiment, a time-series convolutional network TCN is selected for topic detection. TCNs are commonly used for sequence modeling tasks, for sequence data of size N { x1,x2,…,xnTCN can generate prediction sequences y of the same length1,y2,…,yn}。

And S203, performing theme segmentation on the subtitle dialog flow according to the label sequence.

And according to the label of each sentence in the label sequence, taking the sentence with the current label of 1 as a new topic segment, and segmenting the sentence into conversation segments with the same topic from the beginning of the sentence to the next sentence with the label of 1, thereby segmenting the original subtitle conversation stream into a plurality of conversation segments.

FIG. 3 shows the overall structure of the present embodiment, for the sentence sequence S1,S2,…,SMAnd extracting semantic features { E } of each sentence by using BERT1,E2,…,EMAnd after the semantic features are subjected to average pooling, sending the semantic features into a time sequence convolution network TCN, and outputting a 0-1 label sequence with the same length as the sentence sequence.

In the embodiment, starting from the time sequence characteristics of the subtitle dialog flow, the sentence semantics in the subtitle dialog flow are coded by using the pre-training language model BERT, the sentences are expressed as vectors, then, the sentence-level sequence labeling architecture is used, the detection of the topic transformation in the dialog flow is automatically completed based on the time sequence convolution network, the sentence semantics transformation is comprehensively considered from the overall situation of the whole dialog, the tag sequence with the same length as the sentence sequence is output, and the segmentation efficiency and the accuracy are effectively improved.

On the basis of the foregoing embodiment, in this embodiment, the step of extracting the semantic feature of each sentence in the sentence sequence of the subtitle dialog stream based on BERT specifically includes: when the subtitle dialog flow is Chinese, dividing each sentence into N characters; processing each word based on the BERT to obtain a feature vector with the size of H corresponding to each word; each sentence is represented as a matrix of N × H.

Specifically, in the current BERT, word segmentation processing is not adopted for chinese, but a single word is used as a basic unit of a sentence. Therefore, for a sentence containing N words, after BERT processing, a feature vector with the size of H corresponding to each single word is output, and the whole sentence is represented as a matrix of N x H.

On the basis of the foregoing embodiment, in this embodiment, the step of extracting the semantic feature of each sentence in the sentence sequence of the subtitle dialog stream based on BERT specifically includes: adding self attention to the output of the previous layer of each layer of the BERT, and outputting a matrix of N x H; wherein N is the length of a sentence input into the pre-training language model BERT, and H is the size of a hidden layer in the BERT; and taking a matrix output by the reciprocal preset layer of the BERT as the semantic feature of each sentence.

As shown in FIG. 4, there are 12 layers in BERT, each layer adds self-attention to the output of the previous layer and outputs the shape of [ N, H ]]Where N and H are the length of the input sentence and the size of the hidden layer, respectively. Generally, the weighting parameters of the higher network layers will typically have the original task-related information. Since BERT is pre-trained for the two original tasks, the masking language model and the next sentence prediction, so that the closer to the last layer, the more biased the weight parameters are for the two targets, this will reduce the generality of the extracted semantic vector. In addition, the weight parameters closer to the bottom layer are closer to the word vectors of the original lexical items without having high heightHierarchical semantic information. Therefore, from the perspective of semantic expression capability, an output matrix with the preset number of layers of reciprocal of BERT is extracted as the semantic features of sentences. The preset number of layers can be set to 2 layers, namely, the output matrix of the penultimate layer is used as the semantic feature of the sentence. The output matrix of the second last layer is averaged and pooled, and the vector (e) of the sentence is output1,e2,…,eL)。

On the basis of the above embodiment, the step of representing each sentence as a vector according to the semantic features of each sentence in this embodiment specifically includes: and averagely pooling the semantic features of each sentence, and averagely pooling the semantic features of each sentence into a feature vector with the dimension H.

Specifically, in order to improve the computational efficiency, the semantic features of each sentence are not directly fed into the TCN network at the back end, but are averaged and pooled into a vector with dimension H, which is used as the feature representation of the sentence input into the TCN network.

In this embodiment, the TCN convolution layer uses an expanded convolution and a causal convolution, and the TCN is added with a residual error network. The TCN structure is shown in fig. 5, and the most significant feature is that the extended convolution is adopted, and the closer to the upper layer, the larger the convolution window is. Using the dilation convolution allows each layer of the TCN to be as large as the input sequence and to have a larger Receptive field (Receptive field) than a normal convolutional network with the same number of layers. The causal convolution is used in the convolutional layer of the TCN to ensure that the prediction of time step t only uses the information of time step t-1 and before, which is a feature that is well suited to the conversational flow of sentences appearing in time sequence. The residual error network is added in the structure of the TCN, so that the bottom layer characteristics can be directly sent to the high layer, the performance of the TCN is improved, and the TCN can better learn the overall characteristics of the sequence by the characteristics. Furthermore, the convolution operation of the TCN can be computed in parallel, which can greatly shorten the training and prediction time compared to the RNN.

On the basis of the above embodiment, in this embodiment, the step of inputting the vectors of all the sentences into the TCN and outputting the tag sequences corresponding to the sentence sequences further includes: the TCN is trained using the Loss function, Focal local.

In particular, since the number of segmentation boundaries is very sparse in the topic segmentation task, the distribution of 0 and 1 in the tag sequence is greatly unbalanced, which easily causes the skewing of the TCN network learning. To solve the class imbalance problem, the Loss function in TCN is used as the Focal local, which is calculated as:

where γ is an integer, and can be set to 2 in general. Focal local addresses the problem of category imbalances by adjusting for differences in the degree of interest for different categories. For example, when there are far more negative samples (y 0) than positive samples (y 1) in a data set, the TCN network model may tend to declare the samples as negative

Figure BDA0002213386770000092

For negative samples (y ═ 0), this timeAnd

Figure BDA0002213386770000096

the models are very small, and the models do not need to be adjusted too much on the part of samples; for a positive sample (y ═ 1), this time

Figure BDA0002213386770000094

And

Figure BDA0002213386770000095

the models are large in size, parameters of the models need to be adjusted to a large degree on the samples, so that the original positive samples with small quantity have larger influence on the models, and the problem of class imbalance is well solved.

In another embodiment of the present invention, an apparatus for topic segmentation of a subtitle dialog stream is provided, which is used to implement the methods in the foregoing embodiments. Therefore, the description and definition in the embodiments of the subject segmentation method for a subtitle dialog stream described above may be used for understanding of the respective execution modules in the embodiments of the present invention. Fig. 6 is a schematic structural diagram of a topic segmentation apparatus for a subtitle dialog stream according to an embodiment of the present invention, where the apparatus includes a sentence representation module 601, a topic detection module 602, and a topic segmentation module 603, where:

the sentence representation module 601 is configured to extract a semantic feature of each sentence in a sentence sequence of the subtitle dialog stream based on BERT, and represent each sentence as a vector according to the semantic feature of each sentence;

the BERT model is a pre-training language model, and the sentence expression module 601 generates a code for a sentence by BERT, and is configured to extract semantic features from the sentence and express the sentence as a vector.

The topic detection module 602 is configured to input vectors of all the sentences into the TCN, and output a tag sequence corresponding to the sentence sequence; wherein sentences in the sentence sequence correspond to tags in the tag sequence one by one;

the topic detection module 602 performs topic detection on the sentence by using the time-series convolutional network TCN, and detects a topic segment boundary, i.e., a topic transformation point, on the basis of examining the global information of the dialog. When the topic is detected, a structure of sentence-level sequence labeling is adopted, and for each sentence in the conversation flow, a 0-1 label is output to indicate whether the sentence is the beginning of a new topic fragment. Thus, the tag sequence is the same length as the sentence sequence in the dialog flow.

The topic segmentation module 603 is configured to perform topic segmentation on the subtitle dialog stream according to the tag sequence.

The topic segmentation module 603 takes the current sentence with the tag of 1 as a new topic segment according to the tag of each sentence in the tag sequence, and segments the current sentence with the tag of 1 into dialog segments with the same topic from the beginning of the sentence to the beginning of the next sentence with the tag of 1, thereby segmenting the original subtitle dialog stream into a plurality of dialog segments.

In the embodiment, starting from the time sequence characteristics of the subtitle dialog flow, the sentence semantics in the subtitle dialog flow are coded by using the pre-training language model BERT, the sentences are expressed as vectors, then, the sentence-level sequence labeling architecture is used, the detection of the topic transformation in the dialog flow is automatically completed based on the time sequence convolution network, the sentence semantics transformation is comprehensively considered from the overall situation of the whole dialog, the tag sequence with the same length as the sentence sequence is output, and the segmentation efficiency and the accuracy are effectively improved.

On the basis of the foregoing embodiment, the sentence expression module in this embodiment is specifically configured to: when the subtitle dialog flow is Chinese, dividing each sentence into N characters; processing each word based on the BERT to obtain a feature vector with the size of H corresponding to each word; each sentence is represented as a matrix of N × H.

On the basis of the foregoing embodiment, the sentence expression module in this embodiment is specifically configured to: adding self attention to the output of the previous layer of each layer of the BERT, and outputting a matrix of N x H; wherein N is the length of the sentence input into the BERT, and H is the size of the hidden layer in the BERT; and taking a matrix output by the reciprocal preset layer of the BERT as the semantic feature of each sentence.

On the basis of the foregoing embodiment, the sentence expression module in this embodiment is specifically configured to: and averagely pooling the semantic features of each sentence, and averagely pooling the semantic features of each sentence into a feature vector with the size of H.

On the basis of the above embodiments, in this embodiment, the tag sequence includes tag 0 and tag 1; wherein the tag 1 represents a theme transition flag of the subtitle dialog stream.

On the basis of the foregoing embodiments, the present embodiment further includes a training module, configured to train the TCN using a loss function focallloss.

The embodiment provides an electronic device, and fig. 7 is a schematic structural diagram of an electronic device provided in the embodiment of the present invention, where the electronic device includes: at least one processor 701, at least one memory 702, and a bus 703; wherein the content of the first and second substances,

the processor 701 and the memory 702 communicate with each other via a bus 703;

the memory 702 stores program instructions executable by the processor 701, and the processor calls the program instructions to perform the methods provided by the method embodiments, for example, the methods include: extracting semantic features of each sentence in a sentence sequence of the subtitle dialog stream based on BERT, and representing each sentence as a vector according to the semantic features of each sentence; inputting the vectors of all the sentences into the TCN, and outputting tag sequences corresponding to the sentence sequences; wherein sentences in the sentence sequence correspond to tags in the tag sequence one by one; and performing theme segmentation on the subtitle dialog flow according to the label sequence.

The present embodiments provide a non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the methods provided by the above method embodiments, for example, including: extracting semantic features of each sentence in a sentence sequence of the subtitle dialog stream based on BERT, and representing each sentence as a vector according to the semantic features of each sentence; inputting the vectors of all the sentences into the TCN, and outputting tag sequences corresponding to the sentence sequences; wherein sentences in the sentence sequence correspond to tags in the tag sequence one by one; and performing theme segmentation on the subtitle dialog flow according to the label sequence.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

13页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:在线人工中文文本标注系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!