Text processing method, system, device and medium

文档序号:1905184 发布日期:2021-11-30 浏览:5次 中文

阅读说明:本技术 一种文本处理方法、系统、设备及介质 (Text processing method, system, device and medium ) 是由 李超 朱昱锦 徐亮 于 2021-08-31 设计创作,主要内容包括:本发明提供一种文本处理方法、系统、设备及介质,涉及人工智能技术领域,通过对获取的音频数据进行识别,生成原始识别文本;将原始识别文本进行合并,并对合并文本进行序号标记,记录原始识别文本在合并文本中的开始序号和结束序号,得到序号区间;关联所有序号区间,形成目标字典;对截取文本进行目标文本识别,并确定截取文本的开始序号和结束序号在目标字典中的区间位置,以及确定截取文本横跨原始识别文本的文本段落数量;将文本段落数量与预设阈值进行比较,根据比较结果对截取文本进行段落切分,获取段落切分结果。本发明不会让得到的段落切分结果与原始识别文本存在差异,而且本发明也仅完成一次切分,并不需要对切分结果进行再次确认。(The invention provides a text processing method, a system, equipment and a medium, which relate to the technical field of artificial intelligence, and generate an original identification text by identifying acquired audio data; merging the original identification texts, marking the serial numbers of the merged texts, and recording the starting serial numbers and the ending serial numbers of the original identification texts in the merged texts to obtain a serial number interval; associating all the sequence number intervals to form a target dictionary; performing target text recognition on the intercepted text, determining the interval positions of the starting sequence number and the ending sequence number of the intercepted text in a target dictionary, and determining the number of text paragraphs of the intercepted text crossing the original recognized text; and comparing the number of the text paragraphs with a preset threshold value, and performing paragraph segmentation on the intercepted text according to a comparison result to obtain a paragraph segmentation result. The invention can not lead the obtained paragraph segmentation result to have difference with the original recognition text, and the invention only completes one segmentation without confirming the segmentation result again.)

1. A method of text processing, the method comprising the steps of:

acquiring audio data formed by a first target object and a second target object, and identifying the audio data to generate a plurality of original identification texts;

combining the original identification texts to form a combined text, marking a sequence number of each text character in the combined text, and recording a starting sequence number and an ending sequence number of each original identification text in the combined text in an interval form to obtain a sequence number interval of each original identification text;

associating sequence number intervals of all original recognition texts to form a target dictionary;

randomly intercepting part of texts or all texts from the combined text for target text recognition, respectively determining the interval positions of the starting sequence number and the ending sequence number of the intercepted text in the target dictionary after the target text recognition is completed, and determining the number of text paragraphs of the intercepted text crossing the original recognized text according to the interval positions of the intercepted text;

and comparing the number of the text paragraphs with a preset threshold, and performing paragraph segmentation on the intercepted text according to a comparison result to obtain a corresponding paragraph segmentation result.

2. The method of claim 1, wherein paragraph segmentation is performed on the truncated text according to the comparison result, and the process of obtaining a corresponding paragraph segmentation result comprises:

if the number of the text paragraphs is larger than or equal to a first threshold value, segmenting the intercepted text according to complete original identification texts contained in the intercepted text, and taking a plurality of segmented text paragraphs as corresponding paragraph segmentation results;

if the number of the text paragraphs is equal to a second threshold value, adding a marker to the intercepted text, segmenting the intercepted text by using the added marker, and taking a plurality of segmented text paragraphs as corresponding paragraph segmentation results;

if the number of the text paragraphs is equal to a third threshold value, the intercepted text is not segmented, and the intercepted text is directly used as a paragraph segmentation result;

wherein the first threshold is greater than the second threshold, which is greater than the third threshold.

3. The text processing method of claim 1, wherein the process of performing target text recognition on the intercepted text comprises:

acquiring a reference text in the same scene with the merged text;

performing part-of-speech tagging on the reference text to acquire nouns and pronouns in the reference text, and performing part-of-speech tagging on the intercepted text to acquire nouns and pronouns in the intercepted text;

extracting entities in the reference text from nouns and pronouns in the reference text by using a dependency syntax analysis method to serve as reference entities; extracting entities in the intercepted text from nouns and pronouns in the intercepted text by using a dependency syntax analysis method to serve as entities to be compared;

calculating the similarity between the reference entity and the entity to be compared, and comparing the calculated similarity result with a preset similarity threshold; if the calculated similarity is larger than or equal to a preset similarity threshold, determining that a target text exists in the intercepted text; and if the calculated similarity is smaller than a preset similarity threshold, determining that the target text does not exist in the intercepted text.

4. The method of claim 2, wherein if the number of paragraphs is equal to a second threshold, the segmenting the truncated text comprises:

if the number of the text paragraphs is equal to a second threshold value, connecting each original recognition text spanned by the intercepted text by using a preset marker to obtain a corresponding connection text;

adding the marker behind each character in the intercepted text, and marking the text added with the marker as a marked text; wherein each mark text at least comprises one mark symbol;

after the addition of the marker is completed for the last but one character in the intercepted text, whether a certain marked text exists as a substring of the connection text is judged; and if a certain marked text exists as the substring of the connecting text, segmenting the intercepted text through the marker, and taking a plurality of segmented text paragraphs as corresponding paragraph segmentation results.

5. The text processing method of claim 1, wherein the process of recognizing the audio data to generate a plurality of original recognized texts comprises:

performing feature extraction on the audio data;

decoding the extracted audio characteristic data by using a pre-trained acoustic model and a language model to obtain a corresponding recognition text;

and dividing the corresponding recognition texts into a plurality of original recognition texts according to the pause time of the first target object and/or the second target object during the conversation.

6. The text processing method according to any one of claims 1 to 5, wherein the target text comprises an illegal text that does not meet a target scene requirement.

7. A text processing system, comprising:

the audio acquisition module is used for acquiring audio data formed by the first target object and the second target object;

the audio identification module is used for identifying the audio data to generate a plurality of original identification texts;

the text merging module is used for merging the original identification texts to form a merged text;

the text marking module is used for marking the serial number of each text character in the combined text, and recording the starting serial number and the ending serial number of each original identification text in the combined text in an interval form to obtain the serial number interval of each original identification text;

the sequence number interval association module is used for associating sequence number intervals of all original recognition texts to form a target dictionary;

the text recognition module is used for randomly intercepting part of texts or all texts from the combined texts to perform target text recognition;

the text paragraph module is used for respectively determining the interval positions of the starting sequence number and the ending sequence number of the intercepted text in the target dictionary after the target text recognition is finished, and determining the number of text paragraphs of the intercepted text crossing the original recognized text according to the interval positions of the intercepted text;

and the paragraph segmentation module is used for comparing the number of the text paragraphs with a preset threshold value, and segmenting the paragraphs of the intercepted text according to the comparison result to obtain a corresponding paragraph segmentation result.

8. The document processing system of claim 7, wherein the paragraph segmentation module comprises a first segmentation unit, a second segmentation unit and a third segmentation unit;

the first segmentation unit is used for segmenting the intercepted text according to the complete original identification text contained in the intercepted text when the number of the text paragraphs is greater than or equal to a first threshold value, and taking the segmented text paragraphs as corresponding paragraph segmentation results;

the second segmentation unit is used for adding a marker to the intercepted text when the number of the text paragraphs is equal to a second threshold value, segmenting the intercepted text by using the added marker, and taking the segmented text paragraphs as corresponding paragraph segmentation results;

the third segmentation unit is used for directly taking the intercepted text as a paragraph segmentation result when the number of the text paragraphs is equal to a third threshold value;

wherein the first threshold is greater than the second threshold, which is greater than the third threshold.

9. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein: the processor, when executing the computer program, realizes the steps of the method of any of claims 1-6.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.

Technical Field

The present invention relates to the field of artificial intelligence technology, and in particular, to a text processing method, system, device, and medium.

Background

In some consultation-type project conversations, it is currently common to determine whether there is an illegal statement between a customer service (e.g., human customer service, intelligent customer service) and a conversation of a customer through an algorithm model or a regular engine. Since the corresponding dialogue consultation is generally completed through voice dialogue or text dialogue between the customer service and the customer, for the voice dialogue, the conventional recognition method is to convert the voice formed by the dialogue between the customer service and the customer into a voice text, and then recognize the voice text to determine whether the dialogue between the customer service and the customer has an illegal sentence. However, when converting speech formed by a dialog between a customer service and a client into a speech text, due to the discontinuity of speech of the client or the customer service, the corresponding speech text may be irregularly divided into a plurality of original recognized texts. If the plurality of original recognition texts are subjected to text recognition, it may not be possible to truly reflect whether an illegal sentence exists in the conversation between the customer service and the client. Therefore, in practical applications, a plurality of original recognition texts are combined to form a whole dialog, and then the whole dialog is used as an input to realize the recognition of the illegal text. Meanwhile, when the illegal sentence recognition result is output, paragraph information among a plurality of original recognition texts is added into the recognition result, so that a real illegal sentence recognition result is obtained.

However, the inventor finds that, when paragraph segmentation is performed on the violation sentence recognition result in the prior art, a situation of wrong segmentation or inefficient segmentation may occur, for example, the following situations of wrong segmentation or inefficient segmentation may occur: (1) when a plurality of original recognition texts with partial overlapping appear in the violation sentence recognition result, if the violation sentence recognition result is directly segmented according to the original recognition texts with partial overlapping, the segmented paragraphs may be different from the original recognition text. (2) When paragraph segmentation is performed on the illegal sentence recognition result, if a substring relationship exists, the illegal sentence recognition result is segmented according to the original recognition text which only retains the longest substring relationship, and then the segmented paragraph may also have a difference from the original recognition text. (3) The paragraph numbers of a plurality of original recognition texts are recorded, and then the illegal sentence recognition result is segmented according to all paragraphs with the maximum continuous numbers, although the correct original recognition texts can be obtained by the paragraph segmentation mode, when repeated paragraphs or texts exist, specific numbers can be confirmed from the original recognition texts again, so that one layer of loop operation is added when the numbers are recorded, extra calculation amount is added when the maximum continuous numbers are judged later, and the whole operation efficiency is not high.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, an object of the present invention is to provide a text processing method, system, device and medium, which are used to solve the problem of performing error segmentation on the recognition result of an illegal sentence in the prior art.

To achieve the above and other related objects, the present invention provides a text processing method, comprising:

acquiring audio data formed by a first target object and a second target object, and identifying the audio data to generate a plurality of original identification texts;

combining the original identification texts to form a combined text, marking a sequence number of each text character in the combined text, and recording a starting sequence number and an ending sequence number of each original identification text in the combined text in an interval form to obtain a sequence number interval of each original identification text;

associating sequence number intervals of all original recognition texts to form a target dictionary;

randomly intercepting part of texts or all texts from the combined text for target text recognition, respectively determining the interval positions of the starting sequence number and the ending sequence number of the intercepted text in the target dictionary after the target text recognition is completed, and determining the number of text paragraphs of the intercepted text crossing the original recognized text according to the interval positions of the intercepted text;

and comparing the number of the text paragraphs with a preset threshold, and performing paragraph segmentation on the intercepted text according to a comparison result to obtain a corresponding paragraph segmentation result.

Optionally, the process of performing paragraph segmentation on the intercepted text according to the comparison result and obtaining a corresponding paragraph segmentation result includes:

if the number of the text paragraphs is larger than or equal to a first threshold value, segmenting the intercepted text according to complete original identification texts contained in the intercepted text, and taking a plurality of segmented text paragraphs as corresponding paragraph segmentation results;

if the number of the text paragraphs is equal to a second threshold value, adding a marker to the intercepted text, segmenting the intercepted text by using the added marker, and taking a plurality of segmented text paragraphs as corresponding paragraph segmentation results;

if the number of the text paragraphs is equal to a third threshold value, the intercepted text is not segmented, and the intercepted text is directly used as a paragraph segmentation result;

wherein the first threshold is greater than the second threshold, which is greater than the third threshold.

Optionally, the process of performing target text recognition on the intercepted text includes:

acquiring a reference text in the same scene with the merged text;

performing part-of-speech tagging on the reference text to acquire nouns and pronouns in the reference text, and performing part-of-speech tagging on the intercepted text to acquire nouns and pronouns in the intercepted text;

extracting entities in the reference text from nouns and pronouns in the reference text by using a dependency syntax analysis method to serve as reference entities; extracting entities in the intercepted text from nouns and pronouns in the intercepted text by using a dependency syntax analysis method to serve as entities to be compared;

calculating the similarity between the reference entity and the entity to be compared, and comparing the calculated similarity result with a preset similarity threshold; if the calculated similarity is larger than or equal to a preset similarity threshold, determining that a target text exists in the intercepted text; and if the calculated similarity is smaller than a preset similarity threshold, determining that the target text does not exist in the intercepted text.

Optionally, if the number of text paragraphs is equal to a second threshold, the process of segmenting the truncated text includes:

if the number of the text paragraphs is equal to a second threshold value, connecting each original recognition text spanned by the intercepted text by using a preset marker to obtain a corresponding connection text;

adding the marker behind each character in the intercepted text, and marking the text added with the marker as a marked text; wherein each mark text at least comprises one mark symbol;

after the addition of the marker is completed for the last but one character in the intercepted text, whether a certain marked text exists as a substring of the connection text is judged; and if a certain marked text exists as the substring of the connecting text, segmenting the intercepted text through the marker, and taking a plurality of segmented text paragraphs as corresponding paragraph segmentation results.

Optionally, the recognizing the audio data and generating a plurality of original recognition texts includes:

performing feature extraction on the audio data;

decoding the extracted audio characteristic data by using a pre-trained acoustic model and a language model to obtain a corresponding recognition text;

and dividing the corresponding recognition texts into a plurality of original recognition texts according to the pause time of the first target object and/or the second target object during the conversation.

Optionally, the target text includes violation text that does not meet target scenario requirements.

The invention also provides a text processing system, which comprises:

the audio acquisition module is used for acquiring audio data formed by the first target object and the second target object;

the audio identification module is used for identifying the audio data to generate a plurality of original identification texts;

the text merging module is used for merging the original identification texts to form a merged text;

the text marking module is used for marking the serial number of each text character in the combined text, and recording the starting serial number and the ending serial number of each original identification text in the combined text in an interval form to obtain the serial number interval of each original identification text;

the sequence number interval association module is used for associating sequence number intervals of all original recognition texts to form a target dictionary;

the text recognition module is used for randomly intercepting part of texts or all texts from the combined texts to perform target text recognition;

the text paragraph module is used for respectively determining the interval positions of the starting sequence number and the ending sequence number of the intercepted text in the target dictionary after the target text recognition is finished, and determining the number of text paragraphs of the intercepted text crossing the original recognized text according to the interval positions of the intercepted text;

and the paragraph segmentation module is used for comparing the number of the text paragraphs with a preset threshold value, and segmenting the paragraphs of the intercepted text according to the comparison result to obtain a corresponding paragraph segmentation result.

Optionally, the paragraph splitting module includes a first splitting unit, a second splitting unit and a third splitting unit;

the first segmentation unit is used for segmenting the intercepted text according to the complete original identification text contained in the intercepted text when the number of the text paragraphs is greater than or equal to a first threshold value, and taking the segmented text paragraphs as corresponding paragraph segmentation results;

the second segmentation unit is used for adding a marker to the intercepted text when the number of the text paragraphs is equal to a second threshold value, segmenting the intercepted text by using the added marker, and taking the segmented text paragraphs as corresponding paragraph segmentation results;

the third segmentation unit is used for directly taking the intercepted text as a paragraph segmentation result when the number of the text paragraphs is equal to a third threshold value;

wherein the first threshold is greater than the second threshold, which is greater than the third threshold.

The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the above methods when executing the computer program.

The invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the method of any one of the above.

As described above, the present invention provides a text processing method, system, device, and medium, which have the following advantages: the method effectively records the paragraph information of the original recognition text by adopting a target dictionary mode, and accurately restores the paragraph information of the intercepted text according to the starting sequence number and the ending sequence number of the intercepted text; in the process of restoring paragraph information of the intercepted text, the final paragraph segmentation result is obtained by recording the number of the paragraphs spanned by the intercepted text and then automatically adopting different processing modes according to different spanned numbers of the paragraphs. When the intercepted text spans different paragraph numbers, corresponding paragraph segmentation results can be obtained by different segmentation methods, and the paragraph segmentation is performed on the intercepted text by the segmentation method recorded in the invention, so that the obtained paragraph segmentation results are not different from the original recognition text, and when repeated paragraphs or texts exist in the intercepted text, the invention only completes one segmentation without confirming the segmentation results again according to the original recognition text after the segmentation is completed. Therefore, compared with the method for circularly judging the original paragraphs in the prior art, the method has the advantages that the efficiency is greatly improved, the accuracy is high, and the unexpected situations such as multi-segment repetition of the intercepted text and the original recognized text do not need to worry about the interference to the segmentation. Meanwhile, the whole technical scheme of the invention has simple and clear thought and the number of communication steps is less than that of the prior art, so that the invention can complete more technical functions and realize more technical effects by fewer steps.

Drawings

Fig. 1 is a schematic flowchart of a text processing method according to an embodiment;

fig. 2 is a schematic flowchart of a text processing method according to another embodiment;

FIG. 3 is a flowchart illustrating a process of performing target text recognition on an intercepted text according to an embodiment;

FIG. 4 is a diagram illustrating a hardware architecture of a text processing system according to an embodiment;

fig. 5 is a schematic hardware structure diagram of a paragraph segmentation module according to an embodiment;

fig. 6 is a schematic hardware structure diagram of a text processing apparatus according to an embodiment.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

Referring to fig. 1, the present invention provides a text processing method, which includes the following steps:

s10, audio data formed by the first target object and the second target object is obtained. As an example, when a client consults a life insurance project, for example, a conversation voice formed when the client makes a voice conversation with a human customer service or an intelligent customer service may be acquired; the first target object can be a client, and the second target object can be a human service or an intelligent service.

S20, the formed audio data is recognized, and a plurality of original recognition texts are generated. As an example, the process of obtaining a plurality of original recognized texts may include: extracting the characteristics of audio data formed by conversation, and then decoding the extracted audio characteristic data by utilizing a pre-trained acoustic model and a language model to obtain a corresponding recognition text; and dividing the corresponding recognition texts into a plurality of original recognition texts according to the pause time of the first target object and/or the second target object during the conversation. As an example, for example, Mel-Frequency Cepstral Coefficient (MFCC) or Linear Perceptual Prediction (PLP) features may be extracted from the dialogue speech, and then the extracted audio feature data may be decoded using a mixture gaussian model-hidden markov model (GMM-HMM) as a pre-trained acoustic model to obtain a recognition text corresponding to the dialogue speech; and then dividing the corresponding recognition texts into a plurality of original recognition texts according to the pause time of the customer and/or the manual customer service and the intelligent customer service during the conversation. In this embodiment, before performing Voice recognition, noise Voice data in the dialog Voice may be detected by a Voice Activity Detection (VAD) method, and after the noise Voice data is detected, a least mean square adaptive filter may be used to perform noise reduction on the noise Voice data, so that a recognition text without the noise Voice data may be obtained.

And S30, merging the original recognition texts to form a merged text. As an example, the way of merging the multiple original recognized texts in the method may be direct combination, that is, combining the multiple original recognized texts directly to form one merged text. For example, after a certain dialogue speech is recognized, a plurality of original recognition texts are obtained: p1, P2, P3 …, PN; the plurality of original recognition texts P1, P2, P3 …, PN are merged to obtain a merged text P. Specifically, if there are 3 original recognition texts P1: "hello", P2: "I am ABC" and P3: "ask you for yes", the merged text P formed by directly combining the 3 original recognized texts is: "hello i me ABC asks you yes".

S40, marking the serial number of each text character in the combined text, and recording the starting serial number and the ending serial number of each original identification text in the combined text in a section form to obtain the serial number section of each original identification text; when the serial number of each text character in the combined text is marked, the serial number is recorded by starting with zero. As an example, taking the original recognized texts P1, P2 and P3 in step S30 as an example, the original recognized text P1 (i.e., "hello") has a start sequence number of 0 and an end sequence number of 1 in the merged text P; the original recognition text P2 (i.e., "i am ABC") has a start sequence number of 2 and an end sequence number of 6 in the merged text P; the original identification text P3 (i.e., "ask you yes") has a start sequence number of 7 and an end sequence number of 10 in the merged text P; recording the start sequence numbers and the end sequence numbers of the original recognition texts P1, P2 and P3 in the form of intervals, the sequence number interval of the original recognition text P1 is [0, 1], the sequence number interval of the original recognition text P2 is [2, 6], and the sequence number interval of the original recognition text P2 is [7, 10 ].

And S50, associating the sequence number intervals of all the original recognition texts to form a target dictionary. As an example, taking the sequence number sections of the original recognition texts P1, P2 and P3 in step S40 as an example, the target dictionary D is: { 'P1': [0, 1], 'P2': [2, 6], 'P3': [7, 10]}.

S60, randomly intercepting part of text or all text from the combined text for target text recognition, respectively determining the interval positions of the starting sequence number and the ending sequence number of the intercepted text in the target dictionary after the target text recognition is completed, and determining the number of text paragraphs of the intercepted text crossing the original recognized text according to the interval positions of the intercepted text. As an example, for example, a text segment may be randomly intercepted from the merged text P to obtain a corresponding intercepted text T; and carrying out target text recognition on the intercepted text T, and simultaneously recording a starting sequence number N1 and an ending sequence number N2 of the intercepted text T in the combined text P. If the intercepted text T at the current moment is: if "good me is ABC asking you yes", the starting sequence number N1 of the intercepted text T at the current time (i.e., "good me is ABC asking you yes") in the merged text P is 1, the ending sequence number N2 is 9, and the number of paragraphs spanned by the intercepted text T at the current time is 3, i.e., the original texts P1, P2 and P3 are spanned. In this embodiment, the target text includes, but is not limited to, an illegal text that does not meet the target scene requirements; the target scenes include, but are not limited to, life insurance, car insurance, and the like. When the life insurance project consultation is carried out, the corresponding target scene is the life insurance scene; when the vehicle insurance project consultation is carried out, the corresponding target scene is the vehicle insurance scene. In the life risk scenario, the violation text may be a text violating laws and regulations, a text violating ethical regulations, a text with an abuse property, and the like, and the violation text in the method may be determined manually in advance according to the target scenario, which is not described herein again.

And S70, comparing the number of the text paragraphs with a preset threshold, and performing paragraph segmentation on the intercepted text according to the comparison result to obtain a corresponding paragraph segmentation result.

The method effectively records paragraph information of an original recognition text by adopting a target dictionary mode, and accurately restores the paragraph information of the intercepted text according to a starting sequence number and an ending sequence number of the intercepted text; in the process of restoring paragraph information of the intercepted text, the final paragraph segmentation result is obtained by recording the number of paragraphs crossing the original identification text of the intercepted text and then automatically adopting different processing modes according to the number of the crossed paragraphs. When the intercepted text spans different paragraph numbers, corresponding paragraph segmentation results can be obtained by different segmentation methods, and the paragraph segmentation is performed on the intercepted text by the segmentation method recorded in the method, so that the obtained paragraph segmentation results are not different from the original recognition text, and when repeated paragraphs or texts exist in the intercepted text, the method only completes one segmentation without confirming the segmentation results again according to the original recognition text after the segmentation is completed. Therefore, compared with the method for circularly judging the original paragraphs in the prior art, the method has the advantages that the efficiency is greatly improved, the accuracy is high, and the unexpected situations such as multi-segment repetition of the intercepted text and the original recognized text do not need to worry about the interference to the segmentation.

According to the above description, in an exemplary embodiment, the process of performing paragraph segmentation on the intercepted text according to the comparison result and obtaining the corresponding paragraph segmentation result includes: and if the number of the text paragraphs is larger than or equal to a first threshold value, segmenting the intercepted text according to the complete original identification text contained in the intercepted text, and taking the segmented text paragraphs as corresponding paragraph segmentation results. And if the number of the text paragraphs is equal to a second threshold value, adding a marker to the intercepted text, segmenting the intercepted text by using the added marker, and taking the segmented text paragraphs as corresponding paragraph segmentation results. And if the number of the text paragraphs is equal to a third threshold value, not segmenting the intercepted text, and directly taking the intercepted text as a paragraph segmentation result. Wherein the first threshold is greater than the second threshold, which is greater than the third threshold. By way of example, as shown in FIG. 2, the original text P1 (i.e., "hello"), P2 (i.e., "I am ABC"), P3 (i.e., "ask you be"), the merged text P (i.e., "hello i am ABC ask you be"), the text T is intercepted (i.e., "hello am ABC ask you be"), the text T is intercepted at the beginning sequence number N1 and the end sequence number N2 of the merged text P, and the corresponding target dictionary D (i.e., "{ 'P1': 0, 1], 'P2': 2, 6], 'P3': 7, 10) }") described above are taken as examples. Since the start sequence number N1 of the clipped text T in the merged text P at this time is 1 and the end sequence number N2 is 9, it can be known that the start sequence number N1 of the clipped text T at this time is 1 in the interval [0, 1] in the target dictionary D, and the end sequence number N2 of the clipped text T at this time is 9 in the interval [7, 10] in the target dictionary D; that is, the beginning sequence number N1 of the intercepted text T at this time is located in the interval of the original recognized text P1 in the target dictionary D, and the ending sequence number N2 of the intercepted text T at this time is located in the interval of the original recognized text P3 in the target dictionary D; the number N of text paragraphs spanned by the clipped text T at this time is 3, that is, the clipped text T at this time spans the original recognized texts P1, P2 and P3. In this embodiment, the first threshold, the second threshold, and the third threshold may be set according to actual situations, and the present application does not limit the values thereof. As an example, the present embodiment may select to set the first threshold to 3, the second threshold to 2, and the third threshold to 1.

In an exemplary embodiment, as shown in fig. 3, the step S60 of performing target text recognition on the intercepted text includes:

s610, acquiring a reference text in the same scene with the merged text. As an example, when the life insurance item consultation is performed, the corresponding target scene is the life insurance scene; when the vehicle insurance project consultation is carried out, the corresponding target scene is the vehicle insurance scene. The reference text refers to a reference vocabulary and a Chinese sentence which are equivalent when the intercepted text is subjected to target text recognition; for example, in a life insurance scenario, the reference text may be "building jump", "suicide", "robbery", etc. In the health consultation scene, the target text can also be keywords in an electronic medical record text, a medical literature text, a medical instrument name and a medical institution name. For example, the target text may be keywords in the electronic medical record text, such as respiratory tract, respiratory tract infection, respiratory membrane, arterial blood oxygen partial pressure, bronchitis, asthma, respiratory failure, pneumonia, and the like.

S620, performing part-of-speech tagging on the reference text, acquiring nouns and pronouns in the reference text, and extracting entities in the reference text from the nouns and pronouns in the reference text by using a dependency syntax analysis method to serve as reference entities. Specifically, nouns and pronouns in the reference text are extracted by using an open source library of Pyltp and Hanlp in a manner of labeling the part of speech of the reference text and depending on syntactic analysis. The Pyltp and the Hanlp are basic natural language processing libraries issued by Hagongda and Hankcs respectively and are used for part-of-speech tagging and entity extraction. The implementation steps are as follows: 1. article fragmentation for reference text: and (5) carrying out sentence breaking according to punctuation marks of the sentence breaking. Setting the segment length; 2. and calling part-of-speech tagging (POS) and dependency syntactic analysis (DP) modules in two libraries of Pyltp and Hanlp to perform part-of-speech tagging and entity extraction on the reference text, and returning the result of the part-of-speech tagging and entity extraction to the analysis result in a json form. Note that, for part-of-speech tagging, only words including n in the part-of-speech tagging labels, that is, various kinds of nouns, such as n being a general noun, ni being an organizational structure word, nl being a place word, ns being a geographic location word, nt being a time word, and a pronoun p, are retained. The labeling manner is as follows: i consult life-risk ═ i, (i, p), (life-risk, n). Dependency parsing, using the main-predicate-Object (SBV) relationship, labels corresponding words in the sentence in the reference text, such as "i consult the life risk" label (i, Subject), (consult, Predict), (life risk, Object), corresponds the extracted noun to the Subject and Object components, and deletes nouns that do not satisfy both components in the sentence. The use of both Pyltp and Hanlp is intended to avoid the case where one library is not completely recognized, and both results can improve the recognition and extraction accuracy of the entity. The dependency syntax analysis is firstly proposed by a french linguist l.tesniere, which analyzes a sentence into a dependency syntax tree, describes the dependency relationship between words, i.e. points out the syntactic collocation relationship between words, and the collocation relationship is associated with semantics. In the invention, entities in the training text can be extracted in a dependency syntax analysis mode.

S630, performing part-of-speech tagging on the intercepted text, acquiring nouns and pronouns in the intercepted text, and extracting entities in the intercepted text from the nouns and pronouns in the intercepted text by using a dependency syntax analysis method to serve as entities to be compared. The processes of intercepting part-of-speech tagging of the text and extracting the entity to be compared refer to the reference text, which is not described herein again.

S640, calculating the similarity between the reference entity and the entity to be compared, and comparing the calculated similarity result with a preset similarity threshold; if the calculated similarity is larger than or equal to a preset similarity threshold, determining that a target text exists in the intercepted text; and if the calculated similarity is smaller than a preset similarity threshold, determining that the target text does not exist in the intercepted text. Specifically, the similarity between the entity to be compared and the reference entity is calculated as follows:

SimSha(S1,S2)=Count(S1∩S2)/(Count(S1)+Count(S2));

wherein, S1 is an entity to be compared, S2 is a reference entity, S1 ≈ S2 indicates the same words contained in the entity to be compared S1 and the reference entity S2, Count (S1) indicates the number of words in the entity to be compared, Count (S2) indicates the number of words in the reference entity, and SimSha (S1, S2) indicates the similarity between the entity to be compared S1 and the reference entity S2. If the SimSha (S1, S2) is larger than or equal to a preset similarity threshold, determining that the target text exists in the intercepted text; and if the SimSha (S1, S2) is smaller than a preset similarity threshold value, the intercepted text is considered to have no target text. As an example, the preset similarity threshold in this embodiment may be set according to the recognition accuracy of the target scene, for example, may be set to 75%, may also be set to 80%, and may also be set to other numerical values. The target text in the method is the violation text in some embodiments.

In an exemplary embodiment, original text P1 (i.e., "hello"), P2 (i.e., "I am ABC"), P3 (i.e., "ask you yes"), merged text P (i.e., "hello i am ABC ask you yes"), text T is intercepted at the beginning and ending sequence numbers N1, N2 of the merged text P, target dictionary D (i.e., "{ 'P1': 0, 1 ',' P2 ': 2, 6', 'P3': 7, 10) }"), and the number of text segments N spanned are taken as examples. The clipped text T at this time spans 3 text paragraphs N, i.e., the clipped text T at this time spans the original recognized texts P1, P2 and P3 and contains the complete original recognized text P2. When the first threshold is 3, and the number N of text paragraphs spanned by the intercepted text T at this time is equal to the first threshold, the process of segmenting the intercepted text T into a plurality of text paragraphs according to the complete original recognized text included in the intercepted text T at this time and obtaining a corresponding paragraph segmentation result R may be: the intercepted text T at this time is re-divided into 3 text paragraphs, i.e., "good", "i.e.," ask you ", according to the original recognized text P2; taking the cut 3 text paragraphs as corresponding paragraph segmentation results, the paragraph segmentation result R at this time is: "good", "I am ABC" and "ask you". As another example, if at least 5 original recognition texts are formed after the conversion of the dialog speech, and the intercepted text T in the state spans 5 of the original recognition texts, the intercepted text T in the state at least includes 3 complete original recognition texts, at this time, the intercepted text T in the state may be re-segmented into 5 text paragraphs by using the 3 complete original recognition texts, and then the corresponding paragraph segmentation result R is obtained.

In an exemplary embodiment, if the number of text paragraphs is equal to a second threshold, the process of segmenting the truncated text includes: if the number of the text paragraphs is equal to a second threshold value, connecting each original recognition text spanned by the intercepted text by using a preset marker to obtain a corresponding connection text; adding the marker behind each character in the intercepted text, and marking the text added with the marker as a marked text; wherein each mark text at least comprises one mark symbol; after the addition of the marker is completed for the last but one character in the intercepted text, whether a certain marked text exists as a substring of the connection text is judged; and if a certain marked text exists as the substring of the connecting text, segmenting the intercepted text through the marker, and taking a plurality of segmented text paragraphs as corresponding paragraph segmentation results.

By way of example, take the original text P1 (i.e., "hello"), P2 (i.e., "I am ABC"), P3 (i.e., "ask you be"), merged text P (i.e., "hello i am ABC ask you be"), target dictionary D (i.e., { 'P1': 0, 1 ',' P2 ': 2, 6', 'P3': 7, 10) }). If the intercepted text T at this time is: if "good me is AB", it can be known from the above embodiment that the starting sequence number N1 of the intercepted text T (i.e., "good me is AB") in the merged text P is 1, and the ending sequence number N2 is 5; therefore, the start sequence number N1 of the clipped text T (i.e., "good i is AB") at this time is 1 in the interval [0, 1], and the end sequence number N2 of the clipped text T at this time is 5 in the interval [2, 6 ]; that is, the beginning sequence number N1 of the intercepted text T (i.e., "good me is AB") at this time is located in the interval of the original recognized text P1 in the target dictionary D, and the ending sequence number N2 is located in the interval of the original recognized text P2 in the target dictionary D; the number N of text paragraphs that the truncated text T corresponding to this time spans is 2, that is, the truncated text T at this time spans the original recognized text P1 and the original recognized text P2. If the second threshold in this embodiment is 2, the original recognized text P1 and the original recognized text P2 are connected by a preset marker (e.g., line break "\ n"), so as to obtain a connected text NP: "hello \ n I is ABC"; starting with the first text character of the truncated text T, a corresponding marker (e.g., line break "\ n") is added after each text character in the truncated text T at that time, respectively, until the penultimate text character to the truncated text T. In the present embodiment, the text after the corresponding characters in the intercepted text T are added with the markers is recorded as the marker text NT, so that in the present embodiment, after the addition of the markers is completed for the characters in the intercepted text T, a plurality of marker texts NT can be obtained. When adding a marker to the intercepted text T, only one marker is added at a time, that is, only one marker is included in each marked text NT. For example, with the intercepted text T at this time: "good me is AB" as an example, when a marker is added to the good me, the corresponding marker text NT is obtained as: "good \ n is AB", and when a marker is added to the good \ n, the corresponding marker text NT is obtained as follows: "good me \ n is AB", when a marker is added to the last but one character of the intercepted text at this time, the corresponding marked text NT is obtained as follows: "good I is A \ nB". And sequentially judging whether each mark text NT after the addition of the marker is a substring of the connecting text NP, namely judging whether the mark text NT has the same text in the connecting text NP. And if a certain mark text NT is a substring of the connecting text, segmenting the mark text NT through the corresponding mark symbol to obtain a corresponding paragraph segmentation result R. In this embodiment, since the intercepted text T is a substring of the merged text P, and the merged text P is merged from the original recognition texts P1 and P2, there is inevitably a substring of the concatenated text NP of the tagged text NT. That is, in the present embodiment, there is a mark text NT: "good \ n i AB" is the connecting text NP: "hello \ n i am a substring of ABC". Therefore, after segmenting the tagged text NT according to the tag "\ n", the paragraph segmentation result R at this time can be obtained as follows: "good", "I am AB".

In an exemplary embodiment, if the number of text paragraphs is equal to a third threshold, the truncated text is not cut, and the truncated text is directly used as a paragraph cutting result. By way of example, take the original text P1 (i.e., "hello"), P2 (i.e., "I am ABC"), P3 (i.e., "ask you be"), merged text P (i.e., "hello i am ABC ask you be"), target dictionary D (i.e., { 'P1': 0, 1 ',' P2 ': 2, 6', 'P3': 7, 10) }). If the intercepted text T at this time is: "ABC" is obtained from the above embodiment, where the starting sequence number N1 of the truncated text T (i.e., "ABC") is 4, and the ending sequence number N2 is 6. Therefore, the start sequence number N1-4 and the end sequence number N2-6 of the clipped text T (i.e., "ABC") at this time are both located in the interval [2, 6 ]; that is, the beginning sequence number N1 of the clipped text T (i.e., "ABC") at this time is located in the interval of the original recognized text P2 in the target dictionary D, the ending sequence number N2 is also located in the interval of the original recognized text P2 in the target dictionary D, the number N of text paragraphs spanned by the clipped text T at this time is 1, that is, the clipped text T at this time is located only in the original recognized text P2. If the third threshold is 1, the intercepted text T at this time does not need to be segmented again, and the corresponding paragraph segmentation result R is the text corresponding to the intercepted text T at this time, that is, the paragraph segmentation result R at this time is: "ABC".

In summary, the present invention provides a text processing method, which effectively records paragraph information of an original recognized text by using a target dictionary, and accurately restores the paragraph information of the intercepted text according to a start sequence number and an end sequence number of the intercepted text; in the process of restoring paragraph information of the intercepted text, the final paragraph segmentation result is obtained by recording the number of the paragraphs spanned by the intercepted text and then automatically adopting different processing modes according to different spanned numbers of the paragraphs. When the intercepted text spans different paragraph numbers, corresponding paragraph segmentation results can be obtained by different segmentation methods, and the paragraph segmentation is performed on the intercepted text by the segmentation method recorded in the method, so that the obtained paragraph segmentation results are not different from the original recognition text, and when repeated paragraphs or texts exist in the intercepted text, the method only completes one segmentation without confirming the segmentation results again according to the original recognition text after the segmentation is completed. Therefore, compared with the method for circularly judging the original paragraphs in the prior art, the method has the advantages that the efficiency is greatly improved, the accuracy is high, and the unexpected situations such as multi-segment repetition of the intercepted text and the original recognized text do not need to worry about the interference to the segmentation. Meanwhile, the whole technical scheme of the method is simple and clear in thought, and the number of communication steps is less than that of the prior art, so that the method can complete more technical functions and achieve more technical effects by fewer steps.

As shown in fig. 4, the present invention further provides a text processing system, which includes:

and the audio acquisition module M10 is configured to acquire audio data formed by the first target object and the second target object. As an example, when a client consults a life insurance project, for example, a conversation voice formed when the client makes a voice conversation with a human customer service or an intelligent customer service may be acquired; the first target object can be a client, and the second target object can be a human service or an intelligent service.

And the audio recognition module M20 is used for recognizing the audio data and generating a plurality of original recognition texts. As an example, the process of obtaining a plurality of original recognized texts may include: extracting the characteristics of audio data formed by conversation, and then decoding the extracted audio characteristic data by utilizing a pre-trained acoustic model and a language model to obtain a corresponding recognition text; and dividing the corresponding recognition texts into a plurality of original recognition texts according to the pause time of the first target object and/or the second target object during the conversation. As an example, for example, Mel-Frequency Cepstral Coefficient (MFCC) or Linear Perceptual Prediction (PLP) features may be extracted from the dialogue speech, and then the extracted audio feature data may be decoded using a mixture gaussian model-hidden markov model (GMM-HMM) as a pre-trained acoustic model to obtain a recognition text corresponding to the dialogue speech; and then dividing the corresponding recognition texts into a plurality of original recognition texts according to the pause time of the customer and/or the manual customer service and the intelligent customer service during the conversation. In this embodiment, before performing Voice recognition, noise Voice data in the dialog Voice may be detected by a Voice Activity Detection (VAD) method, and after the noise Voice data is detected, a least mean square adaptive filter may be used to perform noise reduction on the noise Voice data, so that a recognition text without the noise Voice data may be obtained.

And the text merging module M30 is configured to merge the multiple original recognized texts to form a merged text. As an example, the manner of merging the multiple original recognized texts in the present system may be direct combination, that is, combining the multiple original recognized texts directly to form one merged text. For example, after a certain dialogue speech is recognized, a plurality of original recognition texts are obtained: p1, P2, P3 …, PN; the plurality of original recognition texts P1, P2, P3 …, PN are merged to obtain a merged text P. Specifically, if there are 3 original recognition texts P1: "hello", P2: "I am ABC" and P3: "ask you for yes", the merged text P formed by directly combining the 3 original recognized texts is: "hello i me ABC asks you yes".

The text marking module M40 is used for marking the serial number of each text character in the combined text, and recording the starting serial number and the ending serial number of each original identification text in the combined text in an interval form to obtain the serial number interval of each original identification text; when the serial number of each text character in the combined text is marked, the serial number is recorded by starting with zero. As an example, taking the original recognized texts P1, P2 and P3 in step S30 as an example, the original recognized text P1 (i.e., "hello") has a start sequence number of 0 and an end sequence number of 1 in the merged text P; the original recognition text P2 (i.e., "i am ABC") has a start sequence number of 2 and an end sequence number of 6 in the merged text P; the original identification text P3 (i.e., "ask you yes") has a start sequence number of 7 and an end sequence number of 10 in the merged text P; recording the start sequence numbers and the end sequence numbers of the original recognition texts P1, P2 and P3 in the form of intervals, the sequence number interval of the original recognition text P1 is [0, 1], the sequence number interval of the original recognition text P2 is [2, 6], and the sequence number interval of the original recognition text P2 is [7, 10 ].

And the sequence number interval association module M50 is used for associating the sequence number intervals of all the original recognition texts to form a target dictionary. Taking the sequence number intervals of the original recognition texts P1, P2 and P3 in the text tagging module M40 as an example, the target dictionary D is: { 'P1': [0, 1], 'P2': [2, 6], 'P3': [7, 10]}.

And the text recognition module M60 is used for randomly intercepting part of text or all text from the combined text for target text recognition. As an example, for example, a piece of text may be randomly intercepted from the merged text P to obtain a corresponding intercepted text T, and then the intercepted text T is subjected to target text recognition. In this embodiment, the target text includes, but is not limited to, an illegal text that does not meet the target scene requirements; the target scenes include, but are not limited to, life insurance, car insurance, and the like. When the life insurance project consultation is carried out, the corresponding target scene is the life insurance scene; when the vehicle insurance project consultation is carried out, the corresponding target scene is the vehicle insurance scene. In the life insurance scenario, the violation text can be text violating laws and regulations, text violating ethical regulations, text with abuse property, and the like, and the violation text in the system can be predetermined manually according to the target scenario, which is not described herein again.

And the text paragraph module M70 is configured to, after the target text recognition is completed, determine the interval positions of the start sequence number and the end sequence number of the intercepted text in the target dictionary, and determine the number of text paragraphs of the intercepted text crossing the original recognized text according to the interval positions of the intercepted text. As an example, the intercepted text T at the current time is: if "good me is ABC asking you yes", the starting sequence number N1 of the intercepted text T at the current time (i.e., "good me is ABC asking you yes") in the merged text P is 1, the ending sequence number N2 is 9, and the number of paragraphs spanned by the intercepted text T at the current time is 3, i.e., the original texts P1, P2 and P3 are spanned.

And the paragraph segmentation module M80 is configured to compare the number of text paragraphs with a preset threshold, and perform paragraph segmentation on the intercepted text according to a comparison result to obtain a corresponding paragraph segmentation result.

The system effectively records paragraph information of an original recognition text by adopting a target dictionary mode, and accurately restores the paragraph information of the intercepted text according to the starting sequence number and the ending sequence number of the intercepted text; in the process of restoring paragraph information of the intercepted text, the final paragraph segmentation result is obtained by recording the number of paragraphs crossing the original identification text of the intercepted text and then automatically adopting different processing modes according to the number of the crossed paragraphs. When the system crosses different paragraph numbers, corresponding paragraph segmentation results can be obtained by different segmentation methods, and the paragraph segmentation is performed on the intercepted text by the segmentation method recorded in the system, so that the obtained paragraph segmentation results are not different from the original recognition text, and when repeated paragraphs or texts exist in the intercepted text, the system only completes one segmentation without confirming the segmentation results again according to the original recognition text after the segmentation is completed. Therefore, compared with the system for circularly judging the original paragraphs in the prior art, the system has the advantages of greatly improving the efficiency and high accuracy, and does not need to worry about the interference of unexpected situations such as multi-segment repetition of the intercepted text and the original recognized text to the segmentation.

According to the above description, in an exemplary embodiment, as shown in fig. 5, the paragraph splitting module M80 includes a first splitting unit D100, a second splitting unit D200, and a third splitting unit D300. The first segmentation unit D100 is configured to segment the truncated text according to a complete original identification text included in the truncated text when the number of text paragraphs is greater than or equal to a first threshold, and use a plurality of segmented text paragraphs as corresponding paragraph segmentation results. The second segmentation unit D200 is configured to add a marker to the intercepted text when the number of text paragraphs is equal to a second threshold, segment the intercepted text by using the added marker, and use a plurality of segmented text paragraphs as corresponding paragraph segmentation results. The third segmentation unit D300 is configured to directly use the truncated text as a paragraph segmentation result when the number of text paragraphs is equal to a third threshold. Wherein the first threshold is greater than the second threshold, which is greater than the third threshold. By way of example, as shown in FIG. 2, the original text P1 (i.e., "hello"), P2 (i.e., "I am ABC"), P3 (i.e., "ask you be"), the merged text P (i.e., "hello i am ABC ask you be"), the text T is intercepted (i.e., "hello am ABC ask you be"), the text T is intercepted at the beginning sequence number N1 and the end sequence number N2 of the merged text P, and the corresponding target dictionary D (i.e., "{ 'P1': 0, 1], 'P2': 2, 6], 'P3': 7, 10) }") described above are taken as examples. Since the start sequence number N1 of the clipped text T in the merged text P at this time is 1 and the end sequence number N2 is 9, it can be known that the start sequence number N1 of the clipped text T at this time is 1 in the interval [0, 1] in the target dictionary D, and the end sequence number N2 of the clipped text T at this time is 9 in the interval [7, 10] in the target dictionary D; that is, the beginning sequence number N1 of the intercepted text T at this time is located in the interval of the original recognized text P1 in the target dictionary D, and the ending sequence number N2 of the intercepted text T at this time is located in the interval of the original recognized text P3 in the target dictionary D; the number N of text paragraphs spanned by the clipped text T at this time is 3, that is, the clipped text T at this time spans the original recognized texts P1, P2 and P3. In this embodiment, the first threshold, the second threshold, and the third threshold may be set according to actual situations, and the present application does not limit the values thereof. As an example, the present embodiment may select to set the first threshold to 3, the second threshold to 2, and the third threshold to 1.

In an exemplary embodiment, the process of the text recognition module performing target text recognition on the intercepted text comprises the following steps:

acquiring a reference text in the same scene with the merged text; as an example, when the life insurance item consultation is performed, the corresponding target scene is the life insurance scene; when the vehicle insurance project consultation is carried out, the corresponding target scene is the vehicle insurance scene. The reference text refers to a reference vocabulary and a Chinese sentence which are equivalent when the intercepted text is subjected to target text recognition; for example, in a life insurance scenario, the reference text may be "building jump", "suicide", "robbery", etc. In the health consultation scene, the target text can also be keywords in an electronic medical record text, a medical literature text, a medical instrument name and a medical institution name. For example, the target text may be keywords in the electronic medical record text, such as respiratory tract, respiratory tract infection, respiratory membrane, arterial blood oxygen partial pressure, bronchitis, asthma, respiratory failure, pneumonia, and the like.

And extracting entities in the reference text from the nouns and pronouns in the reference text by using a dependency syntax analysis system to serve as reference entities. Specifically, nouns and pronouns in the reference text are extracted by using an open source library of Pyltp and Hanlp in a manner of labeling the part of speech of the reference text and depending on syntactic analysis. The Pyltp and the Hanlp are basic natural language processing libraries issued by Hagongda and Hankcs respectively and are used for part-of-speech tagging and entity extraction. The implementation steps are as follows: 1. article fragmentation for reference text: and (5) carrying out sentence breaking according to punctuation marks of the sentence breaking. Setting the segment length; 2. and calling part-of-speech tagging (POS) and dependency syntactic analysis (DP) modules in two libraries of Pyltp and Hanlp to perform part-of-speech tagging and entity extraction on the reference text, and returning the result of the part-of-speech tagging and entity extraction to the analysis result in a json form. Note that, for part-of-speech tagging, only words including n in the part-of-speech tagging labels, that is, various kinds of nouns, such as n being a general noun, ni being an organizational structure word, nl being a place word, ns being a geographic location word, nt being a time word, and a pronoun p, are retained. The labeling manner is as follows: i consult life-risk ═ i, (i, p), (life-risk, n). Dependency parsing, using the main-predicate-Object (SBV) relationship, labels corresponding words in the sentence in the reference text, such as "i consult the life risk" label (i, Subject), (consult, Predict), (life risk, Object), corresponds the extracted noun to the Subject and Object components, and deletes nouns that do not satisfy both components in the sentence. The use of both Pyltp and Hanlp is intended to avoid the case where one library is not completely recognized, and both results can improve the recognition and extraction accuracy of the entity. The dependency syntax analysis is firstly proposed by a french linguist l.tesniere, which analyzes a sentence into a dependency syntax tree, describes the dependency relationship between words, i.e. points out the syntactic collocation relationship between words, and the collocation relationship is associated with semantics. In the invention, entities in the training text can be extracted in a dependency syntax analysis mode.

And extracting entities in the intercepted text from the nouns and pronouns in the intercepted text by using a dependency syntax analysis system to serve as entities to be compared. The processes of intercepting part-of-speech tagging of the text and extracting the entity to be compared refer to the reference text, which is not described herein again.

Calculating the similarity between a reference entity and an entity to be compared, and comparing the calculated similarity result with a preset similarity threshold; if the calculated similarity is larger than or equal to a preset similarity threshold, determining that a target text exists in the intercepted text; and if the calculated similarity is smaller than a preset similarity threshold, determining that the target text does not exist in the intercepted text. Specifically, the similarity between the entity to be compared and the reference entity is calculated as follows:

SimSha(S1,S2)=Count(S1∩S2)/(Count(S1)+Count(S2));

wherein, S1 is an entity to be compared, S2 is a reference entity, S1 ≈ S2 indicates the same words contained in the entity to be compared S1 and the reference entity S2, Count (S1) indicates the number of words in the entity to be compared, Count (S2) indicates the number of words in the reference entity, and SimSha (S1, S2) indicates the similarity between the entity to be compared S1 and the reference entity S2. If the SimSha (S1, S2) is larger than or equal to a preset similarity threshold, determining that the target text exists in the intercepted text; and if the SimSha (S1, S2) is smaller than a preset similarity threshold value, the intercepted text is considered to have no target text. As an example, the preset similarity threshold in this embodiment may be set according to the recognition accuracy of the target scene, for example, may be set to 75%, may also be set to 80%, and may also be set to other numerical values. The target text in the present system is the offending text in some embodiments.

In an exemplary embodiment, original text P1 (i.e., "hello"), P2 (i.e., "I am ABC"), P3 (i.e., "ask you yes"), merged text P (i.e., "hello i am ABC ask you yes"), text T is intercepted at the beginning and ending sequence numbers N1, N2 of the merged text P, target dictionary D (i.e., "{ 'P1': 0, 1 ',' P2 ': 2, 6', 'P3': 7, 10) }"), and the number of text segments N spanned are taken as examples. The clipped text T at this time spans 3 text paragraphs N, i.e., the clipped text T at this time spans the original recognized texts P1, P2 and P3 and contains the complete original recognized text P2. When the first threshold is 3, and the number N of text paragraphs spanned by the intercepted text T at this time is equal to the first threshold, the process of segmenting the intercepted text T into a plurality of text paragraphs by the first segmenting unit D100 according to the complete original recognized text included in the intercepted text T at this time, and obtaining the corresponding paragraph segmentation result R may be: the intercepted text T at this time is re-divided into 3 text paragraphs, i.e., "good", "i.e.," ask you ", according to the original recognized text P2; taking the cut 3 text paragraphs as corresponding paragraph segmentation results, the paragraph segmentation result R at this time is: "good", "I am ABC" and "ask you". As another example, if at least 5 original recognition texts are formed after the conversion of the dialog speech, and the intercepted text T in the state spans 5 of the original recognition texts, the intercepted text T in the state at least includes 3 complete original recognition texts, at this time, the intercepted text T in the state may be re-segmented into 5 text paragraphs by using the 3 complete original recognition texts, and then the corresponding paragraph segmentation result R is obtained.

In an exemplary embodiment, if the number of text paragraphs is equal to a second threshold, the process of segmenting the truncated text by the second segmenting unit includes: if the number of the text paragraphs is equal to a second threshold value, connecting each original recognition text spanned by the intercepted text by using a preset marker to obtain a corresponding connection text; adding the marker behind each character in the intercepted text, and marking the text added with the marker as a marked text; wherein each mark text at least comprises one mark symbol; after the addition of the marker is completed for the last but one character in the intercepted text, whether a certain marked text exists as a substring of the connection text is judged; and if a certain marked text exists as the substring of the connecting text, segmenting the intercepted text through the marker, and taking a plurality of segmented text paragraphs as corresponding paragraph segmentation results.

By way of example, take the original text P1 (i.e., "hello"), P2 (i.e., "I am ABC"), P3 (i.e., "ask you be"), merged text P (i.e., "hello i am ABC ask you be"), target dictionary D (i.e., { 'P1': 0, 1 ',' P2 ': 2, 6', 'P3': 7, 10) }). If the intercepted text T at this time is: if "good me is AB", it can be known from the above embodiment that the starting sequence number N1 of the intercepted text T (i.e., "good me is AB") in the merged text P is 1, and the ending sequence number N2 is 5; therefore, the start sequence number N1 of the clipped text T (i.e., "good i is AB") at this time is 1 in the interval [0, 1], and the end sequence number N2 of the clipped text T at this time is 5 in the interval [2, 6 ]; that is, the beginning sequence number N1 of the intercepted text T (i.e., "good me is AB") at this time is located in the interval of the original recognized text P1 in the target dictionary D, and the ending sequence number N2 is located in the interval of the original recognized text P2 in the target dictionary D; the number N of text paragraphs that the truncated text T corresponding to this time spans is 2, that is, the truncated text T at this time spans the original recognized text P1 and the original recognized text P2. If the second threshold in this embodiment is 2, the original recognized text P1 and the original recognized text P2 are connected by a preset marker (e.g., line break "\ n"), so as to obtain a connected text NP: "hello \ n I is ABC"; starting with the first text character of the truncated text T, a corresponding marker (e.g., line break "\ n") is added after each text character in the truncated text T at that time, respectively, until the penultimate text character to the truncated text T. In the present embodiment, the text after the corresponding characters in the intercepted text T are added with the markers is recorded as the marker text NT, so that in the present embodiment, after the addition of the markers is completed for the characters in the intercepted text T, a plurality of marker texts NT can be obtained. When adding a marker to the intercepted text T, only one marker is added at a time, that is, only one marker is included in each marked text NT. For example, with the intercepted text T at this time: "good me is AB" as an example, when a marker is added to the good me, the corresponding marker text NT is obtained as: "good \ n is AB", and when a marker is added to the good \ n, the corresponding marker text NT is obtained as follows: "good me \ n is AB", when a marker is added to the last but one character of the intercepted text at this time, the corresponding marked text NT is obtained as follows: "good I is A \ nB". And sequentially judging whether each mark text NT after the addition of the marker is a substring of the connecting text NP, namely judging whether the mark text NT has the same text in the connecting text NP. And if a certain mark text NT is a substring of the connecting text, segmenting the mark text NT through the corresponding mark symbol to obtain a corresponding paragraph segmentation result R. In this embodiment, since the intercepted text T is a substring of the merged text P, and the merged text P is merged from the original recognition texts P1 and P2, there is inevitably a substring of the concatenated text NP of the tagged text NT. That is, in the present embodiment, there is a mark text NT: "good \ n i AB" is the connecting text NP: "hello \ n i am a substring of ABC". Therefore, after segmenting the tagged text NT according to the tag "\ n", the paragraph segmentation result R at this time can be obtained as follows: "good", "I am AB".

In an exemplary embodiment, if the number of text paragraphs is equal to the third threshold, the third segmenting unit D300 does not segment the truncated text, and directly uses the truncated text as the paragraph segmentation result. By way of example, take the original text P1 (i.e., "hello"), P2 (i.e., "I am ABC"), P3 (i.e., "ask you be"), merged text P (i.e., "hello i am ABC ask you be"), target dictionary D (i.e., { 'P1': 0, 1 ',' P2 ': 2, 6', 'P3': 7, 10) }). If the intercepted text T at this time is: "ABC" is obtained from the above embodiment, where the starting sequence number N1 of the truncated text T (i.e., "ABC") is 4, and the ending sequence number N2 is 6. Therefore, the start sequence number N1-4 and the end sequence number N2-6 of the clipped text T (i.e., "ABC") at this time are both located in the interval [2, 6 ]; that is, the beginning sequence number N1 of the clipped text T (i.e., "ABC") at this time is located in the interval of the original recognized text P2 in the target dictionary D, the ending sequence number N2 is also located in the interval of the original recognized text P2 in the target dictionary D, the number N of text paragraphs spanned by the clipped text T at this time is 1, that is, the clipped text T at this time is located only in the original recognized text P2. If the third threshold is 1, the intercepted text T at this time does not need to be segmented again, and the corresponding paragraph segmentation result R is the text corresponding to the intercepted text T at this time, that is, the paragraph segmentation result R at this time is: "ABC".

In summary, the present invention provides a text processing system, which effectively records paragraph information of an original recognized text by using a target dictionary, and accurately restores the paragraph information of the intercepted text according to a start sequence number and an end sequence number of the intercepted text; in the process of restoring paragraph information of the intercepted text, the final paragraph segmentation result is obtained by recording the number of the paragraphs spanned by the intercepted text and then automatically adopting different processing modes according to different spanned numbers of the paragraphs. When the system crosses different paragraph numbers, corresponding paragraph segmentation results can be obtained by different segmentation methods, and the paragraph segmentation is performed on the intercepted text by the segmentation method recorded in the system, so that the obtained paragraph segmentation results are not different from the original recognition text, and when repeated paragraphs or texts exist in the intercepted text, the system only completes one segmentation without confirming the segmentation results again according to the original recognition text after the segmentation is completed. Therefore, compared with the system for circularly judging the original paragraphs in the prior art, the system has the advantages of greatly improving the efficiency and high accuracy, and does not need to worry about the interference of unexpected situations such as multi-segment repetition of the intercepted text and the original recognized text to the segmentation. Meanwhile, the whole technical scheme of the system is simple and clear in thought, the number of communication steps is less than that of the prior art, so that the system can complete more technical functions in fewer steps and realize more technical effects.

An embodiment of the present application further provides a computer device, for example, a text processing device, where the computer device includes: memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method as described in fig. 1 when executing the computer program. Fig. 6 is a schematic diagram showing a structure of a text processing apparatus 1000, and referring to fig. 6, the text processing apparatus 1000 includes: a processor 1010, a memory 1020, a power source 1030, a display unit 1040, an input unit 1060.

The processor 1010 is a control center of the document processing apparatus 1000, connects respective components using various interfaces and lines, and performs various functions of the document processing apparatus 1000 by running or executing software programs and/or data stored in the memory 1020, thereby performing overall monitoring of the document processing apparatus 1000. In the embodiment of the present application, the processor 1010 executes the method described in fig. 1 when calling the computer program stored in the memory 1020. Alternatively, processor 1010 may include one or more processing units; preferably, the processor 1010 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. In some embodiments, the processor, memory, and/or memory may be implemented on a single chip, or in some embodiments, they may be implemented separately on separate chips. In some embodiments, the server may be an independent server, or may be a cloud server that provides basic cloud computing services such as cloud service, cloud database, cloud computing, cloud function, cloud storage, web service, cloud communication, middleware service, domain name service, security service, Content Delivery Network (CDN), big data and artificial intelligence platform, and the like.

The memory 1020 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, various applications, and the like; the stored data area may store data created according to the use of the text processing apparatus 1000, and the like. Further, the memory 1020 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The text processing device 1000 also includes a power supply 1030 (e.g., a battery) that provides power to the various components, which may be logically coupled to the processor 1010 via a power management system to manage charging, discharging, and power consumption via the power management system.

The display unit 1040 may be used to display information input by a user or information provided to the user, and various menus of the text processing apparatus 1000, and the like. The display unit 1040 may include a display panel 1050. The Display panel 1050 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The input unit 1060 may be used to receive information such as numbers or characters input by a user. The input unit 1060 may include a touch panel 1070 and other input devices 1080. The touch panel 1070, also referred to as a touch screen, may collect touch operations by a user (e.g., operations by a user on the touch panel 1070 or near the touch panel 1070 using a finger, a stylus, or any other suitable object or attachment).

Specifically, the touch panel 1070 can detect a touch operation of a user, detect signals generated by the touch operation, convert the signals into touch point coordinates, transmit the touch point coordinates to the processor 1010, and receive and execute a command transmitted from the processor 1010. In addition, the touch panel 1070 may be implemented using various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. Other input devices 1080 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, power on/off keys, etc.), a trackball, a mouse, a joystick, and the like.

Of course, the touch panel 1070 may cover the display panel 1050, and when the touch panel 1070 detects a touch operation on or near the touch panel 1070, the touch operation is transmitted to the processor 1010 to determine the type of the touch event, and then the processor 1010 provides a corresponding visual output on the display panel 1050 according to the type of the touch event. Although in fig. 6 the touch panel 1070 and the display panel 1050 are implemented as two separate components to implement the input and output functions of the text processing device 1000, in some embodiments the touch panel 1070 and the display panel 1050 may be integrated to implement the input and output functions of the text processing device 1000.

The text processing device 1000 may also include one or more sensors, such as pressure sensors, gravitational acceleration sensors, proximity light sensors, and the like. Of course, the text processing apparatus 1000 may also include other components such as a camera, etc., as desired in a particular application.

Embodiments of the present application also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method as described in fig. 1. The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

It will be appreciated by those skilled in the art that FIG. 6 is merely an example of a text processing device and is not intended to limit the device, which may include more or fewer components than illustrated, or some components may be combined, or different components. For convenience of description, the above parts are separately described as modules (or units) according to functional division. Of course, the functionality of the various modules (or units) may be implemented in the same one or more pieces of software or hardware when implementing the present application.

Those skilled in the art will appreciate that the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. The present application has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application, and it is understood that each flowchart illustration and/or block diagram block and combination of flowchart illustrations and/or block diagrams block and computer program instructions may be implemented by computer program instructions. These computer program instructions may be applied to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

According to the above description, the embodiment of the present application may acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

It should be understood that although the terms first, second, third, etc. may be used to describe preset ranges, etc. in embodiments of the present invention, these preset ranges should not be limited to these terms. These terms are only used to distinguish preset ranges from each other. For example, the first preset range may also be referred to as a second preset range, and similarly, the second preset range may also be referred to as the first preset range, without departing from the scope of the embodiments of the present invention.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

25页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:信息处理方法、装置、设备、可读存储介质及产品

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!