Text corresponding relation construction method and related equipment thereof

文档序号：272539 发布日期：2021-11-19 浏览：4次中文

阅读说明：本技术 一种文本对应关系构建方法及其相关设备 (Text corresponding relation construction method and related equipment thereof ) 是由龚笠杨晶生于 2021-08-19 设计创作，主要内容包括：本申请公开了一种文本对应关系构建方法及其相关设备,该方法包括：在获取到语音识别文本和该语音识别文本对应的标准语音文本之后,先根据该语音识别文本的至少一个第一切分点和该标准语音文本的至少一个第二切分点,确定至少一个切分点对；再利用该至少一个切分点对中的待使用点对,对该语音识别文本和该标准语音文本进行切分处理,得到至少一个文本对；最后,根据该至少一个文本对,确定该语音识别文本与该标准语音文本之间的文本对应关系,如此能够实现针对语音识别文本及其对应的标准语音文本之间的文本对应关系进行自动构建处理,从而能够有效地避免手动对齐方式造成的不良影响,从而有利于提高语音数据的处理效果。(The application discloses a text corresponding relation construction method and related equipment thereof, wherein the method comprises the following steps: after a voice recognition text and a standard voice text corresponding to the voice recognition text are obtained, determining at least one segmentation point pair according to at least one first segmentation point of the voice recognition text and at least one second segmentation point of the standard voice text; then, segmenting the voice recognition text and the standard voice text by utilizing the point pairs to be used in the at least one segmentation point pair to obtain at least one text pair; finally, according to the at least one text pair, determining a text corresponding relation between the voice recognition text and the standard voice text, so that automatic construction processing can be carried out on the text corresponding relation between the voice recognition text and the standard voice text corresponding to the voice recognition text, adverse effects caused by a manual alignment mode can be effectively avoided, and the processing effect of voice data can be improved.)

1. A text correspondence construction method is characterized by comprising the following steps:

after a voice recognition text and a standard voice text corresponding to the voice recognition text are obtained, determining at least one segmentation point pair according to at least one first segmentation point of the voice recognition text and at least one second segmentation point of the standard voice text; wherein the pair of split points comprises one of the first split points and one of the second split points;

segmenting the voice recognition text and the standard voice text by using a point pair to be used in the at least one segmentation point pair to obtain at least one text pair;

and determining a text corresponding relation between the voice recognition text and the standard voice text according to the at least one text pair.

2. The method of claim 1, wherein determining at least one segmentation point pair from at least one first segmentation point of the speech recognition text and at least one second segmentation point of the standard speech text comprises:

respectively determining a first text to be segmented and a second text to be segmented according to the voice recognition text and the standard voice text; determining at least one segmentation point pair according to at least one third segmentation point of the first text to be segmented and at least one fourth segmentation point of the second text to be segmented; wherein the pair of split points includes one of the third split points and one of the fourth split points;

the segmenting the speech recognition text and the standard speech text by using the point pairs to be used in the at least one segmentation point pair to obtain at least one text pair includes:

segmenting the first text to be segmented and the second text to be segmented by using the point pairs to be used in the at least one segmentation point pair to obtain at least one text pair;

determining a text correspondence between the speech recognition text and the standard speech text according to the at least one text pair, including:

updating the text corresponding relation between the voice recognition text and the standard voice text according to the at least one text pair;

the method further comprises the following steps:

and updating the first text to be segmented and the second text to be segmented according to the at least one text pair, continuously executing the step of determining at least one segmentation point pair according to at least one third segmentation point of the first text to be segmented and at least one fourth segmentation point of the second text to be segmented until a preset stop condition is reached.

3. The method according to claim 1 or 2, wherein the determination process of the point pair to be used comprises:

respectively determining the segmentation score of each segmentation point pair according to the to-be-segmented recognized text object, the to-be-segmented standard text object and each segmentation point pair;

and searching the segmentation point pairs meeting preset searching conditions from the at least one segmentation point pair according to the segmentation score of the at least one segmentation point pair to obtain the point pairs to be used.

4. The method of claim 3, wherein the at least one pair of points comprises a pair of points to be scored;

the process for determining the segmentation score of the point pair to be scored comprises the following steps:

carrying out segmentation processing on the to-be-segmented recognized text object and the to-be-segmented standard text object by using the to-be-scored point pairs to obtain at least one to-be-compared text pair;

and determining the segmentation score of the point pair to be scored according to the text comparison result of the at least one text pair to be compared.

5. The method according to claim 4, wherein the at least one text pair to be compared comprises a text segment pair to be used, and the text segment pair to be used comprises a recognition text segment to be used and a standard text segment to be used;

the text comparison result of the target text segment pair comprises a content comparison score between the identification text segment to be used and the standard text segment to be used and/or a length comparison score between the identification text segment to be used and the standard text segment to be used.

6. The method according to claim 5, wherein the determination process of the content comparison score between the to-be-used identification text segment and the to-be-used standard text segment comprises:

and determining a content comparison score between the identification text segment to be used and the standard text segment to be used according to the content coverage of the identification text segment to be used to the standard text segment to be used, the content coverage of the identification text object to be cut to the standard text segment to be used, the content coverage of the standard text segment to be used to the identification text segment to be used and the content coverage of the standard text object to be cut to the identification text segment to be used.

7. The method according to claim 5, wherein the determination process of the content comparison score between the to-be-used identification text segment and the to-be-used standard text segment comprises:

determining a content coverage score corresponding to the standard text segment to be used according to a ratio of the content coverage of the standard text segment to be used by the identification text segment to be used to the content coverage of the standard text segment to be used by the identification text object to be cut;

determining a content coverage score corresponding to the to-be-used identification text segment according to a ratio of the content coverage of the to-be-used identification text segment by the to-be-used standard text segment to the content coverage of the to-be-divided standard text object to the to-be-used identification text segment;

and determining a content comparison score between the recognition text segment to be used and the standard text segment to be used according to the product of the content coverage score corresponding to the standard text segment to be used and the content coverage score corresponding to the recognition text segment to be used.

8. The method according to claim 6 or 7, wherein the determining of the content coverage comprises:

respectively carrying out unit division processing on a first object and a second object according to an ith division mode to obtain an ith unit set of the first object and an ith unit set of the second object; the unit mode of the ith is divided by taking i vocabularies as a dividing unit; i is a positive integer, I is not more than I, I is a positive integer, and I represents the number of the division modes;

determining content coverage corresponding to the ith unit mode according to the intersection between the ith unit set of the first object and the ith unit set of the second object; wherein I is a positive integer, I is not more than I, I is a positive integer, and I represents the number of the division modes;

and determining the content coverage of the first object to the second object according to the average value between the content coverage corresponding to the 1 st unit mode and the content coverage corresponding to the I-th unit mode.

9. The method according to claim 8, wherein the determining process of the content coverage corresponding to the ith unit mode comprises:

carrying out vocabulary number statistics on the intersection between the ith unit set of the first object and the ith unit set of the second object to obtain the intersection vocabulary number corresponding to the ith unit mode;

carrying out vocabulary number statistics on the ith unit set of the second object to obtain the number of vocabularies to be compared corresponding to the ith unit mode;

and determining the content coverage corresponding to the ith unit mode according to the ratio of the number of the intersection vocabularies corresponding to the ith unit mode to the number of the vocabularies to be compared corresponding to the ith unit mode.

10. The method according to claim 5, wherein the determination process of the length comparison score between the to-be-used identification text segment and the to-be-used standard text segment comprises:

determining a first ratio to be used according to the ratio of the text length of the identification text segment to be used to the text length of the standard text segment to be used;

determining a second ratio to be used according to the ratio of the text length of the standard text segment to be used to the text length of the recognized text segment to be used;

and carrying out preset data processing on the first ratio to be used and the second ratio to be used to obtain a length comparison score between the identification text segment to be used and the standard text segment to be used.

11. The method according to claim 4, wherein the determining the segmentation score of the point pair to be scored according to the text comparison result of the at least one text pair to be compared comprises:

and determining the segmentation score of the point pair to be scored according to the text comparison result of the at least one text pair to be compared and the adjacent vocabulary comparison result of the point pair to be scored.

12. The method of claim 11, wherein the point pair to be scored comprises a first recognition cut point and a first standard cut point, and the neighboring vocabulary alignment result of the point pair to be scored is determined according to the content alignment score between at least one neighboring vocabulary of the first recognition cut point and at least one neighboring vocabulary of the first standard cut point.

13. The method of claim 12, wherein the at least one neighboring vocabulary includes at least one left neighboring vocabulary and at least one right neighboring vocabulary, and wherein the determining of the content alignment score between the at least one neighboring vocabulary of the first recognition cut point and the at least one neighboring vocabulary of the first criterion cut point comprises:

determining a left neighbor vocabulary comparison score based on the content coverage of the at least one left neighbor vocabulary of the first recognition cut point to the at least one left neighbor vocabulary of the first standard cut point, the content coverage of the at least one left neighbor vocabulary of the first standard cut point to the at least one left neighbor vocabulary of the first recognition cut point, the content coverage of the at least one neighbor vocabulary of the first standard cut point to the at least one left neighbor vocabulary of the first recognition cut point;

determining a right neighbor vocabulary comparison score based on the content coverage of the at least one right neighbor vocabulary of the first recognition cut point to the at least one right neighbor vocabulary of the first standard cut point, the content coverage of the at least one right neighbor vocabulary of the first standard cut point to the at least one right neighbor vocabulary of the first recognition cut point, the content coverage of the at least one neighbor vocabulary of the first standard cut point to the at least one right neighbor vocabulary of the first recognition cut point;

and determining a content comparison score between at least one adjacent word of the first identification segmentation point and at least one adjacent word of the first standard segmentation point according to an average value between the comparison score of the left adjacent word and the comparison score of the right adjacent word.

14. The method of claim 4, further comprising:

combining the at least one text pair to be compared according to a preset combination mode to obtain at least one combination to be processed; wherein, the combination to be processed comprises at least one text pair to be compared;

the determining the segmentation score of the point pair to be scored according to the text comparison result of the at least one text pair to be compared includes:

determining a combination score of the at least one combination to be processed according to a text comparison result of the at least one text pair to be compared;

and carrying out preset statistical analysis processing on the combined score of the at least one combination to be processed to obtain the segmentation score of the point pair to be scored.

15. The method according to claim 14, wherein when the to-be-processed combination includes a first to-be-processed text pair and a second to-be-processed text pair, the determining of the combined score of the to-be-processed combination comprises:

and determining a combined score of the combination to be processed according to the text comparison result of the first text pair to be processed and the text comparison result of the second text pair to be processed.

16. The method of claim 2, wherein the at least one text pair comprises a target text pair, and wherein the target text pair comprises a target recognition text segment and a target standard text segment;

the updating process of the text corresponding relation comprises the following steps:

establishing a corresponding relation between the target identification text segment and the target standard text segment;

and adding the corresponding relation between the target identification text segment and the target standard text segment to the text corresponding relation.

17. The method of claim 16, wherein adding the correspondence between the target recognized text segment and the target standard text segment to the text correspondence comprises:

if the corresponding relation between the first text to be segmented and the second text to be segmented exists in the text corresponding relation, deleting the corresponding relation between the first text to be segmented and the second text to be segmented from the text corresponding relation, and adding the corresponding relation between the target identification text segment and the target standard text segment to the text corresponding relation;

and if the corresponding relation between the first text to be segmented and the second text to be segmented does not exist in the text corresponding relation, adding the corresponding relation between the target identification text segment and the target standard text segment to the text corresponding relation.

18. The method of claim 2, wherein the at least one text pair comprises a target text pair, and wherein the target text pair comprises a target recognition text segment and a target standard text segment;

the updating process of the first text to be cut and the second text to be cut comprises the following steps:

and determining the first text to be segmented according to the target identification text segment, and determining a second text to be segmented according to the target standard text segment.

19. The method according to claim 1 or 2, wherein the pair of points to be used comprises a second identification cut point and a second standard cut point;

when the at least one text pair comprises a first text pair and a second text pair, the determining of the at least one text pair comprises:

segmenting the first text to be segmented by using the second identification segmentation point to obtain a front-segment identification text and a rear-segment identification text;

segmenting the second text to be segmented by using the second standard segmentation point to obtain a front section standard text and a rear section standard text;

determining the first text pair according to the front section identification text and the front section standard text;

and determining the second text pair according to the back segment identification text and the back segment standard text.

20. The method of claim 1, wherein determining a text correspondence between the speech recognition text and the standard speech text from the at least one text pair comprises:

combining the at least one text pair according to a preset combination mode to determine at least one candidate combination; wherein the candidate combination comprises at least one of the text pairs;

and determining the text corresponding relation between the speech recognition text and the standard speech text by utilizing the combination to be used in the at least one candidate combination.

21. The method of claim 1, further comprising:

acquiring a voice to be processed;

performing voice recognition processing on the voice to be processed to obtain the voice recognition text;

and presetting the voice recognition text by utilizing the text corresponding relation to obtain a to-be-used voice text corresponding to the to-be-processed voice.

22. A text correspondence relationship construction device, characterized by comprising:

the point pair determining unit is used for determining at least one segmentation point pair according to at least one first segmentation point of the voice recognition text and at least one second segmentation point of the standard voice text after the voice recognition text and the standard voice text corresponding to the voice recognition text are obtained; wherein the pair of split points comprises one of the first split points and one of the second split points;

the text segmentation unit is used for segmenting the voice recognition text and the standard voice text by using a point pair to be used in the at least one segmentation point pair to obtain at least one text pair;

and the relation determining unit is used for determining the text corresponding relation between the voice recognition text and the standard voice text according to the at least one text pair.

23. An apparatus, comprising a processor and a memory:

the memory is used for storing a computer program;

the processor is configured to perform the method of any one of claims 1-21 in accordance with the computer program.

24. A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store a computer program for performing the method of any of claims 1-21.

25. A computer program product, characterized in that the computer program product, when run on a terminal device, causes the terminal device to perform the method of any of claims 1-21.

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a text correspondence relationship construction method and related devices.

Background

For some speech processing scenarios, after performing speech recognition processing on one piece of speech data, in order to further improve the recognized text of the speech data, a relevant technician may first perform text correspondence construction processing on the recognized text of the speech data and a standard speech text of the speech data in a manual manner to obtain a text correspondence between the recognized text and the standard speech text, so that the text correspondence is used to represent a correspondence between at least one text segment in the recognized text and at least one text segment in the standard speech text; and then, according to the text corresponding relation, performing other processing operations (for example, text error correction and other operations) on the recognized text of the voice data.

However, the manual construction method described above has a defect, so that the text correspondence obtained based on the manual construction method also has a defect, and thus the improvement effect of the recognized text is poor, and the processing effect of the voice data is poor.

Disclosure of Invention

In order to solve the technical problem, the present application provides a text correspondence relationship construction method and related devices thereof, which can effectively avoid adverse effects caused by a manual construction mode, thereby being beneficial to improving the processing effect of voice data.

In order to achieve the above purpose, the technical solutions provided in the embodiments of the present application are as follows:

the embodiment of the application provides a text corresponding relation construction method, which comprises the following steps: after a voice recognition text and a standard voice text corresponding to the voice recognition text are obtained, determining at least one segmentation point pair according to at least one first segmentation point of the voice recognition text and at least one second segmentation point of the standard voice text; wherein the pair of split points comprises one of the first split points and one of the second split points; segmenting the voice recognition text and the standard voice text by using a point pair to be used in the at least one segmentation point pair to obtain at least one text pair; and determining a text corresponding relation between the voice recognition text and the standard voice text according to the at least one text pair.

The embodiment of the present application further provides a text correspondence relationship building apparatus, including: the point pair determining unit is used for determining at least one segmentation point pair according to at least one first segmentation point of the voice recognition text and at least one second segmentation point of the standard voice text after the voice recognition text and the standard voice text corresponding to the voice recognition text are obtained; wherein the pair of split points comprises one of the first split points and one of the second split points; the text segmentation unit is used for segmenting the voice recognition text and the standard voice text by using a point pair to be used in the at least one segmentation point pair to obtain at least one text pair; and the relation determining unit is used for determining the text corresponding relation between the voice recognition text and the standard voice text according to the at least one text pair.

An embodiment of the present application further provides an apparatus, where the apparatus includes a processor and a memory:

the memory is used for storing a computer program;

the processor is configured to execute any implementation manner of the text correspondence relationship construction method provided by the embodiment of the application according to the computer program.

The embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium is used for storing a computer program, and the computer program is used for executing any implementation manner of the text correspondence relationship construction method provided in the embodiment of the present application.

The embodiment of the present application further provides a computer program product, and when the computer program product runs on a terminal device, the terminal device is enabled to execute any implementation manner of the text correspondence relationship construction method provided in the embodiment of the present application.

Compared with the prior art, the embodiment of the application has at least the following advantages:

in the technical scheme provided by the embodiment of the application, after a voice recognition text and a standard voice text corresponding to the voice recognition text are obtained, at least one segmentation point pair is determined according to at least one first segmentation point of the voice recognition text and at least one second segmentation point of the standard voice text, so that each segmentation point pair comprises a first segmentation point and a second segmentation point; segmenting the voice recognition text and the standard voice text by utilizing the point pairs to be used in the at least one segmentation point pair to obtain at least one text pair so that the at least one text pair can accurately represent the corresponding relation between at least one first segmentation text segment in the voice recognition text and at least one second segmentation text segment in the standard voice text; finally, according to the at least one text pair, determining a text corresponding relation between the voice recognition text and the standard voice text, so that automatic construction processing can be carried out on the text corresponding relation between the voice recognition text and the standard voice text corresponding to the voice recognition text, adverse effects caused by a manual alignment mode can be effectively avoided, and the processing effect of voice data can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of a text correspondence relationship construction method provided in an embodiment of the present application;

fig. 2 is a schematic diagram of a text correspondence relationship building process provided in an embodiment of the present application;

fig. 3 is a schematic structural diagram of a text correspondence relationship constructing apparatus according to an embodiment of the present application.

Detailed Description

In the research on the voice processing process, the inventor finds that after a recognition text of one piece of voice data is acquired, in order to further improve the recognition text of the voice data, the following processing can be performed on the recognition text: firstly, constructing a text corresponding relation between an identification text and a standard voice text of the voice data, so that the text corresponding relation is used for representing the corresponding relation between at least one text segment in the identification text and at least one text segment in the standard voice text; and then, according to the text corresponding relation, performing other processing operations (for example, text error correction and other operations) on the recognized text of the voice data. However, the manual construction method described above has drawbacks (e.g., the construction takes a long time, subjective errors are prone to occurring, etc.), so that the text correspondence obtained based on the manual construction method also has drawbacks (e.g., inaccuracy, etc.), and thus the improvement effect of text recognition is poor, and the processing effect of speech data is poor.

Based on the above findings, in order to solve the technical problems of the background art, an embodiment of the present application provides a text correspondence relationship construction method, including: after a voice recognition text and a standard voice text corresponding to the voice recognition text are obtained, determining at least one segmentation point pair according to at least one first segmentation point of the voice recognition text and at least one second segmentation point of the standard voice text, so that each segmentation point pair comprises a first segmentation point and a second segmentation point; segmenting the voice recognition text and the standard voice text by utilizing the point pairs to be used in the at least one segmentation point pair to obtain at least one text pair so that the at least one text pair can accurately represent the corresponding relation between at least one first segmentation text segment in the voice recognition text and at least one second segmentation text segment in the standard voice text; finally, according to the at least one text pair, determining a text corresponding relation between the voice recognition text and the standard voice text, so that automatic construction processing can be carried out on the text corresponding relation between the voice recognition text and the standard voice text corresponding to the voice recognition text, adverse effects caused by a manual alignment mode can be effectively avoided, and the processing effect of voice data can be improved.

In addition, the embodiment of the present application does not limit the execution subject of the text correspondence relationship construction method, and for example, the text correspondence relationship construction method provided in the embodiment of the present application may be applied to a data processing device such as a terminal device or a server. The terminal device may be a voice processing terminal, a smart phone, a computer, a Personal Digital Assistant (PDA), a tablet computer, or the like. The server may be a stand-alone server, a cluster server, or a cloud server.

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In order to facilitate understanding of the present application, a text correspondence relationship construction method provided in the embodiments of the present application is described below with reference to the accompanying drawings.

Referring to fig. 1, the figure is a flowchart of a text correspondence relationship construction method provided in an embodiment of the present application.

The method for establishing the text corresponding relation provided by the embodiment of the application comprises the following steps of S1-S4:

s1: and acquiring a voice recognition text and a standard voice text corresponding to the voice recognition text.

The "voice recognition text" is obtained after voice recognition processing is performed on a piece of voice data (e.g., a voice to be processed below), so that the "voice recognition text" can represent voice information carried by the voice data; the embodiment of the present application does not limit the manner of acquiring the "speech recognition text", and may be implemented, for example, by performing speech recognition processing on one piece of speech data.

The standard voice text corresponding to the voice recognition text is used for representing the voice information actually carried by the voice data corresponding to the voice recognition text; moreover, the embodiment of the present application does not limit the manner of acquiring the "standard speech text corresponding to the speech recognition text", for example, when the "speech recognition text" represents a recognition text of speech data, the labeled text of the speech data may be determined as the "standard speech text corresponding to the speech recognition text". Here, the "recognition text of the voice data" is used to indicate a result of the voice recognition processing performed on the voice data. The "annotation text of the voice data" refers to a text obtained by performing voice information annotation on the voice data by a preset annotation manner (e.g., a manual annotation manner).

In addition, the embodiment of S1 is not limited in the examples of the present application, and may be implemented by using the acquisition process shown in S11 to S13 below, for example.

S2: at least one segmentation point pair is determined according to at least one first segmentation point of the speech recognition text and at least one second segmentation point of the standard speech text, so that each segmentation point pair comprises a first segmentation point and a second segmentation point.

Wherein, the first cut point is used for representing a cut position which can be used for cutting processing aiming at the voice recognition text; moreover, the determination manner of the "at least one first cut point" is not limited in the embodiments of the present application, and for example, it may be determined according to the position of each punctuation mark in the speech recognition text and/or the position of each space (for example, when the speech recognition text includes at least one punctuation mark and at least one space, the position of each punctuation mark in the speech recognition text and the position of each space may be determined as the first cut point).

The "second segmentation point" is used to indicate a segmentation position that can be used for performing segmentation processing on the standard speech text; moreover, the determination manner of the "at least one second segmentation point" is not limited in the embodiments of the present application, for example, it may be determined according to the position of each punctuation mark in the standard speech text (for example, when the standard speech text includes at least one punctuation mark, the position of each punctuation mark in the standard speech text may be determined as the second segmentation point).

The "pair of split points" is composed of a first split point and a second split point such that the "pair of split points" is used to indicate a split position for the speech recognition text and a split position for the standard speech text.

In addition, the embodiment of the present application does not limit the determination process of the "at least one split point pair", for example, each first split point of the speech recognition text may be randomly combined with each second split point of the standard speech text to obtain at least one split point pair. For another example, each first segmentation point of the speech recognition text and each second segmentation point of the standard speech text may be collocated according to a preset collocation rule to obtain at least one segmentation point pair. The "preset collocation rule" may be preset.

Based on the related content of S2, after the speech recognition text and the standard speech text corresponding to the speech recognition text are obtained, at least one first segmentation point of the speech recognition text and at least one second segmentation point of the standard speech text may be combined to obtain at least one segmentation point pair, so that each segmentation point pair includes one first segmentation point and one second segmentation point, and each segmentation point pair can indicate a segmentation position for the speech recognition text and a segmentation position for the standard speech text, so that a segmentation position for performing segmentation processing on the speech recognition text and the standard speech text can be subsequently screened from the segmentation point pairs.

S3: and segmenting the voice recognition text and the standard voice text by utilizing the point pairs to be used in the at least one segmentation point pair to obtain at least one text pair.

The "point pair to be used" is used to indicate a segmentation position to be used when performing segmentation processing on the speech recognition text and the standard speech text.

In addition, the determination process of the "point pair to be used" is not limited in the embodiments of the present application, and for example, the determination process of the "point pair to be used" may include: randomly screening a segmentation point pair from at least one segmentation point pair, and determining the segmentation point pair obtained by screening as a point pair to be used. As another example, the determination process of the to-be-used point pair "may include: and searching for a segmentation point pair meeting preset use conditions from at least one segmentation point pair, and determining the segmentation point pair meeting the preset use conditions as a point pair to be used.

In addition, the "preset use condition" may be preset, and the embodiment of the present application does not limit the "preset use condition," for example, the "preset use condition" may specifically be: the segmentation score is the greatest (i.e., one segmentation point pair having the greatest segmentation score is found from at least one segmentation point pair). For another example, the "preset use condition" may specifically be: the segmentation score reaches a preset score threshold (i.e., one segmentation point pair reaching the preset score threshold is searched for from at least one segmentation point pair). For example, the "preset use condition" may specifically be: the segmentation score is the largest and the segmentation score reaches a preset score threshold (i.e., one segmentation point pair having the largest segmentation score and the largest segmentation score reaching the preset score threshold is searched for from at least one segmentation point pair). It should be noted that the relevant content of the "score split" is referred to the relevant content below.

The "at least one text pair" is obtained by combining at least one first cut text segment of the speech recognition text and at least one second cut text segment of the standard speech text, so that each text pair comprises one first cut text segment and one second cut text segment. The "first cut text segment" refers to a text segment obtained after a cutting process is performed on a speech recognition text. The "second segmented text segment" refers to a text segment obtained after a segmentation process is performed on a standard speech text.

In addition, the embodiment of the present application does not limit the obtaining manner of the "at least one first cut text segment of the speech recognition text and the at least one second cut text segment of the standard speech text", for example, when the "point to be used" includes a third recognition cut point and a third standard cut point, the speech recognition text is cut according to the third recognition cut point to obtain the at least one first cut text segment of the speech recognition text, and the standard speech text is cut according to the third standard cut point to obtain the at least one second cut text segment of the standard speech text.

Based on the related content of S3, after the "at least one split point pair" is obtained, a split point may be selected from the at least one split point pair to serve as a to-be-used point pair; then, segmenting the voice recognition text and the standard voice text by utilizing the point pair to be used to obtain at least one first segmentation text segment of the voice recognition text and at least one second segmentation text segment of the standard voice text; and finally, combining the at least one first cut text segment and the at least one second cut text segment to obtain at least one text pair, so that each text pair comprises one first cut text segment and one second cut text segment, and the 'at least one text pair' can represent the corresponding relation between the at least one first cut text segment in the voice recognition text and the at least one second cut text segment in the standard voice text.

S4: and determining a text corresponding relation between the voice recognition text and the standard voice text according to at least one text pair.

The "text correspondence between the speech recognition text and the standard speech text" is used to describe a correspondence between text contents (e.g., a text segment to be segmented, etc.) in the speech recognition text and text contents (e.g., a text segment, a sentence, etc.) in the standard speech text.

Based on the related contents of S1 to S4, it can be known that, with respect to the text correspondence relationship construction method provided in the embodiment of the present application, after a speech recognition text and a standard speech text corresponding to the speech recognition text are obtained, at least one split point pair is determined according to at least one first split point of the speech recognition text and at least one second split point of the standard speech text, so that each split point pair includes one first split point and one second split point; segmenting the voice recognition text and the standard voice text by utilizing the point pairs to be used in the at least one segmentation point pair to obtain at least one text pair so that the at least one text pair can accurately represent the corresponding relation between at least one first segmentation text segment in the voice recognition text and at least one second segmentation text segment in the standard voice text; finally, according to the at least one text pair, determining a text corresponding relation between the voice recognition text and the standard voice text, so that automatic construction processing can be carried out on the text corresponding relation between the voice recognition text and the standard voice text corresponding to the voice recognition text, adverse effects caused by a manual alignment mode can be effectively avoided, and the processing effect of voice data can be improved.

In addition, the text correspondence construction method provided by the embodiment of the present application can be applied to various application scenarios (for example, application scenarios for processing voice data, etc.) that require content alignment processing for two texts. For ease of understanding, the following description is made with reference to examples.

As an example, when the text correspondence relationship construction method provided in the embodiment of the present application is applied to speech data processing, S1 may specifically include S11-S13:

s11: and acquiring the voice to be processed.

Wherein, the "voice to be processed" refers to voice data collected by a voice collecting device (e.g., a sound collecting device); moreover, the embodiment of the present application is not limited to "pending voice", and for example, may refer to any voice data. For another example, to improve the real-time performance of voice data processing, the voice to be processed may refer to voice data collected by the sound pickup apparatus in real time.

S12: and performing voice recognition processing on the voice to be processed to obtain the voice recognition text.

In the embodiment of the application, after the voice to be processed is obtained, the voice to be processed can be subjected to voice recognition processing to obtain a recognition text of the voice to be processed, so that the recognition text can represent voice information carried by the voice to be processed; and determining the recognition text of the speech to be processed as the "speech recognition text" so as to perform text content alignment processing on the recognition text of the speech to be processed and the speech annotation text of the speech to be processed by the steps shown in S2-S4.

S13: and determining a standard voice text corresponding to the voice recognition text according to the voice marking text of the voice to be processed.

The "voice labeling text of the voice to be processed" refers to a text obtained by performing voice information labeling on the voice to be processed in a preset labeling manner (e.g., a manual labeling manner).

Based on the relevant contents of S11 to S13, for some application scenarios, after the to-be-processed speech is acquired, the to-be-processed speech may be subjected to speech recognition processing to obtain a speech recognition text, and the speech tagging text of the to-be-processed speech is determined to determine the standard speech text corresponding to the speech recognition text, so that the text correspondence between the recognition text of the to-be-processed speech and the speech tagging text of the to-be-processed speech can be determined by means of the construction process of "text correspondence between speech recognition text and standard speech text", so that the text content alignment processing process performed on the recognition text of the to-be-processed speech and the speech tagging text of the to-be-processed speech can be implemented.

In addition, in order to further improve the recognized text of the voice data, the above-mentioned "text correspondence between the recognized text of the voice to be processed and the voice markup text of the voice to be processed" may be used to perform other processing on the recognized text of the voice to be processed. Based on this, the embodiment of the present application further provides another possible implementation manner of the text correspondence relationship construction method, in this implementation manner, in addition to the above-mentioned S11-S13 and S2-S4, the text correspondence relationship construction method further includes S5:

s5: and presetting the voice recognition text by using the text corresponding relation between the voice recognition text and the standard voice text to obtain the voice text to be used corresponding to the voice to be processed.

The "preset processing" may be set according to an application scenario; for example, the "preset processing" may include error correction processing.

The 'to-be-used voice text corresponding to the to-be-processed voice' is obtained after the recognition text of the to-be-processed voice is subjected to preset processing; and the to-be-used voice text corresponding to the to-be-processed voice can better represent the voice information carried by the to-be-processed voice.

Based on the related contents of the above S11-S13 and S2-S5, for some application scenarios, the text correspondence between the recognition text of the speech to be processed and the voice annotation text of the speech to be processed may be determined by first using the recognition text of the speech to be processed and the voice annotation text of the speech to be processed; and then, by means of the text corresponding relation between the recognition text of the voice to be processed and the voice marking text of the voice to be processed, the voice recognition text is subjected to preset processing to obtain the voice text to be used corresponding to the voice to be processed, so that the voice text to be used can better represent the voice information carried by the voice to be processed, and the improvement of the recognition text of the voice data is facilitated.

In addition, in order to further improve the accuracy of the "text correspondence", the text correspondence between the speech recognition text and the standard speech text may be constructed in an iterative update manner. Based on this, the embodiment of the present application further provides another possible implementation manner of the text correspondence relationship construction method, and in this implementation manner, the text correspondence relationship construction method includes steps 11 to 18:

step 11: and acquiring a voice recognition text and a standard voice text corresponding to the voice recognition text.

It should be noted that, the relevant content of step 11 can be referred to as the relevant content of S1 above.

Step 12: and respectively determining a first text to be segmented and a second text to be segmented according to the voice recognition text and the standard voice text.

The first text to be segmented is used for representing text data which needs to be segmented in the construction process of each pair of corresponding relations and is related to the voice recognition text; and may specifically include: in the first round of corresponding relation construction process, the 'first text to be segmented' refers to the 'speech recognition text'; however, in the non-first-round correspondence construction process, the "first text to be segmented" refers to a part of text content in the "speech recognition text".

The "second text to be segmented" is used for representing text data which needs to be segmented in the process of constructing the corresponding relationship of each wheel and is related to the standard voice text; and may specifically include: in the first round of corresponding relation construction process, the "second text to be divided" refers to the "standard voice text" mentioned above; however, in the non-first round of correspondence construction process, the "second text to be divided" refers to a part of text content in the "standard speech text".

Based on the above-mentioned related content of step 12, for the iterative update process for the text correspondence relationship, after the speech recognition text and the standard speech text corresponding to the speech recognition text are obtained, the speech recognition text and the standard speech text may be utilized to initialize a first text to be segmented and a second text to be segmented (for example, the speech recognition text is determined as the first text to be segmented, and the standard speech text is determined as the second text to be segmented), so that the first round of iterative updating process aiming at the corresponding relation of the texts can be realized on the basis of the initialized first text to be segmented and the second text to be segmented, therefore, the aim of continuously updating and optimizing the text corresponding relation between the speech recognition text and the standard speech text can be fulfilled subsequently by means of a multi-round iterative updating process aiming at the text corresponding relation in an iterative manner.

Step 13: and determining at least one segmentation point pair according to at least one third segmentation point of the first text to be segmented and at least one fourth segmentation point of the second text to be segmented, so that each segmentation point pair comprises one third segmentation point and one fourth segmentation point.

The "third segmentation point" is used to represent a segmentation position that can be used for performing segmentation processing on the first text to be segmented; moreover, the embodiment of the present application is not limited to the determination manner of the "at least one third segmentation point", for example, it may be determined according to the position of each punctuation mark in the first text to be segmented and/or the position of each space (for example, when the first text to be segmented includes at least one punctuation mark and at least one space, the position of each punctuation mark in the first text to be segmented and the position of each space may be determined as the third segmentation point).

The "fourth cutting point" is used for indicating a cutting position which can be used for performing cutting processing on the second text to be cut; moreover, the determination manner of the "at least one fourth dividing point" is not limited in the embodiments of the present application, for example, it may be determined according to the position of each punctuation mark in the second text to be divided and/or the position of each space (for example, when the second text to be divided includes at least one punctuation mark and at least one space, the position of each punctuation mark in the second text to be divided and the position of each space may be determined as the fourth dividing point).

The 'splitting point pair' is composed of a third splitting point and a fourth splitting point, so that the 'splitting point pair' is used for showing the splitting position of the first text to be split and the splitting position of the second text to be split.

In addition, the determination process of the "at least one split point pair" is not limited in the embodiment of the present application, for example, each third split point of the first text to be split and each fourth split point of the second text to be split may be randomly combined to obtain at least one split point pair. For another example, each third segmentation point of the first text to be segmented and each fourth segmentation point of the second text to be segmented may be collocated according to a preset collocation rule to obtain at least one segmentation point pair. The "preset collocation rule" may be preset.

Based on the related content in step 13, for the current round of iterative update process for the corresponding relationship between the texts, after the first to-be-segmented text and the second to-be-segmented text corresponding to the first to-be-segmented text are obtained, at least one third segmentation point of the first to-be-segmented text and at least one fourth segmentation point of the second to-be-segmented text may be combined to obtain at least one segmentation point pair, so that each segmentation point pair includes one third segmentation point and one fourth segmentation point, and thus each segmentation point pair can indicate the segmentation position for the first to-be-segmented text and the segmentation position for the second to-be-segmented text, so that the segmentation positions for performing segmentation processing on the first to-be-segmented text and the second to-be-segmented text can be screened out from the segmentation point pairs in the subsequent round.

Step 14: and segmenting the first text to be segmented and the second text to be segmented by using the point pairs to be used in the at least one segmentation point pair to obtain at least one text pair.

It should be noted that, the relevant content in the step 14 is similar to the relevant content in the above S3, and only the "speech recognition text" in the relevant content in the above S3 needs to be replaced by the "first text to be cut" and the "standard speech text" needs to be replaced by the "second text to be cut".

Step 15: and updating the text corresponding relation between the voice recognition text and the standard voice text according to at least one text pair.

In the embodiment of the present application, for the current round of iterative update process for the text correspondence, after at least one text pair is obtained, by means of the at least one text pair, for the text correspondence between the speech recognition text and the standard speech text, so that the text correspondence between the speech recognition text and the standard speech text can record at least one correspondence represented in the "at least one text pair", so that the text correspondence between the speech recognition text and the standard speech text can more accurately represent the correspondence between the text content carried by the speech recognition text and the text content carried by the standard speech text, this is advantageous for improving the accuracy of the text correspondence between the speech recognition text and the standard speech text.

Step 16: judging whether a preset stop condition is reached, if so, executing a step 18; if not, go to step 17.

The preset stopping condition is a preset condition for judging whether to finish the iterative updating process aiming at the corresponding relation of the text or not; the preset stop condition is not limited in the embodiment of the present application, and for example, a text segment composed of one sentence may exist in each text pair. For ease of understanding, the following examples are incorporated below.

As an example, when the above-mentioned "first text to be segmented" isThe above-mentioned "second text to be divided" isAnd the above-mentioned "at least one text pair" includes a first text pair (A, B) and a second text pairWhen A, B is determined,And both include at least two sentences, it may be determined that the preset stop condition is not reached; if it is determined that at least one text segment consisting of one sentence exists in A and B (for example, A is a single sentence; or B is a single sentence; or A and B are both single sentences), and it is determined thatAndthere is at least one text segment consisting of one sentence (e.g.,is a single sentence; alternatively, the first and second electrodes may be,is a single sentence; alternatively, the first and second electrodes may be,andboth are single sentences), it may be determined that a preset stop condition is reached, so the iterative update process may be ended, and the "text correspondence between the speech recognition text and the standard speech text" obtained in the current round of update is stored, so that subsequent operations (e.g., "preset process" above) can be performed using the "text correspondence between the speech recognition text and the standard speech text".

It should be noted that the embodiment of the present application is not limited to the implementation of step 16, for example, if step 16 is executed after step 15 is executed, step 16 may specifically include: judging whether a preset stop condition is reached, if so, executing a step 18; if not, go to step 17. For another example, if step 16 is executed before step 15 is executed, step 16 may specifically include: judging whether a preset stop condition is reached, if so, executing the step 15 and the step 18 in sequence; if not, step 15 and step 17 are executed in sequence.

And step 17: and updating the first text to be cut and the second text to be cut according to the at least one text pair, and returning to execute the step 13.

The embodiment of the present application is not limited to the implementation of "updating" in step 17, for example, when the "at least one text pair" includes a first text pair (a, B) and a second text pairWhen the text to be segmented is updated, the first text pair (a, B) and the second text to be segmented can be utilized to update (for example, a is determined as the first text to be segmented, and B is determined as the second text to be segmented), so that a new round of iterative updating process for the text corresponding relation can be performed for a and B in the following; alternatively, a first text pair may be utilizedUpdating the first text to be cut and the second text to be cut (e.g., will beDetermining as the first text to be segmented, and determiningDetermined as the second text to be split) so that the following can be directed toAndand carrying out a new iteration updating process aiming at the corresponding relation of the text.

In some cases, in order to further improve the construction effect (e.g., construction efficiency or construction accuracy) of the text correspondence, the first text to be segmented and the second text to be segmented may be updated simultaneously by using a plurality of text pairs, so that the text pairs can be segmented later. Based on this, when the above-mentioned "at least one text pair" includes the first text pair (A, B) and the second text pairIn the process, the first text pair (a, B) may be used to update the first text to be cut and the second text to be cut (for example, a is determined as the first text to be cut, and B is determined as the second text to be cut), so as to obtain a first group of texts to be cut, and the first text pair is used to update the first text to be cut and the second text to be cutUpdating the first text to be cut and the second text to be cut (e.g., will beDetermining as the first text to be segmented, and determiningThe second text to be divided is determined), a second group of texts to be divided is obtained, so that the parallel processing can be performed on the multiple groups of texts to be divided by means of the step 13 and the subsequent steps, which is favorable for improving the updating effect (such as updating efficiency) of a new iteration updating process aiming at the corresponding relation of the texts, and is favorable for improving the construction effect (such as construction efficiency or construction accuracy) of the corresponding relation of the texts

Based on the related content in step 17, after it is determined that the preset stop condition is not reached, it may be determined that the iterative update process for the text correspondence is still not completed, so that the first text to be segmented and the second text to be segmented may be updated according to at least one text pair, so that step 13 and subsequent steps may be continuously performed based on the updated first text to be segmented and the updated second text to be segmented, so as to implement a new round of iterative update process for the text correspondence.

Step 18: the "iterative update process for the text correspondence" is ended.

In the embodiment of the present application, after it is determined that the preset stop condition is reached, it may be determined that the iterative update process for the text correspondence may be ended, so that the "iterative update process for the text correspondence" may be ended, so that the "text correspondence between the speech recognition text and the standard speech text" updated in the current round of iterative update process for the text correspondence may be subsequently used for saving or using.

Based on the above-mentioned relevant contents of steps 11 to 18, in some cases, the "text correspondence between the speech recognition text and the standard speech text" may be constructed by means of an iterative update process for the text correspondence, so that the constructed "text correspondence between the speech recognition text and the standard speech text" can represent correspondence existing between the speech recognition text and the standard speech text at a granularity of sentences as much as possible, which is beneficial to improving the accuracy of the "text correspondence between the speech recognition text and the standard speech text".

In addition, for the above "iterative update process for text correspondences", in order to further improve the construction effect (e.g., construction accuracy) of text correspondences, the embodiment of the present application further provides another possible implementation manner of "preset stop condition", which may specifically include at least one of the following two conditions; and the two conditions are specifically as follows:

condition 1: for the above-mentioned "at least one text pair", a text segment composed of one sentence exists in each text pair. It should be noted that the relevant content of condition 1 can be referred to as step 16 above.

Condition 2: when the split point pairs meeting the preset searching condition are searched from at least one split point pair, no split point pair meeting the preset searching condition exists in the 'at least one split point pair'. It should be noted that the relevant content of the "preset search condition" can be referred to the relevant content of the "preset search condition" in step 24 below.

Based on the two conditions, when the "preset stop condition" includes the condition 1 and the condition 2, if it is determined that the condition 1 is satisfied, it may be determined that the "preset stop condition" is satisfied; if it is determined that condition 2 is satisfied, it may be determined that the above-described "preset stop condition" is satisfied.

In order to facilitate understanding of the above-described "preset stop condition", the following description is made with reference to an example.

As an example, when the preset stop condition includes the above-described "condition 1" and the above-described "condition 2", the text correspondence relationship construction method includes steps 21 to 30:

step 21: and acquiring a voice recognition text and a standard voice text corresponding to the voice recognition text.

Step 22: and respectively determining a first text to be segmented and a second text to be segmented according to the voice recognition text and the standard voice text.

Step 23: and determining at least one segmentation point pair according to at least one third segmentation point of the first text to be segmented and at least one fourth segmentation point of the second text to be segmented, so that each segmentation point pair comprises one third segmentation point and one fourth segmentation point.

It should be noted that, for the related contents of the above steps 21 to 23, refer to the above steps 11 to 13, respectively.

Step 24: and searching the segmentation point pairs meeting the preset searching condition from at least one segmentation point pair to obtain a point pair searching result.

The "preset search condition" may be preset, and the embodiment of the present application does not limit the "preset search condition," for example, the "preset search condition" may specifically be: the segmentation score reaches a preset score threshold (i.e., one segmentation point pair reaching the preset score threshold is searched for from at least one segmentation point pair). For another example, the "preset search condition" may specifically be: the segmentation score is the largest and the segmentation score reaches a preset score threshold (i.e., one segmentation point pair having the largest segmentation score and the largest segmentation score reaching the preset score threshold is searched for from at least one segmentation point pair). Note that, the relevant content of the "score of segmentation" is referred to below.

The "point pair search result" is used to indicate whether or not there is a split point pair satisfying a preset search condition in at least one split point pair, and what the split point pair satisfying the preset search condition in at least one split point pair is.

Step 25: determining whether at least one segmentation point pair has a segmentation point pair meeting a preset search condition according to a point pair search result; if yes, go to step 26; if not, go to step 30.

In the embodiment of the application, after the point pair search result is obtained, if the point pair search result indicates that at least one segmentation point pair has a segmentation point pair meeting a preset search condition, the point pair to be used may be determined by referring to the 'segmentation point pair meeting the preset search condition', so that subsequent text segmentation processing and text corresponding relation updating processing can be performed based on the 'point pair to be used'; if the point pair search result indicates that at least one segmentation point pair does not have a segmentation point pair meeting the preset search condition, it can be determined that an available segmentation point pair cannot be determined in the current iteration updating process aiming at the corresponding relation of the text, so that the iteration updating process aiming at the corresponding relation of the text can be determined to be ended.

Step 26: and determining a point pair to be used according to the point pair searching result, and segmenting the first text to be segmented and the second text to be segmented by using the point pair to be used to obtain at least one text pair.

The relevant content of the "to-be-cut text and the second to-be-cut text are cut by using the to-be-used point pairs to obtain the relevant content of at least one text pair" please refer to the relevant content in step 14 above.

In addition, the embodiment of the present application does not limit the determination process of the "point pair to be used" in step 26, for example, if the "point pair search result" indicates that there are a plurality of split point pairs satisfying the preset search condition in at least one split point pair, one split point pair (for example, the split point pair having the largest split score) may be selected from the plurality of split point pairs satisfying the preset search condition as the point pair to be used. For another example, if the "point pair search result" indicates that at least one segmentation point pair satisfying the preset search condition exists in the at least one segmentation point pair, the "segmentation point satisfying the preset search condition" may be directly determined as the point pair to be used.

Based on the related content in step 26, for the current iteration update process for the text correspondence, after determining that the "point pair search result" indicates that at least one segmentation point pair satisfying the preset search condition exists in at least one segmentation point pair, a point pair to be used may be first searched for from the "at least one segmentation point pair satisfying the preset search condition"; and then, segmenting the first text to be segmented and the second text to be segmented by utilizing the point pairs to be used to obtain at least one text pair, so that the corresponding relation of the texts can be updated based on the at least one text pair in the following process.

Step 27: and updating the text corresponding relation between the voice recognition text and the standard voice text according to at least one text pair.

It should be noted that the relevant content of step 27 refers to the relevant content of step 15 above.

Step 28: judging whether the condition 1 is reached, if so, executing the step 30; if not, go to step 29.

It should be noted that, the relevant content of step 28 is similar to the relevant content of step 16, and it is only necessary to replace "preset stop condition" with "condition 1", "step 18" with "step 30", "step 17" with "step 29", "step 15" with "step 27", and "step 16" with "step 28" in the relevant content of step 16.

Step 29: and updating the first text to be cut and the second text to be cut according to the at least one text pair, and returning to execute the step 23.

Step 30: the "iterative update process for the text correspondence" is ended.

It should be noted that, for the relevant contents of steps 29 to 30, refer to the relevant contents of steps 17 to 18 above, respectively.

Based on the related contents of the foregoing steps 21 to 28, in some cases, the iterative update process for the text correspondence may construct "a text correspondence between the speech recognition text and the standard speech text" according to the foregoing condition 1 and/or condition 2, so that the constructed "text correspondence between the speech recognition text and the standard speech text" can better represent correspondence existing between the speech recognition text and the standard speech text in the granularity of sentences, which is beneficial to improving the accuracy of the "text correspondence between the speech recognition text and the standard speech text".

In addition, in order to further improve the accuracy of the text correspondence, the embodiment of the present application further provides a possible implementation manner for determining the "point pair to be used", which may specifically include steps 31 to 32:

step 31: and respectively determining the segmentation score of each segmentation point pair according to the identification text object to be segmented, the standard text object to be segmented and each segmentation point pair in the 'at least one segmentation point pair' above.

The text object to be segmented is used for representing text data which needs to be segmented and is related to the voice recognition text; moreover, the text object to be segmented is not limited in the embodiment of the application, and may be, for example, the above "speech recognition text" or the above "first text to be segmented".

The standard text object to be cut is used for representing text data which needs to be cut and is related to the standard voice text; in addition, the standard text object to be divided is not limited in the embodiment of the application, for example, the standard text object to be divided may be the above "standard voice text" or the above "second text to be divided".

"segmentation score" is used to indicate the likelihood that a segmentation point pair is used for subsequent segmentation processing; the embodiment of the present application does not limit the determination process of the "segmentation score".

Step 32: and searching the segmentation point pairs meeting the preset searching condition from the at least one segmentation point pair according to the segmentation score of the at least one segmentation point pair to obtain the point pairs to be used.

The relevant content of the "preset search condition" may refer to the relevant content of the "preset search condition" in step 24 above.

In addition, the embodiment of the present application does not limit the determination process of the "point pair to be used" in step 32, and for example, the determination process of the "point pair to be used" shown in step 26 above may be adopted to perform.

Based on the related contents of the above steps 31 to 32, after at least one segmentation point pair is obtained (i.e., after the above S2 is executed or after the above step 23 is executed), the segmentation score of each segmentation point pair in the at least one segmentation point pair may be calculated first; and searching the segmentation point pairs meeting the preset searching condition from the at least one segmentation point pair by referring to the segmentation scores of the segmentation point pairs to obtain the point pairs to be used, so that the point pairs to be used can more accurately represent the most reasonable segmentation position when segmentation processing is carried out on the text object to be segmented and the standard text object to be segmented, and the accuracy of the text corresponding relation between the speech recognition text and the standard speech text is favorably improved.

In addition, in order to improve the accuracy of the "segmentation score", the embodiment of the present application further provides an implementation manner of determining the "segmentation score", and for facilitating understanding, the following description is made in conjunction with "a determination process of the segmentation score of a point pair to be scored".

As an example, if the "at least one segmentation point pair" includes a point pair to be scored, the process of determining the segmentation score of the point pair to be scored may include steps 41 to 42:

step 41: and segmenting the recognition text object to be segmented and the standard text object to be segmented by utilizing the point pairs to be segmented to obtain at least one text pair to be compared.

The "point pair to be scored" is used to indicate any one of the "at least one point pair".

The text pair to be compared is obtained by combining at least one third segmented text segment of the text object to be identified and at least one fourth segmented text segment of the standard text object to be segmented, so that each text pair to be compared comprises one third segmented text segment and one fourth segmented text segment. The third segmentation text segment refers to a text segment obtained after segmentation processing is performed on the text object to be segmented. The fourth text segment to be segmented refers to a text segment obtained after segmentation processing is performed on the standard text object to be segmented.

In addition, the embodiment of the present application does not limit the construction manner of the "at least one text pair to be compared", for example, at least one third segmented text segment of the to-be-segmented recognized text object and at least one fourth segmented text segment of the to-be-segmented standard text object may be combined according to the first combination rule to obtain the "at least one text pair to be compared". Wherein, the "first combination rule" can be preset; for example, if the text object to be segmented is segmented into 2 third segmented text segments and the standard text object to be segmented is segmented into 2 fourth segmented text segments, the "first combination rule" may specifically be to combine a third segmented text segment positioned earlier in the "2 third segmented text segments" with a fourth segmented text segment positioned earlier in the "2 fourth segmented text segments" to obtain a text pair to be compared; and combining the third segmented text segment with the later position in the 2 third segmented text segments with the fourth segmented text segment with the later position in the 2 fourth segmented text segments to obtain another text pair to be compared.

In addition, the embodiment of the "cutting process" in step 41 is not limited to this, and may be implemented by any of the cutting processes shown in S3, for example.

Step 42: and determining the segmentation score of the point pair to be scored according to the text comparison result of at least one text pair to be compared.

The text comparison result of the text pair to be compared refers to a result obtained by performing text comparison processing on two text data included in the text pair to be compared, so that the text comparison result of the text pair to be compared is used for describing the similarity between the two text data included in the text pair to be compared.

The text comparison processing is used for comparing the text similarity of the two text data; in addition, the embodiment of the present application is not limited to the above "text comparison processing", and for example, any existing or future text comparison method may be used for implementation.

In addition, the embodiments of the present application do not limit the "text alignment result", for example, it may include a text content alignment result (e.g., hereinafter, "content alignment score") and/or a text length alignment result (e.g., hereinafter, "length alignment score"). The "text content comparison result" is used to indicate the content similarity between two text data. The "text length comparison result" is used to indicate the length similarity between two text data.

In addition, the embodiment of step 42 is not limited in this application, for example, step 42 may specifically include: firstly, determining a text comparison score of each text pair to be compared according to a text comparison result of each text pair to be compared; and then carrying out average calculation on the text comparison scores of at least one text pair to be compared to obtain the segmentation score of the point pair to be evaluated.

The text comparison score of the text pair to be compared is used for representing the similarity degree between two text data included in the text pair to be compared; moreover, the embodiment of the present application does not limit "the text comparison score of the text pair to be compared", for example, the text comparison score may specifically include: and carrying out weighted summation processing on the text content comparison result of the text pair to be compared and the text length comparison result of the text pair to be compared to obtain a text comparison score of the text pair to be compared.

Based on the related contents of the above steps 41 to 42, after the point pair to be scored is obtained, the point pair to be scored may be utilized to perform segmentation processing on the text object to be segmented and the standard text object to be segmented, so as to obtain at least one text pair to be compared; and then referring to the text comparison result of each text pair to be compared, and determining the segmentation score of the point pair to be scored, so that the segmentation score can accurately show the reasonable degree achieved when segmentation processing is carried out on the text object to be segmented and the standard text object to be segmented according to the point pair to be scored, and the accuracy of the corresponding relation of the texts is favorably improved.

In addition, in order to further improve the accuracy of the text comparison result, the embodiment of the present application also provides a possible implementation manner of the "text comparison result", and for convenience of understanding, the following description is made with reference to an example.

As an example, when the above-mentioned "at least one text segment pair to be compared" includes a text segment pair to be used, and the text segment pair to be used includes an identification text segment to be used and a standard text segment to be used, the text comparison result of the target text segment pair may include a content comparison score between the identification text segment to be used and the standard text segment to be used, and/or a length comparison score between the identification text segment to be used and the standard text segment to be used.

The "text segment pair to be used" is used to represent any one text pair to be compared in the "at least one text pair to be compared".

The text segment to be recognized is a text segment obtained by cutting the text object to be recognized by the points to be scored.

The "standard text segment to be used" refers to a text segment obtained by cutting the standard text object to be cut by using the "point to be scored".

The "content comparison score between the identification text segment to be used and the standard text segment to be used" is used for representing the content similarity degree between the identification text segment to be used and the standard text segment to be used.

The length comparison score between the recognition text segment to be used and the standard text segment to be used is used for representing the similarity degree of the lengths between the recognition text segment to be used and the standard text segment to be used.

Based on the related contents of the above example, for the text segment pair to be used including the identification text segment to be used and the standard text segment to be used, the text comparison result of the target text segment pair may include the content comparison score between the identification text segment to be used and the standard text segment to be used; alternatively, the text comparison result of the target text segment pair may include a content comparison score between the to-be-used recognition text segment and the to-be-used standard text segment, and a length comparison score between the to-be-used recognition text segment and the to-be-used standard text segment.

In addition, in order to improve the accuracy of the content comparison score, the embodiment of the present application further provides a possible implementation manner of determining the content comparison score between the to-be-used recognized text segment and the to-be-used standard text segment, which may specifically include: and determining the content comparison score between the standard text segment to be used and the identification text segment to be used according to the content coverage of the standard text segment to be used by the identification text segment to be used, the content coverage of the standard text segment to be used by the identification text object to be divided, the content coverage of the standard text segment to be used by the identification text segment to be divided and the content coverage of the standard text segment to be divided.

The content coverage of the standard text segment to be used by the identification text segment to be used is used for representing the coverage degree of the text information carried by the identification text segment to be used for the text information carried by the standard text segment to be used.

The content coverage of the standard text segment to be used by the text object to be segmented is used for representing the coverage degree of the text information carried by the text object to be segmented aiming at the text information carried by the standard text segment to be used.

The content coverage degree of the standard text segment to be used to the recognized text segment is used for representing the coverage degree of the text information carried by the standard text segment to be used aiming at the text information carried by the recognized text segment to be used.

The content coverage of the standard text object to be divided to the text segment to be recognized is used for representing the coverage degree of the text information carried by the standard text object to be divided to the text information carried by the text segment to be recognized.

The determination process of the "content comparison score between the to-be-used identification text segment and the to-be-used standard text segment" is not limited in the embodiment of the present application, and for example, the determination process may specifically include: and performing first statistical processing (for example, adding processing, averaging processing, or maximum value processing) on the content coverage of the standard text segment to be used of the identification text segment to be used, the content coverage of the identification text object to be divided on the standard text segment to be used, the content coverage of the standard text segment to be used of the identification text segment to be divided, and the content coverage of the standard text object to be divided on the identification text segment to be used, so as to obtain a content comparison score between the identification text segment to be used and the standard text segment to be used.

In addition, in order to further improve the accuracy of the content comparison score, the embodiment of the present application further provides a possible implementation manner of determining the content comparison score between the recognized text segment to be used and the standard text segment to be used, which may specifically include steps 52 to 53:

step 51: and determining the coverage score of the content corresponding to the standard text segment to be used according to the ratio of the content coverage of the standard text segment to be used by the identification text segment to be used to the content coverage of the standard text segment to be used by the identification text object to be cut.

Step 52: and determining the coverage score of the content corresponding to the identification text segment to be used according to the ratio of the content coverage of the identification text segment to be used by the standard text segment to be used to the content coverage of the identification text segment to be used by the standard text object to be cut.

Step 53: and determining the content comparison score between the recognition text segment to be used and the standard text segment to be used according to the product of the content coverage score corresponding to the standard text segment to be used and the content coverage score corresponding to the recognition text segment to be used.

The present embodiment does not limit the implementation manner of step 53, and may be implemented by using formula (1), for example.

In the formula, text_score(TS_ide,TS_nor) Representing the content comparison score between the identification text segment to be used and the standard text segment to be used; TS (transport stream)_ideRepresenting a recognition text segment to be used; TS (transport stream)_norIndicating a standard text segment to be used; cover_score(TS_ide,TS_nor) Representing the content coverage of the standard text segment to be used by the identification text segment to be used;representing the content coverage of the standard text segment to be used by the text object to be divided and identified;indicating the text object to be cut and the text object to be cut isScoring Point pairs "cut into TS_ideAndthe two text segments; cover_score(TS_nor,TS_ide) Representing the content coverage of the standard text segment to be used and the identification text segment to be used;representing the content coverage of the standard text object to be divided to the identification text segment to be used;the standard text object to be divided is shown and is divided into TS by the above-mentioned "point pair to be evaluated_norAndthese two text passages.

Based on the related contents of the above steps 51 to 53, in some cases, a content comparison score between the recognition text segment to be used and the standard text segment to be used (as shown in the above formula (1)) may be determined by referring to a ratio between a content coverage of the recognition text segment to be used and a content coverage of the recognition text object to be divided and a content coverage of the recognition text segment to be used and a content coverage of the standard text object to be divided, so that the "content comparison score between the recognition text segment to be used and the standard text segment to be used" may more accurately represent a content similarity between the recognition text segment to be used and the standard text segment to be used.

In addition, in order to improve the accuracy of the content coverage, the embodiment of the present application also provides a possible implementation manner of determining the "content coverage", and for convenience of description, the following description is given with reference to an example.

As an example, the process of determining the content coverage of the second object by the first object may specifically include steps 61 to 63:

step 61: and respectively carrying out unit division processing on the first object and the second object according to the ith division mode to obtain an ith unit set of the first object and an ith unit set of the second object. Wherein I is a positive integer, I is not more than I, I is a positive integer, and I represents the number of the division modes.

The "ith division method" is used to indicate the ith unit division processing method (for example, the division method shown in the i-gram can be used); and the "i-th unit manner" is to perform division processing with i words as one division unit. Wherein I is a positive integer, I is not more than I, I is a positive integer, and I represents the number of the division modes.

The "first object" and the "second object" are not limited in the embodiments of the present application, for example, the "first object" may refer to the "standard text segment to be used" in the "coverage of content of standard text segment to be used by using recognized text segment", and the "second object" may refer to the "standard text segment to be used" in the "coverage of content of standard text segment to be used by using recognized text segment". For another example, the "first object" may refer to the "to-be-divided recognized text object" in the "content coverage of the to-be-divided recognized text object to the standard text segment to be used" above, and the "second object" may refer to the "standard text segment to be used" in the "content coverage of the to-be-divided recognized text object to the standard text segment to be used" above. Also as an example, the "first object" may refer to the "standard text segment to use" in the "content coverage of the standard text segment to use for the recognized text segment" above, and the "second object" may refer to the "recognized text segment to use" in the "content coverage of the standard text segment to use for the recognized text segment" above. For another example, the "first object" may refer to the "standard text object to be divided" in the "coverage of content of standard text object to be divided to recognized text segment", and the "second object" may refer to the "recognition text segment to be used" in the "coverage of content of standard text object to be divided to recognized text segment".

The "ith unit set of the first object" is used for recording all the dividing units obtained after the first object is divided by taking i vocabularies as one dividing unit, so that each dividing unit included in the "ith unit set of the first object" comprises i vocabularies.

The "ith unit set of the second object" is used for recording all the dividing units obtained after the second object is divided by taking i vocabularies as one dividing unit, so that each dividing unit included in the "ith unit set of the second object" comprises i vocabularies.

Step 62: and determining the content coverage corresponding to the ith unit mode according to the intersection between the ith unit set of the first object and the ith unit set of the second object. Wherein I is a positive integer, I is not more than I, I is a positive integer, and I represents the number of the division modes.

As an example, step 62 may specifically include steps 621 to 623:

step 621: and performing vocabulary number statistics on the intersection between the ith unit set of the first object and the ith unit set of the second object to obtain the number of intersection vocabularies corresponding to the ith unit mode, so that the number of intersection vocabularies corresponding to the ith unit mode is used for expressing the number of vocabularies shared between the ith unit set of the first object and the ith unit set of the second object.

Step 622: and counting the number of the vocabularies of the ith unit set of the second object to obtain the number of the vocabularies to be compared corresponding to the ith unit mode, so that the number of the vocabularies to be compared corresponding to the ith unit mode is used for representing the number of the vocabularies included in the ith unit set of the second object.

Step 623: and determining the content coverage corresponding to the ith unit mode according to the ratio of the number of the intersection vocabularies corresponding to the ith unit mode to the number of the vocabularies to be compared corresponding to the ith unit mode.

In this embodiment of the application, after the number of the intersection vocabulary corresponding to the ith unit mode and the number of the vocabularies to be compared corresponding to the ith unit mode are obtained, the content coverage corresponding to the ith unit mode may be determined according to a ratio between the number of the intersection vocabulary corresponding to the ith unit mode and the number of the vocabularies to be compared corresponding to the ith unit mode (for example, the ratio between the number of the intersection vocabulary corresponding to the ith unit mode and the number of the vocabularies to be compared corresponding to the ith unit mode is determined as the content coverage corresponding to the ith unit mode).

Based on the related content in step 62, after the ith cell set of the first object and the ith cell set of the second object are obtained, an intersection between the ith cell set of the first object and the ith cell set of the second object may be determined; and then, referring to the intersection and the ith unit set of the second object, and determining the content coverage corresponding to the ith unit mode.

And step 63: and determining the content coverage of the first object to the second object according to the average value between the content coverage corresponding to the 1 st unit mode and the content coverage corresponding to the I-th unit mode.

In this embodiment of the application, after the content coverage corresponding to the 1 st unit mode to the content coverage corresponding to the I unit mode are obtained, the content coverage of the first object to the second object may be determined according to an average value between the content coverage corresponding to the I unit modes (as shown in formula (2)).

In the formula, cover_score(Obj₁,Obj₂) Representing a content coverage of the second object by the first object; obj₁Representing a first object; obj₂Representing a second object; grams_i(Obj₁) An ith set of cells representing the first object; grams_i(Obj₂) An ith set of cells representing a second object; i is a positive integer, I is not more than I, I is a positive integer, and I represents the number of the division modes.

Based on the related contents of the above steps 61 to 63, in the embodiment of the present application, the text content comparison between the first object and the second object can be realized by means of the I-type division manner, so as to obtain the content coverage of the first object to the second object, so that the content coverage can more accurately represent the coverage degree of the text information carried by the first object with respect to the text information carried by the second object, which is favorable for improving the accuracy of the content coverage.

In addition, in order to further improve the accuracy of the above "length matching score", the embodiment of the present application further provides a possible implementation manner of determining the "length matching score between the recognition text segment to be used and the standard text segment to be used", which may specifically include steps 71 to 73:

step 71: and determining a first ratio to be used according to the ratio of the text length of the identification text segment to be used to the text length of the standard text segment to be used.

The embodiment of step 71 is not limited in the examples of the present application, and for example, it may specifically be: and determining the ratio of the text length of the identification text segment to be used to the text length of the standard text segment to be used as the first ratio to be used.

Step 72: and determining a second ratio to be used according to the ratio of the text length of the standard text segment to be used to the text length of the recognized text segment to be used.

The embodiment of step 72 is not limited in the examples of the present application, and for example, it may specifically be: and determining a second ratio to be used according to the ratio of the text length of the standard text segment to be used to the text length of the recognized text segment to be used.

Step 73: and carrying out preset data processing on the first ratio to be used and the second ratio to be used to obtain a length comparison score between the identification text segment to be used and the standard text segment to be used.

Wherein, the 'preset data processing' can be preset; in addition, the embodiment of the present application is not limited to the "preset data processing", and may specifically be minimum value processing, maximum value processing, or average value processing.

In addition, the embodiment of step 73 is not limited in the examples of the present application, and for example, it may specifically be: and determining the minimum value of the first ratio to be used and the second ratio to be used as the length comparison score between the identification text section to be used and the standard text section to be used.

Based on the above-mentioned related contents of step 71 to step 73, in some cases, the length comparison score between the recognition text segment to be used and the standard text segment to be used may be determined by comprehensively referring to the ratio between the text length of the recognition text segment to be used and the text length of the standard text segment to be used, and the ratio between the text length of the standard text segment to be used and the text length of the recognition text segment to be used, so that the length comparison score can better represent the similarity degree of the lengths between the recognition text segment to be used and the standard text segment to be used.

In addition, in order to further improve the accuracy of the segmentation score, in addition to the "text comparison result", the local information similarity between the two segmentation points included in the "pair of points to be scored" may be referred to. Based on this, the present application provides another possible implementation manner of the foregoing "step 42", which may specifically include: and determining the segmentation score of the point pair to be scored according to the text comparison result of at least one text pair to be compared and the adjacent vocabulary comparison result of the point pair to be scored.

The "adjacent vocabulary comparison result of the point pair to be scored" is used to indicate the local information similarity between the two segmentation points included in the point pair to be scored.

In addition, the embodiment of the present application does not limit the determination process of the "neighboring vocabulary comparison result of the point pair to be scored", for example, when the point pair to be scored includes a first recognition cut point and a first standard cut point, the neighboring vocabulary comparison result of the point pair to be scored may be determined according to a content comparison score between at least one neighboring vocabulary of the first recognition cut point and at least one neighboring vocabulary of the first standard cut point.

The first identification cutting point is used for indicating a cutting position which is required to be referred when the text object to be cut is subjected to cutting processing.

The "first standard cutting point" is used for indicating a cutting position which needs to be referred to when the standard text object to be cut is subjected to cutting processing.

"at least one adjacent word of the first recognition cut point" means at least one word closer to the first recognition cut point in the recognized text object to be cut; moreover, the embodiment of the present application is not limited to "at least one neighboring vocabulary of the first recognition segmentation point", for example, it may specifically include R left neighboring vocabularies of the first recognition segmentation point and R right neighboring vocabularies of the first recognition segmentation point. The left adjacent vocabulary refers to the vocabulary which is positioned at the left side of the first recognition segmentation point and is closer to the first recognition segmentation point in the text object to be segmented and recognized. The term "right adjacent word" refers to a word which is positioned at the right side of the first recognition cut point and is closer to the first recognition cut point in the recognized text object to be cut. Wherein R is a positive integer.

"at least one adjacent vocabulary of the first standard segmentation point" refers to at least one vocabulary closer to the first standard segmentation point in the standard text object to be segmented; moreover, the embodiment of the present application does not limit "at least one neighboring vocabulary of the first standard segmentation point", for example, it may specifically include R left neighboring vocabularies of the first standard segmentation point and R right neighboring vocabularies of the first standard segmentation point. The left adjacent vocabulary refers to the vocabulary which is positioned at the left side of the first standard dividing point and is closer to the first standard dividing point in the standard text object to be divided. The term "right adjacent word" refers to a word which is positioned to the right of and closer to the first standard cut point in the standard text object to be cut. Wherein R is a positive integer.

The "content comparison score between at least one neighboring vocabulary of the first recognition segmentation point and at least one neighboring vocabulary of the first standard segmentation point" is used to represent the content similarity degree between the "at least one neighboring vocabulary of the first recognition segmentation point" and the "at least one neighboring vocabulary of the first standard segmentation point".

In addition, the embodiment of the present application does not limit the determination process of the content comparison score between at least one neighboring vocabulary of the first recognition segmentation point and at least one neighboring vocabulary of the first standard segmentation point, and the determination process is similar to the determination process of the content comparison score between the recognition text segment to be used and the standard text segment to be used, which is shown above.

For ease of understanding, the following description is made with reference to examples.

As an example, when the "at least one neighboring vocabulary" includes at least one left neighboring vocabulary and at least one right neighboring vocabulary, the determination process of "content comparison score between at least one neighboring vocabulary of the first recognition segmentation point and at least one neighboring vocabulary of the first criterion segmentation point" may specifically include steps 81-83:

step 81: a left neighbor vocabulary comparison score is determined based on the content coverage of the at least one left neighbor vocabulary of the first identified segmentation point to the at least one left neighbor vocabulary of the first standard segmentation point, the content coverage of the at least one left neighbor vocabulary of the first standard segmentation point to the at least one left neighbor vocabulary of the first identified segmentation point, the content coverage of the at least one neighbor vocabulary of the first standard segmentation point to the at least one left neighbor vocabulary of the first identified segmentation point.

The left adjacent vocabulary comparison score is used for representing the content similarity between at least one left adjacent vocabulary of the first identification segmentation point and at least one left adjacent vocabulary of the first standard segmentation point.

It should be noted that the implementation manner of step 81 is similar to the determination process of "content comparison score between the text segment to be used and the standard text segment to be used".

Step 82: determining a right neighbor vocabulary comparison score based on the content coverage of the at least one right neighbor vocabulary of the first recognition cut point to the at least one right neighbor vocabulary of the first standard cut point, the content coverage of the at least one right neighbor vocabulary of the first standard cut point to the at least one right neighbor vocabulary of the first recognition cut point, the content coverage of the at least one neighbor vocabulary of the first standard cut point to the at least one right neighbor vocabulary of the first recognition cut point.

Wherein, the "right neighboring vocabulary comparison score" is used for representing the content similarity between at least one right neighboring vocabulary of the first identification segmentation point and at least one right neighboring vocabulary of the first standard segmentation point.

It should be noted that the implementation manner of step 82 is similar to the determination process of "content comparison score between the text segment to be used and the standard text segment to be used".

Step 83: and determining the content comparison score between at least one adjacent word of the first identification segmentation point and at least one adjacent word of the first standard segmentation point according to the average value between the comparison score of the left adjacent word and the comparison score of the right adjacent word.

In the embodiment of the present application, after the comparison score of the left neighboring vocabulary and the comparison score of the right neighboring vocabulary are obtained, the "content comparison score between at least one neighboring vocabulary of the first recognition segmentation point and at least one neighboring vocabulary of the first standard segmentation point" may be determined based on the average between the left neighboring vocabulary comparison score and the right neighboring vocabulary comparison score (for example, the average between the left neighboring vocabulary comparison score and the right neighboring vocabulary comparison score may be directly determined as the "content comparison score between at least one neighboring vocabulary of the first recognition segmentation point and at least one neighboring vocabulary of the first standard segmentation point"), so that the content comparison score can more accurately represent the degree of content similarity between the "at least one word adjacent to the first recognition cut point" and the "at least one word adjacent to the first criterion cut point".

Based on the related content of the "comparison result of adjacent vocabularies of the point pair to be scored", for the point pair to be scored including the first recognition cut point and the first standard cut point, the comparison result of adjacent vocabularies of the point pair to be scored may be determined according to the content comparison score between at least one adjacent vocabulary of the first recognition cut point and at least one adjacent vocabulary of the first standard cut point, so that the comparison result of adjacent vocabularies may accurately represent the local information similarity between the two cut points included in the point pair to be scored.

In addition, the embodiment of the present application does not limit the determination process of the "segmentation score of the point pair to be scored", and for convenience of understanding, the following description is made with reference to an example.

As an example, when the above "at least one text pair to be compared" includes a first text pair (a, B) and a second text pairThe text comparison result of at least one text pair to be compared comprises a content comparison score between A and B, a length comparison score between A and B,Anda content comparison score between, andandwhen the length of the point pair to be scored is compared with the length of the point pair to be scored, the determining process of the segmentation score of the point pair to be scored may specifically include steps 91 to 99:

step 91: "score by content alignment between A and B" - "Andthe content comparison score "is subjected to average processing to obtain a segmentation text association score of the point pair to be scored (as shown in formula (3)).

In the formula, Relevance_score(point_iden,point_norl) Representing the segmentation text association score of the point pair to be scored; (Point)_iden,point_norl) Representing a point pair to be scored; point_idenThe "first identification cut point" above is represented; point_norlDenotes the "first standard cut point" above; text_score(A, B) represents the above-mentioned "content alignment score between A and B";is shown above "Andcontent alignment score between; a andmeans that the text object to be divided is identified according to point_idenPerforming segmentation processing to obtain two text segments; b andmeans that the standard text object to be divided is divided according to point_norlAnd carrying out segmentation processing to obtain two text segments.

And step 92: "Length alignment between A and B scores" AND "Andthe length comparison scores' between the points are processed by average value to obtain the cutting length penalty value of the point pairs to be scored (shown in formulas (4) to (6)).

In the formula, Len_score(point_iden,point_norl) Representing the segmentation length penalty value of the point pair to be scored; (Point)_iden,point_norl) Representing a point pair to be scored; point_idenThe "first identification cut point" above is represented; point_norlDenotes the "first standard cut point" above; length_score(A, B) represents the above-mentioned "length-to-length ratio score between A and B";is shown above "Andlength alignment score between; len (a) represents the text length of a; len (B) represents the length of text representing B;to representThe text length of (d);representation ofThe text length of (d); a andmeans that the text object to be divided is identified according to point_idenPerforming segmentation processing to obtain two text segments; b andmeans that the standard text object to be divided is divided according to point_norlAnd carrying out segmentation processing to obtain two text segments.

Step 93: and determining the adjacent vocabulary comparison result of the point pair to be scored as the local similarity score of the point pair to be scored (as shown in the formula (7)).

In the formula, local_score(point_iden,point_norl) Representing the local similarity score of the point pair to be scored (i.e., the neighboring vocabulary comparison result of the point pair to be scored); (Point)_iden,point_norl) Representing a point pair to be scored; point_idenThe "first identification cut point" above is represented; point_norlDenotes the "first standard cut point" above;represent the "left-adjacent vocabulary alignment score" above;represent the "right neighbor vocabulary alignment score" above;at least one of the first recognition cut points is representedLeft adjacent words;at least one left-adjacent word representing a first criterion cut point;at least one right neighbor vocabulary representing a first recognition cut point;at least one right neighbor vocabulary representing a first criterion cut point.

Step 94: and carrying out weighted summation on the segmentation text association score of the point pair to be scored, the segmentation length penalty value of the point pair to be scored and the local similarity score of the point pair to be scored to obtain the segmentation score of the point pair to be scored (as shown in a formula (8)).

Score(point_iden,point_norl)＝α×Relevance_score(point_iden，point_norl)+β×Len_score(point_iden,point_norl)+γ×local_score(point_iden，point_norl) (8)

In the formula, Score (point)_iden，point_norl) Representing the segmentation score of the point pair to be scored; relevance_score(point_iden,point_norl) Representing the segmentation text association score of the point pair to be scored; alpha represents the weighting weight corresponding to the segmentation text association score of the point pair to be scored; len_score(point_iden，point_norl) Representing the segmentation length penalty value of the point pair to be scored; beta represents the weighting weight corresponding to the segmentation length penalty value of the point pair to be scored; local_score(point_iden,point_norl) Representing local similarity scores of the point pairs to be scored; gamma represents the weighting corresponding to the local similarity score of the point pair to be scored. Note that α, β, and γ may be set in advance according to the application scenario.

Based on the related contents of the above steps 91 to 94, in some cases, for the point pair to be scored, after the point pair to be scored is utilized to segment the text object to be segmented and the standard text object to be segmented to obtain at least one text pair to be compared, the text comparison result of the at least one text pair to be compared and the adjacent vocabulary comparison result of the point pair to be scored may be referred to, and the segmentation score of the point pair to be scored may be determined, so that the segmentation score of the point pair to be scored may accurately indicate the possibility that the point pair to be scored is used for performing the subsequent segmentation processing.

In fact, in order to further improve the accuracy of segmentation score, various possible combinations between at least one third segmented text segment of the to-be-segmented recognized text object and at least one fourth segmented text segment of the to-be-segmented standard text object may be traversed to obtain the above "at least one text pair to be compared". Based on this, the embodiment of the present application further provides a possible implementation manner of determining a "segmentation score of a point pair to be scored", which may specifically include steps 101 to 105:

step 101: and utilizing the point pairs to be scored to segment the text object to be segmented and the standard text object to be segmented to obtain at least one third segmented text segment of the text object to be segmented and at least one fourth segmented text segment of the standard text object to be segmented.

It should be noted that the relevant contents of the "at least one third segmented text segment of the recognized text object to be segmented" and the "at least one fourth segmented text segment of the standard text object to be segmented" refer to the relevant contents of step 41 above.

Step 102: and combining at least one third segmented text segment of the to-be-segmented recognized text object and at least one fourth segmented text segment of the to-be-segmented standard text object according to a second combination rule to obtain at least one text pair to be compared.

Wherein, the "second combination rule" can be preset; and the "second combination rule" may be to traverse various possible combinations between the "at least one third segmented text segment of the recognized text object to be segmented" and the "at least one fourth segmented text segment of the standard text object to be segmented". For ease of understanding, the following description is made with reference to examples.

As an example, if the text object to be segmented is segmented into 2 third segmented text segments, and the standard text object to be segmented is segmented into 2 fourth segmented text segments, the "second combination rule" may specifically be: combining a third segmented text segment with a position before the position comparison in the 2 third segmented text segments with a fourth segmented text segment with a position before the position comparison in the 2 fourth segmented text segments to obtain a first text pair to be compared; combining a third segmented text segment with a front position in the 2 third segmented text segments with a fourth segmented text segment with a rear position in the 2 fourth segmented text segments to obtain a second text pair to be compared; combining a third segmented text segment with a later position in the 2 third segmented text segments with a fourth segmented text segment with a later position in the 2 fourth segmented text segments to obtain a third text pair to be compared; and combining the third segmented text segment with the later position in the 2 third segmented text segments with the fourth segmented text segment with the earlier position in the 2 fourth segmented text segments to obtain a fourth text pair to be compared.

Step 103: and combining the at least one text pair to be compared according to a preset combination mode to obtain at least one combination to be processed. Wherein, the combination to be processed comprises at least one text pair to be compared.

Wherein, the 'preset combination mode' can be preset; the embodiment of the present application does not limit the "preset combination manner," and for example, the preset combination manner may specifically be configured to ensure that each combination to be processed can cover text information carried by the text object to be cut and identified and text information carried by the standard text object to be cut. For ease of understanding, the following description is made with reference to examples.

As an example, when the text object to be segmented is segmented into A andthe standard text object to be divided is divided into B andand the above-mentioned "at least one text pair to be compared" includes first text pair (A, B) and second text pairThird text pairAnd a fourth text pairThe first text pair (A, B) and the second text pair may be alignedCombining to obtain a combination to be processed; and pair the third textAnd a fourth text pairAnd combining to obtain another combination to be processed.

Based on the related content in step 103, after at least one text pair to be compared is obtained, the text pairs to be compared may be combined in a preset combination manner to obtain at least one combination to be processed, so that each combination to be processed includes at least one text pair to be compared, and the "segmentation score of the point pair to be scored" may be determined based on the combinations to be processed in the following.

Step 104: and determining a combined score of at least one combination to be processed according to the text comparison result of at least one text pair to be compared.

Wherein, the "combination score of the combination to be processed" is used to indicate the possibility that the pair of points to be scored presented by the combination to be processed is used for the subsequent segmentation processing.

In addition, the determination process of the "combination score of the combination to be processed" is not limited in the embodiment of the present application, and is similar to the determination process of the "segmentation score of the point pair to be scored" shown above. For ease of understanding, the following description is made in conjunction with two examples,

example 1, when the to-be-processed combination includes the first to-be-processed text pair and the second to-be-processed text pair, the determining process of the combination score of the to-be-processed combination may specifically include: and determining a combined score of the combination to be processed according to the text comparison result of the first text pair to be processed and the text comparison result of the second text pair to be processed.

It should be noted that the "determination process of the combined score of the combinations to be processed" shown in example 1 is similar to the process of "determining the segmentation score of the point pair to be scored according to the text comparison result of at least one text pair to be compared" shown above.

Example 2, when the to-be-processed combination includes the first to-be-processed text pair and the second to-be-processed text pair, the determining process of the combination score of the to-be-processed combination may specifically include: and determining the combined score of the combination to be processed according to the text comparison result of the first text pair to be processed, the text comparison result of the second text pair to be processed and the adjacent vocabulary comparison result of the point pair to be scored.

It should be noted that the "determination process of the combination score of the combination to be processed" shown in example 2 is similar to the "determination process of the segmentation score of the point pair to be scored according to the text comparison result of at least one text pair to be compared and the adjacent vocabulary comparison result of the point pair to be scored" shown above.

Step 105: and carrying out preset statistical analysis processing on the combined score of at least one combination to be processed to obtain the segmentation score of the point pair to be scored.

Wherein, the "preset statistical analysis processing" can be preset; the present embodiment is not limited to the "preset statistical analysis processing", and may be, for example, a summation processing, an averaging processing, a maximum value processing, or a minimum value processing.

Based on the related contents of the above steps 101 to 105, in some cases, the segmentation score of the point pair to be scored may be calculated by traversing various possible combinations between at least one third segmented text segment of the to-be-scored recognized text object and at least one fourth segmented text segment of the to-be-scored standard text object, so that the "segmentation score of the point pair to be scored" can more accurately represent the possibility that the point pair to be scored is used for performing the subsequent segmentation processing.

In addition, in order to improve the construction effect of the text correspondence, the embodiment of the present application further provides a possible implementation manner of updating the text correspondence between the speech recognition text and the standard speech text, and for facilitating understanding, the following description is made with reference to an example.

As an example, when the "at least one text pair" includes a target text pair and the target text pair includes a target recognition text segment and a target standard text segment, the update process of the "text correspondence between the speech recognition text and the standard speech text" may include steps 111 to 112:

step 111: and establishing a corresponding relation between the target identification text segment and the target standard text segment.

Step 112: the correspondence between the target recognition text segment and the target standard text segment is added to the above-mentioned "text correspondence between the voice recognition text and the standard voice text".

Based on the related contents of the above steps 111 to 112, at least one text pair (e.g., the first text pair (a, B) and the second text pair) is obtained) Then, correspondence may be established between text data included in each text pair, and these correspondences are added to the above-mentioned "text correspondence between speech recognition text and the standard speech text" so as to make the text between the above-mentioned "speech recognition text and the standard speech textThe correspondence relationship "can record the correspondence relationship between the text data represented by the text pairs.

In addition, in order to improve the expression effect of the text correspondence, the embodiment of the present application further provides a possible implementation manner of step 112, which may specifically include steps 121 to 122:

step 121: if the correspondence between the first text to be segmented and the second text to be segmented exists in the text correspondence between the voice recognition text and the standard voice text, deleting the correspondence between the first text to be segmented and the second text to be segmented from the text correspondence between the voice recognition text and the standard voice text, and adding the correspondence between the target recognition text segment and the target standard text segment to the text correspondence between the voice recognition text and the standard voice text.

Step 122: and if the correspondence between the first text to be segmented and the second text to be segmented does not exist in the text correspondence between the voice recognition text and the standard voice text, adding the correspondence between the target recognition text segment and the target standard text segment to the text correspondence between the voice recognition text and the standard voice text.

Based on the relevant content of the above steps 121 to 122, it can be known that, since the "at least one text pair" is obtained by performing segmentation processing on the first text to be segmented and the second text to be segmented, so that the text data corresponding relationship recorded by the "at least one text pair" can more accurately represent the corresponding relationship between the "first text to be segmented and the second text to be segmented", after the text data corresponding relationship recorded by the target text pair is obtained, the text data corresponding relationship recorded by the "at least one text pair" is used to replace the "corresponding relationship between the first text to be segmented and the second text to be segmented" existing in the "text corresponding relationship between the voice recognition text and the standard voice text", so that the text corresponding relationship can be simplified on the premise of ensuring that the text corresponding relationship is relatively high in accuracy, therefore, the expression effect of the corresponding relation of the text is improved.

In addition, in order to improve the construction effect of the text correspondence relationship, the embodiment of the present application further provides a possible implementation manner of updating the above "first text to be cut and second text to be cut", and for convenience of understanding, the following description is made with reference to an example.

As an example, when the "at least one text pair" includes a target text pair, and the target text pair includes a target recognition text segment and a target standard text segment, the updating process of the "first text to be segmented and the second text to be segmented" may specifically include: and determining a first text to be segmented according to the target identification text segment, and determining a second text to be segmented according to the target standard text segment.

That is, after the target text pair is obtained, the target recognition text segment in the target text pair may be determined as a first text to be segmented, and the target standard text segment in the target text pair may be determined as a second text to be segmented, so that the segmentation process for the target recognition text segment and the target standard text segment and the construction process for the text correspondence relationship may be performed subsequently.

It can be seen that, in some cases, there is at least one text pair (e.g., a first text pair (A, B) and a second text pair) as described above for the "at least one text pair" (e.g., a first text pair (A, B) and a second text pair)) For example, the first text to be segmented and the second text to be segmented may be updated by using the text data included in each text pair, so that the segmentation process of the text data included in each text pair and the construction process of the text correspondence may be subsequently implemented, and thus the integrity of the text correspondence may be effectively improved.

In addition, in order to improve the accuracy of the text pairs, the embodiment of the present application further provides a possible implementation manner of determining the above "at least one text pair", and for facilitating understanding, the following description is provided with reference to an example.

As an example of this, the following is given,when the "point pair to be used" includes the second recognition cut point and the second standard cut point, and the "at least one text pair" includes the first text pair (A, B) and the second text pairThen, the determination process of "at least one text pair" includes steps 131 to 134:

step 131: segmenting the first text to be segmented by using the second identification segmentation point to obtain a front segment identification text A and a rear segment identification text

Step 132: utilizing a second standard segmentation point to segment a second text to be segmented to obtain a front section standard text B and a rear section standard text

Step 133: a first text pair (A, B) is determined based on the preceding paragraph recognition text A and the preceding paragraph standard text B.

Step 134: recognizing text from back-endAnd back paragraph standard textDetermining a second text pair

As can be understood from the related contents of the above steps 131 to 134, in some cases, after the segmentation processing is performed on the speech recognition text and the standard speech text by using the to-be-used points to obtain at least one first segmented text segment of the speech recognition text and at least one second segmented text segment of the standard speech text, the at least one first segmented text segment and the at least one second segmented text segment may be combined according to the above "first combination rule" to obtain the above "at least one text pair".

In addition, in order to improve the diversity of the text pairs, the embodiment of the present application further provides a possible implementation manner of determining the above "at least one text pair", and for facilitating understanding, the following description is made with reference to an example.

As an example, when the "point pair to be used" includes the second recognition cut point and the second standard cut point, and the "at least one text pair" includes the first text pair (a, B), the second text pairThird text pairAnd a fourth text pairThe determination process of "at least one text pair" includes steps 141-146:

step 141: segmenting the first text to be segmented by using the second identification segmentation point to obtain a front segment identification text A and a rear segment identification text

Step 142: utilizing a second standard segmentation point to segment a second text to be segmented to obtain a front section standard text B and a rear section standard text

Step 143: a first text pair (A, B) is determined based on the preceding paragraph recognition text A and the preceding paragraph standard text B.

Step 144: recognizing text from back-endAnd back paragraph standard textDetermining a second text pair

Step 145: recognizing text from back-endAnd the standard text B in the previous stage, and determining a third text pair

Step 146: recognizing text A and back segment standard text according to front segmentDetermining a fourth text pair

As can be understood from the related contents of the foregoing steps 141 to 146, in some cases, after the segmentation processing is performed on the speech recognition text and the standard speech text by using the to-be-used points to obtain at least one first segmented text segment of the speech recognition text and at least one second segmented text segment of the standard speech text, the at least one first segmented text segment and the at least one second segmented text segment may be combined according to the foregoing "second combination rule" to obtain the foregoing "at least one text pair" so that the "at least one text pair" can traverse various possible combinations between the "at least one first segmented text segment of the speech recognition text" and the "at least one second segmented text segment of the standard speech text".

In addition, in order to further improve the accuracy of the text correspondence, the embodiment of the present application further provides a possible implementation manner of determining the text correspondence between the speech recognition text and the standard speech text, which may specifically include steps 151 to 152:

step 151: and combining the at least one text pair according to a preset combination mode to determine at least one candidate combination. Wherein the candidate combination includes at least one text pair.

It should be noted that the determination process of "at least one candidate combination" is similar to the determination process of "at least one pending combination" above

Step 152: and determining the text corresponding relation between the speech recognition text and the standard speech text by utilizing the combination to be used in the at least one candidate combination.

The combination to be used refers to text data required to be used for more accurately describing the text corresponding relation between the voice recognition text and the standard voice text; moreover, the determining process of the "combination to be used" is not limited in the embodiments of the present application, and for example, the determining process of the "combination to be used" may specifically include: and randomly selecting one candidate combination from the 'at least one candidate combination' to determine the candidate combination to be used. As another example, the determination process of the "combination to be used" may specifically include: and selecting the candidate combination with the highest combination score from the 'at least one candidate combination' and determining the candidate combination as the combination to be used.

It should be noted that the determination process of the combination score of the candidate combination is similar to the determination process of the "combination score of the combination to be processed" above.

Based on the above-mentioned related contents of steps 151 to 152, if the above-mentioned "at least one text pair" can traverse various possible combinations between the above-mentioned "at least one first segmented text segment of the speech recognition text" and the above-mentioned "at least one second segmented text segment of the standard speech text", a group of text pairs with the highest combination score can be searched from these text pairs, so that the text correspondence between the speech recognition text and the standard speech text can be determined by using the text data correspondence recorded by the group of text pairs.

In order to facilitate understanding of the text correspondence relationship construction method provided in the embodiment of the present application, a text correspondence relationship construction process shown in fig. 2 is described as an example. Fig. 2 is a schematic diagram of a text correspondence relationship construction process provided in an embodiment of the present application.

As an example, in FIG. 2, "T" is_D"represents the above" speech recognition text "," T_S"indicates the above" standard speech text corresponding to the speech recognition text "," TC1 "indicates a text pair 1," TC2 "indicates a text pair 2," TC3 "indicates a text pair 3," TC4 "indicates a text pair 4," TC5 "indicates a text pair 5, and" TC6 "indicates a text pair 6. In FIG. 2, "T" is_DEach black rectangle appearing in "indicates that the" T "can be addressed_D"a cutting point at which a cutting process is performed.

As shown in fig. 2, when the speech recognition text T is acquired_DAnd a standard speech text T corresponding to the speech recognition text_SThen, the T can be utilized first_DAnd T_SExecuting a round of iterative updating process aiming at the corresponding relation of the text to obtain TC1 and TC 2; then, for the two text segments included in TC1, a round of iterative update process for the text correspondence is performed to obtain TC3 and TC4, and for the two text segments included in TC2, a round of iterative update process for the text correspondence is also performed to obtain TC5 and TC6, so that since single sentence text segments exist in TC3, TC4, and TC6, it can be determined that the above-mentioned "preset stop condition" is reached, and thus it can be determined that "the iterative update process for the text correspondence" is ended, "at this time, it is obtained that" the iterative update process for the text correspondence "is ended_DAnd T_S"constructed text correspondence (i.e., text correspondence shown by TC3, TC4, TC5, and TC 6).

Based on the text corresponding relation construction method provided by the method embodiment, the embodiment of the application also provides a text corresponding relation construction device, which is explained and explained below with reference to the accompanying drawings. Please refer to the above method embodiment for the technical details of the text correspondence relationship construction apparatus provided in the apparatus embodiment.

Referring to fig. 3, the figure is a schematic structural diagram of a text correspondence relationship building apparatus provided in the embodiment of the present application.

The apparatus 300 for building a text correspondence provided in the embodiment of the present application includes:

a point pair determining unit 301, configured to determine at least one segmentation point pair according to at least one first segmentation point of a speech recognition text and at least one second segmentation point of a standard speech text after obtaining the speech recognition text and the standard speech text corresponding to the speech recognition text; wherein the pair of split points comprises one of the first split points and one of the second split points;

a text segmentation unit 302, configured to segment the speech recognition text and the standard speech text by using a point pair to be used in the at least one segmentation point pair to obtain at least one text pair;

a relationship determining unit 303, configured to determine a text correspondence between the speech recognition text and the standard speech text according to the at least one text pair.

In a possible implementation manner, the point pair determining unit 301 is specifically configured to: after a voice recognition text and a standard voice text corresponding to the voice recognition text are obtained, respectively determining a first text to be segmented and a second text to be segmented according to the voice recognition text and the standard voice text; determining at least one segmentation point pair according to at least one third segmentation point of the first text to be segmented and at least one fourth segmentation point of the second text to be segmented; wherein the pair of split points includes one of the third split points and one of the fourth split points;

the text segmentation unit 302 is specifically configured to: segmenting the first text to be segmented and the second text to be segmented by using the point pairs to be used in the at least one segmentation point pair to obtain at least one text pair;

the relationship determining unit 303 is specifically configured to: updating the text corresponding relation between the voice recognition text and the standard voice text according to the at least one text pair; according to the at least one text pair, the first text to be segmented and the second text to be segmented are updated, and the point pair determining unit 301 is returned to continue to execute the step of determining at least one segmentation point pair according to the at least one third segmentation point of the first text to be segmented and the at least one fourth segmentation point of the second text to be segmented until a preset stop condition is reached.

In a possible implementation manner, the determining process of the point pair to be used includes: respectively determining the segmentation score of each segmentation point pair according to the to-be-segmented recognized text object, the to-be-segmented standard text object and each segmentation point pair; and searching the segmentation point pairs meeting preset searching conditions from the at least one segmentation point pair according to the segmentation score of the at least one segmentation point pair to obtain the point pairs to be used.

In one possible embodiment, the at least one segmentation point pair comprises a point pair to be scored; the process for determining the segmentation score of the point pair to be scored comprises the following steps: carrying out segmentation processing on the to-be-segmented recognized text object and the to-be-segmented standard text object by using the to-be-scored point pairs to obtain at least one to-be-compared text pair; and determining the segmentation score of the point pair to be scored according to the text comparison result of the at least one text pair to be compared.

In a possible implementation manner, the at least one text pair to be compared comprises a text segment pair to be used, and the text segment pair to be used comprises a recognition text segment to be used and a standard text segment to be used; the text comparison result of the target text segment pair comprises a content comparison score between the identification text segment to be used and the standard text segment to be used and/or a length comparison score between the identification text segment to be used and the standard text segment to be used.

In a possible embodiment, the determination process of the content comparison score between the to-be-used identification text segment and the to-be-used standard text segment includes: and determining a content comparison score between the identification text segment to be used and the standard text segment to be used according to the content coverage of the identification text segment to be used to the standard text segment to be used, the content coverage of the identification text object to be cut to the standard text segment to be used, the content coverage of the standard text segment to be used to the identification text segment to be used and the content coverage of the standard text object to be cut to the identification text segment to be used.

In a possible embodiment, the determination process of the content comparison score between the to-be-used identification text segment and the to-be-used standard text segment includes: determining a content coverage score corresponding to the standard text segment to be used according to a ratio of the content coverage of the standard text segment to be used by the identification text segment to be used to the content coverage of the standard text segment to be used by the identification text object to be cut; determining a content coverage score corresponding to the to-be-used identification text segment according to a ratio of the content coverage of the to-be-used identification text segment by the to-be-used standard text segment to the content coverage of the to-be-divided standard text object to the to-be-used identification text segment; and determining a content comparison score between the recognition text segment to be used and the standard text segment to be used according to the product of the content coverage score corresponding to the standard text segment to be used and the content coverage score corresponding to the recognition text segment to be used.

In one possible embodiment, the content coverage determination process includes: respectively carrying out unit division processing on a first object and a second object according to an ith division mode to obtain an ith unit set of the first object and an ith unit set of the second object; the unit mode of the ith is divided by taking i vocabularies as a dividing unit; i is a positive integer, I is not more than I, I is a positive integer, and I represents the number of the division modes; determining content coverage corresponding to the ith unit mode according to the intersection between the ith unit set of the first object and the ith unit set of the second object; wherein I is a positive integer, I is not more than I, I is a positive integer, and I represents the number of the division modes; and determining the content coverage of the first object to the second object according to the average value between the content coverage corresponding to the 1 st unit mode and the content coverage corresponding to the I-th unit mode.

In a possible implementation manner, the process of determining the content coverage corresponding to the ith unit mode includes: carrying out vocabulary number statistics on the intersection between the ith unit set of the first object and the ith unit set of the second object to obtain the intersection vocabulary number corresponding to the ith unit mode; carrying out vocabulary number statistics on the ith unit set of the second object to obtain the number of vocabularies to be compared corresponding to the ith unit mode; and determining the content coverage corresponding to the ith unit mode according to the ratio of the number of the intersection vocabularies corresponding to the ith unit mode to the number of the vocabularies to be compared corresponding to the ith unit mode.

In a possible embodiment, the determination process of the length comparison score between the to-be-used recognized text segment and the to-be-used standard text segment includes: determining a first ratio to be used according to the ratio of the text length of the identification text segment to be used to the text length of the standard text segment to be used; determining a second ratio to be used according to the ratio of the text length of the standard text segment to be used to the text length of the recognized text segment to be used; and carrying out preset data processing on the first ratio to be used and the second ratio to be used to obtain a length comparison score between the identification text segment to be used and the standard text segment to be used.

In a possible implementation manner, the process of determining the segmentation score of the point pair to be scored includes: and determining the segmentation score of the point pair to be scored according to the text comparison result of the at least one text pair to be compared and the adjacent vocabulary comparison result of the point pair to be scored.

In a possible embodiment, the pair of points to be scored includes a first recognition cut point and a first standard cut point, and the result of comparing the adjacent vocabularies of the pair of points to be scored is determined according to the content comparison score between at least one adjacent vocabulary of the first recognition cut point and at least one adjacent vocabulary of the first standard cut point.

In one possible embodiment, the determination of the content comparison score between the at least one neighboring vocabulary of the first recognition cut point and the at least one neighboring vocabulary of the first criterion cut point comprises: determining a left neighbor vocabulary comparison score based on the content coverage of the at least one left neighbor vocabulary of the first recognition cut point to the at least one left neighbor vocabulary of the first standard cut point, the content coverage of the at least one left neighbor vocabulary of the first standard cut point to the at least one left neighbor vocabulary of the first recognition cut point, the content coverage of the at least one neighbor vocabulary of the first standard cut point to the at least one left neighbor vocabulary of the first recognition cut point; determining a right neighbor vocabulary comparison score based on the content coverage of the at least one right neighbor vocabulary of the first recognition cut point to the at least one right neighbor vocabulary of the first standard cut point, the content coverage of the at least one right neighbor vocabulary of the first standard cut point to the at least one right neighbor vocabulary of the first recognition cut point, the content coverage of the at least one neighbor vocabulary of the first standard cut point to the at least one right neighbor vocabulary of the first recognition cut point; and determining a content comparison score between at least one adjacent word of the first identification segmentation point and at least one adjacent word of the first standard segmentation point according to an average value between the comparison score of the left adjacent word and the comparison score of the right adjacent word.

In a possible implementation manner, the determining process of the segmentation score of the point pair to be scored includes: carrying out segmentation processing on the to-be-segmented recognized text object and the to-be-segmented standard text object by using the to-be-scored point pairs to obtain at least one to-be-compared text pair; combining the at least one text pair to be compared according to a preset combination mode to obtain at least one combination to be processed; wherein, the combination to be processed comprises at least one text pair to be compared; determining a combination score of the at least one combination to be processed according to a text comparison result of the at least one text pair to be compared; and carrying out preset statistical analysis processing on the combined score of the at least one combination to be processed to obtain the segmentation score of the point pair to be scored.

In a possible embodiment, when the to-be-processed combination includes a first to-be-processed text pair and a second to-be-processed text pair, the determining of the combined score of the to-be-processed combination includes: and determining a combined score of the combination to be processed according to the text comparison result of the first text pair to be processed and the text comparison result of the second text pair to be processed.

In one possible implementation, the at least one text pair includes a target text pair, and the target text pair includes a target recognition text segment and a target standard text segment; and the updating process of the text corresponding relation comprises the following steps: establishing a corresponding relation between the target identification text segment and the target standard text segment; and adding the corresponding relation between the target identification text segment and the target standard text segment to the text corresponding relation.

In one possible embodiment, the adding the correspondence between the target recognized text segment and the target standard text segment to the text correspondence includes: if the corresponding relation between the first text to be segmented and the second text to be segmented exists in the text corresponding relation, deleting the corresponding relation between the first text to be segmented and the second text to be segmented from the text corresponding relation, and adding the corresponding relation between the target identification text segment and the target standard text segment to the text corresponding relation; and if the corresponding relation between the first text to be segmented and the second text to be segmented does not exist in the text corresponding relation, adding the corresponding relation between the target identification text segment and the target standard text segment to the text corresponding relation.

In one possible embodiment, the at least one text pair includes a target text pair, and the target text pair includes a target recognition text segment and a target standard text segment; and the updating process of the first text to be cut and the second text to be cut comprises the following steps: and determining the first text to be segmented according to the target identification text segment, and determining a second text to be segmented according to the target standard text segment.

In a possible embodiment, the pair of points to be used comprises a second identification cut point and a second standard cut point; and when the at least one text pair comprises a first text pair and a second text pair, the determining of the at least one text pair comprises: segmenting the first text to be segmented by using the second identification segmentation point to obtain a front-segment identification text and a rear-segment identification text; segmenting the second text to be segmented by using the second standard segmentation point to obtain a front section standard text and a rear section standard text; determining the first text pair according to the front section identification text and the front section standard text; and determining the second text pair according to the back segment identification text and the back segment standard text.

In one possible embodiment, the process of determining the text correspondence between the speech recognition text and the standard speech text includes: combining the at least one text pair according to a preset combination mode to determine at least one candidate combination; wherein the candidate combination comprises at least one of the text pairs; and determining the text corresponding relation between the speech recognition text and the standard speech text by utilizing the combination to be used in the at least one candidate combination.

In a possible implementation manner, the text correspondence constructing apparatus 300 further includes:

the information acquisition unit is used for acquiring the voice to be processed; performing voice recognition processing on the voice to be processed to obtain the voice recognition text;

and the text processing unit is used for performing preset processing on the voice recognition text by utilizing the text corresponding relation to obtain a to-be-used voice text corresponding to the to-be-processed voice.

Based on the related content of the text correspondence relationship construction device 300, for the text correspondence relationship construction device 300, after the speech recognition text and the standard speech text corresponding to the speech recognition text are obtained, at least one segmentation point pair is determined according to at least one first segmentation point of the speech recognition text and at least one second segmentation point of the standard speech text, so that each segmentation point pair includes one first segmentation point and one second segmentation point; segmenting the voice recognition text and the standard voice text by utilizing the point pairs to be used in the at least one segmentation point pair to obtain at least one text pair so that the at least one text pair can accurately represent the corresponding relation between at least one first segmentation text segment in the voice recognition text and at least one second segmentation text segment in the standard voice text; finally, according to the at least one text pair, determining a text corresponding relation between the voice recognition text and the standard voice text, so that automatic construction processing can be carried out on the text corresponding relation between the voice recognition text and the standard voice text corresponding to the voice recognition text, adverse effects caused by a manual alignment mode can be effectively avoided, and the processing effect of voice data can be improved.

Further, an embodiment of the present application further provides an apparatus, where the apparatus includes a processor and a memory:

the memory is used for storing a computer program;

Further, an embodiment of the present application also provides a computer-readable storage medium, where the computer-readable storage medium is used to store a computer program, and the computer program is used to execute any implementation manner of the text correspondence relationship construction method provided in the embodiment of the present application.

Further, an embodiment of the present application also provides a computer program product, which when running on a terminal device, causes the terminal device to execute any implementation of the text correspondence relationship construction method provided in the embodiment of the present application.

It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

The foregoing is merely a preferred embodiment of the invention and is not intended to limit the invention in any manner. Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Those skilled in the art can make numerous possible variations and modifications to the present teachings, or modify equivalent embodiments to equivalent variations, without departing from the scope of the present teachings, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.

39页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：视频处理方法及装置

Text corresponding relation construction method and related equipment thereof

相关技术

网友询问留言