Keyword correction method and device, computer equipment and storage medium

文档序号：1905180 发布日期：2021-11-30 浏览：11次中文

阅读说明：本技术 关键词校正方法、装置、计算机设备和存储介质 (Keyword correction method and device, computer equipment and storage medium ) 是由贾亚龙杨洋李锋张琛万化于 2021-08-11 设计创作，主要内容包括：本申请涉及一种关键词校正方法、装置、计算机设备和存储介质,该方法包括：从存储器中筛选出与待处理文本的关键字相关的多个参考文本,根据多个参考文本确定待处理文本中的待校正关键词；通过待校正关键词与多个参考文本的相似度从多个参考文本中确定候选参考文本；并根据待校正关键词与候选参考文本的音素相似度确定目标参考文本,最后基于目标参考文本对待校正关键词进行校正。本申请提供的关键词校正方法从音素力度针对候选参考文本与待校正关键词不同的音素相似度确定目标参考文本,确保了相似度高的待校正关键词与候选参考文本的编辑距离小于相似度低的待校正关键词与候选参考文本的编辑距离,能够提高对待校正关键词校正的准确率。(The application relates to a keyword correction method, a keyword correction device, computer equipment and a storage medium, wherein the method comprises the following steps: screening a plurality of reference texts related to keywords of the text to be processed from a memory, and determining the keywords to be corrected in the text to be processed according to the plurality of reference texts; determining candidate reference texts from the multiple reference texts according to the similarity between the keywords to be corrected and the multiple reference texts; and determining a target reference text according to the phoneme similarity of the keywords to be corrected and the candidate reference text, and finally correcting the keywords to be corrected based on the target reference text. The keyword correction method determines the target reference text according to the phoneme similarity of the candidate reference text and the candidate keywords to be corrected, ensures that the editing distance between the keywords to be corrected with high similarity and the candidate reference text is smaller than the editing distance between the keywords to be corrected with low similarity and the candidate reference text, and can improve the accuracy of correction of the keywords to be corrected.)

1. A keyword correction method, the method comprising:

determining keywords to be corrected in the text to be processed according to a plurality of reference texts, wherein the plurality of reference texts are texts related to the text to be processed;

determining candidate reference texts from the plurality of reference texts through first similarities of the keywords to be corrected and the plurality of reference texts;

determining a target reference text according to the phoneme similarity of the keywords to be corrected and the candidate reference text;

and correcting the keywords to be corrected based on the target reference text.

2. The method according to claim 1, wherein the determining candidate reference texts from the plurality of reference texts by the first similarity between the keyword to be corrected and the plurality of reference texts comprises:

respectively comparing the keywords to be corrected with the reference texts to obtain a plurality of comparison results, and determining first similarity between the keywords to be corrected and the reference texts based on the comparison results;

and determining the reference texts with the first similarity with the keywords to be corrected in the plurality of reference texts to be larger than or equal to a preset first threshold value as the candidate reference texts.

3. The method according to claim 2, wherein the first similarity is a character similarity, the comparing the keyword to be corrected with the reference texts respectively to obtain a plurality of comparison results, and determining the first similarity between the keyword to be corrected and the reference texts based on the comparison results comprises:

comparing the characters in the keywords to be corrected with the characters in the reference texts respectively by taking the characters as a comparison unit according to the arrangement sequence of the characters to obtain a plurality of editing distances, wherein the editing distances are the times required for adjusting the characters in the keywords to be corrected into the characters in the reference texts;

and determining the character similarity between the keywords to be corrected and the plurality of reference texts according to the editing distance between the keywords to be corrected and the reference texts.

4. The method according to claim 2, wherein the first similarity is syllable similarity, the comparing the keyword to be corrected with the reference texts respectively obtains a plurality of comparison results, and determining the first similarity between the keyword to be corrected and the reference texts based on the comparison results comprises:

comparing each syllable in the keyword to be corrected with each syllable in the reference texts according to the arrangement sequence of the syllables by taking the syllables as a comparison unit to obtain a plurality of editing distances, wherein the editing distances are the times required for adjusting each syllable in the keyword to be corrected into each syllable in the reference texts;

and determining syllable similarity between the keywords to be corrected and the plurality of reference texts according to the editing distance between the keywords to be corrected and the reference texts.

5. The method according to claim 1, wherein the determining a target reference text according to the phoneme similarity between the keyword to be corrected and the candidate reference text comprises:

respectively inputting each phoneme in the syllables of the keyword to be corrected and each phoneme in the syllables of the candidate reference text into a preset language model for vectorization processing to obtain a plurality of first word vectors of the keyword to be corrected and a plurality of second word vectors of the candidate reference text;

determining phoneme similarity of the keywords to be corrected and the candidate reference texts according to the hyper-parameters, Euclidean distance between the first word vector and the second word vector corresponding to the first word vector and the maximum Euclidean distance;

and determining the candidate texts with the second similarity to the keywords to be corrected being more than or equal to a preset second threshold value in the candidate texts as target reference texts.

6. The method of claim 5, wherein the determining the phoneme similarity of the keyword to be corrected and the candidate reference text according to the hyper-parameter, the Euclidean distance between the first word vector and the second word vector corresponding to the first word vector, and the maximum Euclidean distance comprises:

according to the inclusionDetermining the editing distance between the keyword to be corrected and the candidate reference text, wherein alpha is a hyper-parameter, S is the maximum Euclidean distance, n is the dimension of a vector, and x is_iIs a first word vector, y_iIs a second word vector;

and determining the phoneme similarity of the keywords to be corrected and the candidate reference texts according to the editing distance between the keywords to be corrected and the candidate reference texts.

7. The method of claim 6, further comprising:

processing the editing distance between the keyword to be corrected and the candidate reference text according to the editing distance between the keyword to be corrected and the candidate reference text and the syllable length of the keyword to be corrected;

and determining the phoneme similarity of the keywords to be corrected and the candidate reference text according to the editing distance between the processed keywords to be corrected and the target reference text.

8. A keyword correction apparatus, characterized in that the apparatus comprises:

the first determining module is used for determining keywords to be corrected in the text to be processed according to a plurality of reference texts, wherein the plurality of reference texts are texts related to the text to be processed;

a second determining module, configured to determine candidate reference texts from the multiple reference texts according to first similarities between the keywords to be corrected and the multiple reference texts;

a third determining module, configured to determine a target reference text according to the phoneme similarity between the keyword to be corrected and the candidate reference text;

and the correction module is used for correcting the keywords to be corrected based on the target reference text.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.

Technical Field

The present application relates to the field of speech recognition technologies, and in particular, to a keyword correction method, apparatus, computer device, and storage medium.

Background

The speech recognition technology converts audio into text through intelligent recognition equipment, and gradually enters various fields of automotive electronics, medical treatment, finance, consumer electronics and the like along with the remarkable progress of the speech recognition technology. However, due to the inherent reasons of the place of birth, pronunciation habit, etc. of each person, and the extrinsic reasons of signal interference, bad network, etc., the actual use accuracy of speech recognition is low, which greatly affects the business and work that need to be followed by speech recognition.

The existing technology is mainly oriented to optimizing and improving the voice recognition, technology upgrading is carried out on a recognition algorithm to achieve higher recognition capability, attention is rarely turned to the aspect of carrying out secondary processing correction after the voice recognition, and the existing technology only corrects homophones. However, in many cases, the recognition capability is not insufficient, and in the technical background that the recognition rate of the existing standard mandarin chinese can be almost accurately recognized, the recognition deviation is caused by human pronunciation difference, interference caused by environment and the like, and the problems are difficult to overcome by improving the recognition capability or the space for improving the recognition capability is extremely limited. Therefore, the method has poor effect on solving the problem of low accuracy of voice recognition.

Disclosure of Invention

The application provides a method and a device for correcting keywords, computer equipment and a storage medium, which can improve the accuracy of correcting the keywords.

A first aspect of the present application provides a keyword correction method, including:

determining keywords to be corrected in the text to be processed according to a plurality of reference texts, wherein the plurality of reference texts are texts related to the text to be processed;

determining candidate reference texts from the multiple reference texts through first similarity of the keywords to be corrected and the multiple reference texts;

determining a target reference text according to the phoneme similarity of the keywords to be corrected and the candidate reference text;

and correcting the keywords to be corrected based on the target reference text.

A second aspect of the present application provides a keyword correction apparatus, the apparatus including:

the second determining module is used for determining candidate reference texts from the multiple reference texts according to the first similarity of the keywords to be corrected and the multiple reference texts;

the third determining module is used for determining a target reference text according to the phoneme similarity of the keywords to be corrected and the candidate reference text;

and the correction module is used for correcting the keywords to be corrected based on the target reference text.

A third aspect of the application provides a computer device comprising a memory storing a computer program and a processor implementing the method steps of any of the above when the processor executes the computer program.

A fourth aspect of the application provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the method steps of any of the above.

The application provides a keyword correction method, a keyword correction device, computer equipment and a storage medium, wherein the method comprises the following steps: screening a plurality of reference texts related to keywords of the text to be processed from a memory, and determining the keywords to be corrected in the text to be processed according to the plurality of reference texts; determining candidate reference texts from the multiple reference texts according to the similarity between the keywords to be corrected and the multiple reference texts; and determining a target reference text according to the phoneme similarity of the keywords to be corrected and the candidate reference text, and finally correcting the keywords to be corrected based on the target reference text. According to the keyword correction method, the target reference text is determined according to the phoneme similarity of the candidate reference text and the candidate keywords to be corrected, the fact that the editing distance between the keywords to be corrected with high similarity and the candidate reference text is smaller than the editing distance between the keywords to be corrected with low similarity and the candidate reference text is ensured, calculation of the editing distance between the keywords to be corrected and the candidate reference text is enabled to be more refined, and the accuracy of correction of the keywords to be corrected is further improved.

Drawings

FIG. 1 is a diagram of an exemplary keyword correction method;

FIG. 2 is a flowchart illustrating a keyword calibration method according to an embodiment;

FIG. 3 is a flowchart illustrating a keyword calibration method according to another embodiment;

FIG. 4 is a flowchart illustrating a keyword calibration method according to another embodiment;

FIG. 5 is a flowchart illustrating a keyword calibration method according to another embodiment;

FIG. 6 is a flowchart illustrating a keyword calibration method according to another embodiment;

FIG. 7 is a flowchart illustrating a keyword calibration method according to another embodiment;

FIG. 8 is a block diagram showing the structure of a keyword correction apparatus according to an embodiment;

fig. 9 is an internal configuration diagram of a terminal device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The keyword correction method provided by the application can be applied to the application environment shown in fig. 1. Wherein, the voice of the user can be recognized through the terminal 102, the recognized voice is converted into text, before outputting the recognized text, the converted initial text needs to be subjected to keyword correction processing, determining the category of the text by recognizing the initial text, searching a plurality of reference texts related to the keywords in the initial text from a pre-stored reference text library, and determining a reference text with the highest similarity with the correction keyword as a candidate text by comparing the similarity of the keyword to be corrected with a plurality of reference texts, then, determining a target editing distance algorithm for calculating the editing distance between the candidate keywords and the candidate reference texts according to the similarity between the candidate texts and the keywords to be corrected, thereby calculating the editing distance between the keyword to be corrected and the candidate reference text according to a target editing distance algorithm, and judging whether to correct the keywords to be corrected based on the candidate reference texts based on the editing distance. And if the correction is needed, replacing the keywords to be corrected with the candidate reference texts and then outputting the recognized texts, and if the correction is not needed, outputting the initial texts. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.

In an embodiment, as shown in fig. 2, a keyword correction method is provided, which is described by taking the method as an example applied to the terminal in fig. 1, and includes the following steps:

step S202, determining keywords to be corrected in the text to be processed according to a plurality of reference texts, wherein the plurality of reference texts are texts related to the text to be processed.

The reference text can be pre-selected and stored in a memory address in the terminal, and can be classified for common keywords, keywords and key sentences in the field in advance according to different application scenes, so that common word banks corresponding to different application scenes are established; for example, in the financial field, a thesaurus related to loan, a thesaurus related to fund, a thesaurus related to savings, a thesaurus related to credit card, and the like are set according to different application scenarios.

The reference text has a correlation with the keyword to be corrected, for example, by identifying a plurality of keywords in the text to be processed, it is determined that the text to be processed is related to the fund in the financial field, and then the reference text may be a word, a sentence, or the like in a word bank related to the fund. When the terminal recognizes the voice, most of the inaccurate recognition is the key words or the key words in a sentence, and the understanding of the text with the processing by the user or the terminal is mainly based on one or more key words in the text or one or more key words in the text. Therefore, the method has important guiding significance for the accurate recognition of the keywords and the words for subsequent operation.

For example, if a user inputs a speech to the terminal, and the initial text recognized by the terminal is, for example, "i want to buy a hybrid fund of the frank and the lunar line", then the terminal may determine a target thesaurus from a plurality of thesaurus of the terminal according to at least one keyword in the frank, the lunar line, the fund and the hybrid, then the target thesaurus is a thesaurus related to the fund, and then the words, words or sentences in the thesaurus related to the fund are a plurality of reference texts related to the text to be processed, for example, the plurality of reference texts are: easy to reach, pleasure, beautiful, insurance, robust, high risk, low risk, income, and pay. Then the keyword to be corrected in the text to be processed, i.e. "i want to buy the eastern jungle and lunar line mixed fund", can be determined according to a plurality of reference texts, for example: yifangda, lunar line and mixed type.

Step S204, determining candidate reference texts from the multiple reference texts according to the first similarity between the keywords to be corrected and the multiple reference texts.

The similarity between the keywords to be corrected and the reference texts represents the difference between the keywords to be corrected and the reference texts, and the higher the similarity is, the smaller the difference between the keywords to be corrected and the reference texts is. The similarity between the keyword to be corrected and the plurality of reference texts can be determined, for example, by converting the keyword to be corrected and the reference texts into audio, respectively extracting the characteristics of the keyword to be corrected and each reference text, then comparing the audio characteristics of the keyword to be corrected with the audio characteristics of the plurality of reference texts to determine the characteristic similarity, and determining the first similarity between the keyword to be corrected and the reference texts according to the characteristic similarity; the similarity between the keyword to be corrected and the plurality of reference texts may be determined by respectively calculating edit distances between the keyword to be corrected and the plurality of reference texts. The editing distance between the keywords to be corrected and the reference text can be determined according to the times required for adjusting the characters in the keywords to be corrected into the characters in the reference text; alternatively, the number of times required to adjust the syllable of the keyword to be corrected to the syllable in the reference text is determined, or the number of times required to adjust the phoneme of the keyword to be corrected to the phoneme in the reference text is determined, which is not limited in the present application.

Illustratively, when a month line is determined as a keyword to be corrected, the month line is compared with reference texts (Evergence, Sharey, Mei, YueBao, robust, high-risk, low-risk, profit, and benefits), respectively, so that the edit distance between the month line and the Evergence is 3, the edit distance between the month line and the Sharey is 2, the edit distance between the month line and the YueBao is 2, the edit distance between the month line and the robust is 3, the edit distance between the month line and the high-risk is 4, the edit distance between the month line and the low-risk is 4, the edit distance between the month line and the benefits is 2, and the edit distance between the month line and the benefits is 2. The first similarity between the keyword to be corrected with the editing distance of 0 and the reference text may be set to 98%, the first similarity between the keyword to be corrected with the editing distance of 1 and the reference text may be set to 90%, the first similarity between the keyword to be corrected with the editing distance of 2 and the reference text may be set to 80%, and the like, then the reference text with the first similarity between the keyword to be corrected and the reference text being greater than or equal to 90% may be determined as the candidate reference text, and the determined candidate reference text may be enjoyable, beautiful, benefit-keeping, income, and payment.

Step S206, determining a target reference text according to the phoneme similarity of the keywords to be corrected and the candidate reference text.

The phoneme is obtained by splitting the syllable after converting each character in the keyword to be corrected into the syllable, the splitting of the syllable can be performed according to splitting rules of the initial consonant and the final consonant, and can be performed according to splitting rules of each letter, so that the application is not limited. The phoneme similarity determining method may be splitting the keywords to be corrected and the syllables of the candidate reference text according to the splitting rules of the initials and the finals, comparing each initial consonant and each final in the keywords to be corrected with each initial consonant and each final in the candidate reference text to determine the editing distance between the keywords to be corrected and the candidate reference text, and determining the similarity between the keywords to be corrected and the candidate reference text according to the editing distance. The comparison rule may be that when the initials and finals of the keywords to be corrected are correspondingly adjusted to be the initials and finals of the candidate reference text, the editing distance is recorded as 1 every time the keywords to be corrected are adjusted, finally, the editing distance between the keywords to be corrected and the candidate reference text is determined according to the sum of all the editing distances, and the similarity between the keywords to be corrected and the candidate reference text is determined based on the principle that the smaller the editing distance is, the greater the similarity is. The comparison rule can also be that when the initials of the syllables are the same as the initials and the finals of the keywords to be corrected are correspondingly adjusted to be the initials and the finals of the candidate reference text, the editing distance is marked as 0.5 when the initials and the finals are adjusted once; when the initials of the syllables are different from the initials and the finals of the keywords to be corrected are correspondingly adjusted to be the initials and the finals of the candidate reference texts, the editing distance is recorded as 1 every time the keywords to be corrected are adjusted, then the editing distance between the keywords to be corrected and the candidate reference texts is determined according to the sum of all the editing distances, and the similarity between the keywords to be corrected and the candidate reference texts is determined based on the principle that the smaller the editing distance is, the greater the similarity is. Or inputting the keywords to be corrected and the candidate reference texts into a preset language model for vectorization, calculating the keywords to be corrected and the candidate reference texts after vectorization according to a preset algorithm, and determining the phoneme similarity of the keywords to be corrected and the candidate reference texts based on the calculation result, which is not limited in the present application.

For example, the month line, pleasure, beauty, insurance, income and payment can be converted into syllables, yue xian, yue xiang, yue mei, yue bao, shou yi and pei fu, and split according to the splitting rule of the consonant vowels into y/ue x/ian, y/ue x/iang, y/ue m/ei, y/ue b/ao, sh/ou y/i and p/ei f/u, wherein the y/ue in y/ue x/ian is firstly compared with the y/ue in y/ue x/iang in a way that y is compared with y, the ue is compared with ue, the obtained edit distance is 0, and then x/ue ian in y/ian is compared with the y/ue x/iang in y/ue x/iang, the comparison mode is that x is compared with x, ian is compared with iang, the obtained edit distance is 1, and then the edit distance between yue xian and yue xiang is 1; by analogy, the comparison modes between other candidate reference texts and the keywords to be corrected can be compared according to the comparison modes, so that the editing distance between yue xian and yue xiang is calculated to be 1, the editing distance between yue xian and yue mei is calculated to be 4, the editing distance between yue xian and yue bao is calculated to be 4, the editing distance between yue xian and shou yi is calculated to be 7, and the editing distance between yue xian and pei fu is calculated to be 7.

And S208, correcting the keywords to be corrected based on the target reference text.

According to the description, the target reference text is determined from the candidate reference texts, the target reference text is extracted from a word bank corresponding to the scene, and the words, the words and the sentences contained in the word bank are accurate words, words and sentences commonly used in the scene. Because the application scene of the text to be processed is consistent with the application scene of the word stock, the similarity degree of phonemes of the target reference text and the keyword to be corrected is high, the fact that the equipment has an error in voice recognition is determined, the keyword to be corrected is corrected based on the target reference text, the corrected text to be processed is output to a user or reported to the terminal, the terminal performs subsequent operation according to the corrected text to be processed, and the capability of the terminal in processing services can be improved.

Illustratively, according to the above-identified similarity of phonemes of the melody to the month line of the keyword to be corrected, the month line of the keyword to be corrected is corrected to melody, and finally, the recognition text "i want to buy easy to melody and melody" is output.

In an embodiment, as shown in fig. 3, this embodiment is an alternative embodiment of a method for determining candidate reference texts, and the method includes the following steps:

step S302, the keywords to be corrected are respectively compared with a plurality of reference texts to obtain a plurality of comparison results, and the first similarity between the keywords to be corrected and the plurality of reference texts is determined based on the comparison results.

The similarity between the keyword to be corrected and the plurality of reference texts can be determined, for example, by converting the keyword to be corrected and the reference texts into audio, respectively extracting the characteristics of the keyword to be corrected and each reference text, then comparing the audio characteristics of the keyword to be corrected with the audio characteristics of the plurality of reference texts to determine the characteristic similarity, and determining the first similarity between the keyword to be corrected and the reference texts according to the characteristic similarity; the similarity between the keyword to be corrected and the plurality of reference texts may be determined by respectively calculating edit distances between the keyword to be corrected and the plurality of reference texts. This is not limited in this application.

Step S304, determining a reference text with a first similarity to the keyword to be corrected being greater than or equal to a preset first threshold value in the plurality of reference texts as a candidate reference text.

The preset first threshold is a similarity threshold, for example, the preset first threshold is 90% of similarity, the preset first threshold may be set to be the same or different for different application scenarios, the purpose of setting the first threshold is to determine a threshold for determining whether the keyword to be corrected needs to be corrected, the preset first threshold may be obtained based on a result of a previous experiment, and the present application is not limited thereto.

The embodiment of the application provides a keyword correction method, the aim of primarily screening a plurality of reference texts is fulfilled by comparing a keyword to be corrected with a plurality of reference texts and determining candidate reference texts according to a comparison result, the comparison method is simple, too many resources do not need to be occupied, if the similarity determined through primary screening is less than or equal to a preset second threshold value, the keyword to be corrected text can not be determined through subsequent accurate screening, and the correction probability of the keyword can be improved.

In an embodiment, as shown in fig. 4, this embodiment is an optional method embodiment for determining first similarities between a keyword to be corrected and a plurality of reference texts when the first similarity is a character similarity, and the method includes the following steps:

step S402, comparing the characters in the keywords to be corrected with the characters in the reference texts respectively according to the arrangement sequence of the characters by taking the characters as a comparison unit to obtain a plurality of editing distances, wherein the editing distances are the times required for adjusting the characters in the keywords to be corrected to the characters in the reference texts.

Illustratively, if the keyword to be corrected is a month line, the month line is compared with reference texts (Everest, Sharpee, beauty, insurance, robust, high risk, low risk, profit, and payment), and the comparison process includes firstly comparing the ease of the month in the month line with the ease of the Everest, comparing the line in the month line with the ease of the Everest to obtain an edit distance between the month line and the Everest as 3, and so on, obtaining an edit distance between the month line and the Sharpee as 2, an edit distance between the month line and the insurance as 2, an edit distance between the month line and the robust as 3, an edit distance between the month line and the high risk as 4, an edit distance between the month line and the low risk as 4, an edit distance between the month line and the profit as 2, and an edit distance between the month line and the payment as 2.

Step S404, determining character similarity of the keywords to be corrected and a plurality of reference texts according to the editing distance between the keywords to be corrected and the reference texts.

Wherein, if the preset edit distance and the character similarity are in the corresponding relationship: the first similarity between the keyword to be corrected with the editing distance of 0 and the reference text is set to 98%, the first similarity between the keyword to be corrected with the editing distance of 0-1 and the reference text is set to 95%, the first similarity between the keyword to be corrected with the editing distance of 1 and the reference text is set to 90%, the first similarity between the keyword to be corrected with the editing distance of 1-2 and the reference text is set to 88%, the first similarity between the keyword to be corrected with the editing distance of 2 and the reference text is set to 85%, the first similarity between the keyword to be corrected with the editing distance of 2-3 and the reference text is set to 60%, the first similarity between the keyword to be corrected with the editing distance of 3 and the reference text is set to 40%, the first similarity between the keyword to be corrected with the editing distance of 3-4 and the reference text is set to 30%, the first similarity between the keyword to be corrected with the editing distance of 4 and the reference text is set to 10%, and so on, then it can be determined that the similarity of the moon line to the characters easy to reach is 40%, the similarity of the moon line to the characters pleasing to the sun is 85%, the similarity of the characters between the moon line and the robust type is 40%, the similarity of the moon line to the characters of the high risk type is 40%, the similarity of the moon line to the characters of the low risk type is 40%, the similarity of the moon line to the characters of the profit is 85%, and the similarity of the characters between the moon line and the claim is 85%.

The embodiment of the application provides a keyword correction method, the aim of primarily screening a plurality of reference texts is fulfilled by comparing the keywords to be corrected with the plurality of reference texts based on character strength, the comparison method is simple, too many resources do not need to be occupied, and the correction probability of the keywords can be improved.

In an embodiment, as shown in fig. 5, this embodiment is an optional method embodiment for determining the first similarity between the keyword to be corrected and the plurality of reference texts when the first similarity is the syllable similarity, and the method includes the following steps:

step S502, taking syllables as a comparison unit, comparing each syllable in the keyword to be corrected with each syllable in the reference texts according to the arrangement sequence of the syllables to obtain a plurality of editing distances, wherein the editing distances are the times required by adjusting each syllable in the keyword to be corrected into each syllable in the reference texts.

If the keyword to be corrected is a month line, the month line is respectively compared with reference texts (Yifanda, Yuehang, Yuei, Yuehang, robust, high-risk, low-risk, income and claim), and the comparison process comprises the steps of firstly converting the month line, the Yifanda, Yuehang, Yuehan, robust, high-risk, low-risk, income and claim into syllables, comparing yue with yi, obtaining an editing distance of 2, comparing xi with fang, obtaining an editing distance of 4, comparing da with empty syllables, obtaining an editing distance of 2, and obtaining an editing distance of 8 between Yi and Yi Fa; by analogy, the comparison modes between other candidate reference texts and the keyword to be corrected can be compared according to the comparison modes, so that the editing distance between yue xian and yue xiang is 1, the editing distance between yue xian and yue mei is 4, the editing distance between yue xian and yue bao is 4, and the like, which is not described herein again.

Step S504, determining syllable similarity between the keywords to be corrected and the plurality of reference texts according to the editing distance between the keywords to be corrected and the reference texts.

Wherein, if the preset edit distance and the character similarity are in the corresponding relationship: the first similarity between the keyword to be corrected with the editing distance of 0 and the reference text is set to 98%, the first similarity between the keyword to be corrected with the editing distance of 0-1 and the reference text is set to 95%, the first similarity between the keyword to be corrected with the editing distance of 1 and the reference text is set to 90%, the first similarity between the keyword to be corrected with the editing distance of 1-2 and the reference text is set to 88%, the first similarity between the keyword to be corrected with the editing distance of 2 and the reference text is set to 85%, the first similarity between the keyword to be corrected with the editing distance of 2-3 and the reference text is set to 60%, the first similarity between the keyword to be corrected with the editing distance of 3 and the reference text is set to 40%, the first similarity between the keyword to be corrected with the editing distance of 3-4 and the reference text is set to 30%, and the first similarity between the keyword to be corrected with the editing distance of 4 and the reference text is set to 10%, and so on. It can be determined that the similarity of the lunar line to the characters of easy reach is 1%, the similarity of the lunar line to the characters of pleasing and enjoying is 90%, the similarity of the lunar line to the characters of pleasing and enjoying is 10%, the similarity of the lunar line to the characters of pleasing and conserving is 10%, etc.

The embodiment of the application provides a keyword correction method, the purpose of primarily screening a plurality of reference texts is achieved by comparing keywords to be corrected with the plurality of reference texts based on syllable dynamics, the editing distance between the keywords to be corrected and the reference texts is more finely determined by comparing the syllable dynamics, the editing distance between the keywords to be corrected with high similarity and candidate reference texts can be far smaller than that between the keywords to be corrected with low similarity and the candidate reference texts, the calculation of the editing distance between the keywords to be corrected and the candidate reference texts is more accurate, and the effect of more accurately primarily screening the reference texts can be achieved.

In an embodiment, as shown in fig. 6, this embodiment is an optional embodiment of a method for determining a target reference text according to a phoneme similarity between a keyword to be corrected and a candidate reference text, and the method includes the following steps:

step S602, respectively inputting each phoneme in the syllables of the keyword to be corrected and each phoneme in the syllables of the candidate reference text into a preset language model for vectorization processing, so as to obtain a plurality of first word vectors of the keyword to be corrected and a plurality of second word vectors of the candidate reference text.

The preset language model is, for example, a Word2vec language model, and the preset language model is a tool capable of generating Word vectors, so that the phonemes in the syllables of the keyword to be corrected and the phonemes in the syllables of the candidate reference text are input into the preset language model to realize vectorization processing of the phonemes, and initial Word vectors and final Word vectors of the syllables of the keyword to be corrected and initial Word vectors and final Word vectors of the syllables of the candidate reference text are obtained. Because the characters in the keyword to be corrected may include a plurality of characters, each phoneme of the keyword to be corrected is processed by a preset language model to obtain a plurality of initial word vectors and a plurality of final word vectors, which are collectively referred to as first word vectors; similarly, a plurality of initial word vectors and a plurality of final word vectors of the candidate reference text are collectively referred to as a second word vector.

Step S604, determining phoneme similarity of the keyword to be corrected and the candidate reference text according to the hyper-parameter, the Euclidean distance between the first word vector and the second word vector corresponding to the first word vector and the maximum Euclidean distance.

The value range of the hyper-parameter may be greater than 0 and smaller than 1, and here, the hyper-parameter, the euclidean distance between the first word vector and the second word vector corresponding to the first word vector, and the maximum euclidean distance may be summed, subtracted, factored, multiplied, summed first and subtracted second, quotient first and multiplied second, and other preset calculation relationships may be calculated to obtain a calculation result, and the phoneme similarity between the keyword to be corrected and the candidate reference text may be determined according to the calculation result. This is not limited in this application.

Step 606, determining the candidate texts of which the second similarity with the keywords to be corrected is smaller than a preset second threshold value in the candidate texts as target reference texts.

The preset second threshold is also a similarity threshold, and the preset second threshold may be set to be smaller than the preset first threshold, so that the screening range of the target reference text may be further narrowed, and the required target reference text may be screened more quickly.

In an embodiment, the present embodiment is an optional method embodiment for determining a phoneme similarity between a keyword to be corrected and a candidate reference text according to a hyper-parameter, a euclidean distance between a first word vector and a second word vector corresponding to the first word vector, and a maximum euclidean distance, including:

The embodiment of the application provides a keyword correction method, which determines the editing distance between a keyword to be corrected and a candidate reference text through the relational expression, so that the closer a vector is, the smaller the editing distance between the keyword to be corrected and the candidate reference text is, and the closer the vector is, the smaller the one change (increase, delete, change) between the keyword to be corrected and the candidate reference text is, the closer the vector representation of the keyword to be corrected and the candidate reference text is, based on the characteristics of a Word2vec language model, the closer the pronunciation is, the closer the vector representation of the keyword to be corrected and the candidate reference text is, the smaller the editing distance obtained according to the calculation formula is, and when the editing distance between the keyword to be corrected and the candidate reference text with similar pronunciation is set according to manual experience, the edit distance between each similar pronunciation pair is a hyper-parameter, which is not beneficial to model tuning, and the calculation formula is more in line with statistical rules.

In an embodiment, as shown in fig. 7, this embodiment is an alternative method embodiment for processing the edit distance, and the method steps are as follows:

step S702, processing the editing distance between the keyword to be corrected and the candidate reference text according to the editing distance between the keyword to be corrected and the candidate reference text and the syllable length of the keyword to be corrected;

step S704, determining phoneme similarity of the keywords to be corrected and the candidate reference texts according to the editing distance between the processed keywords to be corrected and the target reference texts.

The processing of the editing distance between the keyword to be corrected and the target reference text according to the editing distance between the keyword to be corrected and the target reference text and the syllable length of the keyword to be corrected may be, for example, quotient processing of the editing distance and the syllable length of the keyword to be corrected. After the editing distance is processed, the phoneme similarity between the keyword to be corrected and the candidate reference text is determined according to the editing distance, so that the phoneme similarity between the keyword to be corrected and the candidate reference text can be determined more accurately, and the correction accuracy of the keyword is improved conveniently.

It should be understood that although the various steps in the flow charts of fig. 2-7 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-7 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.

In one embodiment, as shown in fig. 8, there is provided a keyword correction apparatus including: a first determination module, a second determination module, a third determination module, and a correction module, wherein:

the third determining module is used for determining a target reference text according to the phoneme similarity of the keywords to be corrected and the candidate reference text;

and the correction module is used for correcting the keywords to be corrected based on the target reference text.

In one embodiment, the second determining module includes: a comparison unit and a first determination unit;

the comparison unit is used for comparing the keywords to be corrected with the plurality of reference texts respectively to obtain a plurality of comparison results, and determining first similarity between the keywords to be corrected and the plurality of reference texts based on the comparison results;

the determining unit is used for determining the reference texts with the first similarity greater than or equal to a preset first threshold value with the keywords to be corrected in the plurality of reference texts as candidate reference texts.

In one embodiment, the comparing unit is further configured to compare the characters in the keyword to be corrected with the characters in the multiple reference texts respectively according to the arrangement order of the characters by using the characters as a comparison unit to obtain multiple editing distances, where the editing distances are times required for adjusting the characters in the keyword to be corrected to the characters in the reference texts;

and the determining unit is also used for determining the character similarity between the keywords to be corrected and the plurality of reference texts according to the editing distance between the keywords to be corrected and the reference texts.

In one embodiment, the comparing unit is further configured to compare, by using the syllable as a comparison unit, each syllable in the keyword to be corrected with each syllable in the plurality of reference texts according to the arrangement order of the syllables, so as to obtain a plurality of edit distances, where the edit distances are times required for adjusting each syllable in the keyword to be corrected to each syllable in the reference texts;

and the determining unit is also used for determining the syllable similarity of the keyword to be corrected and the plurality of reference texts according to the editing distance between the keyword to be corrected and the reference texts.

In one embodiment, the third determining unit comprises a processing unit, a second determining unit and a third determining unit;

the processing unit is used for respectively inputting each phoneme in the syllables of the keyword to be corrected and each phoneme in the syllables of the candidate reference text into a preset language model for vectorization processing to obtain a plurality of first word vectors of the keyword to be corrected and a plurality of second word vectors of the candidate reference text;

the second determining unit is used for determining the phoneme similarity of the keyword to be corrected and the candidate reference text according to the hyper-parameter, the Euclidean distance between the first word vector and a second word vector corresponding to the first word vector and the maximum Euclidean distance;

and the third determining unit is used for determining the candidate texts with the second similarity degree with the keywords to be corrected, which is greater than or equal to a preset second threshold value, in the candidate texts as the target reference texts.

In one embodiment, the second determining unit is further configured to determine the second value according to an inclusionDetermining the editing distance between the keyword to be corrected and the candidate reference text, wherein alpha is a hyper-parameter, S is the maximum Euclidean distance, n is the dimension of a vector, and x is_iIs a first word vector, y_iIs a second word vector; and determining the phoneme similarity of the keywords to be corrected and the candidate reference texts according to the editing distance between the keywords to be corrected and the candidate reference texts.

In one embodiment, the apparatus further comprises a processing module;

the processing module is used for processing the editing distance between the keyword to be corrected and the candidate reference text according to the editing distance between the keyword to be corrected and the candidate reference text and the syllable length of the keyword to be corrected;

and the third determining module is also used for determining the phoneme similarity between the keywords to be corrected and the candidate reference texts according to the editing distance between the processed keywords to be corrected and the target reference texts.

For the specific definition of the keyword correction apparatus, reference may be made to the above definition of the keyword correction method, which is not described herein again. The modules in the keyword correction apparatus may be wholly or partially implemented by software, hardware, or a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 9. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a keyword correction method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

determining keywords to be corrected in the text to be processed according to a plurality of reference texts, wherein the plurality of reference texts are texts related to the text to be processed;

determining candidate reference texts from the multiple reference texts through first similarity of the keywords to be corrected and the multiple reference texts;

determining a target reference text according to the phoneme similarity of the keywords to be corrected and the candidate reference text;

and correcting the keywords to be corrected based on the target reference text.

In one embodiment, the processor, when executing the computer program, further performs the steps of: respectively comparing the keywords to be corrected with a plurality of reference texts to obtain a plurality of comparison results, and determining first similarity between the keywords to be corrected and the plurality of reference texts based on the comparison results; and determining the reference texts with the first similarity to the keywords to be corrected being more than or equal to a preset first threshold value in the plurality of reference texts as candidate reference texts.

In one embodiment, when the first similarity is a character similarity, the processor executes the computer program to further perform the following steps: comparing the characters in the keywords to be corrected with the characters in the reference texts respectively according to the arrangement sequence of the characters by taking the characters as a comparison unit to obtain a plurality of editing distances, wherein the editing distances are the times required by adjusting the characters in the keywords to be corrected into the characters in the reference texts; and determining the character similarity of the keywords to be corrected and the plurality of reference texts according to the editing distance between the keywords to be corrected and the reference texts.

In one embodiment, when the first similarity is a syllable similarity, the processor when executing the computer program further performs the steps of: comparing each syllable in the keyword to be corrected with each syllable in the reference texts according to the arrangement sequence of the syllables by taking the syllables as a comparison unit to obtain a plurality of editing distances, wherein the editing distances are the times required by adjusting each syllable in the keyword to be corrected into each syllable in the reference texts; and determining syllable similarity between the keywords to be corrected and the plurality of reference texts according to the editing distance between the keywords to be corrected and the reference texts.

In one embodiment, the processor, when executing the computer program, further performs the steps of: respectively inputting each phoneme in the syllables of the keyword to be corrected and each phoneme in the syllables of the candidate reference text into a preset language model for vectorization processing to obtain a plurality of first word vectors of the keyword to be corrected and a plurality of second word vectors of the candidate reference text; determining phoneme similarity of the keywords to be corrected and the candidate reference text according to the hyper-parameters, Euclidean distance between the first word vector and a second word vector corresponding to the first word vector and the maximum Euclidean distance; and determining the candidate texts with the second similarity to the keywords to be corrected being more than or equal to a preset second threshold value as target reference texts.

In one embodiment, the processor, when executing the computer program, further performs the steps of: according to the inclusionDetermining the editing distance between the keyword to be corrected and the candidate reference text, wherein alpha is a hyper-parameter, S is the maximum Euclidean distance, n is the dimension of a vector, and x is_iIs a first word vector, y_iIs a second word vector; and determining the phoneme similarity of the keywords to be corrected and the candidate reference texts according to the editing distance between the keywords to be corrected and the candidate reference texts.

In one embodiment, the processor, when executing the computer program, further performs the steps of: processing the editing distance between the keyword to be corrected and the candidate reference text according to the editing distance between the keyword to be corrected and the candidate reference text and the syllable length of the keyword to be corrected; and determining the phoneme similarity of the keywords to be corrected and the candidate reference text according to the editing distance between the processed keywords to be corrected and the target reference text.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

determining keywords to be corrected in the text to be processed according to a plurality of reference texts, wherein the plurality of reference texts are texts related to the text to be processed;

determining candidate reference texts from the multiple reference texts through first similarity of the keywords to be corrected and the multiple reference texts;

determining a target reference text according to the phoneme similarity of the keywords to be corrected and the candidate reference text;

and correcting the keywords to be corrected based on the target reference text.

In one embodiment, the computer program when executed by the processor further performs the steps of: respectively comparing the keywords to be corrected with a plurality of reference texts to obtain a plurality of comparison results, and determining first similarity between the keywords to be corrected and the plurality of reference texts based on the comparison results; and determining the reference texts with the first similarity to the keywords to be corrected being more than or equal to a preset first threshold value in the plurality of reference texts as candidate reference texts.

In one embodiment, when the first similarity is a character similarity, the computer program when executed by the processor further performs the steps of: comparing the characters in the keywords to be corrected with the characters in the reference texts respectively according to the arrangement sequence of the characters by taking the characters as a comparison unit to obtain a plurality of editing distances, wherein the editing distances are the times required by adjusting the characters in the keywords to be corrected into the characters in the reference texts; and determining the character similarity of the keywords to be corrected and the plurality of reference texts according to the editing distance between the keywords to be corrected and the reference texts.

In one embodiment, when the first similarity is a syllable similarity, the computer program when executed by the processor further performs the steps of: comparing each syllable in the keyword to be corrected with each syllable in the reference texts according to the arrangement sequence of the syllables by taking the syllables as a comparison unit to obtain a plurality of editing distances, wherein the editing distances are the times required by adjusting each syllable in the keyword to be corrected into each syllable in the reference texts; and determining syllable similarity between the keywords to be corrected and the plurality of reference texts according to the editing distance between the keywords to be corrected and the reference texts.

In one embodiment, the computer program when executed by the processor further performs the steps of: respectively inputting each phoneme in the syllables of the keyword to be corrected and each phoneme in the syllables of the candidate reference text into a preset language model for vectorization processing to obtain a plurality of first word vectors of the keyword to be corrected and a plurality of second word vectors of the candidate reference text; determining phoneme similarity of the keywords to be corrected and the candidate reference text according to the hyper-parameters, Euclidean distance between the first word vector and a second word vector corresponding to the first word vector and the maximum Euclidean distance; and determining the candidate texts with the second similarity to the keywords to be corrected being more than or equal to a preset second threshold value as target reference texts.

In one embodiment, the computer program when executed by the processor further performs the steps of: according to the inclusionDetermining the editing distance between the keyword to be corrected and the candidate reference text, wherein alpha is a hyper-parameter, S is the maximum Euclidean distance, n is the dimension of a vector, and x is_iIs a first word vector, y_iIs a second word vector; and determining the phoneme similarity of the keywords to be corrected and the candidate reference texts according to the editing distance between the keywords to be corrected and the candidate reference texts.

In one embodiment, the computer program when executed by the processor further performs the steps of: processing the editing distance between the keyword to be corrected and the candidate reference text according to the editing distance between the keyword to be corrected and the candidate reference text and the syllable length of the keyword to be corrected; and determining the phoneme similarity of the keywords to be corrected and the candidate reference text according to the editing distance between the processed keywords to be corrected and the target reference text.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

19页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：基于BERT模型的带权消极监督文本情感分析方法

Keyword correction method and device, computer equipment and storage medium

相关技术

网友询问留言