New word discovery method and device, electronic device and storage medium

文档序号：1889941 发布日期：2021-11-26 浏览：9次中文

阅读说明：本技术 新词发现方法及装置、电子设备及存储介质 (New word discovery method and device, electronic device and storage medium ) 是由陈诚陈守志董井然张�杰曾令英梁浩强孙雨豪于 2021-03-01 设计创作，主要内容包括：本公开提供一种新词发现方法及装置、电子设备、计算机可读存储介质；涉及计算机技术领域。所述新词发现方法包括：获取多个文本数据,且各所述文本数据均具有类型标签；对所述多个文本数据进行新词提取,以得到一个或多个新词；在各所述类型标签中确定目标类型标签,并计算所述新词与所述目标类型标签之间的相关度指标；依据所述相关度指标,在所述新词中确定出与所述目标类型标签关联的目标新词。本公开通过引入相关性指标,可以在确保新词发现的准确率的同时,提高新词发现的结果与目标类型的相关性。(The present disclosure provides a new word discovery method and apparatus, an electronic device, and a computer-readable storage medium; relates to the technical field of computers. The new word discovery method comprises the following steps: acquiring a plurality of text data, wherein each text data has a type label; extracting new words from the text data to obtain one or more new words; determining a target type label in each type label, and calculating a correlation index between the new word and the target type label; and determining a target new word associated with the target type label in the new words according to the relevance index. According to the method and the device, the relevance index is introduced, so that the relevance between the result of new word discovery and the target type can be improved while the accuracy of new word discovery is ensured.)

1. A method for discovering new words, comprising:

acquiring a plurality of text data, wherein each text data has a type label;

extracting new words from the text data to obtain one or more new words;

determining a target type label in each type label, and calculating a correlation index between the new word and the target type label;

and determining a target new word associated with the target type label in the new words according to the relevance index.

2. The method according to claim 1, wherein the extracting new words from the text data to obtain one or more new words comprises:

dividing each text data into a plurality of sentences, and extracting a plurality of candidate words from each sentence;

calculating the adjacent character richness index and the internal coagulability index of each candidate word;

and when the adjacent character richness index and the internal solidification index are respectively greater than the corresponding preset richness threshold and preset solidification threshold, taking the candidate word as the new word.

3. The method according to claim 2, wherein the extracting a plurality of candidate words from each sentence comprises:

extracting a plurality of the candidate words from each sentence in a plurality of different character lengths, respectively.

4. The method according to claim 2, wherein the calculating of the neighborhood word richness index and the internal solidity index of each candidate word comprises:

acquiring left and right adjacent character sets of the candidate words from the text data, and correspondingly calculating the information entropy between the candidate words and the left and right adjacent character sets to obtain the adjacent character richness index;

and calculating mutual information between points in the candidate words to obtain the internal coagulability index.

5. The method for discovering new words according to claim 4, wherein the calculating the entropy between the candidate words and the left and right neighbor sets comprises:

by the formulaCalculating the information entropy E (w) of the candidate words and the left and right adjacent word sets; wherein, w_neiAnd w is the candidate word.

6. The method according to claim 4, wherein the calculating inter-point mutual information inside the candidate word to obtain the internal freezing degree index comprises:

by the formulaCalculating the internal solidity index PMI (x, y); and x and y are text segments in the candidate words.

7. The method according to claim 1, wherein the calculating a relevancy indicator between the new word and the target type tag comprises:

calculating a first index based on the target type tag, the new word, and the sentence, the first index being used to represent a probability that the new word belongs to the target type tag;

calculating a second index based on the other type tags except the target type tag, the new word, and the sentence, the second index representing a probability that the new word belongs to the other type tags;

and calculating the ratio of the first index to the second index to obtain the correlation index.

8. The method of claim 7, wherein the calculating a first indicator based on the target type tag, the new word, and the sentence comprises:

counting a first number of the sentences belonging to the target type tag and a second number of the sentences belonging to the target type tag and containing the new words;

and calculating the ratio of the second number to the first number to obtain the first index.

9. The method according to claim 7, wherein the calculating a second index based on the type tag other than the target type tag, the new word, and the sentence includes:

counting a third number of the sentences belonging to the other types of tags and a fourth number of the sentences belonging to the other types of tags and containing the new words;

and calculating the ratio of the fourth number to the third number to obtain the second index.

10. The method according to claim 1, wherein the determining a target new word associated with the target type tag in the new words according to the relevancy indicator comprises:

and sequencing the new words according to the relevancy indexes, and determining a plurality of target new words according to sequencing results.

11. The method for discovering new words according to claim 10, wherein the outputting a plurality of the target new words according to the sorting result comprises:

and outputting the new words with the number of the front targets in the sequencing result as the target new words.

12. The method for discovering new words according to claim 10, wherein the outputting a plurality of the target new words according to the sorting result comprises:

and outputting the new words with the relevancy indexes larger than a preset relevancy threshold value in the sorting result as the target new words.

13. A new word discovery apparatus, comprising:

the text acquisition module is used for acquiring a plurality of text data, and each text data has a type label;

the new word extraction module is used for extracting new words from the text data to obtain one or more new words;

the relevancy calculation module is used for determining a target type label in each type label and calculating a relevancy index between the new word and the target type label;

and the target new word discovery module is used for determining a target new word associated with the target type label in the new words according to the relevancy index.

14. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 12.

15. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method of any one of claims 1 to 12.

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a new word discovery method, a new word discovery apparatus, an electronic device, and a computer-readable storage medium.

Background

The development and the transition of social, economic, cultural and scientific technologies can generate the influence of profession on the language, wherein the most intuitive influence is the appearance of new words. The source of the new word covers the aspects of production and life, for example, the new word can be a network new word, a new word urged in the production and operation process, or a new word in a certain industry or a certain field, and how to quickly and effectively identify the new word in the fields of text processing and information mining is increasingly important. In the related technology, only the measurement index of the new word judgment dimension is considered, so that the problem that the result of new word discovery is irrelevant to the target task exists.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

An object of the present disclosure is to provide a new word discovery method, a new word discovery apparatus, an electronic device, and a computer-readable storage medium, which overcome, at least to some extent, the problem that a result of new word discovery is not related to a target task due to limitations and disadvantages of the related art.

According to a first aspect of the present disclosure, there is provided a new word discovery method, including:

acquiring a plurality of text data, wherein each text data has a type label;

extracting new words from the text data to obtain one or more new words;

determining a target type label in each type label, and calculating a correlation index between the new word and the target type label;

and determining a target new word associated with the target type label in the new words according to the relevance index.

According to a second aspect of the present disclosure, there is provided a new word discovery apparatus including:

the text acquisition module is used for acquiring a plurality of text data, and each text data has a type label;

the new word extraction module is used for extracting new words from the text data to obtain one or more new words;

the relevancy calculation module is used for determining a target type label in each type label and calculating a relevancy index between the new word and the target type label;

and the target new word discovery module is used for determining a target new word associated with the target type label in the new words according to the relevancy index.

In an exemplary embodiment of the disclosure, the new word extraction module performs new word extraction on the plurality of text data by performing the following method to obtain one or more new words:

dividing each text data into a plurality of sentences, and extracting a plurality of candidate words from each sentence;

calculating the adjacent character richness index and the internal coagulability index of each candidate word;

In an exemplary embodiment of the present disclosure, the new word extraction module extracts a plurality of candidate words from each of the sentences by performing the following steps:

extracting a plurality of the candidate words from each sentence in a plurality of different character lengths, respectively.

In an exemplary embodiment of the disclosure, the new word extraction module calculates the neighborhood richness index and the internal solidity index of each candidate word by performing the following methods:

and calculating mutual information between points in the candidate words to obtain the internal coagulability index.

In an exemplary embodiment of the disclosure, the new word extraction module calculates the information entropy between the candidate word and the left and right neighboring word sets by performing the following steps:

by the formulaCalculating the information entropy E (w) of the candidate words and the left and right adjacent word sets; wherein wnei is the left and right neighbor set.

In an exemplary embodiment of the disclosure, the new word extraction module calculates inter-point mutual information inside the candidate word by performing the following steps to obtain the internal freezing degree index:

by the formulaCalculating the internal solidity index PMI (x, y); and x and y are text segments in the candidate words.

In an exemplary embodiment of the present disclosure, the relevance calculating module calculates the relevance indicator between the new word and the target type tag by performing the following steps:

calculating a first index based on the target type tag, the new word, and the sentence, the first index being used to represent a probability that the new word belongs to the target type tag;

and calculating the ratio of the first index to the second index to obtain the correlation index.

In an exemplary embodiment of the present disclosure, the relevancy calculation module calculates a first index based on the target type tag, the new word, and the sentence by performing the following steps:

counting a first number of the sentences belonging to the target type tag and a second number of the sentences belonging to the target type tag and containing the new words;

and calculating the ratio of the second number to the first number to obtain the first index.

In an exemplary embodiment of the present disclosure, the relevancy calculation module calculates a second index based on the type tag other than the target type tag, the new word, and the sentence by performing the following steps:

counting a third number of the sentences belonging to the other types of tags and a fourth number of the sentences belonging to the other types of tags and containing the new words;

and calculating the ratio of the fourth number to the third number to obtain the second index.

In an exemplary embodiment of the disclosure, the target new word discovery module determines a target new word associated with the target type tag from the new words according to the relevancy indicator by performing the following steps:

and sequencing the new words according to the relevancy indexes, and determining a plurality of target new words according to sequencing results.

In an exemplary embodiment of the present disclosure, the target new word discovery module outputs a plurality of the target new words according to the ranking result by performing the following steps:

and acquiring the new words with the number of front targets in the sequencing result as the target new words.

and acquiring the new words with the relevancy indexes larger than a preset relevancy threshold in the sorting result as the target new words.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the method of any one of the above via execution of the executable instructions.

According to a fourth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any one of the above.

According to a fifth aspect of the present disclosure, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the program configuration management method provided in the above embodiments.

Exemplary embodiments of the present disclosure may have some or all of the following benefits:

in a new word discovery method provided in an example embodiment of the present disclosure, a plurality of text data are acquired, and each text data has a type tag; extracting new words from the text data to obtain one or more new words; determining a target type label in each type label, and calculating a correlation index between the new word and the target type label; and determining a target new word associated with the target type label in the new words according to the relevance index. On one hand, the target type label is determined in each type label, so that the new word discovery can be associated with the target type, and the new word related to a specific task scene can be acquired. On the other hand, a relevance index associated with the target type label is introduced, so that the required target new words associated with the target type label can be acquired according to the relevance degree.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

Fig. 1 is a schematic diagram illustrating an exemplary system architecture to which a new word discovery method and apparatus according to an embodiment of the present disclosure may be applied;

FIG. 2 schematically illustrates a flow diagram of a new word discovery method according to one embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow diagram for new word extraction for a plurality of text data according to one embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow of determining new words from a neighborhood richness index and an internal solidity index, according to one embodiment of the present disclosure;

FIG. 5 schematically shows a flow of computing a relevance indicator between a new word and a target type tag in one embodiment according to the present disclosure;

FIG. 6 schematically shows a flow diagram for calculating a probability that a new word belongs to a target type tag in accordance with one embodiment of the present disclosure;

FIG. 7 schematically illustrates a flow diagram for calculating probabilities of new words belonging to other types of tags in accordance with an embodiment of the present disclosure;

FIG. 8 schematically illustrates a flow chart of a new word discovery method according to one application scenario of the present disclosure;

fig. 9 schematically illustrates a block diagram of a new word discovery apparatus according to one embodiment of the present disclosure;

FIG. 10 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device used to implement embodiments of the present disclosure;

fig. 11 schematically shows a structural diagram of a distributed system applied to a blockchain system according to an embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

Fig. 1 is a schematic diagram illustrating a system architecture of an exemplary application environment to which a new word discovery method and apparatus according to an embodiment of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include one or more of terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. The terminal devices 101, 102, 103 may be various electronic devices having a display screen, including but not limited to desktop computers, portable computers, smart phones, tablet computers, and the like. It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.

The new word discovery method provided by the embodiment of the present disclosure may be executed by the server 105, and accordingly, a new word discovery apparatus may be disposed in the server 105. The new word discovery method provided by the embodiment of the present disclosure may also be executed by the terminal devices 101, 102, and 103, and accordingly, the new word discovery apparatus may be disposed in the terminal devices 101, 102, and 103. The new word discovery method provided by the present disclosure may also be executed by the terminal devices 101, 102, and 103 and the server 105 together, and accordingly, the new word discovery apparatus may be disposed in the terminal devices 101, 102, and 103 and the server 105, which is not particularly limited in this exemplary embodiment.

For example, in the present exemplary embodiment, the server 105 may acquire a plurality of text data through the terminal apparatuses 101, 102, 103, and each text data has a type tag; then, extracting new words from the acquired text data to obtain one or more new words; determining a target type label in each type label, and calculating a correlation index between the new word and the target type label; and finally, determining a target new word associated with the target type label in the new words according to the relevance index.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

The development and transition of social, economic, cultural and scientific technology can bring new words to the beginning, for example, new words in network, production and operation, or in a certain industry or a certain field, and the development of artificial intelligence related technologies such as natural language processing and machine learning play a positive role in promoting the research of the transition process. Among them, how to quickly and efficiently recognize new words in the fields of text processing and information mining is a very important research direction.

The related art realizes the above-described process of new word discovery based on information entropy. For example, a new word can be extracted from text data by two measures of the richness of adjacent characters and the internal solidity. The adjacent character richness index is used for measuring whether a candidate word in the text data has rich left and right adjacent characters or not, namely, whether the candidate word is a word or not is determined by judging whether the candidate word has a rich language application environment or not. The internal solidity is used to measure the non-contingency of the candidate word component words.

Although the method can acquire the new words from the text data, the extracted new words cannot generate effects in application because only the neighborhood richness and internal coagulability indexes of the candidate words are considered when the new words are found, but the correlation between the candidate words and the field/category is not considered. For example, in a black and gray scene related to financial wind control, new words in a certain fraud field need to be found, and then (new) fraud committing techniques and ways need to be found. In such a scenario, not only need to find new words, but also need to be closely related to the target domain/category.

In order to solve the problems existing in the above method, in the present exemplary embodiment, the inventor proposes a new technical solution, and the technical solution of the embodiments of the present disclosure is explained in detail below:

the present exemplary embodiment first provides a new word discovery method, which specifically includes the following steps, as shown in fig. 2:

step S210: acquiring a plurality of text data, wherein each text data has a type label;

step S220: extracting new words from the text data to obtain one or more new words;

step S230: determining a target type label in each type label, and calculating a correlation index between the new word and the target type label;

step S240: and determining a target new word associated with the target type label in the new words according to the relevance index.

In the new word discovery method provided by an example embodiment of the present disclosure, on one hand, by determining a target type tag among various types of tags, the new word discovery may be associated with a target type, so that a new word related to a specific task scenario may be acquired. On the other hand, a relevance index associated with the target type label is introduced, so that the required target new words associated with the target type label can be acquired according to the relevance degree.

The above steps of the present exemplary embodiment are explained in more detail below:

in step S210, a plurality of text data are acquired, and each text data has a type tag.

In the present exemplary embodiment, the text data is a text material related to a certain field. For example, the text data may be text data related to the field of wind control, and the source of the text data may be a user complaint text, or text data obtained through other channels such as a questionnaire, which is not limited in this exemplary embodiment.

Each text data has a corresponding type label for indicating the type of the text data. Taking the above-mentioned wind control field as an example, the type tag corresponding to the text data may include different types such as friend-making fraud, financial fraud, swiped-ticket rebate fraud, and the like. It should be noted that the above scenario is only an exemplary illustration, and the scope of protection of the exemplary embodiment is not limited thereto.

In step S220, new word extraction is performed on the plurality of text data to obtain one or more new words.

After the text data is obtained, the new word discovery method provided by the present exemplary embodiment first extracts a new word from the text data, so as to further obtain a new word related to the target type in the following. The new words refer to newly created words and are words born under the promotion of the development of the modern society and the continuous environmental changes. For example, the new word may be a word of a type such as a proper noun, an abbreviation, a flow vocabulary, etc. which is not included in the word segmenter and causes a word segmentation error, and this is not particularly limited in this exemplary embodiment. The word segmentation device is used for segmenting one entry from a string of texts and standardizing each entry.

In the present exemplary embodiment, the process of extracting new words from the text data to obtain one or more new words may be implemented by the flow of fig. 3, for example, as shown in fig. 3, where the implementation process may include the following steps:

in step S310, each text data is divided into a plurality of sentences, and a plurality of candidate words are extracted from each sentence.

In the present exemplary embodiment, the text data described above contains a plurality of sentences each of which is composed of a plurality of candidate words. The candidate words are text fragments which are to be processed in the text data and are possibly new words.

The process of dividing each text data into a plurality of sentences may be performed according to punctuation marks in the text data. For example, punctuation can be broken into sets of sentences: [ S ]₁,S₂,…,S_n]Wherein the type label of each sentence in the sentence set is the same as the type label of the text data to which the sentence belongs.

The above process of extracting multiple candidate words from each sentence can be implemented as follows: a plurality of candidate words are extracted from each sentence in a plurality of different character lengths, respectively. For example, the text segments in the sentences may be extracted as candidate words in units of tuples with different lengths, where the length of the tuple is the character length of the candidate word. Specifically, 2-tuple, 3-tuple, …, n-tuple in each sentence can be extracted as a candidate word, where a value of n is determined according to an actual scene, and in general, the value of n is 4. Taking a sentence "natural language processing algorithm", taking the example of extracting 2-tuple in the sentence, the obtained 2-tuple includes: [ natural language, speech processing, arithmetic, algorithm ]. It should be noted that the above scenario is only an exemplary illustration, and the scope of protection of the exemplary embodiment is not limited thereto.

In this exemplary embodiment, the extracted candidate words each have a type tag, and the type tag is the same as the sentence and the text data to which the candidate word belongs, and specifically, the text data or the sentence category tag may be mapped to each candidate word. Taking the 2-tuple [ natural language, language processing, algorithm ] as an example, if the type labels of the sentences and text data to which the 2-tuple belongs are friend-making fraud, all candidate words in the 2-tuple have friend-making fraud type labels. It should be noted that the above scenario is only an exemplary illustration, and the scope of protection of the exemplary embodiment is not limited thereto.

In step S320, the neighborhood word richness index and the internal solidity index of each candidate word are calculated.

In the present exemplary embodiment, when each candidate word is obtained, it is necessary to determine whether each candidate word is a new word based on the entropy. Specifically, the method can be realized by calculating the neighborhood word richness and the internal solidity of each candidate word.

The above-mentioned neighborhood word richness index is used to measure the richness of the candidate word language environment, that is, the degree of freedom of the candidate word combining with other text segments externally. The richness of the adjacent characters is explained in detail by taking two candidate words of 'cup' and 'generation' as examples: aiming at the candidate word of 'cup', various different word expressions such as 'washing cup', 'buying cup', 'glass cup', 'designing cup' and the like can be provided; the candidate word "generation" has a fixed usage, such as "generation", "this generation", "up generation" and "down generation". It can be seen that the word combinations that can appear on the left of the candidate word "ancestor" are limited, and it can be considered that "ancestor" is not a single word, but a real word is actually an integer such as "ancestor" this ancestor ". The above-mentioned neighborhood word richness index is used for measuring free application degree of text fragment and is also an important standard for judging that it is formed into word or not.

In the present exemplary embodiment, when calculating the above-mentioned neighborhood word richness index, since a text segment that can be formed into a word should be able to flexibly appear in various different environments, and have rich left neighborhood word sets and right neighborhood word sets, the following can be achieved by calculating the information entropy of the left and right neighborhood word sets of a candidate word: and acquiring left and right adjacent character sets of the candidate words from the text data, and correspondingly calculating the information entropy between the candidate words and the left and right adjacent character sets to obtain an adjacent character richness index.

The information entropy is a measure of the amount of information, and the higher the information entropy is, the richer the information amount is, and the larger the uncertainty is. When the left adjacent word set or the right adjacent word set of the candidate word is obtained, specifically, the left entropy corresponding to the left adjacent word set or the right entropy corresponding to the right adjacent word set may be calculated by the following formula:

wherein E (w) is the left entropy or the right entropy, w_neiThe word set is the left adjacent word set or the right adjacent word set, and w is a candidate word. It should be noted that the above scenario is only an exemplary illustration, and the scope of protection of the exemplary embodiment is not limited thereto.

In this example embodiment, the internal solidity index is used to measure the non-contingency of the candidate word composing the word. In an actual scene, the possibility of combining a plurality of characters with high occurrence frequency into words is high, and on the basis, the non-contingency of the words needs to be considered. For example, in a 1900 ten thousand word news corpus, there are many character combinations combined with "of" and "in", but the frequency of occurrence of these two character combinations is high, but obviously it cannot constitute words.

Taking two candidate words of "singer" and "singing" as an example, the non-contingency of the above-mentioned idioms is explained: assuming that "singer" appears 117 times and "singing" appears 275 times among a plurality of text data, the probability that "singing" becomes a word is higher in terms of frequency alone. However, if the character combination is regarded as a random event, "singing" and "person" occur 2502,32272 times in the text, respectively, with P (singing) × P (person) ═ 2.24 × 10^-7Calculating to obtain the occurrence probability of singersIs 2.24 x 0^-7And the actual probability of occurrence of the "singer" is 27 times the calculated probability of occurrence. The probability of the combination of "and" singing "is P (of) × P (singing) ═ 3.07 × 10^-6The actual probability of occurrence is 4.7 times it. Therefore, the occurrence of the word "singing" is a result of the common word "frequently occurring with other words, and the" singer "is a reasonable Chinese vocabulary with high degree of aggregation among the components. It should be noted that the above scenario is only an exemplary illustration, and does not limit the scope of protection of the exemplary embodiment.

In the present exemplary embodiment, calculating the above-described internal degree of coagulation index may be achieved by quantifying the degree of coagulation inside the candidate word. For example, the internal freezing degree index may be obtained by calculating mutual point information inside the candidate word. Specifically, the internal solidity index PMI (x, y) may be calculated by the following formula:

and x and y are text segments in the candidate words. Taking the above-mentioned "singer" as an example, the index for calculating the internal solidity of the candidate word may beIt should be noted that the above scenario is only an exemplary illustration, and the scope of protection of the exemplary embodiment is not limited thereto.

In step S330, when the neighboring word richness index and the internal solidity index are respectively greater than the corresponding preset richness threshold and the preset solidity threshold, the candidate word is taken as a new word.

In the present exemplary embodiment, after the neighborhood word richness index and the internal solidity index are calculated in step S320, a new word in the candidate word is determined based on the calculated neighborhood word richness index and internal solidity index. For example, the new words may be screened out by a predetermined richness threshold and a predetermined solidity threshold. The screening process, as shown in fig. 4, may include the following steps:

in step S410, it is determined whether the above-mentioned neighboring word richness index is greater than a preset richness threshold.

In this step, the external word-composing freedom of the candidate word is determined, specifically, it is determined whether the richness index of the adjacent word calculated in step S320 is greater than the preset richness threshold. If yes, go on to step S420; otherwise, the step S440 is skipped.

In step S420, it is determined whether the internal solidity index is greater than a preset solidity threshold.

In this step, the internal freezing degree of the candidate word is determined, specifically, it is determined whether the internal freezing degree index calculated in step S320 is greater than a preset freezing degree threshold. If yes, go on to step S430; otherwise, the step S440 is skipped.

In step S430, the candidate word is determined as a new word.

In step S440, the candidate word is discarded.

It should be noted that the above-mentioned flow steps in fig. 4 are only an exemplary description, and the scope of protection of the present exemplary embodiment is not limited thereto.

In step S230, a target type tag is determined among the type tags, and a correlation index between the new word and the target type tag is calculated.

In the present exemplary embodiment, in order to make the extracted new word related to the task to be processed, after the new word is extracted in step S220, the new word related to the task may be further filtered by calculating the degree of correlation between the new word and the type tag described in the target task. The target type tag is a tag related to a task to be processed, and the tag is determined among the types of tags. For example, assuming that the type tags of the text data acquired in step S210 include three types of friend fraud, financial fraud, and liquidation disinterprofit fraud, and the current task is to determine the criminal activity clues of friend fraud, the target type tag should be determined as the type tag of friend fraud among the three types of fraud. It should be noted that the above scenario is only an exemplary illustration, and the scope of protection of the exemplary embodiment is not limited thereto.

The relevancy index is used for measuring the relevancy between the extracted new words and the target type label. For example, in order to make the calculation result more accurate, the relevancy index may be calculated by the relevancy of the new word and the target type tag, and other types of tags except the target type tag. Specifically, as shown in fig. 5, the calculation may include the steps of:

in step S510, a first index for indicating a probability that a new word belongs to a target type tag is calculated based on the target type tag, the new word, and the sentence.

In this exemplary embodiment, the first index is used to indicate the probability that the new word belongs to the target type tag. The method for calculating the first index may be implemented by the steps shown in fig. 6, for example:

in step S610, a first number of sentences belonging to the target type tag and a second number of sentences belonging to the target type tag and containing the new word are counted.

In this step, the first number is a total number of sentences belonging to the target type tag in the plurality of text data, and the second number is a total number of sentences belonging to the target type tag and including the new word in the plurality of text data. The step is used for counting the first number and the second number.

In step S620, a ratio of the second number to the first number is calculated to obtain the first index.

It should be noted that the above scenario is only an exemplary illustration, and the scope of protection of the exemplary embodiment is not limited thereto.

In step S520, a second index for indicating a probability that the new word belongs to the other type tags is calculated based on the other type tags except the target type tag, the new word, and the sentence.

In this exemplary embodiment, the second index is used to indicate the probability that the new word belongs to a tag of a type other than the target type tag. The type labels of the acquired text data comprise three types of friend-making fraud, financial fraud and bill-swiping adversity fraud, wherein the target type label is the friend-making fraud for example, and the other types of labels are the financial fraud and bill-swiping adversity fraud. The method for calculating the second index may be implemented by the steps shown in fig. 7, for example:

in step S710, a third number of sentences belonging to other types of tags and a fourth number of sentences belonging to other types of tags and containing the new word are counted.

In this step, the third number is a total number of sentences belonging to other types of tags in the plurality of text data, and the fourth number is a total number of sentences belonging to other types of tags and including the new word in the plurality of text data. The step is used for counting the third number and the fourth number.

In step S720, a ratio of the fourth number to the third number is calculated to obtain the second index.

In step S530, a ratio of the first index and the second index is calculated to obtain a correlation index.

In the exemplary embodiment, the relevance index is calculated through the steps S510 to S530, and the probability that the new word belongs to the target type tag and the probability that the new word belongs to other types of tags are fully considered, so that compared with the case that only the probability that the new word belongs to the target type tag is considered, the case that the probability that the new word belongs to the target type tag is large because the new word itself is a common word is avoided, and the reliability and accuracy of relevance calculation are improved. It should be noted that the above scenario is only an exemplary illustration, and the relevance index between the new word and the target type tag may also be calculated in other manners, which is not limited in this exemplary embodiment.

In step S240, a target new word associated with the target type tag is determined among the new words according to the relevance index.

In this exemplary embodiment, after the relevance index of the new word and the target type tag is obtained through calculation, the target new word related to the target type tag may be determined in the new word according to actual needs based on the relevance index. Specifically, the new words may be sorted according to the relevancy index, and a plurality of target new words may be determined according to the sorting result. For example, a front target number of new words in the sorting result may be obtained as target new words, where the target number is the number of new words related to the target type; a new word with a relevancy index greater than a preset relevancy threshold in the ranking result may also be obtained as the target new word, which is not particularly limited in this exemplary embodiment.

Next, taking the wind control application scenario shown in fig. 8 as an example, a complete description is performed on the flow of the new word discovery method, as shown in fig. 8, the specific application scenario includes the following steps:

in step S810, a plurality of text data is acquired.

In this step, a plurality of text data each having a type tag is acquired. The application scenario is a wind control scenario, the obtained text data are a plurality of text data respectively related to friend-making fraud, financial fraud and single-swiping rebate fraud, for example, the text data can be a user complaint text, and the corresponding type label can be the friend-making fraud, the financial fraud, the single-swiping rebate fraud and the like.

In step S820, the text data is divided into a plurality of sentences.

In this step, the plurality of text data acquired as described above are divided into a set of a plurality of sentences by punctuation [ S ]₁，S₂，...,S_n]And recording the type label corresponding to each sentence_siI.e., type tags of text data to which a sentence belongs, one sentence may correspond to one or more type tags.

In step S830, candidate words in each sentence are extracted.

In this step, candidate words are extracted from each sentence with different character lengths. For example, 2-tuple, 3-tuple, …, n-tuple in each sentence can be extracted as candidate words, and in this specific application scenario, the value of n takes 4. Taking the 2-tuple corresponding to the natural language processing algorithm as an example, the extracted candidate words include: [ natural language, speech processing, arithmetic, algorithm ]. Meanwhile, mapping the type label corresponding to the sentence to each candidate word. For example, when the above-mentioned "natural language processing algorithm" appears in text data of friend-making fraud, "nature" this 2-tuple carries a "friend-making fraud" label.

In step S840, the neighborhood richness index and the internal solidity index of the candidate word are calculated.

In the step, the adjacent character richness index and the internal coagulation index of each candidate word are calculated to obtain a score of the corresponding adjacent character richness index₁And internal solidity index score₂。

When the above-mentioned neighborhood word richness index is calculated, because a text segment that can become a word should be able to flexibly appear in various different environments, and have rich left neighborhood word sets and right neighborhood word sets, therefore, the following can be realized by calculating the information entropy of the left and right neighborhood word sets of a candidate word: and acquiring left and right adjacent character sets of the candidate words from the text data, and correspondingly calculating the information entropy between the candidate words and the left and right adjacent character sets to obtain an adjacent character richness index.

wherein E (w) is the left entropy or the right entropy, w_neiThe word set is the left adjacent word set or the right adjacent word set, and w is a candidate word.

The calculation of the internal freezing degree index can be realized by quantifying the freezing degree inside the candidate words. For example, the internal freezing degree index may be obtained by calculating mutual point information inside the candidate word. Specifically, the internal solidity index PMI (x, y) may be calculated by the following formula:

and x and y are text segments in the candidate words. Taking the above-mentioned "singer" as an example, the index for calculating the internal solidity of the candidate word may be

In step S850, it is determined whether the neighborhood word richness index is greater than a preset neighborhood word richness threshold.

In this step, it is determined whether the neighborhood word richness index is greater than a preset neighborhood word richness threshold value alpha, if so, the step S860 is continuously executed, otherwise, the step S870 is executed.

In step S860, it is determined whether the internal freezing degree indicator is greater than a preset internal freezing degree threshold.

In this step, it is determined whether the internal solidity index is greater than a preset internal solidity threshold beta, if so, the candidate word is used as a new word, and step S880 is continuously performed, otherwise, step S870 is performed.

In step S870, the candidate word is discarded.

In step S880, a relevance index between the new word and the target type tag is calculated.

In this step, a relevance index of the new word and a target type tag determined by the actual task is calculated, for example, if the current task is analyzing friend-making fraud, the target type tag is friend-making fraud.

Calculating each new word ngram_iThe process of relevance indicators to target type tags may be as follows: the statistical type label is a target type label_jTotal number of sentences P_j(ii) a The statistical type label is a target type label_jAnd contains the candidate word ngram_iTotal number of sentences P_i(ii) a Statistical type tag is not target type tag label_jTotal number of sentences N₀(ii) a Statistical type tag is not target type tag label_jAnd contains the candidate word ngram_iTotal number of sentences N_i(ii) a By the formulaAnd calculating to obtain the correlation index.

In step S890, the target new word is obtained by screening.

In the step, new words related to the target type labels are output in a descending order according to the relevance indexes, and if only the most relevant target number of new words or a result with the relevance indexes larger than a certain threshold value is needed, the target new words can be obtained through screening according to the sorting result.

It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

Further, in the present exemplary embodiment, a new word discovery apparatus is also provided. Referring to fig. 9, the new word discovery apparatus 900 may include a text acquisition module 910, a new word extraction module 920, a relevancy calculation module 930, and a target new word discovery module 940. Wherein:

the text acquisition module 910 may be configured to acquire a plurality of text data, where each text data has a type tag;

a new word extracting module 920, configured to perform new word extraction on the multiple pieces of text data to obtain one or more new words;

the relevance calculating module 930 may be configured to determine a target type tag among the various types of tags, and calculate a relevance index between the new word and the target type tag;

the target new word discovering module 940 may be configured to determine a target new word associated with the target type tag from the new words according to the relevancy indicator.

The text data is text data related to a certain field. For example, the text data may be text data related to the field of wind control, and the source of the text data may be a user complaint text, or text data obtained through other channels such as a questionnaire, which is not limited in this exemplary embodiment.

In an exemplary embodiment of the disclosure, the new word extraction module performs new word extraction on a plurality of text data by performing the following method to obtain one or more new words:

dividing each text data into a plurality of sentences, and extracting a plurality of candidate words from each sentence;

calculating the adjacent character richness index and the internal coagulability index of each candidate word;

when the adjacent character richness index and the internal solidification index are respectively greater than the corresponding preset richness threshold and the corresponding preset solidification threshold, taking the candidate word as a new word;

the above-mentioned neighborhood word richness index is used to measure the richness of the candidate word language environment, that is, the freedom of the candidate word combining with other text segments externally. The internal solidity index is used for measuring the non-contingency of the candidate words to form the words.

In an exemplary embodiment of the present disclosure, the new word extraction module extracts a plurality of candidate words from each sentence by performing the following steps:

for example, the text segments in the sentences may be extracted as candidate words in units of tuples with different lengths, where the length of the tuple is the character length of the candidate word. Specifically, 2-tuple, 3-tuple, …, n-tuple in each sentence can be extracted as a candidate word, where a value of n is determined according to an actual scene, and in general, the value of n is 4. Taking a sentence "natural language processing algorithm", taking the example of extracting 2-tuple in the sentence, the obtained 2-tuple includes: [ natural language, speech processing, arithmetic, algorithm ].

In an exemplary embodiment of the present disclosure, the new word extraction module calculates the neighborhood richness index and the internal solidity index of each candidate word by performing the following methods:

and calculating mutual information between points in the candidate words to obtain an internal coagulability index.

In an exemplary embodiment of the disclosure, the new word extraction module calculates the information entropy between the candidate word and the left and right neighbor sets by performing the following steps:

by the formulaCalculating the information entropy E (w) of the candidate words and the left and right adjacent word sets; wherein, w_neiIs a left and right adjacent character set; the information entropy is a measure of the amount of information, and the higher the information entropy is, the richer the information amount is, and the larger the uncertainty is.

by the formulaCalculating an internal solidification index PMI (x, y); and x and y are text segments in the candidate words.

In an exemplary embodiment of the disclosure, the relevancy index is used to measure a relevancy between the extracted new word and the target type tag. For example, in order to make the calculation result more accurate, the relevancy index may be calculated by the relevancy of the new word and the target type tag, and other types of tags except the target type tag. Specifically, the relevance calculating module calculates the relevance index between the new word and the target type label by performing the following steps:

calculating a first index based on the target type tag, the new word and the sentence, wherein the first index is used for representing the probability that the new word belongs to the target type tag;

calculating a second index based on other type tags except the target type tag, the new word and the sentence, wherein the second index is used for representing the probability that the new word belongs to the other type tags;

and calculating the ratio of the first index to the second index to obtain the correlation index.

In an exemplary embodiment of the present disclosure, the relevancy calculation module calculates the first index based on the target type tag, the new word, and the sentence by performing the following steps:

counting a first number of sentences belonging to the target type tags and a second number of sentences belonging to the target type tags and containing new words;

and calculating the ratio of the second number to the first number to obtain a first index.

In an exemplary embodiment of the present disclosure, the relevancy calculation module calculates the second index based on other type tags other than the target type tag, the new word, and the sentence by performing the following steps:

counting a third number of sentences belonging to other types of tags and a fourth number of sentences belonging to other types of tags and containing new words;

and calculating the ratio of the fourth number to the third number to obtain a second index.

In the exemplary embodiment, the relevance index is calculated through the steps, and the probability that the new word belongs to the target type tag and the probability that the new word belongs to other types of tags are fully considered, so that compared with the case that only the probability that the new word belongs to the target type tag is considered, the case that the probability that the new word belongs to the target type tag is high due to the fact that the new word is a common word is avoided, and the reliability and the accuracy of relevance calculation are improved. It should be noted that the above scenario is only an exemplary illustration, and the relevance index between the new word and the target type tag may also be calculated in other manners, which is not limited in this exemplary embodiment.

In an exemplary embodiment of the disclosure, the target new word discovery module determines the target new word associated with the target type tag from the new words according to the relevancy indicator by performing the following steps:

and sorting the new words according to the relevancy indexes, and determining a plurality of target new words according to sorting results.

In an exemplary embodiment of the present disclosure, the target new word discovery module outputs a plurality of target new words according to the sorting result by performing the following steps:

and acquiring new words with the number of front targets in the sequencing result as target new words.

In an exemplary embodiment of the present disclosure, the target new word discovery module outputs a plurality of target new words according to the sorting result by performing the following steps:

and acquiring the new words with the relevancy indexes larger than a preset relevancy threshold value in the sequencing result as target new words.

The details of each module or unit in the above-mentioned new word finding device have been described in detail in the corresponding new word finding method, and therefore are not described herein again.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

FIG. 10 illustrates a schematic structural diagram of a computer system suitable for use in implementing an electronic device of an embodiment of the present disclosure.

It should be noted that the computer system 1000 of the electronic device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 10, the computer system 1000 includes a Central Processing Unit (CPU)1001 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for system operation are also stored. The CPU 1001, ROM 1002, and RAM 1003 are connected to each other via a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output section 1007 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 1008 including a hard disk and the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 is also connected to the I/O interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1010 as necessary, so that a computer program read out therefrom is mounted into the storage section 1008 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication part 1009 and/or installed from the removable medium 1011. When the computer program is executed by a Central Processing Unit (CPU)1001, various functions defined in the method and apparatus of the present application are executed. In some embodiments, computer system 1000 may also include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

In some embodiments, the server and the terminal device may be nodes in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting the nodes (any form of computing device in an access network, such as a server and a user terminal) in a network communication manner. Nodes can form a Peer-To-Peer (P2P, Peer To Peer) network, and any type of computing device, such as a server, a terminal device, and other electronic devices, can become a node in the blockchain system by joining the Peer-To-Peer network. The node comprises a hardware layer, a middle layer, an operating system layer and an application layer.

Fig. 11 is a schematic structural diagram of a distributed system 1100 applied to a blockchain system according to an exemplary embodiment of the present disclosure, where functions of each node in the blockchain system include:

1. routing, a basic function that a node has, is used to support communication between nodes.

Besides the routing function, the node may also have the following functions:

2. the application is used for being deployed in a block chain, realizing specific services according to actual service requirements, recording data related to the realization functions to form recording data, carrying a digital signature in the recording data to represent a source of task data, and sending the recording data to other nodes in the block chain system, so that the other nodes add the recording data to a temporary block when the source and integrity of the recording data are verified successfully. For example, the application may be configured to implement a word segmentation function, obtain a text after word segmentation and form recorded data, and send the recorded data carrying a digital signature to other nodes in the blockchain system.

3. And the Block chain comprises a series of blocks (blocks) which are mutually connected according to the generated chronological order, new blocks cannot be removed once being added into the Block chain, and recorded data submitted by nodes in the Block chain system are recorded in the blocks. For example, the node where the server is located in the blockchain records the record data of the candidate word extraction process.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.

It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

24页详细技术资料下载

New word discovery method and device, electronic device and storage medium

相关技术

网友询问留言