Object keyword extraction method and device

文档序号：1953441 发布日期：2021-12-10 浏览：13次中文

阅读说明：本技术 对象的关键词提取方法及装置 (Object keyword extraction method and device ) 是由王艳花张晓辉李志鹏李瑶张光宇于 2020-09-03 设计创作，主要内容包括：本申请实施例提供一种对象的关键词提取方法及装置,该方法包括：获取第一对象对应的文本信息,文本信息用于描述第一对象。根据文本信息,确定第一对象对应的多个候选关键词。根据多个候选关键词的相似度及候选关键词为关键词的概率,在多个候选关键词中确定第一对象的至少一个关键词。通过文本信息确定第一对象所对应的多个候选关键词,能够实现快速高效的实现从文本信息中自动的提取候选关键词,并且根据多个候选关键词的相速度以及候选关键词为关键词的概率,对候选关键词进行过滤,能够保证最终确定的第一对象的关键词的准确性。(The embodiment of the application provides a method and a device for extracting keywords of an object, wherein the method comprises the following steps: and acquiring text information corresponding to the first object, wherein the text information is used for describing the first object. And determining a plurality of candidate keywords corresponding to the first object according to the text information. And determining at least one keyword of the first object in the candidate keywords according to the similarity of the candidate keywords and the probability of the candidate keywords as the keywords. The candidate keywords corresponding to the first object are determined through the text information, the candidate keywords can be automatically extracted from the text information quickly and efficiently, the candidate keywords are filtered according to the phase velocities of the candidate keywords and the probability that the candidate keywords are the keywords, and the accuracy of the finally determined keywords of the first object can be guaranteed.)

1. A method for extracting a keyword of an object, comprising:

acquiring text information corresponding to a first object, wherein the text information is used for describing the first object;

determining a plurality of candidate keywords corresponding to the first object according to the text information;

and determining at least one keyword of the first object in the candidate keywords according to the similarity of the candidate keywords and the probability that the candidate keywords are keywords.

2. The method of claim 1, wherein determining a plurality of candidate keywords corresponding to the first object according to the text information comprises:

processing the text information through a first model to obtain a plurality of candidate keywords;

the first model is obtained by learning a plurality of groups of samples, each group of samples comprises sample text information and sample candidate keywords, and the plurality of groups of samples are obtained by generating the second model.

3. The method of claim 2, wherein the process of generating the plurality of sets of samples by the second model comprises:

acquiring the sample text information;

performing word segmentation processing on the sample text information through the second model to obtain a plurality of sample words and the probability that each sample word is a keyword;

and determining sample candidate keywords in the plurality of sample words according to the probability that each sample word is a keyword, wherein the probability that the sample candidate keywords are keywords is greater than a first threshold value.

4. The method according to any of claims 1-3, wherein determining at least one keyword of the first object among the plurality of candidate keywords based on the similarity of the plurality of candidate keywords and the probability that the candidate keyword is a keyword comprises:

for each two candidate keywords in the plurality of candidate keywords, judging whether the similarity between the two candidate keywords is greater than a preset threshold value;

if so, combining the two candidate keywords into one keyword according to the probability that the two candidate keywords are the keywords respectively;

and if not, determining the two candidate keywords as the keywords of the first object.

5. The method of claim 4, wherein merging the two candidate keywords into one keyword according to the probability that the two candidate keywords are keywords, respectively, comprises:

and merging the two candidate keywords into a target keyword, wherein the target keyword is a keyword with higher probability of being a keyword in the two candidate keywords.

6. The method according to any one of claims 1-5, wherein determining a plurality of candidate keywords corresponding to the first object according to the text information comprises:

sentence division processing is carried out on the text information to obtain a plurality of short sentences;

determining whether each short sentence comprises a keyword or not through a binary classification model, and determining the short sentence comprising the keyword as a target short sentence to obtain at least one target short sentence;

performing word segmentation processing on each target short sentence to obtain a plurality of first words;

filtering stop words of the first vocabulary to obtain a plurality of second vocabularies;

and performing keyword prediction processing on the plurality of second words to obtain a plurality of candidate keywords.

7. The method of any of claims 1-6, wherein the first model is a pointer generation network;

the output layer of the pointer generation network comprises a generation probability, and the generation probability is used for indicating the probability that the next output word of the decoder at each time step is from a preset word list; and

the attention distribution function of the pointer generation network includes a coverage factor.

8. The method according to any of claims 1-6, wherein the text information comprises at least one of:

network data corresponding to the first object, wherein the network data comprises description information of the first object;

and the detail page is a network page introducing the first object.

9. An apparatus for extracting a keyword of an object, comprising:

the acquisition module is used for acquiring text information corresponding to a first object, and the text information is used for describing the first object;

the determining module is used for determining a plurality of candidate keywords corresponding to the first object according to the text information;

the determining module is further configured to determine at least one keyword of the first object among the candidate keywords according to the similarity of the candidate keywords and the probability that the candidate keywords are keywords.

10. An apparatus for extracting a keyword of an object, comprising:

a memory for storing a program;

a processor for executing the program stored by the memory, the processor being configured to perform the method of any of claims 1 to 8 when the program is executed.

11. A computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 8.

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for extracting keywords of an object.

Background

At present, online shopping becomes a very important shopping mode, and keywords of a commodity can be provided on a graphical user interface so that a user can quickly know characteristics of the commodity.

The extraction of the keywords of the goods is particularly important, and in the prior art, when the keywords of the goods are extracted, the keywords of the goods submitted by a seller are usually received, the submitted keywords are manually checked, and the keywords which pass the checking are used as the keywords to be displayed.

However, depending on the implementation of manual submission to obtain the keywords of the goods, the operation of obtaining the keywords may be inefficient.

Disclosure of Invention

The embodiment of the application provides a method and a device for extracting keywords of an object, so as to overcome the problem of low operation efficiency of obtaining the keywords.

In a first aspect, an embodiment of the present application provides a method for extracting a keyword of an object, including:

acquiring text information corresponding to a first object, wherein the text information is used for describing the first object;

determining a plurality of candidate keywords corresponding to the first object according to the text information;

and determining at least one keyword of the first object in the candidate keywords according to the similarity of the candidate keywords and the probability that the candidate keywords are keywords.

In one possible design, determining a plurality of candidate keywords corresponding to the first object according to the text information includes:

processing the text information through a first model to obtain a plurality of candidate keywords;

In one possible design, the process of generating the plurality of sets of samples by the second model includes:

acquiring the sample text information;

performing word segmentation processing on the sample text information through the second model to obtain a plurality of sample words and the probability that each sample word is a keyword;

In one possible design, determining at least one keyword of the first object among the candidate keywords according to the similarity of the candidate keywords and the probability that the candidate keyword is the keyword comprises:

for each two candidate keywords in the plurality of candidate keywords, judging whether the similarity between the two candidate keywords is greater than a preset threshold value;

if so, combining the two candidate keywords into one keyword according to the probability that the two candidate keywords are the keywords respectively;

and if not, determining the two candidate keywords as the keywords of the first object.

In one possible design, merging the two candidate keywords into one keyword according to the probability that the two candidate keywords are each a keyword includes:

and merging the two candidate keywords into a target keyword, wherein the target keyword is a keyword with higher probability of being a keyword in the two candidate keywords.

In one possible design, determining a plurality of candidate keywords corresponding to the first object according to the text information includes:

sentence division processing is carried out on the text information to obtain a plurality of short sentences;

performing word segmentation processing on each target short sentence to obtain a plurality of first words;

filtering stop words of the first vocabulary to obtain a plurality of second vocabularies;

and performing keyword prediction processing on the plurality of second words to obtain a plurality of candidate keywords.

In one possible design, the first model generates a network for pointers;

the attention distribution function of the pointer generation network includes a coverage factor.

In one possible design, the text information includes at least one of:

network data corresponding to the first object, wherein the network data comprises description information of the first object;

and the detail page is a network page introducing the first object.

In a second aspect, an embodiment of the present application provides an apparatus for extracting a keyword of an object, including:

the acquisition module is used for acquiring text information corresponding to a first object, and the text information is used for describing the first object;

the determining module is used for determining a plurality of candidate keywords corresponding to the first object according to the text information;

In one possible design, the determining module is specifically configured to: