Model training method, link scoring method, device, equipment, medium and product

文档序号:1846145 发布日期:2021-11-16 浏览:12次 中文

阅读说明:本技术 模型训练方法、链接评分方法、装置、设备、介质和产品 (Model training method, link scoring method, device, equipment, medium and product ) 是由 余文利 刘伟 杨国强 陈由之 王鹏 张博 华轶名 于 2021-06-16 设计创作,主要内容包括:本公开提供了一种模型训练方法、链接评分方法、装置、设备、介质和产品,涉及人工智能领域,尤其涉及机器学习领域和内容推荐领域,可应用于链接抓取和链接库维护场景。该方法包括:获取与被存储在链接库中的链接集合相关联的链接质量信息和链接统计信息,链接集合与所选择的链接相关联;以及训练链接评分模型,使得使用经训练的链接评分模型,基于链接质量信息和链接统计信息所确定的、针对所选择的链接的评分和针对所选择的链接预先标注的标注评分的差别小于阈值差别。利用上述方法,可以基于主域、站点和目录粒度中的链接质量信息和链接统计信息来训练链接评分模型,以用于对抓取的链接进行准确和高效的评分,因此能够提高链接抓取和链接库维护的质量和效率。(The disclosure provides a model training method, a link scoring device, equipment, a medium and a product, relates to the field of artificial intelligence, particularly relates to the field of machine learning and the field of content recommendation, and can be applied to link capture and link library maintenance scenes. The method comprises the following steps: obtaining link quality information and link statistics associated with a set of links stored in a link library, the set of links being associated with the selected link; and training the link scoring model such that a difference in a score for the selected link and a pre-labeled labeling score for the selected link, determined based on the link quality information and the link statistics, is less than a threshold difference using the trained link scoring model. By using the method, the link scoring model can be trained based on link quality information and link statistical information in the main domain, site and directory granularity, so that the captured links can be scored accurately and efficiently, and the quality and efficiency of link capture and link library maintenance can be improved.)

1. A model training method, comprising:

obtaining link quality information and link statistics associated with a set of links stored in a link library, the set of links associated with the selected link; and

training a link scoring model such that a difference in a score for the selected link and a pre-labeled annotation score for the selected link, determined based on the link quality information and the link statistics, is less than a threshold difference using the trained link scoring model.

2. The method of claim 1, wherein the set of links is associated with the selected link comprises at least one of:

the set of links and the selected link belong to the same home domain;

the link set and the selected link belong to the same site; and

the set of links and the selected link belong to the same directory under the same site.

3. The method of claim 1, wherein the link quality information comprises at least one of:

the number proportion of the links in the link set, which are associated with bad pages, in the link set;

a proportion of the number of low quality links in the link set; and

a ratio of high quality links in the set of links that were stored in the link library during a most recent first predetermined time period to a number of links in the link library that were stored during a most recent first predetermined time period.

4. The method of claim 1, wherein the link statistics include at least one of:

a number of links in the set of links that were stored in the link library within a second most recent predetermined time period;

a number of views of a page associated with the offer data within a most recent third predetermined time period of pages associated with the set of links; and

a number of dynamic links in the set of links that were stored in the link library within a fourth most recent predetermined time period.

5. A link scoring method comprising determining a score for an input link using a link scoring model trained according to any one of claims 1 to 4.

6. The method of claim 5, further comprising:

and grabbing the input link through a web crawler.

7. The method of claim 5, further comprising:

storing the input link in a link library if the score is above a first threshold score.

8. The method of claim 5, further comprising:

if the score is below a second threshold score, the incoming link is dropped.

9. A model training apparatus comprising:

an information acquisition module configured to acquire link quality information and link statistics information associated with a set of links stored in a link library, the set of links being associated with the selected link; and

a model training module configured to train a link scoring model such that a difference in a score for the selected link and a pre-labeled labeling score for the selected link, determined based on the link quality information and the link statistics, is less than a threshold difference using the trained link scoring model.

10. The apparatus of claim 9, wherein the set of links associated with the selected link comprises at least one of:

the set of links and the selected link belong to the same home domain;

the link set and the selected link belong to the same site; and

the set of links and the selected link belong to the same directory under the same site.

11. The apparatus of claim 9, wherein the link quality information comprises at least one of:

the number proportion of the links in the link set, which are associated with bad pages, in the link set;

a proportion of the number of low quality links in the link set; and

a ratio of high quality links in the set of links that were stored in the link library during a most recent first predetermined time period to a number of links in the link library that were stored during a most recent first predetermined time period.

12. The apparatus of claim 9, wherein the link statistics comprise at least one of:

a number of links in the set of links that were stored in the link library within a second most recent predetermined time period;

a number of views of a page associated with the offer data within a most recent third predetermined time period of pages associated with the set of links; and

a number of dynamic links in the set of links that were stored in the link library within a fourth most recent predetermined time period.

13. A link scoring apparatus configured to determine a score for an input link using a link scoring model trained by the model training apparatus according to any one of claims 9 to 12.

14. The apparatus of claim 13, further comprising:

a link crawling module configured to crawl the input links through a web crawler.

15. The apparatus of claim 13, further comprising:

a link storage module configured to store the input link in a link library if the score is above a first threshold score.

16. The apparatus of claim 13, further comprising:

a link drop module configured to drop the input link if the score is below a second threshold score.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4.

18. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 5-8.

19. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-4.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 5-8.

21. A computer program product comprising a computer program which, when executed by a processor, performs the method of any one of claims 1-4.

22. A computer program product comprising a computer program which, when executed by a processor, performs the method of any one of claims 5-8.

Technical Field

The present disclosure relates to the field of artificial intelligence, particularly to the field of machine learning and content recommendation, applicable to link crawling and link library maintenance scenarios, and more particularly to a model training method, a link scoring method, a model training apparatus, a link scoring apparatus, an electronic device, a computer readable storage medium, and a computer program product.

Background

Web crawlers such as spiders can grab billions or even higher levels of links from the internet every day. However, the vast majority of the links that are crawled are spam links or worthless links. In addition, the link discovery module of the link library may discover new links based on the links that have been currently crawled. Therefore, if the links stored in the link library are all junk links or worthless links, the newly discovered and captured links are also junk links or worthless links, and storage resources of the link library can be seriously wasted. Meanwhile, the search engine front-end display module also screens out link resources meeting the user query by using the link library, so that too many junk links or worthless links in the link library also seriously influence the query experience of the user.

However, conventional techniques for linked library maintenance do not solve the above problems.

Disclosure of Invention

According to an embodiment of the present disclosure, there is provided a model training method, a link scoring method, a model training apparatus, a link scoring apparatus, an electronic device, a computer-readable storage medium, and a computer program product.

In a first aspect of the present disclosure, there is provided a model training method comprising: obtaining link quality information and link statistics associated with a set of links stored in a link library, the set of links being associated with the selected link; and training the link scoring model such that a difference in a score for the selected link and a pre-labeled labeling score for the selected link, determined based on the link quality information and the link statistics, is less than a threshold difference using the trained link scoring model.

In a second aspect of the present disclosure, there is provided a link scoring method comprising determining a score for an input link using a link scoring model trained in accordance with the first aspect of the present disclosure.

In a third aspect of the present disclosure, there is provided a model training apparatus comprising: an information acquisition module configured to acquire link quality information and link statistics information associated with a set of links stored in a link library, the set of links being associated with the selected link; and a model training module configured to train the link scoring model such that a difference in a score for the selected link and a pre-labeled annotation score for the selected link, determined based on the link quality information and the link statistics, is less than a threshold difference using the trained link scoring model.

In a fourth aspect of the present disclosure, there is provided a link scoring apparatus configured to determine a score for an input link using a link scoring model trained by a model training apparatus according to the third aspect of the present disclosure.

In a fifth aspect of the present disclosure, there is provided an electronic device comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to implement a method according to the first aspect of the disclosure.

In a sixth aspect of the present disclosure, there is provided an electronic device comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to implement a method according to the second aspect of the disclosure.

In a seventh aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to implement a method according to the first aspect of the present disclosure.

In an eighth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to implement a method according to the second aspect of the present disclosure.

In a ninth aspect of the disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the method according to the first aspect of the disclosure.

In a tenth aspect of the disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the method according to the second aspect of the disclosure.

By utilizing the technology according to the application, a link scoring method is provided, and by utilizing the technical scheme of the method, a link scoring model can be trained on the basis of link quality information and link statistical information associated with links in main domain, site and directory granularity, so that the scored links can be scored accurately and efficiently, low-scoring links are further discarded, and associated page content and associated links are captured for high-scoring links, so that the quality and efficiency of link capture and link library maintenance can be improved, the storage resources of a link library can be saved, and the user experience of a user using the link library can be improved.

It should be understood that the statements herein reciting aspects are not intended to limit the critical or essential features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be apparent from the following more particular descriptions of exemplary embodiments of the disclosure as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout the exemplary embodiments of the disclosure. It should be understood that the drawings are for a better understanding of the present solution and do not constitute a limitation of the present disclosure. Wherein:

FIG. 1 illustrates a schematic block diagram of a model training environment 100 in which a model training method in certain embodiments of the present disclosure may be implemented;

FIG. 2 illustrates a flow diagram of a model training method 200 according to an embodiment of the present disclosure;

FIG. 3 illustrates a flow diagram of a link scoring method 300 according to an embodiment of the present disclosure;

FIG. 4 shows a schematic block diagram of a model training apparatus 400 according to an embodiment of the present disclosure; and

FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure.

Like or corresponding reference characters designate like or corresponding parts throughout the several views.

Detailed Description

Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The term "include" and its variants, as used herein, are intended to be inclusive in an open-ended manner, i.e., "including but not limited to. Unless specifically stated otherwise, the term "or" means "and/or". The term "based on" means "based at least in part on". The terms "one example embodiment" and "one embodiment" mean "at least one example embodiment". The term "another embodiment" means "at least one additional embodiment". The terms "first," "second," and the like may refer to different or the same object. Other definitions, both explicit and implicit, may also be included below.

As described above in the background, conventional techniques for linked library maintenance are deficient. Specifically, in the conventional scheme, since only pure link data is stored in the link library and there is no content data of the page associated with the link, there are very few features that can be used to determine whether the page quality is good or bad, and there are usually only pure link features, such as: such as the long-term characteristics of the link of url, etc., it is difficult to accurately determine the quality of the link.

In order to at least partially solve one or more of the above problems and other potential problems, embodiments of the present disclosure provide a link scoring method, by which a link scoring model can be trained based on link quality information and link statistical information associated with links in a home domain, site and directory granularity for accurately and efficiently scoring a captured link, and further discarding a low-scoring link, capturing associated page content and associated links for a high-scoring link, thereby improving quality and efficiency of link capture and link library maintenance, saving storage resources of a link library, and improving user experience of a user using the link library.

FIG. 1 illustrates a schematic block diagram of a model training environment 100 in which a model training method in certain embodiments of the present disclosure may be implemented. In accordance with one or more embodiments of the present disclosure, model training environment 100 may be a cloud environment. As shown in FIG. 1, model training environment 100 includes a computing device 110. In the model training environment 100, the training related data 120 is provided to the computing device 110 as input to the computing device 110. The training related data 120 may include, for example, the selected link and link quality information and link statistics associated with a set of links stored in a link library, where the set of links is associated with the selected link. The computing device 110 may interact with the link scoring model 130, for example, the computing device 110 may provide at least a portion of the training related data 120 to the link scoring model 130, receive a score for the selected link from the link scoring model 130 determined by the link scoring model 130 based on the training related data 120, and issue instructions to the link scoring model 130 to adjust parameters of the link scoring model 130 to train the link scoring model 130 by comparing the score for the selected link determined by the link scoring model 130 to a corresponding annotated score that is pre-annotated for the selected link, and determining whether the compared difference is less than a threshold difference.

In accordance with one or more embodiments of the present disclosure, in the model training environment 100, after the computing device 110 receives the score for the selected link determined by the link scoring model 130 based on the training-related data 120, instructions to adjust the parameters of the link scoring model 130 may be issued to the link scoring model 130 by comparing the score for the selected link determined by the link scoring model 130 to the corresponding pre-labeled annotation score for the selected link, such that the score for the selected link determined by the link scoring model 130 differs from the corresponding pre-labeled annotation score for the selected link as little as possible.

It should be appreciated that the model training environment 100 is merely exemplary and not limiting, and is scalable in that more computing devices 110 may be included and more training-related data 120 may be provided as input to the computing devices 110, and the computing devices 110 may also interact with more link scoring models 130, such that the need for more users to simultaneously utilize more computing devices 110, and even more training-related data 120, to simultaneously or non-simultaneously train the link scoring models 130 may be satisfied. Further, the computing device 110 may also use the trained link scoring model 130 to determine a score for the input link after training of the link scoring model 130 is completed.

In the model training environment 100 shown in FIG. 1, the input of training-related data 120 to the computing device 110 and the interaction between the computing device 110 and the link scoring model 130 may be conducted over a network.

FIG. 2 shows a flow diagram of a model training method 200 according to an embodiment of the present disclosure. In particular, the model training method 200 may be performed by the computing device 110 in the model training environment 100 shown in FIG. 1. It should be understood that the model training method 200 may also include additional operations not shown and/or may omit the operations shown, as the scope of the present disclosure is not limited in this respect.

At block 202, the computing device 110 obtains link quality information and link statistics associated with a set of links stored in a link library. In accordance with one or more embodiments of the present disclosure, a set of links is associated with the selected link, and the computing device 110 may obtain link quality information and link statistics associated with the set of links stored in the link library based on receiving an indication of the selected link.

According to one or more embodiments of the present disclosure, the set of links associated with the selected link includes at least one of: the link set and the selected link belong to the same main domain; the link set and the selected link belong to the same site; and the set of links and the selected link belong to the same directory under the same site. The form of the main domain is, for example, baidu.com or the like, and when the main domain is baidu.com, sites www.baidu.com, zhidao. The site is for example in the form of www.baidu.com or the like. The directory may be in the form of www.baidu.com/index/a, etc., and the links belonging to the same directory may include links that are a regular match.

According to one or more embodiments of the present disclosure, the link quality information includes at least one of: the number proportion of the links in the link set, which are associated with the bad pages, in the link set; the proportion of the number of low quality links in the link set; the number proportion of the pages which fail to grab the page content in the pages associated with the link set; the quantity proportion of the high-quality links in the link set; and a ratio of high quality links in the set of links that were stored in the linked library during the most recent first predetermined time period to a number of links in the linked library that were stored during the most recent first predetermined time period.

Links associated with bad pages may include, for example, links where the associated page content relates to pornography, gambling, fraud, etc., or links where the associated page is a diversion page. Links associated with objectionable pages may be determined, for example, by analyzing page content associated with the links using a trained semantic model. According to embodiments of the present disclosure, the score for such links may be lower.

The low-quality links may include, for example, links where the associated page is blank or where the associated page is missing, and are considered low-quality links because such links do not link to valuable content. According to embodiments of the present disclosure, the score for such links may be lower.

Pages that fail to capture page content may include, for example, pages that fail to capture page content associated with a link due to server instability, related content having been deleted, and so forth. Such pages are often difficult to stably provide valuable content. According to embodiments of the present disclosure, the score for such links may be lower.

High quality links may include, for example, links where the page content of the associated page is easy to obtain and save, and where the page content has a higher value or is more popular. High quality links may be determined, for example, by analyzing pages associated with the links using a trained policy model. According to embodiments of the present disclosure, the score for such links may be higher. It is to be noted that the proportion of the number of high-quality links in the link set and the proportion of high-quality links in the link set, which are stored in the link library in the most recent first predetermined period of time, to the number of links in the link library in the most recent first predetermined period of time may be counted at the same time, so that the effects of long-term quality and real-time quality may be comprehensively considered in the case where, for example, a page of a certain site has ever been of high quality but the recent quality has seriously degraded.

According to one or more embodiments of the present disclosure, the link statistics include at least one of: the number of links in the set of links; the number of links in the set of links that were stored in the link library within a second most recent predetermined time period; the time at which the links in the set of links are stored in the link library; a number of views of a page associated with the offer data within a most recent third predetermined time period of pages associated with the set of links; the length of the links in the set of links; the number of dynamic links in the link set that were stored in the link library within the most recent fourth predetermined time period; and the number of static links in the set of links that were stored in the link library within the fourth most recent predetermined time period.

It is to be noted that the number of links in the link set and the number of links in the link set that are stored in the link library in the most recent second predetermined time period may be counted at the same time, so that whether an abnormality occurs in the number of links stored in the link library can be determined by the historical number of links and the real-time number of links. According to an embodiment of the present disclosure, when it is determined that the number of links stored in the link library within the second predetermined period is abnormal according to the aforementioned number of links, the score of such links may be adjusted. At this point, if the number of links recently stored in the link library is significantly reduced, such links may be scored lower, and if the number of links recently stored in the link library is significantly increased, it may be further determined how to adjust the scoring based on the content of the page associated with the links, as the page relating to objectionable content may also result in a significant increase in the number of links recently stored in the link library.

The time at which a link in the set of links is stored in the link library may indicate a time interval at which two links are stored, which may be used, for example, to determine the frequency with which links are stored into the link library. For example, if a link from a site is frequently stored in the link library, it may indicate that the link from the site is highly storable. According to embodiments of the present disclosure, the score for such links may be higher.

The pages associated with the auction data may include, for example, pages that contribute significantly to the site, or pages that relate to the subject content of the site. For example, if the site relates to a merchandise site, the page associated with merchandise trading may be considered a page associated with the bid data. According to embodiments of the present disclosure, the score for such a link may be higher if the number of views of the page associated with the bid data is larger.

The length of the links in the set of links may include, for example, the number of characters included by the link. For example, links from the same site typically include a number of characters that is within a range, so that the length of the link can be used to determine whether the link is an abnormal link. If a link is determined to be an abnormal link by the length of the link, the score for such link may be lower according to embodiments of the present disclosure.

Dynamic links and static links are categories of links. The dynamic link may include, for example, a link associated with a live page. Dynamic links are generally less stable than static links, as live pages may stop playing after a period of time. Thus, according to embodiments of the present disclosure, static links may be scored higher than dynamic links when other conditions are the same or similar.

According to one or more embodiments of the present disclosure, the lengths of the first, second, third, and fourth predetermined periods of time mentioned in the aforementioned link quality information and link statistical information may be the same.

At block 204, the computing device 110 trains the link scoring model 130 such that, using the trained link scoring model 130, the difference in the score for the selected link and the pre-labeled annotation score for the selected link, as determined based on the link quality information and the link statistics acquired at block 202, is less than a threshold difference.

In accordance with one or more embodiments of the present disclosure, the computing device 110 may train the link scoring model 130 such that the score for the selected link and the pre-labeled annotation score for the selected link, as determined based on the link quality information and link statistics obtained at block 202, differ as little as possible or converge using the trained link scoring model 130.

In accordance with one or more embodiments of the present disclosure, a loss score may be introduced to facilitate comparison of scores for selected links and pre-labeled annotation scores for the selected links as determined by the link scoring model 130. For example, the computing device 110 may determine a score loss score for the selected link and the pre-labeled annotation score for the selected link, as determined by the link score model 130, using a loss function, and then adjust the parameters of the link score model 130 such that the score loss score determined for the score for the selected link and the pre-labeled annotation score for the selected link, as determined by the link score model 130, is as small as possible.

In accordance with one or more embodiments of the present disclosure, in training the link score model 130, the link score model 130 may be trained by obtaining positive and negative samples from the selected link, the link quality information and link statistics obtained at block 202, and the pre-labeled label score for the selected link. For example, positive and negative samples may be distinguished from site information such as site value, or a link associated with a page judged to be bad by the semantic model may be determined as a negative sample. It should be noted that, when the link scoring model 130 is trained, factors that may cause a reduction in the score may be considered more, so that link types with low quality may be learned as much as possible, which may help the trained link scoring model 130 to determine the score of the link more accurately.

FIG. 3 shows a flow diagram of a link scoring method 300 according to an embodiment of the present disclosure. In particular, the link scoring method 300 may be performed by the computing device 110 in the model training environment 100 shown in FIG. 1. It should be understood that the link scoring method 300 may also include additional operations not shown and/or may omit the operations shown, as the scope of the present disclosure is not limited in this respect.

At block 302, the computing device 110 crawls the input links through a web crawler. According to some embodiments of the present disclosure, the input link may be crawled through a web crawler, and whether to store the input link in a link library may be further determined by scoring the input link. According to other embodiments of the present disclosure, the input link may be a link that is already stored in the link library, at which point the input link need not be crawled by a web crawler, and whether to remove the input link from the link library may be further determined by scoring the input link.

At block 304, the computing device 110 determines a score for the input link using the link scoring model 130 trained in accordance with the model training method 200. In accordance with one or more embodiments of the present disclosure, after an input link is obtained, computing device 110 may determine a score for the input link using link scoring model 130 trained in accordance with model training method 200 in conjunction with link quality information and link statistics associated with a set of links stored in a link library associated with the input link.

At block 306, the computing device 110 determines whether the score determined at block 304 is above a first threshold score. If the score determined at block 304 is above the first threshold score, the method 300 proceeds to block 308; otherwise, method 300 proceeds to block 310. According to one or more embodiments of the present disclosure, the first threshold score is a preset threshold score, and links having a score higher than the first threshold score are considered high quality links.

At block 308, the computing device 110 stores the input link in a link library.

At block 310, the computing device 110 determines whether the score determined at block 304 is below a second threshold score. If the score determined at block 304 is below the second threshold score, the method 300 proceeds to block 312; otherwise, the method 300 may not operate or the incoming link may be marked as pending for further manual review.

At block 312, the computing device 110 discards the input link.

In accordance with one or more embodiments of the present disclosure, the first threshold score and the second threshold score may be the same.

It should be appreciated that when the incoming link is a link that is already stored in the link library, the method 300 may be modified to eliminate the steps of blocks 302, 306, and 308, such that only a determination is made as to whether the incoming link scores below a second threshold score, and the incoming link is deleted from the link library when the incoming link scores below the second threshold score.

The foregoing describes, with reference to fig. 1-3, relevant content of a model training environment 100 in which a model training method in certain embodiments of the present disclosure may be implemented, a model training method 200 according to an embodiment of the present disclosure, and a link scoring method 300 according to an embodiment of the present disclosure. It should be understood that the above description is intended to better illustrate what is recited in the present disclosure, and is not intended to be limiting in any way.

It should be understood that the number of various elements and the size of physical quantities employed in the various drawings of the present disclosure are by way of example only and are not limiting upon the scope of the present disclosure. The above numbers and sizes may be arbitrarily set as needed without affecting the normal implementation of the embodiments of the present disclosure.

Details of the model training method 200 and the link scoring method 300 according to embodiments of the present disclosure have been described above with reference to fig. 1 to 3. Hereinafter, the respective modules in the model training apparatus will be described with reference to fig. 4.

FIG. 4 is a schematic block diagram of a model training apparatus 400 in accordance with an embodiment of the present disclosure. As shown in fig. 4, the model training apparatus 400 may include: an information acquisition module 410 configured to acquire link quality information and link statistics associated with a set of links stored in a link library, the set of links being associated with the selected link; and a model training module 420 configured to train the link scoring model such that a difference in a score for the selected link and an annotation score pre-annotated for the selected link, determined based on the link quality information and the link statistics, is less than a threshold difference using the trained link scoring model.

In one or more embodiments, wherein the set of links is associated with the selected link comprises at least one of:

the link set and the selected link belong to the same main domain;

the link set and the selected link belong to the same site; and

the set of links belong to the same directory under the same site as the selected link.

In one or more embodiments, wherein the link quality information includes at least one of:

the number proportion of the links in the link set, which are associated with the bad pages, in the link set;

the proportion of the number of low quality links in the link set;

the number proportion of the pages which fail to grab the page content in the pages associated with the link set;

the quantity proportion of the high-quality links in the link set; and

a ratio of high quality links in the set of links that were stored in the linked library during the most recent first predetermined time period to a number of links in the linked library that were stored during the most recent first predetermined time period.

In one or more embodiments, wherein the link statistics include at least one of:

the number of links in the set of links;

the number of links in the set of links that were stored in the link library within a second most recent predetermined time period;

the time at which the links in the set of links are stored in the link library;

a number of views of a page associated with the offer data within a most recent third predetermined time period of pages associated with the set of links;

the length of the links in the set of links;

the number of dynamic links in the link set that were stored in the link library within the most recent fourth predetermined time period; and

the number of static links in the set of links that were stored in the link library within the most recent fourth predetermined period of time.

Through the above description with reference to fig. 1 to 4, the technical solution according to the embodiments of the present disclosure has many advantages over the conventional solution. For example, with the technical solution of the method, a link scoring model can be trained based on link quality information and link statistical information associated with links in the home domain, site and directory granularity for accurately and efficiently scoring the captured links, and further discarding low-scoring links, capturing associated page content and associated links for high-scoring links, so that the quality and efficiency of link capture and link library maintenance can be improved, not only the storage resources of the link library can be saved, but also the user experience of the user using the link library can be improved.

Specifically, the technical solution according to the embodiments of the present disclosure may determine the link quality of a large-scale link based on link quality information and link statistical information, and may determine the quality of a link based on a large amount of richly-diversified link quality information and link statistical information. In addition, according to the technical scheme of the implementation mode of the disclosure, the cleaning of junk links or worthless links stored in the link library can be supported, so that the storage resources of the link library can be effectively saved, and a good link library use environment can be ensured.

There is also provided, in accordance with an embodiment of the present disclosure, a link scoring apparatus configured to determine a score for an input link using a link scoring model trained by the model training apparatus 400 in accordance with the present disclosure, an electronic device, a computer-readable storage medium, and a computer program product.

FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure. For example, the computing device 110 as shown in FIG. 1 and the model training apparatus 400 as shown in FIG. 4 may be implemented by the electronic device 500. The electronic device 500 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the device 500 comprises a computing unit 501 which may perform various suitable actions and processes in accordance with a computer program stored in a read-only memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, computing units running various machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 501 performs the various methods and processes described above, such as the model training method 200 and the link scoring method 300. For example, in some embodiments, the model training method 200 and the link scoring method 300 may be implemented as computer software programs tangibly embodied in a machine-readable medium, such as the storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When loaded into RAM 503 and executed by computing unit 501, may perform one or more of the steps of model training method 200 and link scoring method 300 described above. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the model training method 200 and the link scoring method 300 by any other suitable means (e.g., by way of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

18页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种企业关系建立方法以及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!