Application content quality analysis method and application content quality analysis device

文档序号：1465989 发布日期：2020-02-21 浏览：28次中文

阅读说明：本技术 一种应用内容质量分析方法及应用内容质量分析装置 (Application content quality analysis method and application content quality analysis device ) 是由刘颖慧魏进武许丹丹赵慧于 2019-10-31 设计创作，主要内容包括：本发明公开了一种应用内容质量分析方法和应用内容质量分析装置，该方法包括：根据输入的关键字确定相关应用，并确定各所述应用内的文章；分别计算各应用内各文章的有效指数、流量指数以及内容排他性指数；根据所述应用内所有文章的所述有效指数、流量指数和内容排他性指数，分别计算各应用的内容质量指数。本发明可以实现综合性评价应用的内容质量，并且可以在向用户进行应用的个性化推荐时提供一定的决策支撑，帮助用户对应用进行过滤和选择。(The invention discloses an application content quality analysis method and an application content quality analysis device, wherein the method comprises the following steps: determining related applications according to the input keywords, and determining articles in each application; respectively calculating the effective index, the flow index and the content exclusivity index of each article in each application; and respectively calculating the content quality index of each application according to the effective index, the flow index and the content exclusivity index of all articles in the application. The method and the device can realize comprehensive evaluation of the content quality of the application, provide certain decision support when performing personalized recommendation of the application to the user, and help the user to filter and select the application.)

1. A method for application content quality analysis, the method comprising:

determining related applications according to the input keywords, and determining articles in each application;

respectively calculating the effective index, the flow index and the content exclusivity index of each article in each application;

and respectively calculating the content quality index of each application according to the effective index, the flow index and the content exclusivity index of all articles in each application.

2. The method of claim 1, wherein the determining related applications according to the input keywords and determining articles in each application comprises:

determining a homepage link of a related application according to the keyword;

respectively determining the number of sub-links contained in each homepage link, wherein the sub-links are links in the application corresponding to the homepage link;

classifying each application according to the number of the sub-links;

and acquiring the articles in each application according to the type of each application.

3. The method of claim 2, wherein the classifying each application according to the number of child links comprises:

in response to the number of the sub-links being greater than a preset first threshold, dividing the corresponding applications into first-class applications;

in response to the number of child links being less than or equal to the first threshold, dividing the corresponding application into a second class of applications.

4. The method of claim 1, wherein the calculating the validity index of each article in each application comprises:

calculating the total word number of each article in each application;

and respectively determining the effective index of each article according to the total word number of each article and a preset second threshold value.

5. The method of claim 1, wherein the calculating the flow index of each article in each application comprises:

respectively calculating the sharing quantity, the comment quantity, the praise quantity and the reading quantity of each article in each application;

respectively determining a maximum value of the share quantity, a maximum value of the comment quantity, a maximum value of the praise quantity and a maximum value of the reading quantity;

and respectively determining the flow index of each article according to the sharing amount, the comment amount, the praise amount, the reading amount, the maximum value of the sharing amount, the maximum value of the comment amount, the maximum value of the praise amount, the maximum value of the reading amount and preset weight of each article.

6. The method for analyzing the application content quality according to claim 1, wherein the calculating the content exclusivity index of each article in each application comprises:

calculating the Hamming distance between any two articles in each application respectively;

respectively determining articles with the Hamming distance in a first range and articles with the Hamming distance in a second range and meeting preset conditions for all articles in all applications;

and respectively calculating the content exclusivity index of each article according to the average value of the Hamming distances of the articles with the Hamming distances in the first range and the sum of the Hamming distances of the articles with the Hamming distances in the second range and meeting the preset condition.

7. The method of claim 6, wherein the determining the article having the hamming distance within the second range and satisfying the predetermined condition comprises: and for each article in all the applications, determining first articles with the Hamming distance within a second range, and determining the article with the publication time earlier than that of the current article from each first article.

8. The method for analyzing the quality of the application contents according to claim 1, wherein the step of calculating the quality of the content of each application according to the validity index, the traffic index and the content exclusivity index of all articles in each application comprises the steps of:

calculating the comprehensive index of each article according to the effective index, the flow index, the content exclusivity index and the preset weight of each article in each application;

determining the lowest index of each article according to the effective index and the content exclusivity index of each article;

and respectively calculating the proportion of the articles with the lowest index smaller than 1 in each application, and calculating the content quality index of each application according to the comprehensive index of each article in each application and the proportion of the articles with the lowest index smaller than 1.

9. The method for analyzing the content quality of the application according to any one of claims 1 to 8, further comprising, after the calculating the content quality index of the application according to the effective index, the traffic index and the content exclusivity index:

and ranking the content quality indexes of the applications, and determining the applications to be recommended according to the ranking.

10. An application content quality analysis device is characterized by comprising a determining module, a first calculating module and a second calculating module, wherein the determining module is used for determining related applications according to input keywords and determining articles in each application;

the first calculation module is used for calculating the effective index, the flow index and the content exclusivity index of each article in each application respectively;

the second calculating module is used for calculating the content quality index of each application according to the effective index, the flow index and the content exclusivity index of all articles in each application.

Technical Field

The invention relates to the technical field of application recommendation, in particular to an application content quality analysis method and an application content quality analysis device.

Background

With the rapid development of internet technology, human life and work increasingly depend on PC (personal computer) end applications and mobile end applications, and the applications provide great convenience for human life. Meanwhile, various different types of applications in the application market are increased explosively, various requirements of users are met to a certain extent, and more choices are provided for the users.

At present, a large number of homogeneous products are included in mass original content type applications such as tourism strategies, product recommendations and the like, the homogeneous applications have the same or similar functions, and plagiarism and manuscript washing behaviors may exist among the homogeneous applications. In addition, most of articles in some applications have few words, lack of flow heat and the like, and the quality of the applications is uneven, so that the user needs to filter and select the articles in the using process, and the using experience of the user and the using effect of the original content type application are undoubtedly influenced.

Therefore, there is a need for an application content quality analysis method and an application content quality analysis apparatus for comprehensively evaluating the content quality of an application and providing certain decision support when performing personalized recommendation of the application to a user.

Disclosure of Invention

Therefore, the invention provides an application content quality analysis method and an application content quality analysis device, which aim to solve the problem that in the prior art, users cannot effectively filter and select due to the fact that the content quality of massive applications is uneven.

In order to achieve the above object, a first aspect of the present invention provides an application content quality analysis method, including:

determining related applications according to the input keywords, and determining articles in each application;

respectively calculating the effective index, the flow index and the content exclusivity index of each article in each application;

and respectively calculating the content quality index of each application according to the effective index, the flow index and the content exclusivity index of all articles in each application.

Preferably, the determining the relevant applications according to the input keywords and determining the articles in each application includes:

determining a homepage link of a related application according to the keyword;

respectively determining the number of sub-links contained in each homepage link, wherein the sub-links are links in the application corresponding to the homepage link;

classifying each application according to the number of the sub-links;

and acquiring the articles in each application according to the type of each application.

Preferably, the classifying the applications according to the number of the sub-links includes:

in response to the number of the sub-links being greater than a preset first threshold, dividing the corresponding applications into first-class applications;

in response to the number of child links being less than or equal to the first threshold, dividing the corresponding application into a second class of applications.

Preferably, the calculating the validity index of each article in each application comprises:

calculating the total word number of each article in each application;

and respectively determining the effective index of each article according to the total word number of each article and a preset second threshold value.

Preferably, the calculating the flow index of each article in each application includes:

respectively calculating the sharing quantity, the comment quantity, the praise quantity and the reading quantity of each article in each application;

respectively determining a maximum value of the share quantity, a maximum value of the comment quantity, a maximum value of the praise quantity and a maximum value of the reading quantity;

Preferably, the calculating the content exclusivity index of each article in each application comprises:

calculating the Hamming distance between any two articles in each application respectively;

Preferably, the article for determining that the hamming distance is within the second range and satisfies the preset condition includes: and for each article in all the applications, determining first articles with the Hamming distance within a second range, and determining the article with the publication time earlier than that of the current article from each first article.

Preferably, the calculating the content quality index of each application according to the effective index, the traffic index and the content exclusivity index of all articles in each application respectively includes:

calculating the comprehensive index of each article according to the effective index, the flow index, the content exclusivity index and the preset weight of each article in each application;

determining the lowest index of each article according to the effective index and the content exclusivity index of each article;

Preferably, after the calculating the applied content quality index according to the effective index, the flow index and the content exclusivity index, the method further comprises the following steps:

and ranking the content quality indexes of the applications, and determining the applications to be recommended according to the ranking.

In order to achieve the above object, a second aspect of the present invention provides an application content quality analysis apparatus, which includes a determination module, a first calculation module, and a second calculation module, where the determination module is configured to determine, according to an input keyword, a relevant application and determine an article in each of the applications;

the first calculation module is used for calculating the effective index, the flow index and the content exclusivity index of each article in each application respectively;

The embodiment of the invention can achieve the following beneficial technical effects:

determining related applications according to input keywords, determining articles in each application, respectively calculating the effective index, the flow index and the content exclusivity index of each article in each application, and respectively calculating the content quality index of each application according to the effective index, the flow index and the content exclusivity index of all the articles in the application. The method and the device can realize comprehensive evaluation of the content quality of the application, provide certain decision support when performing personalized recommendation of the application to the user, and help the user to filter and select the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.

Fig. 1 is a schematic flow chart of an application content quality analysis method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart illustrating a process of calculating validity indices of articles in applications according to an embodiment of the present invention;

fig. 3 is a schematic flow chart of calculating a flow index of each article in each application according to an embodiment of the present invention;

fig. 4 is a schematic flow chart illustrating a process of calculating content exclusivity indexes of articles in applications according to an embodiment of the present invention;

fig. 5 is a schematic flow chart of calculating a content quality index of each application according to an embodiment of the present invention;

fig. 6a is a schematic structural diagram of an application content quality analysis apparatus according to an embodiment of the present invention;

fig. 6b is a second schematic structural diagram of an application content quality analysis apparatus according to an embodiment of the present invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.

As shown in fig. 1, the method for analyzing the quality of the application content according to the present invention may include the following steps:

step S101, determining relevant applications according to the input keywords, and determining articles in each application.

In the embodiment of the present invention, the Application includes an original content type Application, such as a website of a travel strategy, a product recommendation, and the like, or an APP (Application, smartphone third party Application), and it should be noted that the Application of the present invention does not include a blog website recorded by an individual and a reading website of a web novel.

Keywords may be words that characterize the type of application, such as "travel site" and "product recommendation," by which the relevant type of application may be determined, and altering the keywords may determine different types of applications.

The article may be an article within an original content-type application, such as a Yunnan travel route strategy or a recommended article about a laptop computer. In this step, a crawler program can be utilized to crawl all article content within different types of applications. Taking the input keyword as a "travel website" as an example, it can be determined that relevant applications are a bull, a hornet, where to go, a travel and the like, and further, all article contents in the applications can be determined according to a crawler program.

Step S102, calculating the effective index, the flow index and the content exclusivity index of each article in each application.

The crawler program can crawl all contents in a web link where the article is located, and after the article is crawled, all literal contents and traffic data (such as praise amount and forwarding amount) of the article can be obtained.

In the embodiment of the invention, the validity index represents the validity of the article and can be calculated according to the length of the article. The flow index represents the network popularity of the article and can be calculated according to some flow data representing the network popularity of the article. The content exclusivity index characterizes the similarity of articles, and can be calculated according to the similarity of characters among the articles.

Step S103, respectively calculating the content quality index of each application according to the effective index, the flow index and the content exclusivity index of all articles in each application.

The content quality of the application can be evaluated by combining the effective index, the flow index and the content exclusivity index of all articles in the application.

And respectively calculating the content quality index of each application according to the effective index, the flow index and the content exclusivity index of all articles in the application, and comprehensively analyzing the content quality of the application from the aspects of the effectiveness, the network popularity, the similarity and the like of the articles in the application. And further, operations such as relevant application screening, recommendation and the like can be performed according to the content quality index of the application.

As can be seen from the foregoing steps S101-S103, in the present invention, relevant applications are determined according to input keywords, articles in each application are determined, an effective index, a traffic index, and a content exclusivity index of each article in each application are respectively calculated, and a content quality index of each application is respectively calculated according to the effective indexes, the traffic indexes, and the content exclusivity indexes of all the articles in the application. The method and the device can realize comprehensive evaluation of the content quality of the application, provide certain decision support when performing personalized recommendation of the application to the user, and help the user to filter and select the application.

Further, in the present invention, determining relevant applications according to the input keywords and determining articles in each application may include the following steps: determining a homepage link of a related application according to the keyword; respectively determining the number of sub-links contained in each homepage link, wherein the sub-links are links in the application corresponding to the homepage links; classifying each application according to the number of the sub-links; and acquiring the articles in each application according to the type of each application.

In the embodiment of the invention, an automatic data acquisition process can be established, and a crawler program is utilized to obtain the homepage link of the related application of the keyword. Specifically, keywords are input into a search engine, a large number of related industrial websites or APP webpage links can be obtained, and all webpage links can be crawled. Typically, a web page link (e.g., "www.tuniu.com") in a web page address where the top-level domain name is not followed by a web site directory may be considered an official web page link of an application. The home page links of the relevant applications may be screened out using a regular filter.

After the homepage link is screened out, a dictionary (Key) can be established according to the searched keyword and the related homepage link. Key refers to the entered Key and valid refers to the secondary domain name in the home page link (e.g., tuniu in www.tuniu.com), and deduplicates the same Key-value pair in the dictionary.

In the embodiment of the invention, the crawler program can be utilized to automatically and deeply acquire the number of the sub-links in the application. Generally, web page links with the same secondary domain name in an address can be regarded as links within the same application, such as "http:// www.tuniu.com/tripts/31191618", "http:// go. tuniu. com/" and "http:// www.tuniu.com/" are links within a cow, which can be regarded as child links of the latter. Specifically, taking a enroute cow website as an example, according to a crawler program, a sub-link with a domain name including "tuniu" in a background format < a href > of a homepage link http:// www.tuniu.com// is obtained first, further a stratum layer obtains other sub-links in each sub-link, and finally obtains all sub-links in the enroute cow website and removes duplication.

Typically, the application may include a PC-side version and a mobile terminal (e.g., a mobile phone terminal) version, and the application including the PC-side version and the mobile terminal version generally includes a larger number of sub-links than the application including only the mobile terminal version. Therefore, applications including both the PC-side version and the mobile terminal version can be classified according to the number of sub-links, and applications including only the mobile terminal version can be distinguished from applications including only the mobile terminal version.

For the application simultaneously comprising the PC end version and the mobile terminal version, only the article content of the application of the PC end version needs to be collected. Specifically, a crawler program matched with the page format of different applications can be established, and article contents in each application can be automatically and iteratively crawled and stored.

For the application only comprising the mobile terminal version, the corresponding application needs to be downloaded by matching with the terminal, and the article content of the application of the mobile terminal version is collected. Specifically, a simulator environment can be established for different applications, the applications are automatically downloaded, registered and verified, and then article contents in each application are automatically and iteratively crawled and stored.

Further, in the present invention, classifying each application according to the number of sub-links may include the following steps: in response to the number of the sub-links being greater than a preset first threshold, dividing the corresponding applications into first-class applications; responsive to the number of child links being less than or equal to a first threshold, the corresponding application is classified as a second type of application.

In the embodiment of the present invention, a preset first threshold may be set according to an actual situation, if the number of the sub-links is greater than the preset first threshold, the application may be considered to include both the PC-side version and the mobile terminal version, the application is divided into a first type of application, and if the number of the sub-links is less than or equal to the preset first threshold, the application may be considered to include only the mobile terminal version, the application is divided into a second type of application. It should be noted that, the specific value of the preset first threshold is not particularly limited, and may be adjusted according to actual situations.

Further, as shown in fig. 2, calculating the validity index of each article in each application in the present invention may include the following steps:

in step S201, the total number of words of each article in each application is calculated.

Step S202, respectively determining the effective index of each article according to the total word number of each article and a preset second threshold value.

In the embodiment of the invention, the preset second threshold value can be set according to the actual situation, and the effective index of the article is determined by comparing the size relationship between the total word number of the article and the preset second threshold value. It should be noted that the present invention includes, but is not limited to, setting the validity index to 0 or 1.

Further, as shown in fig. 3, the calculating the flow index of each article in each application in the present invention may include the following steps:

step S301, respectively calculating the sharing amount, the comment amount, the praise amount and the reading amount of each article in each application.

Specifically, the sharing amount, the comment amount, the praise amount and the reading amount of the article can be further calculated according to the crawled article content, and the flow popularity of the article can be comprehensively evaluated according to the indexes. And calculating the sharing quantity, the comment quantity, the praise quantity and the reading quantity of each article in each application aiming at different applications.

Step S302, determining a maximum sharing amount, a maximum comment amount, a maximum like amount and a maximum reading amount respectively.

Specifically, the maximum value among the sharing amounts of all the articles in each application may be determined, that is, the maximum value of the sharing amount, and similarly, the maximum value of the comment amount, the maximum value of the endorsement amount, and the maximum value of the reading amount among all the articles in each application may be determined. And respectively determining the maximum value of the comment quantity, the maximum value of the praise quantity and the maximum value of the reading quantity in all articles in the application aiming at different applications.

Step S303, determining the flow index of each article according to the sharing amount, the comment amount, the praise amount, the reading amount, the maximum sharing amount, the maximum comment amount, the maximum praise amount, the maximum reading amount and the preset weight of each article.

Specifically, for each article in each application, according to the sharing amount, the comment amount, the like amount, and the reading amount of the article, and the maximum value of the sharing amount, the maximum value of the comment amount, the maximum value of the like amount, the maximum value of the reading amount, and the preset weight among all the articles in the application, the sharing coefficient, the comment coefficient, the like coefficient, and the reading coefficient of the article can be determined. For example, if the maximum value of the share amount among all articles in a certain application is N, and the share amount of a certain article in the application is K, the share coefficient of the article is K/N, and the comment coefficient, the like, and the reading coefficient of the article can be calculated.

In the embodiment of the invention, the importance of the share amount, the comment amount and the reading amount exerted when the flow popularity of the article is influenced is considered to be gradually reduced. For example, the weights corresponding to the share amount, the comment amount, the praise amount and the reading amount can be respectively set as P₁、P₂、P₃、P₄Then, the formula used to determine the flow index of each article may be: flow index 1+ (sharing coefficient P)₁+ evaluation coefficient P₂+ praise coefficient P₃+ reading coefficient P₄). It should be noted that the present invention is directed to P₁、P₂、P₃、P₄The specific numerical values of (A) are not particularly limited.

Further, as shown in fig. 4, in the present invention, calculating the content exclusivity index of each article in each application may include the following steps:

step S401, calculating the Hamming distance between any two articles in each application respectively.

In the embodiment of the present invention, a part of the flow in SIMHASH (similar hash) algorithm can be used to calculate the hamming distance between two articles. Specifically, in the first step, word segmentation processing is performed on the text content of each article in each application to obtain a series of words, and nonsense word-qi auxiliary words such as "yes" and "o" are removed. And secondly, calculating the hash value of each vocabulary, performing hybrid processing on a string of data through a hash algorithm, and finally outputting a string of binary 0 and 1 characters with fixed length, wherein each vocabulary corresponds to one hash value. And thirdly, weighting the hash value of each vocabulary, calculating the occurrence frequency of each vocabulary in the article, and taking the frequency as the corresponding weight of the vocabulary, namely, performing positive weighting on the part of 1 in the hash value and performing negative weighting on the part of 0. For example, in a certain article, "party" appears 4 times, with a hash value of 10011, and weighting the hash values yields "4, -4, -4, 4, 4". And fourthly, merging all the words of an article, namely performing addition and subtraction on the numbers at the same positions of the weighted values of all the words, for example, merging two words of '5, -5, 5, 5, -5' and '3, -3, -3, -3, 3' to obtain '8, -8, 2, 2, -2'. And fifthly, dimension reduction processing is carried out on the weighted digit string of one article, namely, the positive part of the digit string is changed into 1, and the negative part of the digit string is changed into 0, for example, dimension reduction is carried out on 8, -8, 2, 2, -2 to obtain 10110.

Through the steps, the SIMHASH signature which is one character string of each article in each application can be obtained. From the SIMHASH signatures of each article, the hamming distance between any two articles within all applications can be calculated, i.e., the number of 0, 1 different bits in the SIMHASH signatures of the two articles. Specifically, the xor calculation may be performed on two identical numbers in the SIMHASH signatures of any two articles, and the number of the number 1 in the xor calculation result is counted, which is the hamming distance between the two articles.

In the embodiment of the present invention, for each article of each application, the number of 1 in the xor result of the two SIMHASH signatures can be counted as Count1_KK is the K-th article participating in calculation together with the article, i.e. calculating the Hamming distance between the K-th article and the current article and counting 1_K. If there are N articles in common within all applications, there are (N-1) counts 1 for each article for each application_KThe value of K is [1, N-1 ]]Any natural number of (1).

Step S402, respectively aiming at all articles in the application, determining the article with the Hamming distance in the first range and the article with the Hamming distance in the second range and meeting the preset conditions.

In the embodiment of the present invention, the first range may be set to (3, 10)]The second range may be set to [0,3 ]]. Specifically, the case of 10 articles shared by two applications using the same keywordFor example, there are 9 counts 1 for one article of one application_KA value wherein K is [1,9 ]]From these 9 counts 1_KThe Count1 which is in the first range and the second range and meets the preset condition is screened out from the values_KThe value is obtained. It should be noted that the first and second ranges of the present invention are not limited to (3, 10)]And [0,3 ]]The adjustment can be carried out according to the actual situation.

And respectively screening all articles meeting the requirements aiming at each article in all applications.

In step S403, content exclusivity indexes of the articles are calculated according to the average of the hamming distances of the articles whose hamming distances are within the first range and the sum of the hamming distances of the articles whose hamming distances are within the second range and satisfy preset conditions.

For each article of each application, the average value of the hamming distances of the articles in the first range and the sum of the hamming distances of the articles in the second range and satisfying the preset condition can be calculated according to the hamming distances between the screened articles and the article, and the content exclusivity index of the article can be represented by the ratio of the average value to the sum. Taking 7 articles screened for a certain article as an example, the hamming distances between the 7 articles and the article are respectively: 2. 2, 3, 4, 5, 6, i.e. the average of the hamming distances within the first range is: (4+4+5+6)/4 ═ 4.75, and the sum of hamming distances within the second range and satisfying the preset condition is: (2+2+3) ═ 7. Further, the content exclusivity index R of the article may be calculated to be 4.75/7 ≈ 0.68.

Further, the determining of the article with the hamming distance within the second range and satisfying the preset condition in the present invention may include the following steps: and for each article in all the applications, determining first articles with the Hamming distance within a second range, and determining the article with the publication time earlier than that of the current article from each first article.

Specifically, for each article in each application, all the regions [0,3 ] can be screened first]Count1 between_KValues (i.e., Hamming distance), these Count1_KValue is corresponded toThe articles are first articles, and then articles with publication times earlier than that of the current article are screened from all the first articles. The finally screened articles are the articles with the Hamming distance within the second range and meeting the preset conditions.

Further, as shown in fig. 5, the calculating the content quality index of each application according to the effective index, the traffic index and the content exclusivity index of all articles in each application in the present invention may include the following steps:

step S501, calculating a comprehensive index of each article according to the effective index, the flow index, the content exclusivity index and the preset weight of each article in each application.

In the embodiment of the invention, the comprehensive index of an article is calculated and obtained according to the three indexes of the article, and the traffic index and the content exclusivity index of the article can be respectively given a weight Q₁And Q₂Then, the formula used to calculate the composite index of the article may be: integrated index (effective index) Q₁+ content exclusivity index Q₂). It should be noted that the present invention is directed to Q₁And Q₂The specific numerical values of (A) are not particularly limited. Similarly, for each article for each application, a composite index for that article may be calculated.

Step S502, determining the lowest index of each article according to the effective index and the content exclusivity index of each article.

In the embodiment of the present invention, it may be considered that at least two aspects of the effective index and the content exclusivity index of an article are required to evaluate the content quality of the article, and therefore the lowest index of the article may be the minimum value between the effective index and the content exclusivity index. Similarly, for each article for each application, the lowest index for that article may be determined.

Step S503, respectively calculating the proportion of articles with the lowest index less than 1 in each application, and calculating the content quality index of each application according to the comprehensive index of each article in each application and the proportion of articles with the lowest index less than 1.

In the embodiment of the invention, for each application, the number of articles with the lowest index smaller than 1 in the application is counted first, and the total number of all articles in the application is collected to calculate the proportion of the articles with the lowest index smaller than 1. The average of the lowest indices of all articles within this application is then calculated. The formula used to calculate the content quality index for an application may be: content quality index (sum of the composite index of all articles in the application/total number of articles) (number of articles with lowest index less than 1 in the application/total number of articles) 100%. Similarly, for each application, the content quality index of the application can be calculated according to the comprehensive indexes of all articles in the application and the proportion of the articles with the lowest index smaller than 1.

In an embodiment of the present invention, the content quality index of an application characterizes how many percent of the content of the articles within the application is valid, of high quality.

Further, the method may further include the following steps after calculating the applied content quality index according to the effective index, the traffic index and the content exclusivity index: and ranking the content quality indexes of the applications, and determining the applications to be recommended according to the ranking.

In the embodiment of the invention, the content quality ranking lists of the same type can be constructed for the related applications of different keywords, and when the applications need to be recommended to the user according to the preference type of the user, the applications are determined to be specifically recommended according to the ranking lists. The first few applications on the ranking list can be recommended, and all applications with content quality indexes in a certain range on the ranking list can also be recommended.

In the embodiment of the invention, the flow of the application content quality analysis method can be repeated at regular time, the application ranking is carried out again, and the latest decision support is provided for the user.

Based on the same technical concept, an application content quality analysis apparatus according to an embodiment of the present invention may further include, as shown in fig. 6a, a determination module 601, a first calculation module 602, and a second calculation module 603, where the determination module 601 is configured to determine a relevant application according to an input keyword, and determine an article in each application.

The first calculating module 602 is configured to calculate an effectiveness index, a traffic index, and a content exclusivity index of each article in each application.

The second calculating module 603 is configured to calculate content quality indexes of each application according to the validity indexes, the traffic indexes, and the content exclusivity indexes of all articles in the application.

Further, the determining module 601 is configured to determine a homepage link of the relevant application according to the keyword; respectively determining the number of sub-links contained in each homepage link, wherein the sub-links are links in the application corresponding to the homepage link; classifying each application according to the number of the sub-links; and acquiring the articles in each application according to the type of each application.

Further, the determining module 601 is configured to, in response to that the number of the sub-links is greater than a preset first threshold, divide the corresponding applications into first-class applications; in response to the number of child links being less than or equal to the first threshold, dividing the corresponding application into a second class of applications.

Further, the first calculating module 602 is configured to calculate a total word count of each article in each application; and respectively determining the effective index of each article according to the total word number of each article and a preset second threshold value.

Further, the first calculating module 602 is configured to calculate sharing amount, comment amount, praise amount, and reading amount of each article in each application, respectively; respectively determining a maximum value of the share quantity, a maximum value of the comment quantity, a maximum value of the praise quantity and a maximum value of the reading quantity; and respectively determining the flow index of each article according to the sharing amount, the comment amount, the praise amount, the reading amount, the maximum value of the sharing amount, the maximum value of the comment amount, the maximum value of the praise amount, the maximum value of the reading amount and preset weight of each article.

Further, the first calculating module 602 is configured to calculate hamming distances between any two articles in each of the applications respectively; respectively determining articles with the Hamming distance in a first range and articles with the Hamming distance in a second range and meeting preset conditions for all articles in all applications; and respectively calculating the content exclusivity index of each article according to the average value of the Hamming distances of the articles with the Hamming distances in the first range and the sum of the Hamming distances of the articles with the Hamming distances in the second range and meeting the preset condition.

Further, the first calculation module 602 is configured to, for each article in all applications, determine first articles with a hamming distance within a second range, and determine, from each first article, an article with a publication time earlier than that of the current article.

Further, the second calculating module 603 is configured to calculate a comprehensive index of each article according to the effective index, the traffic index, the content exclusivity index, and the preset weight of each article in each application; determining the lowest index of each article according to the effective index and the content exclusivity index of each article; and respectively calculating the proportion of the articles with the lowest index smaller than 1 in each application, and calculating the content quality index of each application according to the comprehensive index of each article in each application and the proportion of the articles with the lowest index smaller than 1.

Further, as shown in fig. 6b, the apparatus for analyzing content quality of an application according to an embodiment of the present invention may further include a recommending module 604, where the recommending module 604 is configured to rank the content quality indexes of the applications, and determine the application to be recommended according to the ranking.

It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

14页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：对象识别方法以及装置

Application content quality analysis method and application content quality analysis device

相关技术

网友询问留言