Thermal control method and system based on semantic real-time analysis

文档序号:1963842 发布日期:2021-12-14 浏览:14次 中文

阅读说明:本技术 一种基于语义实时分析的热控方法及系统 (Thermal control method and system based on semantic real-time analysis ) 是由 杨建仁 于 2021-08-19 设计创作,主要内容包括:本发明提供了一种基于语义实时分析的热控方法及系统,通过网络爬虫技术实时采集互联网网页数据和搜索排名的关键词并将采集到的互联网网页数据以其不同的URL划分为多个不同的文本文件进行存储,将读取到的多个不同的字符串通过分词算法切分为多个不同的分词数组得到集合Cps,根据计算搜索排名的关键词在集合Cps中各分词数组的多个关联序列,选取多个关联序列中关键词热度最高的一个关联序列作为关键关联序列发送到客户端,实现了根据实时的搜索热词对多个相关文本的信息筛选和信息提取,达到了根据实时的搜索关键词进行实时分析和热控。(The invention provides a thermal control method and a thermal control system based on semantic real-time analysis, which are characterized in that internet webpage data and search ranking keywords are collected in real time through a web crawler technology, the collected internet webpage data are divided into a plurality of different text files for storage through different URLs of the internet webpage data, a plurality of read different character strings are divided into a plurality of different word segmentation arrays through a word segmentation algorithm to obtain a set Cps, one correlation sequence with the highest keyword popularity in the correlation sequences is selected as a key correlation sequence to be sent to a client according to a plurality of correlation sequences of the word segmentation arrays in the set Cps for calculating the search ranking keywords, information screening and information extraction of a plurality of related texts according to real-time search hot words are realized, and real-time analysis and thermal control according to the real-time search keywords are realized.)

1. A thermal control method based on semantic real-time analysis is characterized by comprising the following steps:

s100, collecting internet webpage data and searching keywords of ranking in real time through a web crawler technology;

s200, dividing the acquired Internet page data into a plurality of different text files by using different URLs of the Internet page data for storage;

s300, respectively reading a plurality of different text files into a plurality of different character strings, segmenting the read plurality of different character strings into a plurality of different participle arrays through a participle algorithm, and taking the plurality of different participle arrays as a set Cps;

s400, according to the keywords of the search ranking, calculating a plurality of associated sequences of the keyword of the search ranking in each participle array in the set Cps;

s500, calculating and selecting one correlation sequence with the highest keyword heat degree in the plurality of correlation sequences as a key correlation sequence;

s600, the key association sequence is sent to the client.

2. The thermal control method based on semantic real-time analysis according to claim 1, wherein in S100, the method for collecting internet webpage data and searching for ranked keywords in real time through web crawler technology comprises: the method comprises the steps of collecting internet webpage data and hot search keywords of search ranking in real time through a web crawler technology, wherein the hot search keywords are recorded as keywords, and the internet webpage data and the hot search keywords of the search ranking are obtained from one or more search API interfaces of a Baidu API interface, a dog search API interface, a 360 search API interface and a necessary search API interface.

3. The thermal control method based on semantic real-time analysis according to claim 1, wherein in S200, the method for dividing the collected internet web page data into a plurality of different text files by using different URLs thereof to store the internet web page data comprises: the method comprises the steps that collected internet web page data are stored in a JSON format to be structured data, the structured data comprise character string data of corresponding web page data and URLs of collected websites of the web page data, the character string data in each piece of structured data are read for different pieces of structured data according to different URLs of the different pieces of structured data, and the read character string data are divided into a plurality of different text files according to different URLs to be stored.

4. The thermal control method based on semantic real-time analysis according to claim 2, wherein in S300, the method for respectively reading a plurality of different text files into a plurality of different character strings, segmenting the plurality of read different character strings into a plurality of different word segmentation arrays through a word segmentation algorithm, and using the plurality of different word segmentation arrays as the set Cps comprises: respectively reading effective character information in a plurality of different text files as a plurality of different character strings, respectively segmenting each read character string through a Chinese word segmentation algorithm to obtain a plurality of different character string arrays which are recorded as word segmentation arrays, and recording a set of the plurality of different word segmentation arrays as a set Cps.

5. The method of claim 4, wherein in step S400, the method for calculating the plurality of associated sequences of the word segmentation arrays of the search ranking keywords in the set Cps according to the search ranking keywords comprises: recording a set of search ranking keywords as a set Querys, recording the number of elements in the set Querys as n, and recording the sequence number of the elements in the set Querys as i, i belongs to [1, n ], wherein the set is { Q (1), Q (2), …, Q (n-1), Q (n) };

the number of elements in the set Cps is m, the sequence number of the elements in the set Cps is j, j belongs to [1, m ], and the Cps is { Cps (1), Cps (2), …, Cps (m-1), Cps (m) };

marking a variable k to represent the array length of each participle array Cps (j) in the set Cps, a variable h to represent the serial number of the character string in the participle array Cps (j), Cps (j, h) to represent the character string with the serial number h in the element with the serial number j in the set Cps, and h ∈ [1, k ], wherein Cps (j) ═[ Cps (j,1), …, Cps (j, k-1), Cps (j, k) ];

note that the function Glv () is a function for calculating an input character string by a word embedding algorithm to obtain a word vector thereof, Glv (Cps (j, h)) indicates a word vector obtained by a word embedding algorithm for a character string with a sequence number h among elements with sequence numbers j in a set Cps, G (j, h) ═ Glv (Cps (j, h)), Glv (q (i)) indicates a word vector obtained by a word embedding algorithm for a character string with a sequence number i among elements with sequence numbers i in a set Querys, and gq (i) ═ Glv (q (i)), a variable q indicates a q-th dimension of the word vector, a variable p indicates the number of dimensions of the word vector, G (j, h) [ q ] indicates a numerical value of a q-th dimension of the word vector G (j, h), and gq (i) [ q ] indicates a numerical value of a q-th dimension of the word vector gq (i);

the function Sim () represents the calculation of the degree of inclination between two vectors of the input, the function Sim (gq (i), G (j, h)) represents the calculation of the degree of inclination between the word vectors gq (i) and G (j, h) by the function Sim (), and the calculation formula of the degree of inclination Sim (gq (i), G (j, h)) is:

calculating a plurality of associated sequences of keywords of each search rank in the set Querys in each participle array in the set Cps, comprising the following steps

S401, setting the value of a variable i to be 1; creating an empty set Chianset, wherein the set Chianset has mutual anisotropy and orderliness; go to S402;

s402, acquiring an element Q (i) with a sequence number of i in Querys; gq (i) of Q (i) is obtained by a function Glv (); go to S403;

s403, setting the value of the variable j to be 1; go to S404;

s404, acquiring an element Cps (j) with the sequence number j in Cps; creating an empty array Simset; go to S405;

s405, enabling the value of the variable h to be 1; go to S406;

s406, obtaining an element Cps (j, h) with the sequence number h in Cps (j); obtaining G (j, h) of Cps (j, h) by a function Glv (); go to S407;

s407, acquiring tendency degrees Sim (Gq (i), G (j, h)); adding the tendency Sim (Gq (i), G (j, h)) to the array Simset; go to S408;

s408, judging whether constraint conditions h ≧ k are met, if yes, turning to S4081, and if not, turning to S4082;

s4081, calculating an arithmetic mean value sim _ avg of each element in the array Simset, and taking a set of serial numbers of each element with a value larger than sim _ avg in the array Simset as a set Seq; taking each element in the set Seq as a target sequence number, extracting the element of the target sequence number in cps (j) as an array Chain, and adding the array Chain into a set Chianset; go to S409;

s4082, increasing the value of h by 1; go to S406;

s409, enabling the value of h to be 1; go to S410;

s410, judging whether the constraint condition j is more than or equal to m, if so, turning to S411, and otherwise, turning to S4101;

s4101, increasing the value of j by 1; go to S404;

s411, setting the value of j to 1; go to S412;

s412, judging whether the constraint condition i is not less than n, if so, turning to S413, and otherwise, turning to S4121;

s4121, increasing the value of i by 1; go to S402;

s413, obtaining a Chianset set;

each array in the set Chianset is an associated sequence of each search ranking keyword in the corresponding set Querys, and the set of the associated sequences is recorded as a set Litset.

6. The thermal control method based on semantic real-time analysis according to claim 5, wherein in S500, the method for calculating and selecting the associated sequence with the highest keyword popularity in the plurality of associated sequences as the key associated sequence comprises: and acquiring Qri of the keyword with the highest heat degree in the set Querys at the moment through a search API interface, acquiring Qri a sequence number i in the set Querys, acquiring a corresponding element with the sequence number i in the set Litset according to the sequence number i, and recording the element as Litset (i), wherein the Litset (i) is the required key association sequence.

7. The thermal control method based on semantic real-time analysis according to claim 6, wherein in S600, the method for sending the key association sequence to the client is as follows: and (5) sending the key association sequence Litset (i) to a client, and performing character string splicing and printing display on the elements in the Litset (i).

8. A thermal control system based on semantic real-time analysis is characterized in that the thermal control system based on semantic real-time analysis comprises: the processor executes the computer program to implement the steps in the semantic real-time analysis-based thermal control method in claim 1, the semantic real-time analysis-based thermal control system can be operated in a computing device such as a desktop computer, a notebook computer, a mobile phone, a palm computer, and a cloud data center, and the operable system can include the processor, the memory, and a server cluster.

Technical Field

The invention belongs to the technical field of information processing, and particularly relates to a thermal control method and system based on semantic real-time analysis.

Background

The Internet is an important way for people to search and obtain key information and topics, and has important significance in modern information dissemination. The user can express own view attitude in real time and large scale through the internet, and simultaneously causes real-time and huge social public opinion influence. In the aspect of processing large-scale information of the internet, the current thermal control monitoring system utilizes a monitoring system of artificial intelligence and a distributed big data technology, and a method for tracking and analyzing hotspot events based on emotion analysis, which is disclosed in the publication with the publication number of CN109582801A, although the original text of keywords related to hotspot events to be analyzed can be input into the whole analysis system through a user operation module and the purpose of accurately understanding the word senses of the keywords by recognizing emotion texts in the keyword texts is achieved, the current thermal control monitoring system is not beneficial to efficiently extracting information of hot search keywords of a real-time search system.

Disclosure of Invention

The present invention is directed to a thermal control method and system based on semantic real-time analysis, which solves one or more technical problems in the prior art and provides at least one useful choice or creation condition.

The invention provides a thermal control method and a thermal control system based on semantic real-time analysis, which are characterized in that internet webpage data and search ranking keywords are collected in real time through a web crawler technology, the collected internet webpage data are divided into a plurality of different text files for storage through different URLs of the internet webpage data, a plurality of read different character strings are divided into a plurality of different word segmentation arrays through a word segmentation algorithm to obtain a set Cps, one correlation sequence with the highest keyword popularity in the plurality of correlation sequences is selected as a key correlation sequence to be sent to a client according to a plurality of correlation sequences of the word segmentation arrays in the set Cps for calculating and searching the ranking keywords, information screening and information extraction of a plurality of related texts according to real-time search hot words are realized, and real-time analysis and thermal control according to the real-time search keywords are realized.

In order to achieve the above object, according to an aspect of the present disclosure, there is provided a thermal control method based on semantic real-time analysis, the method including the steps of:

s100, collecting internet webpage data and searching keywords of ranking in real time through a web crawler technology;

s200, dividing the acquired Internet page data into a plurality of different text files by using different URLs of the Internet page data for storage;

s300, respectively reading a plurality of different text files into a plurality of different character strings, segmenting the read plurality of different character strings into a plurality of different participle arrays through a participle algorithm, and taking the plurality of different participle arrays as a set Cps;

s400, according to the keywords of the search ranking, calculating a plurality of associated sequences of the keyword of the search ranking in each participle array in the set Cps;

s500, calculating and selecting one correlation sequence with the highest keyword heat degree in the plurality of correlation sequences as a key correlation sequence;

s600, the key association sequence is sent to the client.

Further, in S100, the method for collecting internet webpage data and searching for the ranked keywords in real time through the web crawler technology includes: the method comprises the steps of collecting internet webpage data and hot Search keywords of Search ranking in real time through a Web Crawler technology, wherein the hot Search keywords are marked as keywords, the internet webpage data and the Search ranking keywords are obtained from one or more Search API interfaces of a Baidu API interface, a dog Search API interface, a 360 Search API interface and a necessary Search API interface, and the Web Crawler technology comprises any one of a topic Web Crawler (topic Crawler), a Fish Search algorithm, a Sharksearch algorithm Incremental Web Crawler (inclusive Web Crawler) or a Deep Web Crawler.

Further, in S200, the method for dividing the collected internet page data into a plurality of different text files by using different URLs thereof to store includes: the method comprises the steps that collected internet web page data are stored in a JSON format to be structured data, the structured data comprise character string data of corresponding web page data and URLs of collected websites of the web page data, the character string data in each piece of structured data are read for different pieces of structured data according to different URLs of the different pieces of structured data, and the read character string data are divided into a plurality of different text files according to different URLs to be stored.

Further, in S300, the method for respectively reading a plurality of different text files into a plurality of different character strings, segmenting the plurality of read different character strings into a plurality of different participle arrays through a participle algorithm, and taking the plurality of different participle arrays as the set Cps includes: respectively reading effective character information in a plurality of different text files as a plurality of different character strings, respectively segmenting each read character string through a Chinese word segmentation algorithm to obtain a plurality of different character string arrays which are recorded as word segmentation arrays, and recording a set of the plurality of different word segmentation arrays as a set Cps.

Further, in S400, according to the search ranking keyword, the method for calculating a plurality of association sequences of the search ranking keyword in each participle array in the set Cps includes: recording a set of search ranking keywords as a set Querys, recording the number of elements in the set Querys as n, and the sequence number of the elements in the set Querys as i, i belongs to [1, n ], wherein the i-th keyword is represented by Querys { (Q (1), Q (2), …, Q (n-1), Q (n) }, and Q (i));

the number of elements in the set Cps is m, the sequence number of the elements in the set Cps is j, j belongs to [1, m ], and the Cps is { Cps (1), Cps (2), …, Cps (m-1), Cps (m) };

marking a variable k to represent the array length of each participle array Cps (j) in the set Cps, a variable h to represent the serial number of the character string in the participle array Cps (j), Cps (j, h) to represent the character string with the serial number h in the element with the serial number j in the set Cps, and h belongs to [1, k ], wherein Cps (j) is [ Cps (j,1), …, Cps (j, k-1) and Cps (j, k) ];

note that the function Glv () is a function for calculating an input character string by a word embedding algorithm to obtain a word vector thereof, Glv (Cps (j, h)) indicates a word vector obtained by a word embedding algorithm for a character string with a sequence number h among elements with sequence numbers j in a set Cps, G (j, h) ═ Glv (Cps (j, h)), Glv (q (i)) indicates a word vector obtained by a word embedding algorithm for a character string with a sequence number i among elements with sequence numbers i in a set Querys, and gq (i) ═ Glv (q (i)), a variable q indicates a q-th dimension of the word vector, a variable p indicates the number of dimensions of the word vector, G (j, h) [ q ] indicates a numerical value of a q-th dimension of the word vector G (j, h), and gq (i) [ q ] indicates a numerical value of a q-th dimension of the word vector gq (i);

the function Sim () represents the calculation of the degree of inclination between two vectors of the input, the function Sim (gq (i), G (j, h)) represents the calculation of the degree of inclination between the word vectors gq (i) and G (j, h) by the function Sim (), and the calculation formula of the degree of inclination Sim (gq (i), G (j, h)) is:

calculating a plurality of associated sequences of keywords of each search rank in the set Querys in each participle array in the set Cps, comprising the following steps

S401, starting a program; making the value of a variable i be 1; creating an empty set Chianset, wherein the set Chianset has mutual anisotropy and orderliness; go to S402;

s402, acquiring an element Q (i) with a sequence number of i in Querys; obtaining gq (i) by function Glv () with q (i); go to S403;

s403, enabling the value of the variable j to be 1; go to S404;

s404, acquiring an element Cps (j) with the sequence number j in Cps; creating an empty array Simset; go to S405;

s405, enabling the value of the variable h to be 1; go to S406;

s406, obtaining an element Cps (j, h) with the sequence number h in Cps (j); obtaining G (j, h) by function Glv () at Cps (j, h); go to S407;

s407, obtaining Sim (Gq (i), G (j, h)); adding Sim (gq (i), G (j, h)) to the array Simset; go to S408;

s408, judging whether constraint conditions h ≧ k are met, if yes, turning to S4081, and if not, turning to S4082;

s4081, calculating an arithmetic mean value sim _ avg of each element in the array Simset, and taking a set of serial numbers of each element with a value larger than sim _ avg in the array Simset as a set Seq; taking each element in the set Seq as a target sequence number, extracting the element of the target sequence number in cps (j) as an array Chain, and adding the array Chain into a set Chianset; go to S409;

s4082, increasing the value of h by 1; go to S406;

s409, enabling the value of h to be 1; go to S410;

s410, judging whether constraint conditions j ≧ m are met, if yes, going to S411, and if not, going to S4101;

s4101, increasing the value of j by 1; go to S404;

s411, enabling the value of j to be 1; go to S412;

s412, judging whether constraint conditions i ≧ n are met, if yes, going to S413, and if not, going to S4121;

s4121, increasing the value of i by 1; go to S402;

s413, outputting a set Chianset; ending the program;

each array in the set Chianset is an associated sequence of each search ranking keyword in the corresponding set Querys, and the set of the associated sequences is recorded as a set Litset.

Further, in S500, the method for calculating and selecting one of the plurality of association sequences with the highest keyword popularity as the key association sequence includes: and acquiring Qri of the keyword with the highest heat degree in the set Querys at the moment through a search API interface, acquiring Qri a sequence number i in the set Querys, acquiring a corresponding element with the sequence number i in the set Litset according to the sequence number i, and recording the element as Litset (i), wherein the Litset (i) is the required key association sequence.

Further, in S600, the method for sending the key association sequence to the client includes: and (5) sending the key association sequence Litset (i) to a client, and performing character string splicing and printing display on the elements in the Litset (i).

The present disclosure also provides a thermal control system based on semantic real-time analysis, which includes: a processor, a memory and a computer program stored in the memory and executable on the processor, the processor implementing the steps in the semantic real-time analysis based thermal control method according to claim 1 when executing the computer program, the semantic real-time analysis based thermal control system being executable in a computing device such as a desktop computer, a notebook computer, a mobile phone, a portable phone, a tablet computer, a palm computer and a cloud data center, the executable system including, but not limited to, the processor, the memory and a server cluster, the processor executing the computer program and executing the computer program in units of:

the data acquisition unit is used for acquiring internet webpage data and searching keywords of the ranking in real time through a web crawler technology;

the data sorting unit is used for dividing the acquired internet page data into a plurality of different text files by using different URLs of the internet page data for storage;

the word segmentation unit is used for respectively reading a plurality of different text files into a plurality of different character strings and segmenting the read plurality of different character strings into a plurality of different word segmentation arrays through a word segmentation algorithm so as to take the plurality of different word segmentation arrays as a set Cps;

the related sequence calculating unit is used for calculating a plurality of related sequences of each participle array of the keywords of the search ranking in the set Cps according to the keywords of the search ranking;

the key associated sequence selecting unit is used for calculating and selecting one associated sequence with the highest keyword heat degree in the plurality of associated sequences as a key associated sequence;

and the sending unit is used for sending the key association sequence to the client.

The invention has the beneficial effects that: the invention provides a thermal control method and a thermal control system based on semantic real-time analysis, which are characterized in that internet webpage data and search ranking keywords are collected in real time through a web crawler technology, a plurality of associated sequences of the search ranking keywords in each participle array are calculated, and then one associated sequence with the highest keyword popularity in the associated sequences is selected as a key associated sequence to be sent to a client, so that information screening and information extraction of a plurality of related texts according to real-time search hot words are realized, and real-time analysis and thermal control are performed according to the real-time search keywords.

Drawings

The foregoing and other features of the present disclosure will become more apparent from the detailed description of the embodiments shown in conjunction with the drawings in which like reference characters designate the same or similar elements throughout the several views, and it is apparent that the drawings in the following description are merely some examples of the present disclosure and that other drawings may be derived therefrom by those skilled in the art without the benefit of any inventive faculty, and in which:

FIG. 1 is a flow chart of a thermal control method based on semantic real-time analysis;

fig. 2 is a system structure diagram of a thermal control system based on semantic real-time analysis.

Detailed Description

The conception, specific structure and technical effects of the present disclosure will be clearly and completely described below in conjunction with the embodiments and the accompanying drawings to fully understand the objects, aspects and effects of the present disclosure. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

In the description of the present invention, the meaning of a plurality of means is one or more, the meaning of a plurality of means is two or more, and larger, smaller, larger, etc. are understood as excluding the number, and larger, smaller, inner, etc. are understood as including the number. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.

Fig. 1 is a flowchart illustrating a thermal control method based on semantic real-time analysis according to the present invention, and a thermal control method and system based on semantic real-time analysis according to an embodiment of the present invention are described below with reference to fig. 1.

The present disclosure provides a thermal control method based on semantic real-time analysis, which specifically includes the following steps:

s100, collecting internet webpage data and searching keywords of ranking in real time through a web crawler technology;

s200, dividing the acquired Internet page data into a plurality of different text files by using different URLs of the Internet page data for storage;

s300, respectively reading a plurality of different text files into a plurality of different character strings, segmenting the read plurality of different character strings into a plurality of different participle arrays through a participle algorithm, and taking the plurality of different participle arrays as a set Cps;

s400, according to the keywords of the search ranking, calculating a plurality of associated sequences of the keyword of the search ranking in each participle array in the set Cps;

s500, calculating and selecting one correlation sequence with the highest keyword heat degree in the plurality of correlation sequences as a key correlation sequence;

s600, the key association sequence is sent to the client.

Further, in S100, the method for collecting internet webpage data and searching for the ranked keywords in real time through the web crawler technology includes: the method comprises the steps of collecting internet webpage data and hot Search keywords of Search ranking in real time through a Web Crawler technology, wherein the hot Search keywords are marked as keywords, the internet webpage data and the Search ranking keywords are obtained from one or more Search API interfaces of a Baidu API interface, a dog Search API interface, a 360 Search API interface and a necessary Search API interface, and the Web Crawler technology comprises any one of a topic Web Crawler (topic Crawler), a Fish Search algorithm, a Sharksearch algorithm Incremental Web Crawler (inclusive Web Crawler) or a Deep Web Crawler.

The hot search keyword may also be a character string with the highest frequency after word segmentation is performed on text data in any one or more webpage data.

Further, in S200, the method for dividing the collected internet page data into a plurality of different text files by using different URLs thereof to store includes: the method comprises the steps that collected internet web page data are stored in a JSON format to be structured data, the structured data comprise character string data of corresponding web page data and URLs of collected websites of the web page data, the character string data in each piece of structured data are read for different pieces of structured data according to different URLs of the different pieces of structured data, and the read character string data are divided into a plurality of different text files according to different URLs to be stored.

Further, in S300, the method for respectively reading a plurality of different text files into a plurality of different character strings, segmenting the plurality of read different character strings into a plurality of different participle arrays through a participle algorithm, and taking the plurality of different participle arrays as the set Cps includes: reading a plurality of different text files into a plurality of different character strings respectively, segmenting each read character string respectively through a Chinese word segmentation algorithm to obtain a plurality of different character string arrays which are marked as word segmentation arrays, and marking a set of the plurality of different word segmentation arrays as a set Cps.

Further, in S400, according to the search ranking keyword, the method for calculating a plurality of association sequences of the search ranking keyword in each participle array in the set Cps includes: recording a set of search ranking keywords as a set Querys, recording the number of elements in the set Querys as n, and recording the sequence number of the elements in the set Querys as i, i belongs to [1, n ], wherein the set is { Q (1), Q (2), …, Q (n-1), Q (n) };

the number of elements in the set Cps is m, the sequence number of the elements in the set Cps is j, j belongs to [1, m ], and the Cps is { Cps (1), Cps (2), …, Cps (m-1), Cps (m) };

marking a variable k to represent the array length of each participle array Cps (j) in the set Cps, a variable h to represent the serial number of the character string in the participle array Cps (j), Cps (j, h) to represent the character string with the serial number h in the element with the serial number j in the set Cps, and h belongs to [1, k ], wherein Cps (j) is [ Cps (j,1), …, Cps (j, k-1) and Cps (j, k) ];

note that the function Glv () is a function for calculating an input character string by a word embedding algorithm to obtain a word vector thereof, Glv (Cps (j, h)) indicates a word vector obtained by a word embedding algorithm for a character string with a sequence number h among elements with sequence numbers j in a set Cps, G (j, h) ═ Glv (Cps (j, h)), Glv (q (i)) indicates a word vector obtained by a word embedding algorithm for a character string with a sequence number i among elements with sequence numbers i in a set Querys, and gq (i) ═ Glv (q (i)), a variable q indicates a q-th dimension of the word vector, a variable p indicates the number of dimensions of the word vector, G (j, h) [ q ] indicates a numerical value of a q-th dimension of the word vector G (j, h), and gq (i) [ q ] indicates a numerical value of a q-th dimension of the word vector gq (i);

the Word embedding algorithm at least comprises any one of Word2Vec, Skip-Gram model or GloVe algorithm.

The function Sim () represents the calculation of the degree of inclination between two vectors of the input, the function Sim (gq (i), G (j, h)) represents the calculation of the degree of inclination between the word vectors gq (i) and G (j, h) by the function Sim (), and the calculation formula of the degree of inclination Sim (gq (i), G (j, h)) is:

calculating a plurality of associated sequences of keywords of each search rank in the set Querys in each participle array in the set Cps, comprising the following steps

S401, starting a program; making the value of a variable i be 1; creating an empty set Chianset, wherein the set Chianset has mutual anisotropy and orderliness; go to S402;

s402, acquiring an element Q (i) with a sequence number of i in Querys; obtaining gq (i) by function Glv () with q (i); go to S403;

s403, enabling the value of the variable j to be 1; go to S404;

s404, acquiring an element Cps (j) with the sequence number j in Cps; creating an empty array Simset; go to S405;

s405, enabling the value of the variable h to be 1; go to S406;

s406, obtaining an element Cps (j, h) with the sequence number h in Cps (j); obtaining G (j, h) by function Glv () at Cps (j, h); go to S407;

s407, obtaining the tendency Sim (gq (i), G (j, h)) through the function Glv () with gq (i) and G (j, h); adding Sim (gq (i), G (j, h)) to the array Simset; go to S408;

s408, judging whether constraint conditions h ≧ k are met, if yes, turning to S4081, and if not, turning to S4082;

s4081, calculating an arithmetic mean value sim _ avg of each element in the array Simset, and taking a set of serial numbers of each element with a value larger than sim _ avg in the array Simset as a set Seq; taking each element in the set Seq as a target sequence number, extracting the element of the target sequence number in cps (j) as an array Chain, and adding the array Chain into a set Chianset; go to S409;

s4082, increasing the value of h by 1; go to S406;

s409, enabling the value of h to be 1; go to S410;

s410, judging whether constraint conditions j ≧ m are met, if yes, going to S411, and if not, going to S4101;

s4101, increasing the value of j by 1; go to S404;

s411, enabling the value of j to be 1; go to S412;

s412, judging whether constraint conditions i ≧ n are met, if yes, going to S413, and if not, going to S4121;

s4121, increasing the value of i by 1; go to S402;

s413, outputting a set Chianset; ending the program;

each array in the set Chianset is an associated sequence of each search ranking keyword in the corresponding set Querys, and the set of the associated sequences is recorded as a set Litset.

Further, in S500, the method for calculating and selecting one of the plurality of association sequences with the highest keyword popularity as the key association sequence includes: and acquiring Qri of the keyword with the highest heat degree in the set Querys at the moment through a search API interface, acquiring Qri a sequence number i in the set Querys, acquiring a corresponding element with the sequence number i in the set Litset according to the sequence number i, and recording the element as Litset (i), wherein the Litset (i) is the required key association sequence.

Further, in S600, the method for sending the key association sequence to the client includes: and (5) sending the key association sequence Litset (i) to a client, and performing character string splicing and printing display on the elements in the Litset (i).

The thermal control system based on semantic real-time analysis comprises: the processor executes the computer program to implement the steps in the above thermal control method embodiment based on semantic real-time analysis, the thermal control system based on semantic real-time analysis may be run in a computing device such as a desktop computer, a notebook computer, a palm computer, and a cloud data center, and the executable system may include, but is not limited to, a processor, a memory, and a server cluster.

As shown in fig. 2, the thermal control system based on semantic real-time analysis according to the embodiment of the present disclosure includes: a processor, a memory and a computer program stored in the memory and executable on the processor, the processor implementing the steps in the above-mentioned embodiment of the thermal control method based on semantic real-time analysis when executing the computer program, the processor executing the computer program to run in the units of the following system:

the data acquisition unit is used for acquiring internet webpage data and searching keywords of the ranking in real time through a web crawler technology;

the data sorting unit is used for dividing the acquired internet page data into a plurality of different text files by using different URLs of the internet page data for storage;

the word segmentation unit is used for respectively reading a plurality of different text files into a plurality of different character strings and segmenting the read plurality of different character strings into a plurality of different word segmentation arrays through a word segmentation algorithm so as to take the plurality of different word segmentation arrays as a set Cps;

the related sequence calculating unit is used for calculating a plurality of related sequences of each participle array of the keywords of the search ranking in the set Cps according to the keywords of the search ranking;

the key associated sequence selecting unit is used for calculating and selecting one associated sequence with the highest keyword heat degree in the plurality of associated sequences as a key associated sequence;

and the sending unit is used for sending the key association sequence to the client.

The thermal control system based on semantic real-time analysis can be operated in computing equipment such as desktop computers, notebooks, palm computers and cloud data centers. The thermal control system based on semantic real-time analysis comprises, but is not limited to, a processor and a memory. It will be understood by those skilled in the art that the example is only an example of a thermal control method and system based on semantic real-time analysis, and does not constitute a limitation to a thermal control method and system based on semantic real-time analysis, and may include more or less components than a certain proportion, or combine some components, or different components, for example, the thermal control system based on semantic real-time analysis may further include an input-output device, a network access device, a bus, etc.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete component Gate or transistor logic, discrete hardware components, etc. The general processor can be a microprocessor or the processor can be any conventional processor and the like, the processor is the control center of the thermal control system based on the semantic real-time analysis, and various interfaces and lines are utilized to connect various subareas of the whole thermal control system based on the semantic real-time analysis.

The memory can be used for storing the computer program and/or the module, and the processor realizes various functions of the thermal control method and the thermal control system based on the semantic real-time analysis by running or executing the computer program and/or the module stored in the memory and calling data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

The invention provides a thermal control method and a thermal control system based on semantic real-time analysis, which are characterized in that internet webpage data and search ranking keywords are collected in real time through a web crawler technology, the collected internet webpage data are divided into a plurality of different text files for storage through different URLs of the internet webpage data, a plurality of read different character strings are divided into a plurality of different word segmentation arrays through a word segmentation algorithm to obtain a set Cps, one correlation sequence with the highest keyword popularity in the correlation sequences is selected as a key correlation sequence to be sent to a client according to a plurality of correlation sequences of the word segmentation arrays in the set Cps for calculating the search ranking keywords, information screening and information extraction of a plurality of related texts according to real-time search hot words are realized, and real-time analysis and thermal control according to the real-time search keywords are realized.

Although the description of the present disclosure has been rather exhaustive and particularly described with respect to several illustrated embodiments, it is not intended to be limited to any such details or embodiments or any particular embodiments, so as to effectively encompass the intended scope of the present disclosure. Furthermore, the foregoing describes the disclosure in terms of embodiments foreseen by the inventor for which an enabling description was available, notwithstanding that insubstantial modifications of the disclosure, not presently foreseen, may nonetheless represent equivalent modifications thereto.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种资源推送处理方法、装置、电子设备及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!