System, method and equipment for measuring inter-city relation based on microblog public sentiment

文档序号:1831341 发布日期:2021-11-12 浏览:10次 中文

阅读说明:本技术 基于微博舆情的城市间关系测度系统、方法、设备 (System, method and equipment for measuring inter-city relation based on microblog public sentiment ) 是由 张文生 杨阳 白江波 于 2021-08-16 设计创作,主要内容包括:本发明属于城市关系测度领域,具体涉及一种基于微博舆情的城市间关系测度系统、方法、设备,旨在解决传统的城市测度方法需要耗费大量的精力统计或搜集基础数据,并且基础数据的时效性滞后,导致城市间的关系测度时效性、准确性较低的问题。本系统包括城市舆情爬取子模块、舆情信息整理子模块与城市关系测度子模块;所述舆情信息整理子模块包括有向无环图分词单元、BERT分词单元和加权计算单元;所述城市关系测度子模块包括第一频率计算单元、第二频率计算单元、城市关系测度单元。本发明提升了城市间的关系测度时效性、准确性。(The invention belongs to the field of city relation measurement, and particularly relates to a system, a method and equipment for measuring the relation among cities based on microblog public sentiments, aiming at solving the problems that the traditional city measuring method needs to consume a large amount of energy to count or collect basic data, and the timeliness of the basic data is delayed, so that the timeliness and the accuracy of the relation measurement among the cities are low. The system comprises an urban public opinion crawling submodule, a public opinion information sorting submodule and an urban relation measuring submodule; the public opinion information sorting submodule comprises a directed acyclic graph word segmentation unit, a BERT word segmentation unit and a weighting calculation unit; the city relation measurement submodule comprises a first frequency calculation unit, a second frequency calculation unit and a city relation measurement unit. The method improves timeliness and accuracy of relation measurement among cities.)

1. A system for measuring inter-city relation based on microblog public sentiment, which is characterized by comprising: the city public opinion crawling submodule, the public opinion information sorting submodule and the city relation measuring submodule are connected;

the city public opinion crawling submodule is configured to acquire a city name to be measured in a relation; in the microblogs with the set theme types, the microblog data related to the city names to be measured in the relation are crawled through a crawler technology and serve as input data;

the public opinion information sorting submodule comprises a directed acyclic graph word segmentation unit, a BERT word segmentation unit and a weighting calculation unit;

the directed acyclic graph word segmentation unit is configured to construct a directed acyclic graph corresponding to each text in the input data, and find a path with the maximum probability in the directed acyclic graph by using a dynamic programming algorithm to obtain a word segmentation result corresponding to each text, wherein the word segmentation result is used as a first word segmentation result; deleting stop words in the first segmentation result according to the stop word dictionary to obtain a second segmentation result;

the BERT word segmentation unit is configured to extract word vectors of words in the input data through a pre-constructed word embedding matrix; inputting the word vector of each word and the position of the word vector in the text into a BERT model, and obtaining word segmentation results of each text of the input data as third word segmentation results;

the weighting calculation unit is configured to perform weighted summation on the second word segmentation result and the third word segmentation result corresponding to each text in the input data to obtain a final word segmentation result of each text;

the city relation measurement submodule comprises a first frequency calculation unit, a second frequency calculation unit and a city relation measurement unit;

the first frequency calculation unit is configured to count frequencies of other city names of microblogs with city names as entries in the input data after word segmentation based on final word segmentation results of texts in the input data, weight the frequencies by taking the sum of comments, praise and forwarding numbers of the microblogs as a weight, and take the weighted frequencies as a first frequency;

the second frequency calculation unit is configured to obtain word frequency-inverse file frequency TF-IDF in the input data after word segmentation by taking each city name as an entry, and multiply the word frequency-inverse file frequency TF-IDF to be used as a second frequency;

the city relation measurement unit is configured to perform weighted summation on the first frequency and the second frequency to serve as relation measurement between cities.

2. The system for measuring inter-city relation based on microblog public sentiments according to claim 1, wherein the method for constructing the directed acyclic graph corresponding to each text in the input data comprises the following steps:

counting the word frequency of each word in the input data, and storing the word frequency in a dictionary form;

and after storage, constructing a directed acyclic graph by taking the word frequency of each word as a node according to the position of each word in the text and the tail position of the corresponding text.

3. The system for measuring inter-city relation based on microblog public sentiments according to claim 1, wherein the method for extracting multiple semantic information from the BERT model by using a plurality of attention layers comprises the following steps:

Mi=Attention(QWi Q,KWi K,VWi V)

M(Q,K,V)=concat(Mi)w0

wherein Q, K, V are respectively query vector, key vector and value vector, Wi Q,Wi K,Wi VProjection matrix of Q, K, V, MiFor a single-ended self-attentive mechanism layer, w0For the weight matrix, M (Q, K, V) represents the multi-headed self-Attention mechanism layer, concat represents the merge, Attention (QW)i Q,KWi K,VWi V) Representing a single-headed self-attention algorithm.

4. The system for measuring inter-city relation based on microblog public sentiments according to claim 3, wherein the BERT model is processed by a dot product attention layer as follows:

wherein Attention (Q, K, V) represents the self-Attention algorithm of the dot product Attention layer, T represents transposition, dkRepresenting the dimension of the key vector K.

5. The system for measuring inter-city relation based on microblog public sentiments according to claim 1, wherein the method for acquiring the word frequency of the entry comprises the following steps:

wherein, tfi,jIndicating the frequency of occurrence of the entry in the text, i.e. the word frequency, ni,jIndicates the entry in the file djNumber of occurrences, Σknk,jPresentation document djThe total number of occurrences of all entries in.

6. The system for measuring inter-city relation based on microblog public sentiments according to claim 1, wherein the reverse file frequency is obtained by:

wherein idfiThe reverse file frequency of the ith file is represented, | D | is the total number of microblog public opinion files, { j: t is ti∈djDenotes the inclusion of an entry tiThe number of files.

7. A method for measuring inter-city relation based on microblog public sentiment is characterized by comprising the following steps:

s10, obtaining city names to be measured; in the microblogs with the set theme types, the microblog data related to the city names to be measured in the relation are crawled through a crawler technology and serve as input data;

s20, constructing a directed acyclic graph corresponding to each text in the input data, and searching a path with the maximum probability in the directed acyclic graph by using a dynamic programming algorithm to obtain a word segmentation result corresponding to each text as a first word segmentation result; deleting stop words in the first segmentation result according to the stop word dictionary to obtain a second segmentation result;

extracting word vectors of all words in the input data through a pre-constructed word embedding matrix; inputting the word vector of each word and the position of the word vector in the text into a BERT model, and obtaining word segmentation results of each text of the input data as third word segmentation results;

carrying out weighted summation on the second word segmentation result and the third word segmentation result corresponding to each text in the input data to obtain a final word segmentation result of each text;

s30, counting the frequency of other city names of microblogs with city names as entries in the input data after word segmentation based on the final word segmentation result of each text in the input data, weighting the frequency by taking the sum of the comments, praise and forwarding numbers of the microblogs as a weight, and taking the weighted frequency as a first frequency;

taking each city name as an entry, acquiring word frequency-reverse file frequency TF-IDF in input data after word segmentation, and multiplying the word frequency-reverse file frequency TF-IDF to serve as second frequency;

and carrying out weighted summation on the first frequency and the second frequency to be used as a relation measure among cities.

8. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to at least one of the processors; wherein the content of the first and second substances,

the memory stores instructions executable by the processor to implement the method of measuring inter-city relations based on microblog public opinions of claim 7.

9. A computer-readable storage medium storing computer instructions for execution by the computer to implement the method for measuring inter-city relations based on microblog public opinions according to claim 7.

Technical Field

The invention belongs to the field of city relation measurement, and particularly relates to a system, a method and equipment for measuring city relation based on microblog public sentiment.

Background

With the continuous deepening of globalization and the continuous aggravation of global competition, the urban group increasingly becomes a new space unit of global economic competition, the urban group is used as the highest space organization form of the urban development to the maturity stage, the existing basis of the urban group is closely related to the direct connection of a plurality of cities in the urban group, and the connection and interaction among the cities also form a rudiment of urban relation. Based on the method, the 'microblog' public sentiment information of the network community is combined, and the relation measure among cities in the city group is realized through sorting and analyzing the related account number release information.

Disclosure of Invention

In order to solve the above problems in the prior art, that is, to solve the problems that the traditional city measuring method needs to consume a large amount of energy to count or collect basic data, and the timeliness of the basic data is delayed, so that the timeliness and the accuracy of the inter-city relation measurement are low, the invention provides, in a first aspect, a system for measuring the inter-city relation based on microblog public opinions, which comprises: the city public opinion crawling submodule, the public opinion information sorting submodule and the city relation measuring submodule are connected;

the city public opinion crawling submodule is configured to acquire a city name to be measured in a relation; in the microblogs with the set theme types, the microblog data related to the city names to be measured in the relation are crawled through a crawler technology and serve as input data;

the public opinion information sorting submodule comprises a directed acyclic graph word segmentation unit, a BERT word segmentation unit and a weighting calculation unit;

the directed acyclic graph word segmentation unit is configured to construct a directed acyclic graph corresponding to each text in the input data, and find a path with the maximum probability in the directed acyclic graph by using a dynamic programming algorithm to obtain a word segmentation result corresponding to each text, wherein the word segmentation result is used as a first word segmentation result; deleting stop words in the first segmentation result according to the stop word dictionary to obtain a second segmentation result;

the BERT word segmentation unit is configured to extract word vectors of words in the input data through a pre-constructed word embedding matrix; inputting the word vector of each word and the position of the word vector in the text into a BERT model, and obtaining word segmentation results of each text of the input data as third word segmentation results;

the weighting calculation unit is configured to perform weighted summation on the second word segmentation result and the third word segmentation result corresponding to each text in the input data to obtain a final word segmentation result of each text;

the city relation measurement submodule comprises a first frequency calculation unit, a second frequency calculation unit and a city relation measurement unit;

the first frequency calculation unit is configured to count frequencies of other city names of microblogs with city names as entries in the input data after word segmentation based on final word segmentation results of texts in the input data, weight the frequencies by taking the sum of comments, praise and forwarding numbers of the microblogs as a weight, and take the weighted frequencies as a first frequency;

the second frequency calculation unit is configured to obtain word frequency-inverse file frequency TF-IDF in the input data after word segmentation by taking each city name as an entry, and multiply the word frequency-inverse file frequency TF-IDF to be used as a second frequency;

the city relation measurement unit is configured to perform weighted summation on the first frequency and the second frequency to serve as relation measurement between cities.

In some preferred embodiments, the method of "constructing a directed acyclic graph corresponding to each text in the input data" includes:

counting the word frequency of each word in the input data, and storing the word frequency in a dictionary form;

and after storage, constructing a directed acyclic graph by taking the word frequency of each word as a node according to the position of each word in the text and the tail position of the corresponding text.

In some preferred embodiments, the method for extracting multiple semantic information by using the BERT model with multiple attention layers is as follows:

M(Q,K,V)=concat(Mi)w0

wherein Q, K and V are respectively a query vector, a key vector and a value vector, projection matrix of Q, K, V, MiFor a single-ended self-attentive mechanism layer, w0For the weight matrix, M (Q, K, V) represents the multi-headed self-attention mechanism layer, concat represents the merge,representing a single-headed self-attention algorithm.

In some preferred embodiments, the BERT model has a dot product attention layer processing procedure of:

wherein Attention (Q, K, V) represents the self-Attention algorithm of the dot product Attention layer, T represents transposition, dkRepresenting the dimension of the key vector K.

In some preferred embodiments, the method for acquiring the word frequency of the entry comprises:

wherein, tfi,jRepresenting entriesThe frequency of occurrence in the text, i.e. word frequency, ni,jIndicates the entry in the file djNumber of occurrences, Σknk,jPresentation document djThe total number of occurrences of all entries in.

In some preferred embodiments, the inverse file frequency is obtained by:

wherein idfiThe reverse file frequency of the ith file is represented, | D | is the total number of microblog public opinion files, { j: t is ti∈djDenotes the inclusion of an entry tiThe number of files.

The invention provides a method for measuring the inter-city relation based on microblog public sentiment, which comprises the following steps:

s10, obtaining city names to be measured; in the microblogs with the set theme types, the microblog data related to the city names to be measured in the relation are crawled through a crawler technology and serve as input data;

s20, constructing a directed acyclic graph corresponding to each text in the input data, and searching a path with the maximum probability in the directed acyclic graph by using a dynamic programming algorithm to obtain a word segmentation result corresponding to each text as a first word segmentation result; deleting stop words in the first segmentation result according to the stop word dictionary to obtain a second segmentation result;

extracting word vectors of all words in the input data through a pre-constructed word embedding matrix; inputting the word vector of each word and the position of the word vector in the text into a BERT model, and obtaining word segmentation results of each text of the input data as third word segmentation results;

carrying out weighted summation on the second word segmentation result and the third word segmentation result corresponding to each text in the input data to obtain a final word segmentation result of each text;

s30, counting the frequency of other city names of microblogs with city names as entries in the input data after word segmentation based on the final word segmentation result of each text in the input data, weighting the frequency by taking the sum of the comments, praise and forwarding numbers of the microblogs as a weight, and taking the weighted frequency as a first frequency;

taking each city name as an entry, acquiring word frequency-reverse file frequency TF-IDF in input data after word segmentation, and multiplying the word frequency-reverse file frequency TF-IDF to serve as second frequency;

and carrying out weighted summation on the first frequency and the second frequency to be used as a relation measure among cities.

In a third aspect of the invention, an electronic device is proposed, at least one processor; and a memory communicatively coupled to at least one of the processors; the memory stores instructions executable by the processor to implement the method for measuring inter-city relations based on microblog public opinions.

In a fourth aspect of the present invention, a computer-readable storage medium is provided, which stores computer instructions for being executed by a computer to implement the method for measuring inter-city relations based on microblog public sentiments as claimed in the claims.

The invention has the beneficial effects that:

the method improves timeliness and accuracy of relation measurement among cities.

According to the invention, the microblog public opinion information data with the core of content release, time release, comment, forwarding and praise number is conditionally extracted, and effective screening and measurement are carried out, so that the obtained inter-city relation is more timeliness and accuracy, the problems of a large amount of heavy work for collecting entity data and serious hysteresis brought by collecting timeliness in the traditional city measuring method are perfectly solved, a new method for measuring the inter-city relation is developed, and the required cost is effectively reduced.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of a frame of a method for measuring inter-city relation based on microblog public sentiments according to an embodiment of the invention;

fig. 2 is a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

The invention discloses a microblog public opinion-based inter-city relation measuring system, which comprises: the city public opinion crawling submodule 100, the public opinion information sorting submodule 200 and the city relation measuring submodule 300;

the city public opinion crawling submodule 100 is configured to acquire a city name to be measured in a relation; in the microblogs with the set theme types, the microblog data related to the city names to be measured in the relation are crawled through a crawler technology and serve as input data;

the public opinion information arrangement submodule 200 comprises a directed acyclic graph word segmentation unit, a BERT word segmentation unit and a weighting calculation unit;

the directed acyclic graph word segmentation unit is configured to construct a directed acyclic graph corresponding to each text in the input data, and find a path with the maximum probability in the directed acyclic graph by using a dynamic programming algorithm to obtain a word segmentation result corresponding to each text, wherein the word segmentation result is used as a first word segmentation result; deleting stop words in the first segmentation result according to the stop word dictionary to obtain a second segmentation result;

the BERT word segmentation unit is configured to extract word vectors of words in the input data through a pre-constructed word embedding matrix; inputting the word vector of each word and the position of the word vector in the text into a BERT model, and obtaining word segmentation results of each text of the input data as third word segmentation results;

the weighting calculation unit is configured to perform weighted summation on the second word segmentation result and the third word segmentation result corresponding to each text in the input data to obtain a final word segmentation result of each text;

the city relation measurement submodule 300 comprises a first frequency calculation unit, a second frequency calculation unit and a city relation measurement unit;

the first frequency calculation unit is configured to count frequencies of other city names of microblogs with city names as entries in the input data after word segmentation based on final word segmentation results of texts in the input data, weight the frequencies by taking the sum of comments, praise and forwarding numbers of the microblogs as a weight, and take the weighted frequencies as a first frequency;

the second frequency calculation unit is configured to obtain word frequency-inverse file frequency TF-IDF in the input data after word segmentation by taking each city name as an entry, and multiply the word frequency-inverse file frequency TF-IDF to be used as a second frequency;

the city relation measurement unit is configured to perform weighted summation on the first frequency and the second frequency to serve as relation measurement between cities.

In order to more clearly explain the system for measuring the inter-city relationship based on the microblog public sentiments, the following describes each module in one embodiment of the system in detail with reference to fig. 1.

The city public opinion crawling submodule 100 is configured to acquire a city name to be measured in a relation; in the microblogs with the set theme types, the microblog data related to the city names to be measured in the relation are crawled through a crawler technology and serve as input data;

in this embodiment, a crawler technology is used to select a plurality of representative microblogs (e.g., government affairs service, police issue, public media) according to city names for information crawling, and the crawling content of the microblogs includes: id, content, release time, release address, number of prawns, number of forwarding, number of comments, topic and @ user, and storing the data in a table according to requirements.

Taking the hong Kong and Australia Bay district in Guangdong as an example, according to nine cities in the district, three to four representative microblogs are respectively selected in the aspects of government affair release, police inquiry, comprehensive news and the like, such as 'Guangzhou release', 'Guangzhou public security', 'Guangzhou daily newspaper' in China, and according to the date as a standard, 244370 pieces of information including id, content, release time, release address, praise number, forwarding number, comment number, topic and microblog information of @ user released by 57 account numbers from 6 month to 10 month in 2020 in 2019 are crawled, wherein the meanings are respectively:

microblog id: id of the microblog is in a form of a string of numbers;

microblog bid: bid of the microblog;

microblog content: a microblog text;

and microblog release position: a release position in the position microblog;

and microblog release time: the time of the microblog release is accurate to the day;

the number of praise is as follows: the number of praised microblogs;

forwarding number: the number of microblogs forwarded;

number of comments: the number of micro-blogs being commented;

topic: microblog topics, i.e. contents in two #, if there are multiple topics, each url is separated by an english comma, and if there is no url, the value is ";

@ user: and if the users of the microblog @ exist a plurality of @ users, each url is separated by an English comma, and if not, the value is 'zero'.

The public opinion information arrangement submodule 200 comprises a directed acyclic graph word segmentation unit, a BERT word segmentation unit and a weighting calculation unit;

the public opinion information sorting submodule is used for sorting the collected microblog public opinion information, mainly comprises BERT pre-training named entity recognition and traditional word segmentation, and realizes accurate extraction of required information (namely microblogs containing more than one other city names) through weighted comparison of output results of two units. The method comprises the following specific steps:

the directed acyclic graph word segmentation unit is configured to construct a directed acyclic graph corresponding to each text in the input data, and find a path with the maximum probability in the directed acyclic graph by using a dynamic programming algorithm to obtain a word segmentation result corresponding to each text, wherein the word segmentation result is used as a first word segmentation result; deleting stop words in the first segmentation result according to the stop word dictionary to obtain a second segmentation result;

in this embodiment, word frequency statistics is performed on each word in the collected microblog data by using a statistical dictionary, the word frequency statistics is stored in a dictionary form, that is, a dictionary is built, a list, that is, a directed acyclic graph, is built by using a position where each word is located and a tail position of a corresponding division (that is, a tail position of a text corresponding to the word), and finally, a path with a maximum probability is found on the list by using a dynamic programming algorithm, wherein the probability of occurrence of each word is equal to the sum of the word frequency of the word divided by the word frequency of all the word statistics, so that word segmentation is realized, and a word segmentation result is used as a first word segmentation result. And then, sorting and deleting the divided entries according to the stop words contained in the stop word dictionary to obtain a second word division result.

The BERT word segmentation unit is configured to extract word vectors of words in the input data through a pre-constructed word embedding matrix; inputting the word vector of each word and the position of the word vector in the text into a BERT model, and obtaining word segmentation results of each text of the input data as third word segmentation results;

in the implementation, the acquired microblog data are put into a trained BERT model for word segmentation, and the BERT replaces and covers 15% of expected information on the basis of a bidirectional Transformer encoder to improve the learning habit. After word embedding and position coding processing, the input sequence is put into a multi-head self-attention mechanism layer to extract information of multiple semantics, namely:

M(Q,K,V)=concat(Mi)w0 (2)

wherein Q, K and V are respectively a query vector, a key vector and a value vector, projection matrix of Q, K, V, MiFor a single-ended self-attentive mechanism layer, w0For the weight matrix, M (Q, K, V) represents the multi-headed self-attention mechanism layer, concat represents the merge,representing a single-headed self-attention algorithm.

The self-attention result is obtained through the zoom and dot product attention layers, namely:

wherein Attention (Q, K, V) represents the self-Attention algorithm of the dot product Attention layer, T represents transposition, dkRepresenting the dimension of the key vector K. After the word vectors are weighted and combined by attention, each word vector contains information of all words in a current sentence, and on the basis, residual error connection and layer normalization are carried out on the processed information, wherein the layer normalization is that:

wherein x isiI.e. the output of the upper layer, mu and sigma are the mean and standard deviation, the epsilon parameter is intended to prevent the standard deviation from being 0, and alpha and beta are parameters used to adjust to compensate for the lost information in the normalization. And then, putting the processed information into a feedforward neural network, and repeating the operation for several times to realize the classification of BERT so as to obtain a word segmentation result.

The weighting calculation unit is configured to perform weighted summation on the second word segmentation result and the third word segmentation result corresponding to each text in the input data to obtain a final word segmentation result of each text;

in this embodiment, after weighted calculation according to the word segmentation results of the directed acyclic graph and the BERT, removal of useless data and retention of valid data are realized, and data cleaning is completed.

The city relation measurement submodule 300 comprises a first frequency calculation unit, a second frequency calculation unit and a city relation measurement unit;

the city relation measurement submodule is used for calculating the well-regulated information and finally realizing the measurement of the city relation, wherein the measurement comprises calculation according to the frequency of comment forwarding praise weighting and reverse word frequency calculation based on TF-IDF, and accurate measurement of the city relation is completed. The method comprises the following specific steps:

the first frequency calculation unit is configured to count frequencies of other city names of microblogs with city names as entries in the input data after word segmentation based on final word segmentation results of texts in the input data, weight the frequencies by taking the sum of comments, praise and forwarding numbers of the microblogs as a weight, and take the weighted frequencies as a first frequency;

in this embodiment, after the data is cleaned, the city relationship measurement is performed, and the propagation mixed weight is established by combining the number of the data forwarded with the comment, the praise, and the propagation mixed weight to perform frequency statistics (i.e., weight word frequency statistics) on the word frequencies of other cities appearing in the city entry, as the first frequency.

The second frequency calculation unit is configured to obtain word frequency-inverse file frequency TF-IDF in the input data after word segmentation by taking each city name as an entry, and multiply the word frequency-inverse file frequency TF-IDF to be used as a second frequency;

in this embodiment, the relation determination is performed in combination with TF-IDF, and the TD-IDF coefficient is TF × IDF, where TF represents the frequency of occurrence of the entry in the text:

wherein, tfi,jIndicating the frequency of occurrence of the entry in the text, i.e. the word frequency, ni,jIndicates the entry in the file djNumber of occurrences, Σknk,jPresentation document djThe total number of occurrences of all entries in.

The IDF reverse file frequency acquiring method comprises the following steps:

wherein idfiThe reverse file frequency of the ith file is represented, | D | is the total number of microblog public opinion files, { j: t is ti∈djDenotes the inclusion of an entry tiThe number of files.

In addition, the city names are used as entries, and the word frequency-inverse file frequency TF-IDF in the input data after word segmentation is obtained, wherein the input data after word segmentation is the input data after word segmentation obtained based on the final word segmentation result of each text in the input data.

The city relation measurement unit is configured to perform weighted summation on the first frequency and the second frequency to serve as relation measurement between cities.

In this embodiment, different weights are given according to relevant attributes of the microblog related accounts, such as government affairs, police, news and the like, the first frequency and the second frequency are weighted, integrated and constructed, a relation index between the cities related to the fixed microblog account is obtained, and the relation measure between the cities based on the microblog public sentiments is realized.

It should be noted that, the system for measuring inter-city relations based on microblog public sentiments provided by the above embodiment is only exemplified by the division of the above functional modules, and in practical applications, the above functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the above embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the above described functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.

The invention provides a microblog public opinion-based inter-city relation measuring method, which specifically comprises the following steps:

s10, obtaining city names to be measured; in the microblogs with the set theme types, the microblog data related to the city names to be measured in the relation are crawled through a crawler technology and serve as input data;

s20, constructing a directed acyclic graph corresponding to each text in the input data, and searching a path with the maximum probability in the directed acyclic graph by using a dynamic programming algorithm to obtain a word segmentation result corresponding to each text as a first word segmentation result; deleting stop words in the first segmentation result according to the stop word dictionary to obtain a second segmentation result;

extracting word vectors of all words in the input data through a pre-constructed word embedding matrix; inputting the word vector of each word and the position of the word vector in the text into a BERT model, and obtaining word segmentation results of each text of the input data as third word segmentation results;

carrying out weighted summation on the second word segmentation result and the third word segmentation result corresponding to each text in the input data to obtain a final word segmentation result of each text;

s30, counting the frequency of other city names of microblogs with city names as entries in the input data after word segmentation based on the final word segmentation result of each text in the input data, weighting the frequency by taking the sum of the comments, praise and forwarding numbers of the microblogs as a weight, and taking the weighted frequency as a first frequency;

taking each city name as an entry, acquiring word frequency-reverse file frequency TF-IDF in input data after word segmentation, and multiplying the word frequency-reverse file frequency TF-IDF to serve as second frequency;

and carrying out weighted summation on the first frequency and the second frequency to be used as a relation measure among cities.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the method described above may refer to the corresponding process in the foregoing system embodiment, and are not described herein again.

An electronic device according to a third embodiment of the present invention includes at least one processor; and a memory communicatively coupled to at least one of the processors; the memory stores instructions executable by the processor to implement the method for measuring inter-city relations based on microblog public opinions.

A computer-readable storage medium according to a fourth embodiment of the present invention stores computer instructions for being executed by a computer to implement the method for measuring a microblog public opinion-based inter-city relation according to the claims.

It is clear to those skilled in the art that, for convenience and brevity not described, the specific working processes and related descriptions of the above-described apparatuses and computer-readable storage media may refer to the corresponding processes in the foregoing method examples, and are not described herein again.

Referring now to FIG. 2, there is illustrated a block diagram of a computer system suitable for use as a server in implementing embodiments of the method, system, and apparatus of the present application. The server shown in fig. 2 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 2, the computer system includes a Central Processing Unit (CPU) 201 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 202 or a program loaded from a storage section 208 into a Random Access Memory (RAM) 203. In the RAM203, various programs and data necessary for system operation are also stored. The CPU201, ROM 202, and RAM203 are connected to each other via a bus 204. An Input/Output (I/O) interface 205 is also connected to bus 204.

The following components are connected to the I/O interface 205: an input portion 206 including a keyboard, a mouse, and the like; an output section 207 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 208 including a hard disk and the like; and a communication section 209 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 209 performs communication processing via a network such as the internet. A drive 210 is also connected to the I/O interface 205 as needed. A removable medium 211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 210 as necessary, so that a computer program read out therefrom is mounted into the storage section 208 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 209 and/or installed from the removable medium 211. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), a compact disc read-only memory (CD-ROM), Optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.

The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种用于资源保障投入的数据访问控制方法、装置和电子设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!