Network structure-based detection method, medium and equipment for research hotspot evolution trend

文档序号：1831797 发布日期：2021-11-12 浏览：12次中文

阅读说明：本技术 基于网络结构的研究热点演变趋势检测方法、介质及设备 (Network structure-based detection method, medium and equipment for research hotspot evolution trend ) 是由胡艳梅刘佳刘宏于 2021-08-19 设计创作，主要内容包括：本发明请求保护一种基于网络结构的研究热点演变趋势检测方法、介质及设备,其包括以下步骤：A依照年份划分子集,将论文数据集以年为单位进行切割,得到每年的子论文数据集；B针对单一年的子论文数据集,构建作者合著网,其中每个节点代表一个作者,每条边代表对应的两个作者合著了论文,边的权值则设定为合著的论文篇数；为每个节点设定一个属性：对应作者在当年所发表论文的关键词集；C研究群体检测,包括对研究群体的划分和研究群体的表示；D研究热点检测；E对所有年的作者合著网进行研究群体和研究热点检测；F研究群体和研究热点演变分析,得到研究热点的演变趋势图。(The invention requests to protect a method, a medium and equipment for detecting the evolution trend of research hotspots based on a network structure, wherein the method, the medium and the equipment comprise the following steps: a, dividing subsets according to years, and cutting a thesis data set by taking the year as a unit to obtain a yearly sub-thesis data set; b, aiming at the sub-thesis data set of a single year, constructing an author co-publication network, wherein each node represents one author, each edge represents two corresponding authors to collaborate on the treatises, and the weight of the edge is set as the volume number of the collaborated treatises; an attribute is set for each node: a keyword set corresponding to a paper published by an author in the current year; c, detecting a research group, wherein the research group comprises the division of the research group and the representation of the research group; research hotspot detection; e, carrying out research group and research hotspot detection on the author co-production network in all years; and F, analyzing evolution of the research group and the research hotspot to obtain an evolution trend graph of the research hotspot.)

1. The method for detecting the evolution trend of the research hotspot based on the network structure is characterized by comprising the following steps of:

a dividing subsets by year

Cutting the thesis data set by taking the year as a unit to obtain a sub-thesis data set of each year;

b construction of author's collaborating network

Aiming at a sub-thesis data set of a single year, constructing an author co-publication network, wherein each node represents one author, each edge represents two corresponding authors to collaborate on a thesis, and the weight of the edge is set as the volume of the collaborated thesis; an attribute is set for each node: a keyword set corresponding to a paper published by an author in the current year;

c, detecting a research group, wherein the research group comprises the division of the research group and the representation of the research group;

research hotspot detection;

e, cycling the steps B-D until all the years of research groups and research hotspots are detected;

and F, analyzing evolution of the research group and the research hotspot to obtain an evolution trend graph of the research hotspot.

2. The method for detecting the evolution trend of research hotspots based on the network structure according to claim 1, wherein the step C research population detection specifically includes the following steps:

c.a study population partitioning

The authors are divided into different research groups by utilizing the topological structure of the author collaborative network, the authors in the same research group are dense in interaction, namely close in collaboration, and the authors in different research groups are sparse in interaction, namely little or no in collaboration;

c.b study population representation

C1: aiming at each research group, combining the keyword sets of all authors of each research group, accumulating the occurrence frequency of each keyword, and then selecting 3 keywords with the highest frequency to form a high-frequency keyword set of the research group;

c2: vectorizing the high-frequency keyword set of the research group by using a word2vec model: and aiming at each high-frequency keyword, obtaining a word vector of each word through a word2vec model, and taking the average value of the word vectors as the word vector of the high-frequency keyword.

3. The method for detecting the evolution trend of research hotspots based on the network structure according to claim 1, wherein the step D of detecting the research hotspots specifically includes:

d1: and (3) detecting the research subject: and summarizing high-frequency keyword sets corresponding to all research groups, clustering the keywords according to word vectors, wherein each class corresponds to one research topic.

D2: research group-research subject-research hotspot relationship graph construction: 1) for each study population, the study topics are associated by high frequency keyword sets: if a high frequency keyword belongs to a research topic, then the research topic is a research topic of the research group; 2) associate research topic with research hotspot: for each research topic, the keywords with the frequency higher than the threshold are a research hotspot.

4. The method for detecting the evolution trend of research hotspots based on the network structure according to claim 1, wherein the step F of analyzing the evolution of research groups and research hotspots specifically includes:

f1: similarity calculation is carried out on the research groups of adjacent years, and a evolution diagram of the research groups is constructed;

study population of year tStudy population by t +1 yearsThe similarity of (c) is calculated as follows:

study population at year t +1Study population by year tThe similarity of (c) is calculated as follows:

f3: through the study population evolution diagram, the evolution of the corresponding study subject is obtained:

f4: and counting the research hotspots every year, and drawing a frequency change curve of each research hotspot by taking the year as a horizontal axis and the frequency as a vertical axis to obtain an evolution trend graph of the research hotspots.

5. The method for detecting the evolution trend of research hotspots based on the network structure according to claim 4, wherein the F1 research population evolution diagram is constructed by the following steps: a) each research group is regarded as a node, and nodes corresponding to the research groups in the same year are a group; b) establishing connecting edges between node groups of adjacent years: 1) study population for year t +1If for each study group of year tSatisfies the following conditions:andare all less than the threshold value, thenIs a newly formed research group in t +1 yearsAnd the corresponding node is not connected with any node in the t year; 2) study population for year t +1If only one study population exists in year tSatisfies the following conditions:andare all greater than the threshold value, thenAndthe nodes are the same research group of adjacent years, and the corresponding nodes are connected by an edge; 3) study population for year tIf each study group in year t +1Satisfies the following conditions:andare all less than the threshold value, thenDisappeared in the t +1 th year, and the corresponding node is not connected with any node in the t +1 th year; 4) to is directed atStudy population at year t +1If there are two study groups in year tAndsatisfies the following conditions: andare all greater than the threshold value, thenAndin the t +1 year merge intoAnd isAndthe corresponding nodes are respectively connected toA corresponding node; 5) study population for year tIf there are two study groups in year t +1Andsatisfies the following conditions:andare all greater than the threshold value, thenSplit into in the t +1 yearAndand isThe corresponding node is connected toAnda corresponding node.

6. The method for detecting the evolution trend of research hotspots based on the network structure according to claim 4, wherein the step F3: through the study population evolution diagram, the evolution of the corresponding study theme is obtained, and the method specifically comprises the following steps:

if the study populationCorresponding one of the research subjectsWith a connected study populationCorresponding one of the research subjectsThe similarity between the two is greater than the threshold value, thenAndthe similarity of the two research topics is measured by the Euclidean distance of the mean value of the word vectors, and the calculation formula is as follows:

wherein v isⁱAs a subject of researchMean of word vectors, v, of all keywords in^jAs a subject of researchAnd K is the word vector dimension of all the keywords.

7. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when being executed by a processor, the computer program implements the method for detecting evolution trend of research hotspot based on network structure according to any one of claims 1 to 6.

8. A network structure-based research hotspot evolution trend detection device, characterized by comprising the computer-readable storage medium of claim 7 and a processor for invoking and processing a computer program stored in the computer-readable storage medium.

Technical Field

The invention belongs to the technical field of literature research hotspot detection and research hotspot evolution trend detection, and particularly relates to a network structure-based research hotspot evolution trend detection method.

Background

With the continuous and intensive scientific research and the continuous fusion of interdisciplinary research, the literature is gradually accumulated in a large amount in various academic fields as the result of research of scholars. The research focus in the academic field also changes over time. How to comprehensively analyze the literature by using a scientific and intelligent method, quickly and accurately detect research topics and research hotspots from huge literature database resources and master the evolution trend of the research topics and the research hotspots, and has important significance for scientific researchers to select research directions and make research plans.

When a scientific research staff conducts academic research, in order to quickly master research hotspots in the academic field, the research hotspots in the academic field are inquired, read and collated through own scientific research experience, and besides, a traditional literature metrology method is often adopted, namely, the published number of the literatures corresponding to different research directions, the published number of the texts of related authors and the like are counted, and the research hotspots are determined. But merely searching for documents by keywords to determine the direction of research may miss some of the research content to some extent, making the results less accurate. Meanwhile, the methods all need scientific research personnel to participate in person, and subjective factors are bound to be mixed in the results.

Common document metrology tools look up and use the sharp rising highlight words as leading-edge terms over a period of time by highlight detection. However, the following is commonly used by researchers when using document metrology tools: and constructing a word co-occurrence network by taking the highlighted words as the feature words, and then exploring a research hotspot through the word co-occurrence network. Word co-occurrence networking is the main way of the current research focus extraction. The method comprises the steps that a learner (a first patent is compared with a scientific research hotspot analysis and prediction method based on a knowledge graph) firstly constructs a keyword co-occurrence network, then measures the distance between points of the network according to the keyword co-occurrence frequency, determines a research topic by clustering the keywords, associates structured data such as subject information and the like with the keywords for data fusion to form a scientific knowledge graph, and finally reads the knowledge graph to detect the evolution trend of the topic. And the scholars (a study front edge identification method and a study front edge identification device for combining word semantics and word co-occurrence information) select to time slice the paper data and construct a keyword co-occurrence network of the paper data of a single time slice according to the keyword co-occurrence frequency, then use a word vector embedding technology to represent keywords and calculate the semantic similarity between the keywords and make semantic-based adjustment on the keyword co-occurrence network nodes. Further, clustering is carried out on the adjusted keyword network to form a plurality of clusters. And finally, calculating the similarity of clusters of adjacent time slices to form a subject evolution venation map, and exploring the leading edge subject and hot technology thereof based on the subject evolution venation map. These methods can only find research hotspots from the perspective of keyword co-occurrence, without considering the major contributors of the research hotspots: the authors of the paper.

Scientific collaboration is the mainstream form of scientific research today, and the differing closeness of collaboration among authors can lead to different research populations. Moreover, the premise of the cooperation of the authors is to have similar or identical research contents, which makes the parts of the research contents within the same research group consistent. Thus, the study hotspot is closely related to the study population behind. The more research contents the research population focuses on become a research hotspot. The author collaborating network intuitively embodies the collaboration between authors. Therefore, the invention starts from the author co-authoring network, finds out all research groups by using the topological structure, then determines the research topics according to the semantics of the keywords of the papers written by the authors in the research groups, and finally detects the research hotspots in the research topics according to the frequency of the keywords. By carrying out the above processing on the discourse sets in different years, not only the evolution trend of the research hotspot can be detected, but also the research group which contributes to the formation of the research hotspot can be detected. Further, by tracking the evolution of the study population and its study subjects, the evolution context of the study population and study subjects can also be obtained.

Disclosure of Invention

The present invention is directed to solving the above problems of the prior art. A method, medium and device for detecting research hotspot trend based on a network structure are provided. The technical scheme of the invention is as follows:

the method for detecting the evolution trend of the research hotspot based on the network structure comprises the following steps:

a dividing subsets by year

Cutting the thesis data set by taking the year as a unit to obtain a sub-thesis data set of each year;

b construction of author's collaborating network

c, detecting a research group, wherein the research group comprises the division of the research group and the representation of the research group;

research hotspot detection;

e, cycling the steps B-D until all the years of research groups and research hotspots are detected;

and F, analyzing evolution of the research group and the research hotspot to obtain an evolution trend graph of the research hotspot.

A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a network structure based research hotspot extraction analysis method as recited in any one of the preceding claims.

The device for detecting the evolution trend of the research hotspot based on the network structure comprises the computer readable storage medium and a processor, wherein the processor is used for calling and processing a computer program stored in the computer readable storage medium.

The invention has the following advantages and beneficial effects:

the invention breaks through the constraint of the word co-occurrence network commonly used by the traditional exploration research hotspot, and considers that the formation of the research hotspot cannot be supported by research groups, and the more research contents concerned by the research groups are more popular. The invention starts from the author's co-authoring network and achieves the purpose of detecting research hotspots and evolution trends thereof by detecting research groups. Besides, the invention can also: 1) detecting research groups contributing to the formation of research hotspots, and also concerning which other research contents the research groups concern; 2) the evolution context of the study population and the study subject is obtained, and the evolution of the study hotspot is more clearly explained.

Drawings

FIG. 1 is a flowchart illustrating a method for detecting trends in research hotspots based on a network architecture according to a preferred embodiment of the present invention;

FIG. 2 is a study population-study topic-study hotspot relationship diagram

FIG. 3 is a study population evolution diagram

Detailed Description

The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

fig. 1 is a work flow chart of a research hotspot trend detection method based on a network structure, which specifically includes the following steps:

a dividing subsets by year

And cutting the paper data set by taking the year as a unit to obtain a sub-paper data set of each year.

B construction of author's collaborating network

And constructing a co-author network aiming at the sub-thesis data set of a single year. Wherein each node represents an author, each edge represents two corresponding authors collaborating on the treatises, and the weight of the edge is set as the volume of the collaborated treatises. An attribute is set for each node: the keyword set of the paper published by the corresponding author in the current year.

Study population detection

C.a study population partitioning

Authors are divided into different study groups using the topology of the author's collaborating web, with authors within the same study group interacting densely (i.e., collaborating closely), and authors between different study groups interacting sparsely (i.e., collaborating little or hardly).

C.b study population representation

C1: aiming at each research group, combining the keyword sets of all authors of each research group, accumulating the occurrence frequency of each keyword, and then selecting the 3 keywords with the highest frequency to form the high-frequency keyword set of the research group.

C2: vectorizing the high-frequency keyword set of the research group by using a word2vec model: and aiming at each high-frequency keyword, obtaining a word vector of each word through a word2vec model, and taking the average value of the word vectors as the word vector of the high-frequency keyword.

Research hotspot detection

D1: and (3) detecting the research subject: and summarizing high-frequency keyword sets corresponding to all research groups, clustering the keywords according to word vectors, wherein each class corresponds to one research topic.

D2: research group-research subject-research hotspot relationship graph construction: 1) for each study population, the study topics are associated by high frequency keyword sets: if a high frequency keyword belongs to a research topic, then that research topic is a research topic of the research population. 2) Associate research topic with research hotspot: for each research topic, the keywords with the frequency higher than the threshold are a research hotspot.

E, cycling the steps B-D until all the years of research groups and research hotspots are detected;

study population and study hotspot evolution analysis

F1: and (5) carrying out similarity calculation on the research groups of adjacent years, and constructing a evolution diagram of the research groups.

Study population of year tStudy population by t +1 yearsThe similarity of (c) is calculated as follows:

study population at year t +1Study population by year tThe similarity of (c) is calculated as follows:

the study population evolution diagram is constructed by the following steps: a) each research group is regarded as a node, and nodes corresponding to the research groups in the same year are a group. b) Establishing connecting edges between node groups of adjacent years: 1) study population for year t +1If for each study group of year tSatisfies the following conditions:andare all less than the threshold value, thenIs a newly formed study group in the year t +1, and the corresponding node does not generate connection with any node in the year t. 2) Study population for year t +1If only one study population exists in year tSatisfies the following conditions:andare all greater than the threshold value, thenAndthe same study group of adjacent years, and the corresponding nodes are connected by an edge. 3) Study population for year tIf each study group in year t +1Satisfies the following conditions:andare all less than the threshold value, thenDisappeared in year t +1, and the corresponding node does not make a connection with any node in year t + 1. 4) Study population for year t +1If there are two study groups in year tAndsatisfies the following conditions: andare all greater than the threshold value, thenAndin the t +1 year merge intoAnd isAndthe corresponding nodes are respectively connected toA corresponding node. 5) Study population for year tIf there are two study groups in year t +1Andsatisfies the following conditions:andare all greater than the threshold value, thenSplit into in the t +1 yearAndand isThe corresponding node is connected toAnda corresponding node.

F2: through the study population evolution diagram, the evolution of the corresponding study subjects can be further obtained: if the study populationCorresponding one of the research subjectsAnd study populationCorresponding one of the research subjectsThe similarity between the two is greater than the threshold value, thenAndthe subjects belonging to the same study group, that is to say the study subjects of the respective study groups, did not change significantly. The similarity of the two research topics is measured by using Euclidean distance of a word vector mean value, and the calculation formula is as follows:

wherein v isⁱAs a subject of researchMean of word vectors, v, of all keywords in^jAs a subject of researchAnd K is the word vector dimension of all the keywords.

F3: and counting the research hotspots every year, and drawing a frequency change curve of each research hotspot by taking the year as a horizontal axis and the frequency as a vertical axis to obtain an evolution trend graph of the research hotspots.

FIG. 2 is a plot of study population-study topic-study hotspot relationship. And obtaining a research topic T according to the high-frequency keyword set association of the research group C, wherein the research content of one research group may belong to a plurality of research topics, and one research topic may be concerned by a plurality of research groups. Research hotspot H is determined by the high frequency keywords of the research topic.

FIG. 3 is a study population evolution diagram. The evolution events that may occur over time in a study population are divided into: split, disappear, remain unchanged, merge, appear, and the like. Study population C at time t₁Split into C at time t +1_1,1And C_1,2(ii) a Study population C at time t₂Vanish at time t + 1; study population C at time t₃Keeping unchanged at the moment t + 1; c at time t₄、C₅At time t +1, combined into a study population C_4,5(ii) a Study population C₆It is a newly emerging study population at time t + 1.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

11页详细技术资料下载

Network structure-based detection method, medium and equipment for research hotspot evolution trend

相关技术

网友询问留言