Event extraction and judgment method and system

文档序号:1963734 发布日期:2021-12-14 浏览:18次 中文

阅读说明:本技术 一种事件抽取判断方法及系统 (Event extraction and judgment method and system ) 是由 于兴文 于 2021-06-02 设计创作,主要内容包括:本发明公开了一种事件抽取判断方法与系统,涉及信息处理领域,其技术要点是,包括以下步骤:获取自然语料,对所述自然语料进行预处理,以获得目标语料;基于目标语料,运用算法库分别对其进行事件类型判断,以分别获得多组目标类型结果;基于所述多组目标类型结果,输出最优类型结果;基于命名实体识别与模式匹配算法,提取目标语料的目标事件元素,并对目标事件元素进行判别;基于最优类型结果与目标事件元素的对应关系,输出事件抽取结果。通过算法库在事件类型抽取上具有更高的精度。同时,通过两阶段式的召回精筛提升事件元素抽取的准确率,提升了整个事件抽取任务在具体业务场景下的算法精度。(The invention discloses an event extraction judgment method and system, relating to the field of information processing and having the technical key points that the method comprises the following steps: acquiring a natural corpus, and preprocessing the natural corpus to acquire a target corpus; based on the target corpus, respectively judging the event types of the target corpus by using an algorithm library so as to respectively obtain a plurality of groups of target type results; outputting an optimal type result based on the multiple groups of target type results; extracting target event elements of the target corpus based on named entity recognition and pattern matching algorithm, and distinguishing the target event elements; and outputting an event extraction result based on the corresponding relation between the optimal type result and the target event element. The event type extraction has higher precision through the algorithm library. Meanwhile, the accuracy of event element extraction is improved through the two-stage recall fine screening, and the algorithm precision of the whole event extraction task in a specific service scene is improved.)

1. An event extraction and judgment method is characterized by comprising the following steps:

acquiring a natural corpus, and preprocessing the natural corpus to acquire a target corpus;

based on the target corpus, respectively judging the event types of the target corpus by using an algorithm library so as to respectively obtain a plurality of groups of target type results;

outputting an optimal type result based on the multiple groups of target type results;

extracting target event elements of the target corpus based on named entity recognition and pattern matching algorithm, and distinguishing the target event elements;

and outputting an event extraction result based on the corresponding relation between the optimal type result and the target event element.

2. The method as claimed in claim 1, wherein the pre-processing of the natural corpus to obtain the target corpus comprises semantic error correction, sentence segmentation and word processing.

3. The method according to claim 1, wherein the event type is determined by using an algorithm library based on the target corpus to obtain a plurality of target type results, wherein the algorithm library at least comprises a text classification algorithm, a text matching algorithm and a trigger matching algorithm;

performing event type classification on the target corpus by a text classification algorithm;

the text matching algorithm is constructed by constructing a sample vector library, collecting strong intention information in sample corpus data by collecting sample corpus data, and classifying event types by vectorizing a target corpus, collecting target strong intention information corpus and distance between the target strong intention information corpus and a vector autumn chrysanthemum in the vector library;

the trigger matching algorithm classifies event types through trigger and trigger modes.

4. The method as claimed in claim 1, wherein the target event elements of the target corpus are extracted based on the named entity recognition and pattern matching algorithm, and wherein the recognition result based on the named entity recognition is inputted into the template dictionary of the pattern matching algorithm for modification and filtering.

5. An event extraction judging system is characterized by comprising a data preprocessing module, an event type judging module, an event element extraction module and an event output module;

the data preprocessing module is configured to acquire natural corpus and preprocess the natural corpus to acquire target corpus;

the event type judging module is configured to respectively judge the event types of the target linguistic data by using an algorithm library based on the target linguistic data so as to respectively obtain a plurality of groups of target type results, and output an optimal type result based on the plurality of groups of target type results;

the event element extraction module is configured to extract target event elements of the target corpus based on named entity recognition and pattern matching algorithm, and judge the target event elements;

the event output module is configured to output an event extraction result based on the corresponding relationship between the optimal type result and the target event element.

6. The event extraction and judgment system according to claim 5, further comprising a data collection module, wherein the data collection module is configured to collect natural corpus by microphone or keyboard input and send the natural corpus to the data preprocessing module.

7. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to carry out the method of any one of claims 1 to 4 when executed.

8. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 4.

Technical Field

The invention relates to the field of information processing, in particular to an event extraction and judgment method and system.

Background

With the rapid development of the internet, more and more information is presented to users in the form of electronic text. In order to help a user to quickly find needed information in massive information, an information extraction concept is provided. Information extraction refers to extracting factual information from natural language text and describing the information in a structured form. The event extraction is an important research direction in information extraction, and mainly refers to extracting interesting event information from text data containing the event information, and presenting the events expressed by natural language in a structured form, such as people, places, time and things, so that the event extraction has a very wide application prospect in the present massive information era.

The event extraction is to extract the events which are interested by the user from the text describing the event information and present the events in a structured form, so that the events can be extracted from massive natural texts, and chat software in the current market is more, such as QQ and WeChat which are commonly used in daily communication; stings of long duration in work. Chat software is an indispensable network tool for people at present, regardless of work or study.

Natural language is commonly adopted among users of chat software as a communication means, and the natural language of chat often contains a lot of event information. The event referred to herein specifically includes two attributes of an event type and an event element, wherein the event element is different in content according to different event types, but generally includes fields of time, place, person, and the like.

At present, the event extraction technology for natural language in the chat software usually involves two aspects, namely judgment of event types and extraction of event elements. The existing event extraction methods include pattern matching based and machine learning based methods. The pattern matching in the professional field needs to define a large number of templates for event recognition and extraction; traditional machine learning-based approaches typically convert event extraction problems into classification problems, based on phrase or sentence-level information. The method comprises the steps of performing sentence segmentation, word segmentation, entity identification, syntax and dependency relationship on text information, extracting word meaning characteristics and semantic characteristics of contexts of candidate words by using a natural language processing tool, constructing a characteristic vector as input of a classifier, predicting a trigger word of an event by using the classifier, and judging the type of the event according to the type of the trigger word.

The event extraction method based on pattern matching matches sentences to be extracted with an existing template based on a certain pattern (context environment), wherein the construction of the pattern requires expert knowledge in the field, manual establishment is carried out, the labor and time costs are high, the portability is poor, and the transplanting from one field to another field is equivalent to reestablishing; the event extraction method based on machine learning comprises the steps of dividing the event extraction into an entity extraction stage and an event judgment stage, wherein the judgment of an event is influenced by the error of named entity identification, and an accumulated error is brought; secondly, for a specific field, a large number of artificial features are often required to be constructed, the cost of the feature selection process is high, and the maintainability becomes worse and worse as the complexity of the model is improved.

Disclosure of Invention

The invention aims to provide an event extraction judgment method and system, which at least solve the problem that certain false detection rate is often caused in event extraction due to the characteristics of poor flexibility and complex form of a template.

In order to achieve the purpose, the invention provides the following technical scheme: an event extraction judging method comprises the following steps:

acquiring a natural corpus, and preprocessing the natural corpus to acquire a target corpus;

based on the target corpus, respectively judging the event types of the target corpus by using an algorithm library so as to respectively obtain a plurality of groups of target type results;

outputting an optimal type result based on the multiple groups of target type results;

extracting target event elements of the target corpus based on named entity recognition and pattern matching algorithm, and distinguishing the target event elements;

and outputting an event extraction result based on the corresponding relation between the optimal type result and the target event element.

The application is further configured to pre-process the natural corpus to obtain a target corpus,

the method comprises the steps of performing semantic error correction, sentence segmentation processing and uncommon word processing on the natural corpus.

The application is further configured to, based on the target corpus, respectively perform event type judgment on the target corpus by using an algorithm library to respectively obtain a plurality of groups of target type results, wherein the algorithm library at least comprises a text classification algorithm, a text matching algorithm and a trigger word matching algorithm;

performing event type classification on the target corpus by a text classification algorithm;

the text matching algorithm is constructed by constructing a sample vector library, collecting strong intention information in sample corpus data by collecting sample corpus data, and classifying event types by vectorizing a target corpus, collecting target strong intention information corpus and distance between the target strong intention information corpus and a vector autumn chrysanthemum in the vector library;

the trigger matching algorithm classifies event types through trigger and trigger modes.

The application is further configured such that the target event elements of the target corpus are extracted based on the named entity recognition and pattern matching algorithm, wherein the recognition result based on the named entity recognition is input to a template dictionary of the pattern matching algorithm for correction and filtering.

The application also provides an event extraction judgment system which is characterized by comprising a data preprocessing module, an event type judgment module, an event element extraction module and an event output module;

the data preprocessing module is configured to acquire natural corpus and preprocess the natural corpus to acquire target corpus;

the event type judging module is configured to respectively judge the event types of the target linguistic data by using an algorithm library based on the target linguistic data so as to respectively obtain a plurality of groups of target type results, and output an optimal type result based on the plurality of groups of target type results;

the event element extraction module is configured to extract target event elements of the target corpus and judge the target event elements based on named entity recognition and pattern matching algorithm;

the event output module is configured to output an event extraction result based on the correspondence between the optimal type result and the target event element.

The application is further configured to further include a data collection module configured to collect the natural corpus through microphone or keyboard input and send the natural corpus to the data preprocessing module.

The present application also provides a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to perform the aforementioned method when executed.

The present application also provides an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the aforementioned method.

Compared with the prior art, the invention has the following beneficial effects: different improvements are carried out at each stage of natural corpus event extraction; and the method integrates multiple natural language processing technologies such as text disambiguation, dependency syntax analysis, named entity identification, text semantic matching, text classification and the like. The whole event mining process can be more flexible and intelligent, and the manual workload is reduced; and the method further adopts a template and dictionary mode to control and bottom-holding the event extraction process and result, so as to ensure the controllability and accuracy of the event extraction result.

Drawings

Fig. 1 is a block diagram of a hardware structure of a mobile terminal according to an event extraction determination method in an embodiment of the present application;

FIG. 2 is a flow chart of the general steps of the method of the present application;

FIG. 3 is a flowchart illustrating the detailed steps of the extraction and determination method according to the present application;

fig. 4 is a block diagram of the present application.

Detailed Description

The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

The method embodiments provided in the embodiments of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking an example of an event running on a mobile terminal, fig. 1 is a hardware structure block diagram of the mobile terminal of an event extraction and determination method according to an embodiment of the present application. As shown in fig. 1, the mobile terminal may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), and a memory 104 for storing data, wherein the mobile terminal may further include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and does not limit the structure of the mobile terminal. For example, the mobile terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 can be used for storing a computer program, for example, a software program and a module of an application software, such as a computer program corresponding to a method for measuring a distribution parameter of a high-frequency transformer in the embodiment of the present application, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the mobile terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet via wireless.

As shown in fig. 2 and 3, an event extraction and determination method includes the following steps:

acquiring natural corpus, and preprocessing the natural corpus to acquire target corpus, wherein semantic error correction, sentence breaking processing and uncommon word processing are performed on the natural corpus;

based on the target corpus, respectively judging the event types of the target corpus by using an algorithm library so as to respectively obtain a plurality of groups of target type results;

outputting an optimal type result based on the multiple groups of target type results;

extracting target event elements of the target corpus based on named entity recognition and pattern matching algorithm, and distinguishing the target event elements;

and outputting an event extraction result based on the corresponding relation between the optimal type result and the target event element.

The natural corpus is preprocessed, chat natural language information obtained by chat software or other communication environments is used as input in the preprocessing, the input natural language data is used as the natural corpus, and the input natural language data needs to be processed by a pycorrector semantic error correction tool because the chat sentences are large and the situations of wrongly written or ill-formed words can exist. The sentence breaking processing is needed for longer natural language, and the processing and replacement are needed for partial special characters and stop words so as to ensure the extraction quality of the follow-up events.

And based on the target linguistic data, respectively judging the event types of the target linguistic data by using an algorithm library so as to respectively obtain a plurality of groups of target type results. Wherein; the algorithm library at least comprises a text classification algorithm, a text matching algorithm and a trigger word matching algorithm, the three algorithms in the algorithm library are adopted for simultaneous event type judgment, and in practical implementation, the three algorithms are not limited to be used, but the idea of simultaneous calculation by adopting multiple algorithms is still in the central idea of the embodiment.

Specifically, in the first embodiment, through a text classification algorithm, the event type classification can be performed on the target corpus by using a natural language classification algorithm of albert + nn.

Secondly, a text matching algorithm is constructed by constructing a sample vector library, the sample vector library is constructed by collecting sample corpus data and collecting strong intention information in the sample corpus data, and the text matching algorithm is constructed by vectorizing a target corpus, collecting target strong intention information corpus and carrying out event type classification by the distance between the target strong intention information corpus and a vector autumn chrysanthemum in the vector library; the method comprises the steps of vectorizing chat information by using a simbert + write method, storing collected information vectors with strong intentions to form a vector library to be searched, and solving the distance between new chat natural language information and vectors in the vector library so as to judge the event type of the text.

And thirdly, the trigger word matching algorithm classifies the event types through the trigger words and the trigger modes. The chat text is matched by adopting a traditional method of triggering words and triggering modes, and the event type of the chat text is judged.

After the three algorithms are used for calculating three groups of event calculation results, the optimal type result is output based on the multiple groups of target type results, in the embodiment, the tree model is adopted to synthesize the results of the three algorithms, the optimal event type judgment result is output, and other neural network models or traditional model algorithms can be used for judging and selecting one for output after sample training or manual parameter input.

And extracting target event elements of the target corpus based on the named entity recognition and pattern matching algorithm, wherein the recognition result based on the named entity recognition is input into a template dictionary of the pattern matching algorithm for correction and filtration. The event elements are distinguished and mined by using two algorithms of named entity recognition and pattern matching. An albert + self-attack model is adopted for named entity recognition on a chat natural language, and unstable characteristics exist in a model recognition result under partial conditions. The ner results are put into a template dictionary for correction and filtering. In this module, the ner's method acts as a broad recall event element, while the template dictionary acts as a further fine screen.

Due to the comprehensive use of a text classification algorithm, a semantic matching algorithm and a trigger word matching method, the method has higher precision in event type extraction. Meanwhile, a ner algorithm and a pattern recognition method are combined, and the accuracy of event element extraction is improved through a two-stage recall fine screen. The embodiment improves the algorithm precision of the event extraction task under a specific service scene as a whole.

As shown in fig. 3 and fig. 4, the present embodiment further discloses an event extraction and judgment system, which applies the foregoing method and includes a data preprocessing module, an event type judgment module, an event element extraction module, and an event output module;

the data preprocessing module is configured to acquire natural corpus and preprocess the natural corpus to acquire target corpus;

the event type judging module is configured to respectively judge the event types of the target linguistic data by using an algorithm library based on the target linguistic data so as to respectively obtain a plurality of groups of target type results, and output an optimal type result based on the plurality of groups of target type results;

the event element extraction module is configured to extract target event elements of the target corpus and judge the target event elements based on named entity recognition and pattern matching algorithm;

the event output module is configured to output an event extraction result based on the correspondence between the optimal type result and the target event element.

The system comprises a data preprocessing module and a data acquisition module, wherein the data acquisition module is configured to collect natural corpora through microphone or keyboard input and send the natural corpora to the data preprocessing module.

The present application also provides a computer-readable storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the aforementioned method when executed.

The present application also provides an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the aforementioned method.

Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by instructing hardware associated with the terminal device through a program, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on this understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the methods described in the embodiments of the present application.

In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other ways of dividing the actual implementation, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a hardware form, and can also be realized in a software functional unit form.

The foregoing is only a preferred embodiment of the present application and it should be noted that, for a person skilled in the art, several modifications and improvements can be made without departing from the principle of the present application, and these modifications and improvements should also be considered as the protection scope of the present application.

11页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:数据热度的分析方法、装置、设备及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!