Method and device for extracting document elements of divorce dispute referee

文档序号:1043317 发布日期:2020-10-09 浏览:8次 中文

阅读说明:本技术 离婚纠纷裁判文书要素提取方法及装置 (Method and device for extracting document elements of divorce dispute referee ) 是由 刘大双 晋耀红 李德彦 张志一 于 2020-06-12 设计创作,主要内容包括:本申请公开了离婚纠纷裁判文书要素提取方法及装置,该方法首先根据预设的裁判文书目录结构,将离婚纠纷裁判文书切分成多个文本块,每个文本块对应于裁判文书目录结构中的一个目录标题;根据文本块对应的目录标题,从多个文本块中选择与待提取的目标要素对应的目标文本块,最后针对不同的目标文本块抽取不同目标要素的要素值。通过本申请方法,可以从离婚纠纷裁判文书中抽取完整的文书要素,便于相关人员通过文书要素了解离婚纠纷裁判文书。(The method comprises the steps of firstly, according to a preset referee document directory structure, cutting the divorce referee document into a plurality of text blocks, wherein each text block corresponds to one directory title in the referee document directory structure; and selecting a target text block corresponding to the target element to be extracted from the plurality of text blocks according to the directory title corresponding to the text block, and finally extracting element values of different target elements aiming at different target text blocks. By the method, complete document elements can be extracted from the divorce officer documents, and relevant personnel can know the divorce officer documents conveniently through the document elements.)

1. A divorce officer document element extraction method is characterized by comprising the following steps:

acquiring a divorce referee document;

according to a preset referee document directory structure, cutting the divorce referee document into a plurality of text blocks, wherein the referee document directory structure comprises a plurality of directory titles, and each text block corresponds to one directory title;

selecting a target text block corresponding to a target element to be extracted from the plurality of text blocks according to the directory title corresponding to the text block;

and extracting the element value of the target element from the selected target text block.

2. The method of claim 1, wherein selecting a target text block corresponding to a target element to be extracted from the plurality of text blocks according to a directory title corresponding to the text block comprises:

acquiring a preset corresponding relation between a catalog title and a document element;

selecting a directory title corresponding to the target element according to the preset corresponding relation;

and determining a target text block corresponding to the target element according to the directory title corresponding to the target element.

3. The method of claim 1, wherein the extracting the element value of the target element from the selected target text block comprises:

acquiring an element tree corresponding to the target elements, wherein the element tree comprises element nodes corresponding to each target element and extraction rules of the element nodes;

and extracting the element value of the target element from the corresponding target text block by using the element tree.

4. The method of claim 3, wherein the target element corresponds to an element node having at least two sub-nodes, one of the sub-nodes corresponding to a class label of the target element, and wherein after extracting the element value of the target element from the corresponding target text block, the method further comprises:

classifying the element values of the target elements to obtain class labels hit by the element values;

taking the first preset value as the element value of the child node corresponding to the hit category label;

and taking the second preset value as the element value of the child node corresponding to the missed category label.

5. The method of claim 4, after extracting the element value of the target element from the corresponding target text block, the method further comprising:

generating a data object with the element node name and/or the sub-node name corresponding to the target element as a field name;

and assigning the data object by using the element node name and/or the element value corresponding to the child node name.

6. The method according to any one of claims 1-5, wherein the target element comprises a litigation request element, and when the target element is the litigation request element, the extracting the element value of the target element from the selected target text block comprises:

extracting litigation request information texts from text blocks corresponding to the litigation request elements;

and segmenting the extracted litigation request information text to obtain one or more independent litigation request items.

7. The method of any of claims 1-5, wherein the target element comprises a dispute focus element, and wherein the element node corresponding to the dispute focus element comprises one or more child nodes corresponding to dispute focus category labels;

when the target element is the dispute focus element, the extracting the element value of the target element from the selected target text block includes:

identifying a dispute focus information text from a text block corresponding to the dispute focus element;

segmenting the identified dispute focus information text to obtain a dispute focus list comprising one or more independent dispute focus items;

obtaining a dispute focus category label hit by each independent dispute focus item;

and taking the first preset value as the element value of the child node corresponding to the hit dispute focus category label, and taking the second preset value as the element value of the child node corresponding to the missed dispute focus category label.

8. The method according to any one of claims 1 to 5, wherein the target element comprises a child condition element, the element node corresponding to the child condition element comprises a child node corresponding to a child condition element class label, and when the target element is a child condition element, the extracting the element value of the target element from the selected target text block comprises:

identifying a sentence where child information is located from a text block corresponding to the child condition element;

extracting element values of the child condition elements from the sentence where the identified child information is located;

acquiring a category label hit by the element value of the child condition element;

the first predetermined value is used as the element value of the child node corresponding to the hit category label, and the second predetermined value is used as the element value of the child node corresponding to the missed category label.

9. The method according to any one of claims 1-5, wherein the target elements comprise evidence class elements including proof elements, quality evidence elements, and court certification elements;

the evidence elements comprise evidence numbers, evidence lists, evidence facts and evidence rules, the evidence rules comprise at least one evidence item, and each evidence item comprises an evidence list element arranged according to a bit sequence and an evidence fact element adjacent to the evidence list element;

the quality certification elements comprise quality certification evidence numbers, quality certification results, quality certification opinions and quality certification rules, the quality certification rules comprise at least one quality certification item, and each quality certification item comprises an evidence number element, a quality certification result element and a quality certification opinion element, wherein the evidence number elements, the quality certification result elements and the quality certification opinion elements are arranged in a bit sequence;

the court certification elements comprise evidence numbers, evidence providers, court certification results and court certification rules, the court certification rules comprise at least one certification basis item, and each certification basis item comprises the evidence provider elements arranged in a bit sequence, the evidence number elements adjacent to the evidence provider elements and the court certification result elements adjacent to the evidence number elements.

10. A divorce officer's document element extraction device, the device comprising:

the text acquisition module is used for acquiring divorce judgment documents;

the text cutting module is used for cutting the divorce officer document into a plurality of text blocks according to a preset officer document directory structure, the officer document directory structure comprises a plurality of directory titles, and each text block corresponds to one directory title;

the selection module is used for selecting a target text block corresponding to a target element to be extracted from the plurality of text blocks according to the directory title corresponding to the text block;

and the element extraction module is used for extracting the element value of the target element from the selected target text block.

Technical Field

The application relates to the technical field of text processing, in particular to a divorce officer document element extraction method and device.

Background

The referee document is a carrier for recording the result of litigation activities such as the trial process and the result of the people's court, and is also a unique certificate for the people's court to determine and distribute the entity right obligation of the party. Official documents usually have a regular structural framework and writing format, which may be slightly different for different types of official documents. Common types of documents include civil referee documents (e.g., civil adjudication documents), criminal referee documents (e.g., criminal adjudication documents), administrative referee documents (e.g., administrative adjudication documents), and other general litigation documents, among others.

Since the official documents are described with important information such as the trial and error process and the decision result, which has important values for analysis and attention, for example, performing a case analysis, a case search, etc. based on the information, extracting valuable information (i.e., document elements) from the official documents is a basic requirement of practitioners in the related art.

The divorce officials document is one of civil officials documents, and in order to comprehensively understand the content of the divorce officials document, how to extract complete document elements from the divorce officials document becomes a technical problem to be urgently solved by technical personnel in the field.

Disclosure of Invention

The application provides a divorce officer document element extraction method and device, which aim to solve the problem of how to extract complete document elements from the divorce officer document.

In a first aspect, the present application provides a divorce officials document element extraction method, including:

acquiring a divorce referee document;

according to a preset referee document directory structure, cutting the divorce referee document into a plurality of text blocks, wherein the referee document directory structure comprises a plurality of directory titles, and each text block corresponds to one directory title;

selecting a target text block corresponding to a target element to be extracted from the plurality of text blocks according to the directory title corresponding to the text block;

and extracting the element value of the target element from the selected target text block.

In a second aspect, the present application further provides a divorce officials document element extraction device, the device comprising:

the text acquisition module is used for acquiring divorce judgment documents;

the text cutting module is used for cutting the divorce officer document into a plurality of text blocks according to a preset officer document directory structure, the officer document directory structure comprises a plurality of directory titles, and each text block corresponds to one directory title;

the selection module is used for selecting a target text block corresponding to a target element to be extracted from the plurality of text blocks according to the directory title corresponding to the text block;

and the element extraction module is used for extracting the element value of the target element from the selected target text block.

According to the technical scheme, the method for extracting the divorce officials document elements comprises the steps of firstly cutting the divorce officials document into a plurality of text blocks according to a preset officials document directory structure, wherein each text block corresponds to one directory title in the officials document directory structure; and selecting a target text block corresponding to the target element to be extracted from the plurality of text blocks according to the directory title corresponding to the text block, and finally extracting element values of different target elements aiming at different target text blocks. By the method, complete document elements can be extracted from the divorce officer documents, and relevant personnel can know the divorce officer documents conveniently through the document elements.

Drawings

In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 is a flow chart of an exemplary divorce officer's document element extraction method shown in the present application;

fig. 2 is a block diagram of an exemplary divorce officer document element extraction device according to the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the judicial field, a referee document is a special document for recording the result of litigation activities such as the trial process and the result of the national court, and generally has a uniform structural composition and writing format, and each part of the composition (i.e. text block) corresponds to a content subject for representing the subject matter of the content covered by the part.

Taking the civil judgment as an example, the civil judgment is composed of head information, party information, trial process, litigant request, debate by litigant, dispute focus, evidence catalogue, trial finding, court view, judgment result and tail information, the corresponding components of each topic have a specific writing format or description mode, and each component contains established elements, for example, the head information necessarily contains the name of the trial court, the case number and the like.

Since the official document records important information such as the trial process and the result, which has important values for analysis and attention, the official document can be comprehensively known by extracting valuable information from the official document. For example, the case type, case number, trial court name, trial court hierarchy, territory, conference members, and basic document elements such as time of acceptance, time of trial, etc. are known.

In order to comprehensively understand the content of a divorce referee document, an embodiment of the present application provides a method for extracting elements of the divorce referee document, fig. 1 is a flowchart of the method according to an exemplary embodiment of the present application, and as shown in fig. 1, the method may include:

step 101, acquiring divorce judgment documents.

In the present application, the divorce officer document may be a divorce dispute first-pass judgment book, such as a divorce dispute first-pass judgment book with some plum.

And 102, cutting the marriage dispute official document into a plurality of text blocks according to a preset official document directory structure, wherein the official document directory structure comprises a plurality of directory titles, and each text block corresponds to one directory title.

In some embodiments, a data set of divorce referee documents of a certain scale is collected, the directory structure composition of the divorce referee documents and the compiling characteristics of each part of the divorce referee documents are learned and mined, a directory tree is built by using the directory structure composition characteristics of the referee documents and taking directory titles in the directory structure composition as directory nodes, and an extraction rule, such as at least one extraction expression, is designed for each directory node by using the compiling characteristics of each part of the composition. By using the extraction rule corresponding to the directory node, the text block corresponding to the directory node can be extracted from the divorce judge document, namely the text block corresponding to the directory title in the directory structure of the judge document.

In some embodiments, the directory node sequentially lists the content topics of text blocks that may exist in the umpire document, and the extraction rules under the directory node are used to extract text blocks from the umpire document that correspond to the directory node or content topic, a text block comprising one or more paragraphs.

Civil judgment book

Header information-extraction expression

Party information- -extraction expression >

The trial process is through an-extraction expression

Litigant request- - - - - < extraction expression >

Is resolved by litigation party- - - - - - - - - - < extraction expression >

Checking-out- -finding- -extracting expression >

The focus of dispute- - - - - - - - - - - < extraction expression >

Court view-of-extraction expression

Decision result-extraction expression

Tail information-extraction expression

Wherein, the "civil judgment" is the name of the directory tree selected according to the document type, and the "header information" and the like are the directory nodes included in the directory tree.

In some embodiments, the extraction expression corresponding to each directory node is used to extract the block header information of each text block, so that the start position of each text block can be determined according to the block header information, and paragraph contents between two adjacent start positions are extracted to obtain the corresponding text block.

By the method, a divorce officer document is cut into pieces, and an exemplary cutting result is obtained as follows:

< header information >/

People's court in XX district of Beijing City

Civil judgment book

(2018) Jing 0105 Minjun 77967 No.

< party information >/</su

Original notice: liu Gong, male, born in 31 th of 10 th of 1983, Han nationality, in the rising area of Beijing.

Entrusting litigation attorneys: zhao, Beijing xx law firm lawyer.

Is informed: luo Gong, female, born in 25.4.1985, Han nationality, located in the sunny region of Beijing.

Entrusting litigation attorneys: xian, the lawyer of Beijing xx law firm.

< audit pass >/(R)

When a case of property disputes after a divorce is reported to Yu in original Liu, … is carried out after the case is accepted and the case is settled. The present application has been examined and finalized.

< litigation-party request >/</

Liu somewhere asks the court for litigation: 1. luo somebody pays me 309678 yuan; 2. litigation fees are borne by the defendant. Facts and reasons: i together with Luo-Chi … 2012 in 11 months of 2012, I applied for buying two rooms located in the area facing the sun and at the arm X-yard X-building X-number, … I thought that Luo-Chi should pay me the portion that I has paid back. If Lu and a certain requirement divide the share of the house, I require Lu and a certain requirement pay tax and loan-half cost and decorate half of the cost.

< solicited by litigation party >/</H

Some dialects of Luo are written in divorce judgment to solve house division and debt together, and I require that houses are divided according to shares and two thirds of houses are divided.

< evidence directory >/(X >

The plaintiff submitted evidence around lawsuit requests legislation:

evidence 1, list 1, evidence 1;

evidence 2, manifest 2, proof 2;

the following evidence is verified by the court justice and certification:

evidence 1, identifying result 1;

evidence 2, affirming result 2;

< audit found >/broken

The hospital is approved to recognize the fact as follows: … are provided.

< focus of dispute >/</su

Lu Chi advocates dividing the concerned houses according to shares. Lu Di should have ownership for two thirds of the way that Lu Di is considered as a family name when applying for a house and considering the daughter of both parties. Liu Gong believes that although the house related to the case is applied in the name of a family, the qualification that the house is the identity of a transfer soldier to obtain the shaking number is mainly considered. Liu somebody advocates that Lu somebody paid the loan and duty after the Lu certain repayment marriage and requires Lu somebody to pay half of the decoration cost.

< court opinion >/</>/broken

This hospital deems, according to the relevant legal provisions, ….

< decision result >/

The decision is as follows:

from the month this decision was in effect, ….

< trailer information >/

In the above example, since the directory node composition of the directory tree is designed according to the block composition of the referee document, the directory node composition of the directory tree corresponds to the block composition of the referee document, each text block obtained by splitting the directory tree corresponds to one directory node in the directory tree, and the directory title of the corresponding text block can be obtained by the directory node name.

And 103, selecting a target text block corresponding to the target element to be extracted from the plurality of text blocks according to the directory title corresponding to the text block.

In some embodiments, by learning the data structure characteristics of the divorce officer document, a professional term knowledge system of the divorce case is mined, such as child foster distribution, common property distribution, emotional breaking affairs and the like, and an element system structure of the divorce officer document is designed according to the mined professional term knowledge system, wherein the element system structure comprises multi-level and multi-dimensional document element information, such as litigation requests, case facts, dispute focuses, judgment results and the like.

In the embodiment of the present application, a document element to be extracted, that is, a target element is selected from a preset element architecture.

An exemplary element architecture is established by learning and mining the data structure features of the divorce officer's paperwork as follows:

litigation request

List of litigation requests

Whether the quilt agrees to divorce

-is

-no

-none

Fact of case

-child condition element

-number of children

-name of child

……

Age (R)

String with original relationship (child-daughter, parent-daughter, child-mother, married-child)

String in relation to quilt (child-daughter, parent-daughter, child-mother, married-child)

Lactation whether or not

Whether it is underage

Past following party String (original, defended)

Willingness to follow the party String (original, defended)

Willingness tending party String (original, defended)

Marriage type String (factual marriage, legal marriage)

Whether or not to become a separate residence

……

-original or not female prescription

Whether or not the original is active military

Whether or not the quilt is active for military use

-female equity conditions String (during pregnancy, within 1 year of delivery, within 6 months after pregnancy)

-common property

-common creditor debt

Whether to initiate complaints several times

-elements of evidence class

Demonstration of origin

Presentation of evidence

Excellent rules of argumentation

Evidence numbering

List of evidence

Evidence of facts

Is reported to prove

……

-proof of provenance

-presentation of quality evidence

The principle of quality verification

Evidence numbering

Quality and quality results

Quality and evidence opinions

-quilt certificate

……

-court approval element

-presentation of evidence

-identifying fine rules

Evidence numbering

-an evidence provider

Results of court approval

Basis of court approval

Focus of dispute

-list of dispute foci

Dispute focus category 1

Dispute focus category 2

……

Decision result

Whether or not to grant divorce

Child foster distribution

-common property allocation

……

It will be appreciated that different document elements may be included in different text blocks, for example, a litigation request element is included in a text block with a directory title of litigant requests, and a child case element is included in one or more text blocks with a directory title of party information, litigant requests, debates by litigants, trial ascertainments, and court opinions. Therefore, the text block including the target element can be selected as the analysis target according to the specific target element, and for example, when the litigation requesting element needs to be extracted, the text block whose directory title is requested by the litigant can be selected as the structured object.

For the sake of distinction and explanation, a text block selected according to a target element and including the target element is defined as a target text block corresponding to the target element.

In some embodiments, a preset corresponding relationship between the catalog title and the document element is established in advance, and under the condition that the target element to be extracted is known, the catalog title corresponding to the target element can be selected according to the preset corresponding relationship, so that the target text block containing the target element is determined.

In this embodiment, the preset corresponding relationship between the catalog title and the document element can be shown in the following table:

it should be noted that the preset correspondence is only an exemplary representation, and does not include elements of all levels and dimensions in the element architecture, and those skilled in the art can further refine and refine the preset correspondence according to the exemplary representation.

And 104, extracting the element value of the target element from the selected target text block.

In order to extract the document elements included in a given target text block from the target text block, an element tree structure matched with the specific text block is created in advance so as to extract different document elements from different text blocks by using different element trees. Each element tree comprises at least one element node, each element node corresponds to at least one extraction rule, and the extraction rules are used for extracting the document elements corresponding to the element nodes from the target blocks.

Based on this, in step 104, an element tree to be used is selected from the target text block and/or the target element, and the element value of the target element corresponding to each element node is extracted from the corresponding target text block using the element node in the element tree.

Illustratively, for child case elements, the pre-created element tree is as follows:

-child condition element

-number of children- - - - - - - - - - - < extraction rule >

-name of child- - - - - - - < extraction rule >

Gender of the child- < - > extraction rule >

Date of birth- - - - - - - - - - - - - - - - - - - - - - < extraction rule >

-age-of-the-extraction rules >

-relationship with the original < extraction rule >

Child-daughter

Parent child

-nurturing women

Married children

And an informed relationship- < extraction rule >

Child-daughter

Parent child

-nurturing women

Married children

Whether lactation is ongoing or not < extraction rules >

Whether it is underage or not

Past following of party-extraction rules >

-original advertisement

-is being reported

Willingness to follow the party-extraction rules >

-original advertisement

-is being reported

Willingness fostering recipe- < extraction rules >

-original advertisement

-is being reported

In some embodiments, since different element nodes are used for extracting different target elements, the extraction rule corresponding to each element node is different, and the extraction rule may be: a positioning rule, a time extraction rule, or a normalized element matching rule.

The positioning rule comprises a front positioning rule and a rear positioning rule based on a regular expression, and the main principle of the positioning rule is to determine the starting position of the target element in the target text block by using the front positioning rule and determine the ending position of the target document element in the target text block by using the rear positioning rule.

In some embodiments, determining the starting position of the target element in the target text block using the prepositioning rule includes: identifying prepositioning information of the target element by using a prepositioning rule; and determining the initial position of the target element in the target text block according to the preposed positioning information. The prepositioning information can be a specific Chinese word or a specific Chinese context, such as a role tag in front of the name of a conference room member, and can also be a Chinese character or a non-Chinese character of a specific position index, such as the prepositioning information taking the Chinese character at the first position in the header block information as the name of the trial court.

In some embodiments, determining the end position of the target element in the target text block by using the post-positioning rule comprises: identifying the post-positioning information of the target element by using a post-positioning rule; and determining the end position of the target element in the target text block according to the post-positioning information. The post-positioning information can be a specific suffix characteristic word, such as that of the trial court name, namely the court or the division, and can also be a non-Chinese character indexed by a specific position, such as a line-feed symbol.

In some embodiments, the extraction rule corresponding to one or more element nodes in the element tree is a time extraction rule, where the time extraction rule is specifically at least one time extraction expression for extracting a time element from a text block such as an audit pass block, where the time element is: the "appellation time", "acceptance time", "plan setting time", "trial time", and "trial period" included in the trial pass block include the living time and the like.

Specifically, the time extraction expression is a regular expression supporting various date structure types, and supports identification of Chinese, Arabic number and full/half angle type numerical information.

In order to improve the accuracy of the extracted time class elements, in some embodiments, the time information in the text block corresponding to the time class elements is subjected to a reference resolution process, and then the element values of the time class elements are extracted from the text block after the reference resolution process.

In some embodiments, the extraction rule corresponding to one or more element nodes in the element tree is a normalized element matching rule. Here, the normalized element may be understood as a document element that must be expressed by a standard word/word in the official document, such as a sex element of a principal or a child that must be expressed by a standard word such as "male" or "female".

During specific implementation, a standard word set is preset according to the target normalized element, and the standard word set comprises at least one standard word. Furthermore, the standard words in the standard word set can be matched with the target text block, and then the document elements can be extracted from the target block according to the matching result.

In some embodiments, the element node corresponding to the target element has at least two sub-nodes, and each sub-node corresponds to a class label of the target element, which is substantially a class label of the extraction result of the element node corresponding to the target element. The extraction results of part of element nodes are classified in a targeted manner, so that element extraction and classification with finer granularity are realized, and the accuracy of element identification and extraction results is improved.

In some embodiments, the target element may be a litigation request element, and the target title of the text block corresponding to the litigation request element may be a litigation party request. With reference to the above exemplary element architecture, the litigation-request elements may specifically include a litigation-request list and whether the story is to approve the divorce, and accordingly, the element tree corresponding to the litigation-request elements at least includes element nodes corresponding to the litigation-request list and element nodes corresponding to the story is to approve the divorce, where the element nodes corresponding to the story may have three child nodes, which are "yes", "no" and "none", respectively, and are category tags of extraction results of the nodes reporting whether the story is to approve the divorce. In the above embodiment, first, corresponding element nodes are used to extract litigation request information texts from corresponding target text blocks, and then the extracted litigation request information texts are segmented to obtain one or more independent litigation request items, where the one or more independent litigation request items are element values of a litigation request list. When cutting, firstly, judging whether the litigation request information text has item symbols such as '1', '2', and the like, if so, identifying the existing item symbols, separating the independent litigation request items from the litigation request information text by using a character string segmentation method, and if not, directly cutting the litigation request information text according to the separation symbols such as periods, semicolons, and the like.

For example, the litigation request information text extracted from the aforementioned liu and luo dispute referee documents and the independent litigation request items separated from the litigation request information text are as follows:

litigation request information text:

original report certain 1 makes litigation request to the institute: 1. is advised to pay me 309678 dollars; 2. litigation fees are borne by the defendant.

Independent litigation request items:

is advised to pay me 309678 meta @

The litigation fee is borne by the notice

In the above example, the independent litigation request item is the result corresponding to the litigation request list node.

In addition, it is also necessary to extract an expression text which is reported whether to agree with divorce from a corresponding target text block by using a corresponding element node, classify the extracted expression text, use a first predetermined value as an element value of a child node corresponding to a category tag hit by the expression text, and use a second predetermined value as an element value of a child node corresponding to a category tag missed by the expression text, for example, if the expression text hits "yes", then assign "True" by using "True", and assign "no" and "none" by using "False".

In some embodiments, the target element may be a case fact element, the case fact element further includes a child condition element and other fact elements, wherein the child condition element further includes a child basic condition element and a child other condition element, since the child other condition element belongs to an element needing to be classified, at least two child nodes are set under an element node corresponding to the child other condition element, each child node corresponds to a category label of the child other condition element, for example, the child other condition element includes category labels of a child and an original/informed relationship, whether it is underage/lactation, willing to follow a party, and the like, wherein the child nodes under the element node corresponding to the original/informed relationship respectively correspond to a child, a parent child, a nurses and a married child, and whether it is underage/lactation corresponds to a child node under the element node corresponding to the underage/lactation period respectively, And if not, the child nodes under the element nodes corresponding to the willing following the principal correspond to the original report and the reported report respectively. The other fact elements include common property, marital type, presence or absence of participation, residence time and the like, it should be noted that the other fact elements also include document elements to be classified, for such other fact elements, at least two sub-nodes respectively corresponding to the category labels are also arranged below the corresponding element node, and whether two sub-nodes are arranged below the element node corresponding to the presence or absence of the participation.

In some embodiments, child condition elements are extracted from the corresponding target text block according to the following steps:

firstly, identifying a sentence where child information is located from a text block corresponding to the child condition element;

secondly, analyzing the sentence where each piece of child information is located item by item, and extracting child condition elements from the sentence where the piece of child information is located by using corresponding element nodes; specifically, the child basic situation element is extracted using the element node corresponding to the child basic situation element, and the child other situation is extracted using the element node corresponding to the child other situation element.

Then, for the child basic situation elements, the data object generation and storage are directly performed by taking the extraction result of the corresponding element node as an element value, such as the name, sex, age and the like of the child. For other elements of children, the extraction result of the corresponding element node needs to be classified to obtain the hit category label, and the first predetermined value is used as the element value of the child node corresponding to the hit category label, and the second predetermined value is used as the element value of the child node corresponding to the missed category label. For example, the extracted result corresponding to the age of a child is classified to determine whether the extracted result belongs to the immature period or the lactation period, the expression text extracted from the element node corresponding to the original relationship is classified to determine whether the extracted result is a married child or a nurseries child or a relay child or a parent child, and the extracted result is defaulted to be a married child if the extracted result is not the married child.

From the above embodiment, it can be seen that through further classification of other condition elements of children, multi-dimensional and multi-level extraction of the condition elements of children is achieved, for example, the age of the children and whether the minor is an element with different levels and dimensions, and whether the minor is a result obtained by classifying the age of the children. Thus, the relevant personnel can know the age of the child and can directly know whether the child is underage.

For other case factual elements, element nodes in the element tree are used to extract element results according to the element tree corresponding to the specific target element.

Illustratively, for other case fact elements, the pre-created element tree structure is as follows:

other case facts

Type of marriage-type-extraction rule

-factual marriage

Legitimate marriage

Whether or not to intervene- - - - - - - - - - - - - < extraction rule >

-is

-no

……

-common property

-house information

……

-common claims

Common debt

Whether to initiate complaints several times

-is

-no

In some embodiments, the target elements further include evidence class elements corresponding to text blocks titled evidence directory (in other directory structure compositions, text blocks corresponding to evidence class elements may be text blocks titled proof, and court approval results).

In some embodiments, the evidence class elements include proof elements including provenance proof elements and defendant proof elements, and forensic elements including provenance proof elements and defendant proof elements.

The evidence element comprises an evidence number, an evidence list, an evidence fact and an evidence detail rule, the evidence detail rule comprises at least one evidence item, and each evidence item comprises an evidence list element arranged according to a bit sequence and an evidence fact element adjacent to the evidence list element. Specifically, proof-proof numbers, proof lists and proof facts are respectively extracted from corresponding target text blocks through corresponding element nodes, then according to the proof-proof numbers, the proof lists and position indexes of the proof facts in the text blocks, the 'proof facts' are searched backwards from the 'proof lists', and until the next 'proof list', the proof list elements arranged in the order of bits and the proof fact elements adjacent to the positions of the proof list elements are respectively used as a proof-proof item.

The quality certification elements comprise quality certification evidence numbers, quality certification results, quality certification opinions and quality certification rules, the quality certification rules comprise at least one quality certification item, and each quality certification item comprises an evidence number element, a quality certification result element and a quality certification opinion element, wherein the evidence number elements, the quality certification result elements and the quality certification opinion elements are arranged in a bit sequence; specifically, based on the quality certificate result, the evidence number is searched forward, the quality certificate opinions are searched backward until the previous quality certificate result, and the evidence number elements arranged in order, the quality certificate result elements adjacent to the evidence number elements and the quality certificate opinion elements adjacent to the quality certificate result elements are used as a quality certificate item until the next quality certificate result.

The court certification elements comprise evidence numbers, evidence providers, court certification results and court certification rules, the court certification rules comprise at least one certification basis item, and each certification basis item comprises the evidence provider elements arranged in a bit sequence, evidence number elements adjacent to the evidence provider elements and court certification result elements adjacent to the evidence number elements. Specifically, based on the 'court approval result', the contents of the 'evidence provider' and the 'evidence number' are searched forward, and until the previous 'court approval result', the evidence provider element, the evidence number element adjacent to the position of the evidence provider element, and the court approval result element adjacent to the position of the evidence number element, which are arranged in a bit order, are used as an authentication basis item.

In some embodiments, the target element may be a dispute focus element. With reference to the exemplary element architecture described above, the dispute focus element may further comprise a list of dispute focus items and at least one dispute focus category, wherein the list of dispute focus items comprises one or more independent dispute focus items, and wherein each independent dispute focus item, after being sorted, may hit one or more dispute focus category tags. Correspondingly, the element tree corresponding to the dispute focus element at least comprises an element node corresponding to the dispute focus list and an element node corresponding to the dispute focus category, wherein one or more child nodes corresponding to the dispute focus category label are arranged under the element node corresponding to the dispute focus category.

In some embodiments, the dispute focus information text is first identified from the corresponding target text block by using the element nodes specified in the corresponding element tree, and the dispute focus information text is segmented to obtain a dispute focus list including one or more independent dispute focus items. And then analyzing each independent dispute focus item one by one to obtain a hit dispute focus category label, taking a first preset value as an element value of a child node corresponding to the hit dispute focus category label, and taking a second preset value as an element value of a child node corresponding to the missed dispute focus category label.

In some embodiments, text data of the independent dispute focus items of the known class is used as a training sample, a certain amount of the training sample is used for training to obtain a classification model based on a neural network model, and the trained classification model is used for classifying the independent dispute focus items of the unknown class.

Illustratively, the dispute focus information text extracted from the divorce dispute referee documents of the luga and the liu, and the independent dispute focus items separated from the dispute focus information text are as follows:

dispute focus information text:

lu Chi advocates dividing the concerned houses according to shares. Lu Di should have ownership for two thirds of the way that Lu Di is considered as a family name when applying for a house and considering the daughter of both parties. Liu Gong believes that although the house related to the case is applied in the name of a family, the qualification that the house is the identity of a transfer soldier to obtain the shaking number is mainly considered. Liu somebody advocates that Lu somebody paid the loan and duty after the Lu certain repayment marriage and requires Lu somebody to pay half of the decoration cost.

Independent dispute focus items:

some claim of/Lu divides into the case house according to the share

The Luo should have two thirds of ownership based on the fact that the Luo applies on the name of the family when the Luo applies for the house in the case of the project and considers the daughter of the two parties

Liu et al thought that although the related housing system applies in the name of a family, the qualification that the house is the identity of a transfer military person to be signed is mainly considered

Liu claims loan and duty paid by Liu after Lu paying marriage and requires Lu paying half of decoration fee

In an exemplary divorce factor architecture, the partial dispute focus categories are as follows:

divorce dispute focus category

Whether the emotion is disrupted

-whether there is a family violence

By whom the child is fostered

-property segmentation

Debt split

-credit split

How common property is handled

How common debts are handled

How common claims are handled

……

House repayment

Housing discount money

How houses are handled

Illustratively, the above-mentioned individual dispute focus item "/some claim division by shares related to the case/" hit dispute focus category label includes at least "how the house is handled", the/Liu some claims another loan and tax paid by Liu some after marriage, and the claim that Lu some pays half of the decoration fee/"hit dispute focus category label includes at least" house loan ". And the first predetermined value is respectively stored as the element values of the child nodes corresponding to the 'how house is processed' and the 'house loan'.

As can be seen from the above embodiments, the dispute focus elements include not only the dispute focus list composed of the independent dispute focus items, but also one or more category labels of each independent dispute focus item, and the extracted dispute focus elements have a multi-level and multi-dimensional characteristic, for example, the dispute focus list and the dispute focus category belong to document elements of different levels, and document elements of different dimensions between the dispute focus categories belong to document elements of different dimensions.

In some embodiments, the target elements further include a decision result element, the decision result element including whether divorce is granted, a child foster allocation element, common property allocation, and the like, wherein whether divorce is granted further includes two categories of "yes" and "no", the child foster allocation element further includes foster, year of burden, proportion of burden, amount of money, payment method, and the like, and the common property allocation further includes home affiliation, vehicle affiliation, and the like.

And during specific implementation, analyzing the text information of the judgment result extracted by the corresponding directory node to obtain the classification result of whether divorce is granted or not. And if the classification result of whether divorce is granted is not obtained, judging whether the child nodes with the extraction result exist under the element nodes corresponding to child foster allocation, and if so, judging that divorce is granted.

Illustratively, the element nodes for extracting the child fostering allocation elements are composed as follows:

-child fostering allocation elements

Nourishing formula

Year of charge

-amount of charge

Burden ratio

-payment means

Payment means 1

-payment means 2

Whether or not to grant divorce

-is

-no

And screening minor information from the minor condition element result, and performing character string matching on the minor information (such as minor names) and the judgment result information text to obtain the position indexes of all minor information in the judgment result text information.

And distributing the fostering elements for each child, and extracting corresponding information by using the extraction rules corresponding to the corresponding element nodes. Judging whether the foster party node has an extraction result, if so, indicating that the foster party information exists in the judgment result, acquiring the extraction result of the foster party node, and extracting a role label or a name of the party from the extraction result, such as an original notice or a defended notice, as an element value of the foster party.

In some embodiments, the method of the present application further comprises: and generating a data object containing the element node name and/or the sub-node name corresponding to the target element as a field name, and assigning the data object by using the element node name and/or the sub-node corresponding to the element node name and/or the sub-node.

And for nodes taking the first preset value or the second preset value as an element value, such as child nodes under the category of a dispute focus, child nodes under elements of other situations of children or child nodes under a judgment result element, assigning values to the data objects containing the names of the corresponding child nodes by using the first preset value or the second preset value.

In this embodiment, the assigned data objects are structured results of dispute official documents, and each data object includes one or more information pairs, i.e., keys (field names): value (field attribute), such as "child gender: female "," minor "," whether or not: yes, will foster the prescription: original notice "" house attribution is notified: yes, etc.

In some embodiments, the field value in the data object corresponding to the field name may be a null value.

As can be seen from the above embodiments, the present application provides a divorce officer document element extraction method, the method includes firstly cutting a divorce officer document into a plurality of text blocks according to a preset referee document directory structure, where each text block corresponds to a directory title in the referee document directory structure; and selecting a target text block corresponding to the target element to be extracted from the plurality of text blocks according to the directory title corresponding to the text block, and finally extracting element values of different target elements aiming at different target text blocks. By the method, complete document elements can be extracted from the divorce officer documents, and relevant personnel can know the divorce officer documents conveniently through the document elements.

According to the method for extracting the divorce officials 'document elements provided by the above embodiment, the present application also provides a divorce officials' document element extraction device, as shown in fig. 2, the device includes:

the text acquisition module 201 is used for acquiring divorce judgment documents; the text segmentation module 202 is configured to segment the divorce officer document into a plurality of text blocks according to a preset officer document directory structure, where the officer document directory structure includes a plurality of directory titles, and each text block corresponds to one directory title; a selecting module 203, configured to select, according to the directory title corresponding to the text block, a target text block corresponding to a target element to be extracted from the text blocks; and an element extracting module 204, configured to extract an element value of the target element from the selected target text block.

In some embodiments, the selection module 203 is specifically configured to: acquiring a preset corresponding relation between a catalog title and a document element; selecting a directory title corresponding to the target element according to the preset corresponding relation; and determining a target text block corresponding to the target element according to the directory title corresponding to the target element.

In some embodiments, the element extraction module 204 is specifically configured to: acquiring an element tree corresponding to the target elements, wherein the element tree comprises element nodes corresponding to each target element and extraction rules of the element nodes; and extracting the element value of the target element corresponding to the element node from the corresponding target text block by using the element node in the element tree.

In some embodiments, the target element corresponds to an element node having at least two child nodes, one child node corresponding to a class label of the target element, the apparatus further comprising a classification module for: classifying the element values of the target elements to obtain class labels hit by the element values; taking the first preset value as the element value of the child node corresponding to the hit category label; and taking the second preset value as the element value of the child node corresponding to the missed category label.

In some embodiments, the apparatus further comprises a data object generating module, configured to generate a data object with a field name of an element node name and/or a child node name corresponding to the target element; and assigning the data object by using the element node name and/or the element value corresponding to the child node name.

In specific implementation, the present invention further provides a computer storage medium, wherein the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments of the method provided by the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).

Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The same and similar parts in the various embodiments in this specification may be referred to each other. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is simple, and the relevant points can be referred to the description in the method embodiment.

The above-described embodiments of the present invention should not be construed as limiting the scope of the present invention.

19页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种识别敏感文本的方法及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!