Multi-mode-fused resume layout analysis method and device

文档序号:1922109 发布日期:2021-12-03 浏览:18次 中文

阅读说明:本技术 一种融合多模态的简历版面分析方法及装置 (Multi-mode-fused resume layout analysis method and device ) 是由 于兴文 于 2021-08-17 设计创作,主要内容包括:本发明公开了一种融合多模态的简历版面分析方法及装置,所述方法包括:以简历图片为数据源,对简历图片里的文本行区域进行抽取获得文本位置信息,对文本区域中文本内容进行识别获得文本自然语言信息,输入文本自然语言信息生成文本内容编码,输入文本位置信息生成文本的位置编码,通过注意力机制分别计算文本与文本间注意力矩阵以及文本与相对位置间注意力矩阵,根据注意力矩阵生成结果矩阵,获得结构化简历。本发明以相对位置注意力编码为核心设计融合自然语言信息与位置信息的深度学习模型,综合考虑自然语言信息与其对应的位置信息,具有简历全文的感受野,可以有效使用到上下文信息,算法精度得到有效保障。(The invention discloses a method and a device for analyzing a resume layout by fusing multiple modes, wherein the method comprises the following steps: the method comprises the steps of taking a resume picture as a data source, extracting a text line region in the resume picture to obtain text position information, identifying text contents in the text region to obtain text natural language information, inputting the text natural language information to generate text content codes, inputting the text position information to generate the text position codes, respectively calculating an attention moment matrix between texts and text and an attention matrix between the texts and relative positions through an attention mechanism, and generating a result matrix according to the attention moment matrix to obtain the structured resume. The invention designs a deep learning model fusing natural language information and position information by taking relative position attention coding as a core, comprehensively considers the natural language information and the corresponding position information, has the reception field of a resume full text, can effectively use context information, and effectively ensures the algorithm precision.)

1. A multi-mode fused resume layout analysis method is characterized by comprising the following steps:

(1) taking the resume picture as a data source, extracting a text line region in the resume picture by using a craft algorithm, generating a text box, and acquiring text position information of the resume picture;

(2) identifying text contents in the resume picture by using a crnn algorithm by taking the resume picture as a data source to obtain text natural language information;

(3) the text natural language information is used as input, and the albert language model is used for coding the natural language information to obtain text content codes;

(4) using the text position information as input, and generating a position code of the text by using a sine position code;

(5) and fusing text content coding and text position coding by using an attention mechanism, setting the type of resume content, classifying each text box, judging the line relation of the text boxes, summarizing the fragmentary text boxes which should belong to one line into one line, decoding the text line relation result and the type of the text boxes by using the attention mechanism, outputting the type to which the text lines belong, and obtaining the structured resume.

2. The method for analyzing the layout of the resume with multi-modal fusion as recited in claim 1, wherein the text box is a right circumscribed rectangle corresponding to the outline of the text generated by a craft algorithm according to the text position information and the information of the degree of association between the text.

3. The method for analyzing resume layout by fusing multimodal according to claim 1, wherein the step (4) is specifically to construct a position coding dictionary by using sinusoidal position coding, convert 0 to p, and p positions in total into vectors of i dimension, generate absolute position codes of text boxes, and calculate relative positions between the text boxes by using a softsign function, and generate relative position codes between the text boxes.

4. The method of claim 3, wherein the softsign function is:

where x is the difference distance of the two text boxes.

5. The method for parsing resume layout according to claim 3, wherein p is a positive integer, and i is 312 according to the preset amount of information in the resume layout.

6. The method for analyzing resume layout by fusing multimodal according to claim 1, wherein the step (5) is specifically as follows:

(5.1) text content coding and text absolute position information are fused, and the method comprises the following steps: arr1Txt + abs, where txt is a text content coding matrix and abs is a text absolute position information matrix, arr1Is a process matrix;

(5.2) further fusing the relative position information matrix, wherein the method comprises the following steps: arr2=arr1*posTWhere pos is the relative position information matrix between texts, arr2Is an attention matrix;

(5.3) Note that the Torque matrix is generated out by Linear transformation1And out2,out1For determining the category, out, to which the current text box belongs2Used for judging whether the text boxes belong to the same line or not;

(5.4) arranging the text boxes in a line from left to right, outputting the category to which the line belongs according to the mode of the category to which the text boxes belong, and outputting the structured resume.

7. The method for profiling a resume fusing multimodal according to claim 1, wherein the categories of the resume content include text categories related to the resume, such as basic information, work experience, education experience, training experience, project experience, etc.

8. A converged multi-modal resume layout analysis apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the computer program when loaded into the processor implements the converged multi-modal resume layout analysis method of any one of claims 1 to 7.

Technical Field

The invention relates to the technical field of natural language processing, in particular to a method and a device for analyzing a resume layout by fusing multiple modes.

Background

The layout analysis of the resume refers to structuring the content of the resume by some means, for example, a conventional resume generally comprises a plurality of layout contents such as basic information, job seeking intention, education experience, work experience, project experience and the like, and the text content in a resume picture or pdf document needs to be automatically filled into the corresponding layout, so as to provide a basis for the management and matching of human resources.

The current industry realizes that this technique mainly has two kinds of schemes, one kind uses natural language data as the basis, designs a large amount of complicated text analysis rules, finally realizes the purpose of extracting corresponding layout information, and this kind of scheme needs to design a large amount of rules according to the resume content manual work of difference, and is comparatively laborious and troublesome. Meanwhile, the resume forms are various, each job seeker can have a resume template, and it is difficult to think that a rule covering the full amount of resumes is designed. Another method is to analyze the resume content using a natural language analysis algorithm, and generally to classify the natural language information in the resume using a natural language deep classification algorithm. Compared with the method, the method is more intelligent, does not need to manually design a large number of rules, and can automatically train and model according to the structure of the resume. It also has some drawbacks. Firstly, the method strongly depends on the accuracy of natural language information, most resumes are stored in the form of word and pdf files, and the accurate extraction of the text content of the resumes needs to depend on a mature text analysis tool, otherwise, the extracted texts are often disordered, repeated and lacking, which undoubtedly increases the project development cost. Secondly, the algorithm only depends on natural language information, but a section of text such as 'I work very seriously in XX', can be classified as self-evaluation, and can also be classified as work experience or project experience, and the position information and the context information of the text in the resume are required to be further introduced for accurately classifying the text in detail.

Disclosure of Invention

The purpose of the invention is as follows: the invention mainly solves the problem that the quality of resume natural language data is not high; the data form of the algorithm model is single, the model fitting difficulty is high, and the accuracy is difficult to increase.

The technical scheme is as follows: a resume layout analysis method fusing multiple modes comprises the following steps:

(1) taking the resume picture as a data source, extracting a text line region in the resume picture by using a craft algorithm, generating a text box, and acquiring text position information of the resume picture;

(2) identifying text contents in the resume picture by using a crnn algorithm by taking the resume picture as a data source to obtain text natural language information;

(3) the text natural language information is used as input, and the albert language model is used for coding the natural language information to obtain text content codes;

(4) using the text position information as input, and generating a position code of the text by using a sine position code;

(5) and fusing text content coding and text position coding by using an attention mechanism, setting the type of resume content, classifying each text box, judging the line relation of the text boxes, summarizing the fragmentary text boxes which should belong to one line into one line, decoding the text line relation result and the type of the text boxes by using the attention mechanism, outputting the type to which the text lines belong, and obtaining the structured resume.

And the text box is a positive external rectangle corresponding to the outline of the character generated by a craft algorithm according to the character position information and the association degree information between the characters.

Specifically, the step (4) is to construct a position coding dictionary by using sinusoidal position coding, convert all p positions from 0 to p into i-dimensional vectors, generate absolute position codes of the text boxes, calculate relative positions between the text boxes by using a softsign function, and generate relative position codes between the text boxes.

The softsign function is as follows:

where x is the difference distance of the two text boxes.

And p is a positive integer and is preset according to the content information quantity of the resume layout, and i is 312.

The step (5) is specifically as follows:

(5.1) text content coding and text absolute position information are fused, and the method comprises the following steps: arr1Txt + abs, where txt is a text content coding matrix and abs is a text absolute position information matrix, arr1Is a process matrix;

(5.2) further fusion of phasesFor the position information matrix, the method comprises the following steps: arr2=arr1*posTWhere pos is the relative position information matrix between texts, arr2Is an attention matrix;

(5.3) Note that the Torque matrix is generated out by Linear transformation1And out2,out1For determining the category, out, to which the current text box belongs2Used for judging whether the text boxes belong to the same line or not;

(5.4) arranging the text boxes in a line from left to right, outputting the category to which the line belongs according to the mode of the category to which the text boxes belong, and outputting the structured resume.

The types of resume content comprise basic information, work experience, education experience, training experience, project experience and other text types related to the resume.

A converged multi-modal resume layout analysis apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the computer program when loaded into the processor implements the converged multi-modal resume layout analysis method of any one of claims 1 to 7.

Has the advantages that: compared with the prior art, the invention has the following remarkable advantages:

the algorithm model has advantages. The model input is the content of the text box and the position of the text box, and the output is whether the text boxes belong to the same line and the category of each text box. The model integrates a plurality of inputs, is a multi-task learning model, has the characteristic of multi-mode integration, and improves the working efficiency of the model.

The algorithm steps are advantageous. In a common resume processing project, txt extraction is performed on a resume pdf and a word, and the resume pdf and the word are processed only by aiming at natural language, so that the source of a resume text is not explained, the situations of disorder and code of characters are easy to occur, and a large amount of work is required to process the abnormal characters. Meanwhile, the detail position information of the text in the resume is discarded in the txt extraction process of the resume pdf and the word, and the information is also useful. The processing flow of adopting ocr to obtain the text and the position thereof can greatly save the project development cost and reduce the complexity of the development. The text detection and text recognition technology in ocr is introduced as a means for acquiring high-precision natural language information, a deep network model fusing natural language information and position information is constructed, albert natural language coding, sine relative position coding and attention mechanism are contained in the model, and fusion of multi-mode information is realized.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The technical scheme of the invention is further explained by combining the attached drawings.

Example 1:

as shown in fig. 1, a resume layout analysis method fusing multiple modalities includes the following steps:

(1) taking the resume picture as a data source, extracting a text line region in the resume picture by using a craft algorithm, generating a text box, and acquiring text position information of the resume picture;

(2) identifying text contents in the resume picture by using a crnn algorithm by taking the resume picture as a data source to obtain text natural language information;

(3) the text natural language information is used as input, and the albert language model is used for coding the natural language information to obtain text content codes;

(4) using the text position information as input, and generating a position code of the text by using a sine position code;

(5) and fusing text content coding and text position coding by using an attention mechanism, setting the type of resume content, classifying each text box, judging the line relation of the text boxes, summarizing the fragmentary text boxes which should belong to one line into one line, decoding the text line relation result and the type of the text boxes by using the attention mechanism, outputting the type to which the text lines belong, and obtaining the structured resume.

And the text box is a positive external rectangle corresponding to the outline of the character generated by a craft algorithm according to the character position information and the association degree information between the characters.

Specifically, the step (4) is to construct a position coding dictionary by using sinusoidal position coding, convert all p positions from 0 to p into i-dimensional vectors, generate absolute position codes of the text boxes, calculate relative positions between the text boxes by using a softsign function, and generate relative position codes between the text boxes.

The softsign function is as follows:

where x is the difference distance of the two text boxes.

And p is a positive integer and is preset according to the content information quantity of the resume layout, and i is 312.

For convenience of understanding, the position encoding process of generating the text in step (4) is now illustrated:

assuming that the position of a first text box A is 1, the position of a second text box B is 3, and p is 1000;

a position coding dictionary is constructed by using sinusoidal position coding, 1000 positions from 0 to 1000 are changed into 312-dimensional vectors, 1 is changed into a 312-dimensional vector, 3 is also changed into a 312-dimensional vector, and 1000 vectors with 312 dimensions are obtained in total;

inquiring the absolute position of a text box, wherein the position of the text box A is the 1 st vector of the absolute position coding dictionary corresponding to 1, and the position of the text box B is the 3 rd vector of the absolute position coding dictionary corresponding to 1;

inquiring the relative position between the textboxes, wherein the difference between A and B is 2, carrying out softsign function processing on the relative distance, because the limit of the softsign function is 1000, any distance can be compressed to within 1000, after the relative position is processed through the softsign function, 2 is changed into 2.02, and the downward rounding is changed into 2, and the numerical value is subjected to table lookup to obtain the relative position code between the textboxes.

The step (5) is specifically as follows:

(5.1) text content coding and text absolute position information are fused, and the method comprises the following steps: arr1Txt + abs, where txt is a text content coding matrix, abs is a text absolute position information matrix,arr1is a process matrix;

(5.2) further fusing the relative position information matrix, wherein the method comprises the following steps: arr2=arr1*posTWhere pos is the relative position information matrix between texts, arr2Is an attention matrix;

(5.3) Note that the Torque matrix is generated out by Linear transformation1And out2,out1For determining the category, out, to which the current text box belongs2Used for judging whether the text boxes belong to the same line or not;

(5.4) arranging the text boxes in a line from left to right, outputting the category to which the line belongs according to the mode of the category to which the text boxes belong, and outputting the structured resume.

The types of resume content comprise basic information, work experience, education experience, training experience, project experience and other text types related to the resume.

Example 2:

a converged multi-modal resume layout analysis apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the computer program when loaded into the processor implements the converged multi-modal resume layout analysis method of any one of claims 1 to 7.

8页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:字母向量计算方法、系统、存储介质及电子设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!