Test paper content extraction method, test paper matching method, device, equipment and medium

文档序号：1613974 发布日期：2020-01-10 浏览：18次中文

阅读说明：本技术 试卷内容提取方法、试卷匹配方法、装置、设备以及介质 (Test paper content extraction method, test paper matching method, device, equipment and medium ) 是由朱达华徐宋传陈晓宇于 2019-09-17 设计创作，主要内容包括：本发明涉及计算机技术的技术领域,尤其是涉及试卷内容提取方法、试卷匹配方法、装置、设备以及介质,其试卷内容提取方法包括：S10：若获取到文档试题,则从所述文档试题中获取试题文档；S20：从所述试题文档中获取文档内容文件,其中,所述文档内容文件为xml格式的文件；S30：遍历所述文档内容文件,从所述文档内容文件中获取文档段落数据；S40：获取每一所述文档段落数据中的文本内容,将每一所述文档段落数据中的文本内容组成对应的段落对象；S50：将所述段落对象添加至集合plist中,并将所述集合plist作为试卷内容集。本发明具有快速获取试题文档内容,且能够从试题文档内容中获取具体试题的效果。(The invention relates to the technical field of computer technology, in particular to a test paper content extraction method, a test paper matching method, a device, equipment and a medium, wherein the test paper content extraction method comprises the following steps: s10: if the document test questions are obtained, obtaining test question documents from the document test questions; s20: acquiring a document content file from the test question document, wherein the document content file is a file in an xml format; s30: traversing the document content file, and acquiring document paragraph data from the document content file; s40: acquiring text content in each document paragraph data, and forming the text content in each document paragraph data into a corresponding paragraph object; s50: adding the paragraph object to a set plist, and taking the set plist as a test paper content set. The method and the device have the effects of quickly acquiring the test question document content and acquiring the specific test questions from the test question document content.)

1. A test paper content extraction method is characterized by comprising the following steps:

s10: if the document test questions are obtained, obtaining test question documents from the document test questions;

s20: acquiring a document content file from the test question document, wherein the document content file is a file in an xml format;

s30: traversing the document content file, and acquiring document paragraph data from the document content file;

s40: acquiring text content in each document paragraph data, and forming the text content in each document paragraph data into a corresponding paragraph object;

s50: adding the paragraph object to a set plist, and taking the set plist as a test paper content set.

2. The test paper content extraction method according to claim 1, wherein the step S20 includes:

s21: obtaining a document format from the test question document;

s22: judging the compatibility of the document format, and if the document format is judged to be incompatible, converting the document format into a compatible format;

s23: and obtaining a document content file from the test question document in the compatible format.

3. A test paper matching method is characterized by comprising the following steps:

s60: acquiring a preset matching rule, and traversing a test paper content set according to the matching rule to obtain large-topic paragraph data, wherein the test paper content set is acquired by adopting the test paper content extraction method of any one of claims 1-2;

s70: acquiring corresponding question type description information from the big question paragraph data, and acquiring corresponding small question paragraph data according to the question type description information;

s80: composing the big-question paragraph data and the small-question paragraph data into a replacement file;

s90: and replacing the document content files in the test paper content set with the replacement files to obtain test question files.

4. The paper matching method of claim 1, wherein after step S60 and before step S70, the paper matching method further comprises:

s61: if the large-topic paragraph data is the first matched large-topic paragraph data and the first matched large-topic paragraph data is not the first paragraph object of the test paper content set, acquiring a title matching rule from the matching rule;

s62: acquiring the object serial number of the large-topic paragraph data obtained by the first matching, and acquiring the corresponding paragraph object smaller than the object serial number from the test paper content set;

s63: and matching the corresponding paragraph objects smaller than the object sequence number by using the title matching rule, and if the matching is successful, taking the matching result as the test paper title.

5. The test paper matching method of claim 1, wherein the step S70 includes:

s71: acquiring a sub-question matching rule corresponding to each major-question paragraph data from the matching rules according to the question type description information;

s72: traversing the test paper content set according to the sub-question matching rule to obtain a sub-question list;

s73: traversing all the subtopic objects in the subtopic list, and setting the subtopic objects as corresponding subtopic objects to obtain the subtopic paragraph data.

6. A test paper content extraction device characterized by comprising:

the test question acquisition module is used for acquiring a test question document from the document test questions if the document test questions are acquired;

the content acquisition module is used for acquiring a document content file from the test question document, wherein the document content file is a file in an xml format;

the paragraph traversing module is used for traversing the document content file and acquiring document paragraph data from the document content file;

the object acquisition module is used for acquiring text contents in each document paragraph data and forming the text contents in each document paragraph data into a corresponding paragraph object;

and the object adding module is used for adding the paragraph object into a set plist and taking the set plist as a test paper content set.

7. A test paper matching apparatus, characterized in that the test paper matching apparatus comprises:

the large-topic paragraph traversal module is used for acquiring a preset matching rule, and traversing a test paper content set according to the matching rule to acquire large-topic paragraph data, wherein the test paper content set is acquired by adopting the test paper content extraction method of any one of claims 1-2;

the small topic paragraph acquisition module is used for acquiring corresponding topic type description information from the large topic paragraph data and acquiring corresponding small topic paragraph data according to the topic type description information;

the replacing file obtaining module is used for forming a replacing file by the big question paragraph data and the small question paragraph data;

and the replacing module is used for replacing the document content file in the test paper content set with the replacing file to obtain the test question file.

8. The test paper matching apparatus of claim 7, wherein the test paper matching apparatus further comprises: the matching sub-module is used for acquiring a title matching rule from the matching rule if the large-topic paragraph data is the first matched large-topic paragraph data and the first matched large-topic paragraph data is not the first paragraph object of the test paper content set;

the object acquisition sub-module is used for acquiring the object serial number of the large-topic paragraph data obtained by the first matching and acquiring the corresponding paragraph object smaller than the object serial number from the test paper content set;

and the title matching sub-module is used for matching the corresponding paragraph objects smaller than the object serial number by using the title matching rule, and if the matching is successful, the matching result is used as the test paper title.

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the test paper content extraction method according to any one of claims 1 to 2 when executing the computer program or implements the steps of the test paper matching method according to any one of claims 3 to 5 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the test paper content extraction method according to any one of claims 1 to 2; alternatively, the computer program realizes the steps of the test paper matching method according to any one of claims 3 to 5 when executed by a processor.

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method for extracting test paper content, a method for matching test paper, an apparatus, a device, and a medium.

Background

At present, in schools, especially for students needing to participate in college entrance examination in high schools, in order to enable the students to contact more questions and improve the learning scores of the students, the students can participate in a large number of examinations or do a large number of examination questions.

When a question teacher gives a question, the corresponding question needs to be selected from a large number of question banks or test papers, and then a new test paper or exercise paper is formed. When an item library is created, usually, the items in the test questions are split from the existing test questions, and then the split items are used as the item library. However, when the test paper of the existing document is split, the test paper needs to be marked manually, so that the manual participation is large, the trouble is troublesome, and the error is easy to occur, so that the improvement space is provided.

Disclosure of Invention

The invention aims to provide a test paper content extraction method for rapidly acquiring test question document content.

The above object of the present invention is achieved by the following technical solutions:

a test paper content extraction method comprises the following steps:

s10: if the document test questions are obtained, obtaining test question documents from the document test questions;

s20: acquiring a document content file from the test question document, wherein the document content file is a file in an xml format;

s30: traversing the document content file, and acquiring document paragraph data from the document content file;

s40: acquiring text content in each document paragraph data, and forming the text content in each document paragraph data into a corresponding paragraph object;

s50: adding the paragraph object to a set plist, and taking the set plist as a test paper content set.

By adopting the technical scheme, the document content file in the xml format is obtained from the test question document in the document test questions, so that the label of the corresponding document paragraph is convenient to read from the test question document, the corresponding paragraph can be further obtained from the document paragraph label, and the document paragraph data in each paragraph is favorable for reading the paragraph object in the document paragraph data from the document paragraph data; meanwhile, the paragraph exclusive is added to the collection plist, so that the preset rule for acquiring the specific test questions in the test question document can be stored in the file form of the plist, the specific test questions can be automatically acquired in the test question document, and the identification and the splitting of the test question document can be facilitated.

The invention is further configured to: step S20 includes:

s21: obtaining a document format from the test question document;

s22: judging the compatibility of the document format, and if the document format is judged to be incompatible, converting the document format into a compatible format;

s23: and obtaining a document content file from the test question document in the compatible format.

By adopting the technical scheme, the compatibility of the document is judged, and the format of the test question document in the incompatible format is converted, so that the accuracy in obtaining the document content file can be ensured, and the subsequent extraction and splitting of the test questions are facilitated.

The second purpose of the invention is to provide a test paper content extraction method capable of acquiring specific test questions from the test question document content.

A test paper matching method comprises the following steps:

s80: composing the big-question paragraph data and the small-question paragraph data into a replacement file;

s90: and replacing the document content files in the test paper content set with the replacement files to obtain test question files.

By adopting the technical scheme, the matching rule is preset, and the large-subject paragraph data is traversed from the test paper content set obtained by the test paper content extraction method according to the matching rule, so that the small subjects in each large subject in the test paper and the content corresponding to each small subject can be further matched by matching the large-subject paragraph data; the specific test question content in the test paper content set is identified through matching, the formats of questions in the document test paper can be split and stored respectively, and therefore the splitting of manual intervention test paper can be reduced, errors can be reduced when the test paper is split, a teacher can be helped to establish a question bank of the test questions, and the teacher can conveniently output the test paper.

The invention is further configured to: after step S60, before step S70, the test paper matching method further includes:

By adopting the technical scheme, after the large-topic paragraph data obtained by matching is judged to be the first large-topic paragraph data obtained by matching, the test paper title of the document test paper can be matched by matching the paragraph object with the object sequence number smaller than the large-topic paragraph data.

The invention is further configured to: step S70 includes:

s71: acquiring a sub-question matching rule corresponding to each major-question paragraph data from the matching rules according to the question type description information;

s72: traversing the test paper content set according to the sub-question matching rule to obtain a sub-question list;

s73: traversing all the subtopic objects in the subtopic list, and setting the subtopic objects as corresponding subtopic objects to obtain the subtopic paragraph data.

By adopting the technical scheme, the corresponding topic list can be matched from each topic paragraph data according to different topic type description information by using the topic matching rule, and the topic objects traversed from the topic list are associated with the corresponding topic objects, so that the specific topic contents in each topic can be obtained.

The third object of the invention is realized by the following technical scheme:

a test paper content extraction device, the test paper content extraction device comprising:

the test question acquisition module is used for acquiring a test question document from the document test questions if the document test questions are acquired;

the content acquisition module is used for acquiring a document content file from the test question document, wherein the document content file is a file in an xml format;

the paragraph traversing module is used for traversing the document content file and acquiring document paragraph data from the document content file;

the object acquisition module is used for acquiring text contents in each document paragraph data and forming the text contents in each document paragraph data into a corresponding paragraph object;

and the object adding module is used for adding the paragraph object into a set plist and taking the set plist as a test paper content set.

The fourth object of the invention is realized by the following technical scheme:

a test paper matching apparatus, characterized in that the test paper matching apparatus comprises:

the large-topic paragraph traversing module is used for acquiring a preset matching rule, and traversing a test paper content set according to the matching rule to acquire large-topic paragraph data, wherein the test paper content set is acquired by adopting the test paper content extraction method;

the replacing file obtaining module is used for forming a replacing file by the big question paragraph data and the small question paragraph data;

and the replacing module is used for replacing the document content file in the test paper content set with the replacing file to obtain the test question file.

The fifth invention object of the present invention is achieved by the following technical solutions:

a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above-mentioned test paper content extraction method when executing the computer program.

The sixth object of the present invention is achieved by the following technical solutions:

a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned test paper matching method.

In conclusion, the beneficial technical effects of the invention are as follows:

1. the method comprises the steps that a document content file in an xml format is obtained from a test question document in a document test question, so that a label of a corresponding document paragraph can be read conveniently from the test question document, and then a corresponding paragraph and document paragraph data in each paragraph can be obtained from the document paragraph label, and paragraph objects in the document paragraph data can be read conveniently from the document paragraph data; meanwhile, the paragraph exclusive is added to the set plist, so that the preset rule for acquiring the specific test questions in the test question document can be stored in the file form of the plist, the specific test questions can be automatically acquired from the test question document, and the identification and the splitting of the test question document can be facilitated;

2. the matching rule is preset, and the large-question paragraph data is traversed from the test paper content set obtained by the test paper content extraction method according to the matching rule, so that the small questions in each large question in the test paper and the content corresponding to each small question can be further matched by matching the large-question paragraph data; the specific test question content in the test paper content set is identified through matching, the formats of questions in the document test paper can be split and stored respectively, and therefore the splitting of manual intervention test paper can be reduced, errors can be reduced when the test paper is split, a teacher can be helped to establish a question bank of the test questions, and the teacher can conveniently output the test paper.

Drawings

FIG. 1 is a flow chart of a method for extracting test paper content according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating an implementation of step S20 in the test paper content extraction method according to an embodiment of the present invention;

FIG. 3 is a flowchart of a method for matching test paper according to an embodiment of the present invention;

FIG. 4 is another flow chart of a method of matching test sheets in an embodiment of the present invention;

FIG. 5 is a flowchart illustrating the implementation of step S70 in the test paper matching method according to an embodiment of the present invention;

FIG. 6 is a schematic block diagram of a test paper content extracting apparatus according to an embodiment of the present invention;

FIG. 7 is a schematic block diagram of a test paper matching apparatus according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a computer device according to an embodiment of the invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

22页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：描述数据标注任务的DSL应用系统及其方法

Test paper content extraction method, test paper matching method, device, equipment and medium

相关技术

网友询问留言