Method and device for converting docx file into xml file based on java and computer equipment

文档序号:1544836 发布日期:2020-01-17 浏览:8次 中文

阅读说明:本技术 基于java的docx文件转xml文件的方法、装置及计算机设备 (Method and device for converting docx file into xml file based on java and computer equipment ) 是由 潘孝 于 2019-09-19 设计创作,主要内容包括:本发明涉及一种基于java的docx文件转xml文件的方法、装置及计算机设备,方法包括:接收目标docx文件;对目标docx文件做预处理,得到初始xml文件和媒体文件;新建一个目标xml文件;将初始xml文件和媒体文件写入目标xml文件的根节点;输出目标xml文件并保存。本方案基于java实现,通过对接收到的目标docx文件做预处理,得到初始xml文件和媒体文件,并新建一个目标xml文件,再将初始xml文件和媒体文件写入目标xml文件的根节点,最后输出目标xml文件并保存,来实现将docx文件转换为xml文件,转换后的xml文件保真度高,并且提供jar包,也可内嵌于自己的应用中,转换质量更高。(The invention relates to a method, a device and computer equipment for converting a java-based docx file into an xml file, wherein the method comprises the following steps: receiving a target docx file; preprocessing a target docx file to obtain an initial xml file and a media file; newly building a target xml file; writing the initial xml file and the media file into a root node of the target xml file; and outputting and saving the target xml file. The scheme is realized based on java, an initial xml file and a media file are obtained by preprocessing a received target docx file, a target xml file is newly built, the initial xml file and the media file are written into a root node of the target xml file, and finally the target xml file is output and stored to convert the docx file into the xml file.)

1. A method for converting a java-based docx file into an xml file is characterized by comprising the following steps:

receiving a target docx file;

preprocessing a target docx file to obtain an initial xml file and a media file;

newly building a target xml file;

writing the initial xml file and the media file into a root node of the target xml file;

and outputting and saving the target xml file.

2. The method for converting a java based docx file into an xml file according to claim 1, wherein the step of preprocessing the target docx file to obtain an initial xml file and a media file comprises,

decompressing the target docx file by using a poi tool set to obtain a decompressed file;

the initial xml file and the media file in the decompressed file are extracted.

3. The method for converting java based docx file into xml file according to claim 1, wherein the root node of the target xml file is pkg: and (7) a package.

4. The method for converting a java based docx file into an xml file according to claim 3, wherein the step of writing the initial xml file and the media file into the root node of the target xml file comprises,

writing the initial xml file into the root node of the target xml file in the form of pkg: part/pkg: xmlData;

the media file is encrypted and written into the root node of the target xml file in the form of pkg: part/pkg: binary data.

5. An apparatus for converting a java-based docx file into an xml file, comprising:

a receiving unit, configured to receive a target docx file;

the preprocessing unit is used for preprocessing the target docx file to obtain an initial xml file and a media file;

the new building unit is used for building a target xml file;

the writing unit is used for writing the initial xml file and the media file into a root node of the target xml file;

and the output unit is used for outputting and saving the target xml file.

6. The apparatus for converting java based docx file into xml file according to claim 5, wherein the pre-processing unit comprises a decompression module and an extraction module;

the decompression module is used for decompressing the target docx file by using the poi tool set to obtain a decompressed file;

the extraction module is used for extracting the initial xml file and the media file in the decompressed file.

7. The apparatus for converting java based docx file into xml file as claimed in claim 5, wherein the root node of the target xml file is pkg: and (7) a package.

8. The apparatus for converting a java based docx file into an xml file according to claim 5, wherein the writing unit comprises a first writing module and a second writing module;

the first writing module is used for writing the initial xml file into the root node of the target xml file in the form of pkg: part/pkg: xmlData;

and the second writing module is used for writing the encrypted media file into the root node of the target xml file in the form of pkg: part/pkg: binary data.

9. A computer arrangement, characterized in that the computer arrangement comprises a memory having stored thereon a computer program and a processor implementing the method of converting a java based docx file to an xml file as claimed in any one of claims 1 to 4 when the computer program is executed.

10. A storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of converting a java based docx file into an xml file according to any one of claims 1 to 4.

Technical Field

The invention relates to the field of file conversion, in particular to a method, a device and computer equipment for converting a java-based docx file into an xml file.

Background

Microsoft Word is a Word processing application program of Microsoft corporation, and has gained increasing use in office automation. However, today where the application of automated office is very wide, it is already possible to efficiently and stably mass-generate Word documents from already-existing large amounts of XML (Extensible Markup Language) information.

OOXML is a technical specification developed by microsoft corporation for Office 2007 products, and became the ECMA standard in 2006, 12 months. The Docx format is one of the text formats of OOXML, only the OFFICE SDK of Microsoft can support conversion of the Docx file into the xml file at present, but the OFFICE SDK depends on NET environment, and OFFICE version compatibility is not good and cannot be used in java environment.

The existing technical scheme has poor compatibility to OFFICE versions, and meanwhile, the converted xml file is inconsistent with the original file after being opened by offce software, and the picture has a compression sign and cannot be used for a freemaker template. Therefore, there is a need for a method, an apparatus and a computer device for converting a java-based docx file into an xml file.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a method, a device and computer equipment for converting a java-based docx file into an xml file.

In order to achieve the purpose, the invention adopts the following technical scheme: a method for converting a java-based docx file into an xml file comprises the following steps:

receiving a target docx file;

preprocessing a target docx file to obtain an initial xml file and a media file;

newly building a target xml file;

writing the initial xml file and the media file into a root node of the target xml file;

and outputting and saving the target xml file.

The further technical proposal is that the step of preprocessing the target docx file to obtain the initial xml file and the media file comprises the steps of,

decompressing the target docx file by using a poi tool set to obtain a decompressed file;

the initial xml file and the media file in the decompressed file are extracted.

The further technical scheme is that the root node of the target xml file is pkg: and (7) a package.

The further technical proposal is that the step of writing the initial xml file and the media file into the root node of the target xml file comprises,

writing the initial xml file into the root node of the target xml file in the form of pkg: part/pkg: xmlData;

the media file is encrypted and written into the root node of the target xml file in the form of pkg: part/pkg: binary data.

The invention also adopts the following technical scheme: an apparatus for converting a java-based docx file into an xml file, comprising:

a receiving unit, configured to receive a target docx file;

the preprocessing unit is used for preprocessing the target docx file to obtain an initial xml file and a media file;

the new building unit is used for building a target xml file;

the writing unit is used for writing the initial xml file and the media file into a root node of the target xml file;

and the output unit is used for outputting and saving the target xml file.

The preprocessing unit comprises a decompression module and an extraction module;

the decompression module is used for decompressing the target docx file by using the poi tool set to obtain a decompressed file;

the extraction module is used for extracting the initial xml file and the media file in the decompressed file.

The further technical scheme is that the root node of the target xml file is pkg: and (7) a package.

The writing unit comprises a first writing module and a second writing module;

the first writing module is used for writing the initial xml file into the root node of the target xml file in the form of pkg: part/pkg: xmlData;

and the second writing module is used for writing the encrypted media file into the root node of the target xml file in the form of pkg: part/pkg: binary data.

The invention also adopts the following technical scheme: a computer device comprising a memory having stored thereon a computer program and a processor that, when executed, implements a method of converting a java based docx file to an xml file as claimed in any preceding claim.

The invention also adopts the following technical scheme: a storage medium storing a computer program which, when executed by a processor, implements the method of converting a java based docx file to an xml file as claimed in any preceding claim.

Compared with the prior art, the invention has the beneficial effects that: the method is realized based on java, the received target docx file is preprocessed to obtain an initial xml file and a media file, a target xml file is newly built, the initial xml file and the media file are written into a root node of the target xml file, and finally the target xml file is output and stored to convert the docx file into the xml file.

The invention is further described below with reference to the accompanying drawings and specific embodiments.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic application scenario diagram of a method for converting a java-based docx file into an xml file according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a method for converting a java-based docx file into an xml file according to an embodiment of the present invention;

FIG. 3 is a sub-flow diagram of a method for converting a java-based docx file into an xml file according to an embodiment of the present invention;

FIG. 4 is a schematic sub-flow chart of a method for converting a java-based docx file into an xml file according to another embodiment of the present invention;

FIG. 5 is a schematic block diagram of an apparatus for converting a java-based docx file into an xml file according to an embodiment of the present invention;

FIG. 6 is a schematic block diagram of a preprocessing unit of a device for converting a java-based docx file into an xml file according to an embodiment of the present invention;

FIG. 7 is a schematic block diagram of a writing unit of an apparatus for converting a java-based docx file into an xml file according to an embodiment of the present invention;

FIG. 8 is a schematic block diagram of a computer device provided by an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1 and fig. 2, fig. 1 is a schematic view of an application scenario of a method for converting a java-based docx file into an xml file according to an embodiment of the present invention. Fig. 2 is a schematic flow chart of a method for converting a java-based docx file into an xml file according to an embodiment of the present invention. The method for converting the java-based docx file into the xml file is applied to a server, the server and a terminal carry out data interaction, the terminal transmits a target docx file to the server, the server carries out pretreatment on the target docx file and provides the target docx file to obtain an initial xml file and a media file, a target xml file is newly established and used for writing the initial xml file and the media file, and the target xml file is finally obtained and output.

Fig. 2 is a flowchart illustrating a method for converting a java-based docx file into an xml file according to an embodiment of the present invention. As shown in fig. 2, the method includes the following steps S110 to S150.

And S110, receiving a target docx file.

In this embodiment, the target docx file is a document that needs to be converted into an xml file, and before the target docx file is installed and replaced, the terminal transmits the target docx file to the server, and the server performs subsequent processing.

And S120, preprocessing the target docx file to obtain an initial xml file and a media file.

In this embodiment, the target docx file is preprocessed to obtain an initial xml file and a media file according to the preprocessed file, and the initial xml file and the media file are used to obtain the target xml file through conversion.

Referring to fig. 3, in an embodiment, step S120 includes steps S121 and S122.

S121, decompressing the target docx file by using the poi tool set to obtain a decompressed file.

And S122, extracting the initial xml file and the media file in the decompressed file.

In one embodiment, the set of POI tools, also known as Apache POIs, is an open source function library of the Apache software foundation, and the POIs provide APIs for Java programs to read and write to Microsoft Office format archives. The target docx file can be decompressed through the poi tool set to obtain a decompressed file, and an initial xml file and a media file which are converted into the target xml file can be further extracted from the decompressed file.

Specifically, the media file is a media file under a word/media. The initial xml file includes the following files:

./[Content_Types].xml;

./_rels/.rels;

./docProps/app.xml;

./docProps/core.xml;

./word/document.xml;

./word/fontTable.xml;

./word/numbering.xml;

./word/settings.xml;

./word/styles.xml;

./word/webSettings.xml;

./word/theme/theme1.xml。

and S130, creating a target xml file.

In this embodiment, xml is an extensible markup language, a subset of standard generalized markup language, and is a markup language for marking electronic documents to be structured. The Xml file must have a root node, that is, a node established immediately after the root node, all nodes are child nodes of the root node, and the root node completely comprises all other child nodes in the Xml file. And creating an xml file as a target xml file for subsequently writing the specific information of the original target docx file and generating a final target xml file.

Specifically, the root node of the target xml file is pkg: and (7) a package.

And S140, writing the initial xml file and the media file into a root node of the target xml file.

In this embodiment, the converted target xml file can be obtained by writing the initial xml file and the media file into the newly created target xml file.

Referring to fig. 4, in an embodiment, step S140 includes steps S141 and S142.

And S141, writing the initial xml file into the root node of the target xml file in the form of pkg: part/pkg: xmlData.

And S142, writing the encrypted media file into the root node of the target xml file in the form of pkg part/pkg binary data.

In an embodiment, the converted target xml file can be obtained by writing the initial xml file and the media file into the newly-created target xml file, and the conversion method of the scheme completely follows the OOXML standard, so that the problem of distortion is avoided. The specific media file is encrypted by means of base64 encryption and written into the root node of the target xml file.

And S150, outputting and saving the target xml file.

In this embodiment, after the writing is completed, the conversion from the docx file to the xml file is completed, the conversion method completely follows the OOXML standard, the fidelity of the converted target xml file is high, and a jar package (compressed file) is provided, which can be embedded into its own application by the user simply and conveniently.

The scheme is realized based on java, the conversion completely follows OOXML standard, and the problem of distortion is avoided. The method comprises the steps of preprocessing a received target docx file to obtain an initial xml file and a media file, creating a target xml file, writing the initial xml file and the media file into a root node of the target xml file, and finally outputting and storing the target xml file to convert the docx file into the xml file.

Fig. 5 is a schematic block diagram of an apparatus for converting a java-based docx file into an xml file according to an embodiment of the present invention. As shown in fig. 5, the invention further provides a device for converting a java based docx file into an xml file, which corresponds to the method for converting a java based docx file into an xml file in the above embodiment. The apparatus for converting a java-based docx file into an xml file comprises a unit for executing the method for converting a java-based docx file into an xml file, and can be configured in a desktop computer, a tablet computer, a portable computer, and the like. Specifically, referring to fig. 5, the apparatus for converting a java-based docx file into an xml file includes a receiving unit 10, a preprocessing unit 20, a new creating unit 30, a writing unit 40 and an output unit 50.

A receiving unit 10, configured to receive a target docx file.

In this embodiment, the target docx file is a document that needs to be converted into an xml file, and before the target docx file is installed and replaced, the terminal transmits the target docx file to the server, and the server performs subsequent processing.

And the preprocessing unit 20 is configured to preprocess the target docx file to obtain an initial xml file and a media file.

In this embodiment, the target docx file is preprocessed to obtain an initial xml file and a media file according to the preprocessed file, and the initial xml file and the media file are used to obtain the target xml file through conversion.

Referring to fig. 6, in one embodiment, the preprocessing unit 20 includes a decompression module 21 and an extraction module 22.

And the decompression module 21 is configured to decompress the target docx file by using the poi tool set to obtain a decompressed file.

And the extracting module 22 is used for extracting the initial xml file and the media file in the decompressed file.

In one embodiment, the set of POI tools, also known as Apache POIs, is an open source function library of the Apache software foundation, and the POIs provide APIs for Java programs to read and write to Microsoft Office format archives. The target docx file can be decompressed through the poi tool set to obtain a decompressed file, and an initial xml file and a media file which are converted into the target xml file can be further extracted from the decompressed file.

Specifically, the media file is a media file under a word/media. The initial xml file includes the following files:

./[Content_Types].xml;

./_rels/.rels;

./docProps/app.xml;

./docProps/core.xml;

./word/document.xml;

./word/fontTable.xml;

./word/numbering.xml;

./word/settings.xml;

./word/styles.xml;

./word/webSettings.xml;

./word/theme/theme1.xml。

and a new creating unit 30 for creating a target xml file.

In this embodiment, xml is an extensible markup language, a subset of standard generalized markup language, and is a markup language for marking electronic documents to be structured. The Xml file must have a root node, that is, a node established immediately after the root node, all nodes are child nodes of the root node, and the root node completely comprises all other child nodes in the Xml file. And creating an xml file as a target xml file for subsequently writing the specific information of the original target docx file and generating a final target xml file.

Specifically, the root node of the target xml file is pkg: and (7) a package.

And the writing unit 40 is used for writing the initial xml file and the media file into the root node of the target xml file.

In this embodiment, by writing the initial xml file and the media file into the newly created target xml file, the converted target xml file can be obtained,

referring to fig. 7, in the present embodiment, the writing unit 40 includes a first writing module 41 and a second writing module 42.

The first writing module 41 is used for writing the initial xml file into the root node of the target xml file in the form of pkg: part/pkg: xmlData.

And the second writing module 42 is used for writing the encrypted media file into the root node of the target xml file in the form of pkg: part/pkg: binary data.

In an embodiment, the converted target xml file can be obtained by writing the initial xml file and the media file into the newly-created target xml file, and the conversion method of the scheme completely follows the OOXML standard, so that the problem of distortion is avoided. The specific media file is encrypted by means of base64 encryption and written into the root node of the target xml file.

And the output unit 50 is used for outputting and saving the target xml file.

In this embodiment, after the writing is completed, the conversion from the docx file to the xml file is completed, the conversion method completely conforms to the OOXML standard, the fidelity of the converted target xml file is high, and a jar package (compressed file) is provided, which can be embedded in its own application by the user simply and conveniently.

The scheme is realized based on java, the conversion completely follows OOXML standard, and the problem of distortion is avoided. The method comprises the steps of preprocessing a received target docx file to obtain an initial xml file and a media file, creating a target xml file, writing the initial xml file and the media file into a root node of the target xml file, and finally outputting and storing the target xml file to convert the docx file into the xml file.

It should be noted that, as can be clearly understood by those skilled in the art, for the specific implementation process of the apparatus for converting a java-based docx file into an xml file and each unit, reference may be made to the corresponding description in the foregoing method embodiment, and for convenience and brevity of description, no further description is provided herein.

Referring to fig. 8, fig. 8 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be a terminal or a server, where the terminal may be an electronic device with a communication function, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, and a wearable device. The server may be an independent server or a server cluster composed of a plurality of servers.

Referring to fig. 8, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.

The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer programs 5032 comprise program instructions that, when executed, cause the processor 502 to perform a method of converting a java based docx file to an xml file.

The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.

The internal memory 504 provides an environment for running the computer program 5032 on the non-volatile storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 can execute a method for converting a java-based docx file into an xml file.

The network interface 505 is used for network communication with other devices. Those skilled in the art will appreciate that the configuration shown in fig. 8 is a block diagram of only a portion of the configuration relevant to the present teachings and does not constitute a limitation on the computer device 500 to which the present teachings may be applied, and that a particular computer device 500 may include more or less components than those shown, or combine certain components, or have a different arrangement of components.

Wherein the processor 502 is adapted to run a computer program 5032 stored in the memory.

It should be understood that, in the embodiment of the present Application, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will be understood by those skilled in the art that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program instructing associated hardware. The computer program includes program instructions, and the computer program may be stored in a storage medium, which is a computer-readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.

Accordingly, the present invention also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program.

The storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, which can store various computer readable storage media.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.

The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be merged, divided and deleted according to actual needs. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种错别字检测方法、装置及计算机存储介质、电子设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!