Layout recovery method and device and electronic equipment

文档序号:1087368 发布日期:2020-10-20 浏览:6次 中文

阅读说明:本技术 版面恢复方法、装置和电子设备 (Layout recovery method and device and electronic equipment ) 是由 龙坤 于 2020-07-06 设计创作,主要内容包括:本公开实施例公开了版面恢复方法、装置和电子设备。该方法的一具体实施方式包括:提取目标版面中的文本行位置信息;根据该文本行位置信息,确定该目标版面的复杂度,其中,复杂度用于表征该目标版面中文本排布的复杂程度;根据该复杂度,选取针对该目标版面的目标排版恢复函数,其中,排版恢复函数用于基于文本行位置信息重建目标版面;采用该目标排版恢复函数,对该目标版面进行恢复。由此,提供了一种新的版面恢复方式。(The embodiment of the disclosure discloses a layout recovery method and device and electronic equipment. One embodiment of the method comprises: extracting text line position information in a target layout; determining the complexity of the target layout according to the text line position information, wherein the complexity is used for representing the complexity of text arrangement in the target layout; selecting a target typesetting recovery function aiming at the target layout according to the complexity, wherein the typesetting recovery function is used for reconstructing the target layout based on the text line position information; and adopting the target typesetting recovery function to recover the target layout. Therefore, a new layout recovery mode is provided.)

1. A method for restoring a layout, comprising:

extracting text line position information in a target layout;

determining the complexity of the target layout according to the text line position information, wherein the complexity is used for representing the complexity of text arrangement in the target layout;

selecting a target typesetting recovery function aiming at the target layout according to the complexity, wherein the typesetting recovery function is used for reconstructing the target layout based on the text line position information;

and adopting the target typesetting recovery function to recover the target layout.

2. The method of claim 1, wherein determining the complexity of the target layout based on the text line position information comprises:

based on at least one of: determining the complexity of the target layout according to the first frequency, the second frequency and whether the target layout is in columns or not; the first time indicates the times of occurrence of a dislocation relation in the target layout, wherein the dislocation relation is used for indicating the condition that two adjacent text lines are not overlapped in the line direction; the second number indicates the number of times that an overlap relationship appears in the target layout, wherein the overlap relationship is used for indicating that the difference value of the line coordinates of the two texts is smaller than the height of any text line.

3. The method of claim 2, wherein the determining is based on at least one of: the first frequency, the second frequency and whether the target layout is in columns, and determining the complexity of the target layout comprises the following steps:

determining whether the target layout is in columns;

in response to determining that the target layout is subfield, determining that the complexity is a first preset value, wherein the first preset value is greater than a preset complexity threshold value;

and responding to the fact that the target layout is not in column division, and determining the complexity according to the first times, the second times and the number of text lines in the target layout.

4. The method of claim 2, wherein the determining is based on at least one of: the first frequency, the second frequency and whether the target layout is in columns, and determining the complexity of the target layout comprises the following steps:

determining whether the target layout is in columns;

in response to determining the target layout subfield, determining the complexity according to a first subfield indicating value, the first number of times, the second number of times and the number of text lines in the target layout;

and responding to the fact that the target layout is not classified, and determining the complexity according to the second classification indicating value, the first times, the second times and the number of text lines in the target layout.

5. The method of claim 2, wherein the text line position information includes line coordinates of the text line; and

whether the target layout is in columns or not is determined by the following method:

determining whether at least two non-intersected coordinate areas exist in the line direction according to the line coordinates of each text line of the target layout;

if yes, determining that the target document is in columns;

and if not, determining that the target document is not in column.

6. The method of claim 1, wherein the at least two predefined typesetting recovery functions comprise a typesetting recovery function based on column coordinates; and

selecting a target typesetting recovery function aiming at the target layout according to the complexity, wherein the selecting the target typesetting recovery function comprises the following steps:

and in response to the complexity not being larger than a preset complexity threshold, selecting a typesetting recovery function based on the column coordinates as a target typesetting recovery function.

7. The method of claim 1, wherein the at least two predefined typographical recovery functions comprise a deep learning based typographical recovery function; and

selecting a target typesetting recovery function aiming at the target layout according to the complexity, wherein the selecting the target typesetting recovery function comprises the following steps:

and responding to the fact that the complexity is larger than a preset complexity threshold value, and selecting a typesetting recovery function based on deep learning as a target typesetting recovery function.

8. A method for restoring a layout, comprising:

the extraction unit is used for extracting the text line position information in the target layout;

the determining unit is used for determining the complexity of the target layout according to the text line position information, wherein the complexity is used for representing the complexity of text arrangement in the target layout;

the selecting unit is used for selecting a target typesetting recovery function aiming at the target layout according to the complexity, wherein the typesetting recovery function is used for reconstructing the target layout based on the text line position information;

and the recovery unit is used for recovering the target layout by adopting the target typesetting recovery function.

9. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

10. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for restoring a layout, and an electronic device.

Background

With the development of the internet, users increasingly use computers to implement various functions. For example, a user may use a Portable Document Format (PDF) file to communicate information to other users. The PDF has the advantages of cross-platform, capability of reserving the original format (Layout) of the file, open standard and capability of freely developing PDF compatible software without version tax.

In some application scenarios, a computer may parse a file in PDF format to obtain information in the file.

Disclosure of Invention

This disclosure is provided to introduce concepts in a simplified form that are further described below in the detailed description. This disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In a first aspect, an embodiment of the present disclosure provides a method for restoring a layout, where the method includes: extracting text line position information in a target layout; determining the complexity of the target layout according to the text line position information, wherein the complexity is used for representing the complexity of text arrangement in the target layout; selecting a target typesetting recovery function aiming at the target layout according to the complexity, wherein the typesetting recovery function is used for reconstructing the target layout based on the text line position information; and adopting the target typesetting recovery function to recover the target layout.

In a second aspect, an embodiment of the present disclosure provides a layout recovery apparatus including: the extraction unit is used for extracting the text line position information in the target layout; the determining unit is used for determining the complexity of the target layout according to the text line position information, wherein the complexity is used for representing the complexity of text arrangement in the target layout; the selecting unit is used for selecting a target typesetting recovery function aiming at the target layout according to the complexity, wherein the typesetting recovery function is used for reconstructing the target layout based on the text line position information; and the recovery unit is used for recovering the target layout by adopting the target typesetting recovery function.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method of restoring a layout as described in the first aspect.

In a fourth aspect, the disclosed embodiments provide a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the steps of the layout recovery method according to the first aspect.

According to the method, the device and the electronic equipment for restoring the layout, the position information of a text line is advanced for a target layout; then, determining the complexity of the target layout according to the text line position information; then, according to the complexity, selecting a target typesetting recovery function from at least two typesetting recovery functions; finally, adopting a target typesetting recovery function to recover the target layout; thus, a new layout restoration method can be provided.

And, according to the difference of the complexity, the method can provide basis for the selection of the file typesetting recovery function. In general, the typesetting recovery function is difficult to take account of the calculation speed and the calculation accuracy, the typesetting recovery function is selected according to the complexity, and a recovery mode suitable for the target layout can be selected according to the actual condition of the target layout. When the target layout is simple, the typesetting recovery function with high calculation speed and low calculation accuracy can be selected, so that the target layout is ensured to have high accuracy while the speed is increased. When the target layout is complex, the typesetting recovery function with low calculation speed and high calculation accuracy can be selected, so that the target layout has high accuracy. That is, it is possible to increase the calculation speed as much as possible while ensuring high accuracy for the recovery of the target layout.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

FIG. 1 is a flow diagram for one embodiment of a method of panel recovery according to the present disclosure;

FIG. 2 is a schematic diagram of an exemplary representation of text line information according to the present disclosure;

FIG. 3 is a flow diagram of yet another embodiment of a method of panel restoration according to the present disclosure;

FIG. 4 is a schematic illustration of a misalignment relationship according to the present disclosure;

FIG. 5 is a schematic illustration of an overlapping relationship according to the present disclosure;

FIG. 6 is an alternative implementation of step 302 according to the present disclosure;

FIG. 7 is a schematic diagram illustrating an embodiment of a layout recovery apparatus according to the present disclosure;

FIG. 8 is an exemplary system architecture to which the layout restoration method of one embodiment of the present disclosure may be applied;

fig. 9 is a schematic diagram of a basic structure of an electronic device provided according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

Referring to fig. 1, a flow of one embodiment of a method for panel restoration according to the present disclosure is shown. The layout recovery method can be applied to a server or a terminal device. The method for restoring the layout as shown in fig. 1 comprises the following steps:

step 101, for extracting text line position information in the target layout.

In the present embodiment, an execution subject (e.g., server) of the layout recovery method may extract text line position information in the target layout.

In this embodiment, the target layout may be any layout. For example, it may be the target layout to be parsed. It is understood that a file may be divided into multiple pages. Each page may be referred to as a layout. The "target" in the target layout is for clearly explaining the page to be analyzed, and does not constitute a limitation on the layout.

In some application scenarios, any one or more pages in the document may be the target layout. In some application scenarios, a partial area in a page may also be understood as a target layout.

In this embodiment, the file format of the file in which the target layout is located is not limited. As an example, the target layout may be a page in a portable file format file, or may be a page in a picture file. The above-mentioned portable file format file may also be referred to as a PDF file. Here, the specific content of the PDF file is not limited. As an example, the PDF file used to perform the present embodiment may be a PDF file to be parsed.

In this embodiment, the text line position information may indicate the position of the text line. Each text line may correspond to text line position information indicating the position of the text line.

In this embodiment, the form of the text line position information is not limited. As an example, the text line position information may be expressed in coordinates in a rectangular coordinate system, or may be expressed in coordinates in other types of coordinate systems (for example, a polar coordinate system).

In some application scenarios, text extraction may be performed on the target layout. The proposed text information can include not only the characters in the file, but also rich text information such as position coordinates, fonts, font sizes, colors, whether to be bold and the like of each character. Based on rich text information, discrete words may be divided into a plurality of text lines, with the line coordinates of the leftmost and rightmost words of the text line in the row direction (e.g., referred to as the x-axis), the column coordinates of the text line in the column direction (e.g., the y-axis), and the height of the text line being recorded. Here, the row direction and the column direction are perpendicular.

Referring to FIG. 2, an exemplary representation of text line information is shown. In fig. 2, "In _ text" may represent a text line. The specific contents ("#" preceding contents) of the text line are shown after "#", and are not described again.

And 102, determining the complexity of the target layout according to the text line position information.

In this embodiment, the execution body may determine the complexity of the template layout according to the text line position information.

Here, the above complexity is used to represent the complexity of the text arrangement in the target layout.

Here, the complexity determination manner may be set according to an actual application scenario, and is not limited herein.

By way of example, the target layout may have the following but is not limited to: the subareas are used as semantic groups and left and right subareas. All of the above may result in a complex layout that cannot be laid out from top to bottom by simple text lines. Therefore, in an actual application scenario, parameters based on the situations can be used as factors for determining complexity according to the situations which may cause errors in the layout recovery. That is, the complexity may be generated based on predefined parameters that cause errors in the layout recovery.

And 103, selecting a target typesetting recovery function aiming at the target layout according to the complexity.

In this embodiment, the execution subject may select a target layout recovery function for the target layout according to the complexity.

Here, the layout recovery function is used to reconstruct the target layout based on the text line position information, that is, may be used to recover the layout. The typesetting recovery function can take the text line position information as an independent variable and the layout structure information as a dependent variable. The layout structure information may include sub-region location information and/or lines of text in the sub-regions. It can be understood that the recovered layout can be used to obtain a semantic group, and then semantic recognition and other steps are performed.

Optionally, the execution subject may select a target layout recovery function for the target layout from at least two predefined layout recovery functions.

Here, the number of the predefined typesetting restoration functions may be at least two, and may be two, three, or the like, as an example.

Here, different layout restoration functions may differ in calculation speed, calculation accuracy, and the like.

The specific content of the predefined various typesetting restoration functions can be set according to the actual situation, and is not limited herein.

In some application scenarios, the at least two predefined typographical recovery functions may comprise a column direction based typographical recovery function.

By way of example, the column direction-based layout restoration function is briefly described below, and if the target layout in the file is arranged from top to bottom as composition, we can order the text lines from small to large in size in the vertical column coordinate (y coordinate), i.e., restore the order of the text. The method has the advantages of high speed and high efficiency.

In some application scenarios, the at least two predefined typographical recovery functions may comprise a deep learning based typographical recovery function.

In some application scenarios, when the layout is extremely complex, for example, there are left and right columns, and there are misalignment and overlap of text lines, the ordering by directly using the y coordinate will cause disorder of text sequence. At this point, a deep learning based typesetting recovery strategy may need to be used.

As an example, the input to the deep learning based layout recovery function may include textual content in a text line. The typesetting recovery function based on deep learning is roughly divided into the following steps: text belonging to the same subject (e.g., the same piece of work experience) in the layout is framed with a rectangular frame, optionally, using a deep learning object detection technique. The text within each rectangular box is sorted. All rectangular boxes are sorted. After the three steps, the disordered text of the complex layout can be restored to an ordered state according with the reading order of people. Although the method can restore the text of the complex layout to the ordered state, the time consumed by the method is obviously increased compared with the method of directly sequencing by using the y coordinate due to the use of the target detection technology based on deep learning.

And 104, adopting a target typesetting recovery function to recover the target layout.

In this embodiment, the execution body may adopt a target layout recovery function to recover the target layout.

Here, the execution body may recover the target layout according to a specific calculation manner corresponding to the target layout recovery function due to different types, which is not described herein again. As an example, the calculation method of step 104 may be briefly described with reference to the calculation methods of the above two typesetting recovery functions.

It should be noted that, in the layout recovery method provided in this embodiment, the position information of the text line is advanced for the target layout in the portable file format file; then, determining the complexity of the target layout according to the text line position information; then, according to the complexity, selecting a target typesetting recovery function from at least two typesetting recovery functions; finally, adopting a target typesetting recovery function to recover the target layout; thus, a new layout restoration method can be provided.

And, according to the difference of the complexity, the selection of the typesetting recovery function can be provided with a basis. In general, the typesetting recovery function is difficult to take account of the calculation speed and the calculation accuracy, the typesetting recovery function is selected according to the complexity, and a recovery mode suitable for the target layout can be selected according to the actual condition of the target layout. When the target layout is simple, the typesetting recovery function with high calculation speed and low calculation accuracy can be selected, so that the target layout is ensured to have high accuracy while the speed is increased. When the target layout is complex, the typesetting recovery function with low calculation speed and high calculation accuracy can be selected, so that the target layout has high accuracy. That is, it is possible to increase the calculation speed as much as possible while ensuring high accuracy for the recovery of the target layout.

It can be understood that the above calculation complexity is implemented for the layout in the file, and the typesetting recovery function is flexibly selected according to the complexity, so that the accuracy of file recovery is ensured, and the speed of file layout recovery is greatly improved.

Referring to FIG. 3, a flow diagram of one embodiment of a method for panel recovery according to the present disclosure is shown. The layout recovery method can be applied to a server or a terminal device. The method for restoring the layout as shown in fig. 1 comprises the following steps:

step 301, extracting the text line position information in the target layout.

In the present embodiment, an execution subject (e.g., server) of the layout recovery method may extract text line position information in the target layout.

It should be noted that, for details of implementation and technical effects of step 301, reference may be made to the description of step 101, and details are not described herein.

Step 302, based on at least one of: and determining the complexity of the target layout according to the first times, the second times and whether the target layout is in columns.

In the present embodiment, the execution subject may be based on at least one of, but not limited to: and determining the complexity of the target layout according to the first times, the second times and whether the target layout is in columns.

In this embodiment, the first number may indicate the number of times the misalignment relationship occurs in the target layout.

In this embodiment, the above-mentioned offset relationship may be a case where two adjacent text lines do not overlap in the line direction.

Please refer to fig. 4, which shows a schematic diagram of the misalignment relationship. In fig. 4, the text line in which "DDDDDD" is located and the text line in which "AAAAAAAAAAAAAAAA" is located are adjacent text lines, and there is no overlap in the line direction, so that the two lines are in a staggered relationship. The text line where the "dddddddd" is located and the text line where the "CCCCCCCCCCCCCCCCC" is located are adjacent text lines, and if there is no overlap in the line direction, the two are in a staggered relationship. The BDE region and the AC region are different semantic regions, and if they are arranged from top to bottom according to the column coordinates, the text "DDDDDD" may be assigned to perform semantic parsing with the AC region, thereby possibly causing errors,

in this embodiment, the second number indicates the number of times the overlap relationship appears in the target layout

In this embodiment, the overlap relationship is used to indicate that the difference between the line coordinates of the two texts is smaller than the height of any text line.

Please refer to fig. 5, which shows a schematic diagram of the overlapping relationship. In fig. 5, the text line where "FFFFFFFFFFFFFFFFFF" is located and the text line where "GGGGGG" is located are the text lines whose column coordinates are smaller than the height of any text line, and they are in a staggered relationship.

In some application scenarios, the overlapping relationship often occurs when two lines of text belong to the same line in spatial position, but since there are multiple spaces between them, the text information extraction is performed by dividing the text information into two lines. This case also easily causes disorder of the layout order. Detecting this relationship primarily looks at whether there is an overlap of the vertical orientation of the two lines of text.

In this case, column division may refer to dividing the text in the layout into two or more columns, where the columns are relatively fixed in position and have substantially the same width. The current writing habit is generally horizontal writing, so the column division can be generally referred to as horizontal column division, and as an example, the layout can be divided into a left column and a right column.

The specific manner for determining whether the target layout is in columns may be set according to an actual application, and is not limited herein.

In some embodiments, whether the target layout is columnar may be determined by: determining whether at least two non-intersected coordinate areas exist in the line direction according to the line coordinates of each text line of the target layout; if yes, determining that the target document is in columns; and if not, determining that the target document is not in column.

In other words, all text lines can be projected from the vertical direction to the bottom of the page, and then it is observed that there are several disjoint projection areas at the bottom of the page, if there is only one projection area, it means that there is no division, and if the number of projection areas is greater than or equal to two, it means that there is a division.

Here, the text line position information may include line coordinates and column coordinates of the text line.

In some alternative implementations, step 302 may be implemented in various ways.

In some embodiments, the step 302 may include: determining whether the target layout is in columns; in response to determining the target layout subfield, determining the complexity according to the first subfield indicating value, the first number of times, the second number of times and the number of text lines in the target layout; and responding to the fact that the target layout is not classified, and determining the complexity according to the second classification indicating value, the first times, the second times and the number of text lines in the target layout.

Here, the first section indication value may indicate a section, and may be 1, for example; the second column indication value may indicate no column, e.g., may be 0.

Here, determining the complexity according to the first column indication value, the first number of times, the second number of times, and the number of text lines in the target layout may include: and dividing the first number by the number of text lines to obtain a first ratio. The second number may be divided by the number of text lines to obtain a second ratio; then, the first column indicating value, the first ratio and the second ratio are weighted and averaged to obtain the complexity.

Here, determining the complexity according to the second column indication value, the first number, the second number, and the number of text lines in the target layout may include: and dividing the first number by the number of text lines to obtain a first ratio. The second number may be divided by the number of text lines to obtain a second ratio; and then, carrying out weighted average on the second column indication value, the first ratio and the second ratio to obtain the complexity.

Step 303, determining whether the complexity is greater than a preset complexity threshold.

In this embodiment, the execution subject may determine whether the complexity is greater than a preset complexity threshold.

Here, the specific value of the preset complexity threshold may be set according to an actual application scenario, and is not limited herein. As an example, the complexity threshold may be 0.3.

And 304, in response to the complexity not greater than the preset complexity threshold, selecting a typesetting recovery function based on the column coordinates as a target typesetting recovery function.

In this embodiment, the executing entity may select, in response to the complexity not being greater than the preset complexity threshold, the layout recovery function based on the column coordinates as the target layout recovery function.

And 305, in response to the complexity being greater than the preset complexity threshold, selecting a typesetting recovery function based on deep learning as a target typesetting recovery function.

In this embodiment, the executing entity may select, in response to the complexity being greater than a preset complexity threshold, a typesetting recovery function based on deep learning as a target typesetting recovery function.

And step 306, adopting a target typesetting recovery function to recover the target layout.

In this embodiment, the execution body may adopt a target layout recovery function to recover the target layout.

It should be noted that, in the embodiment shown in fig. 3, the complexity is determined by one or more of the first number, the second number and whether to divide the column; and then, according to the relation between the complexity and the complexity threshold, determining whether to select a typesetting recovery function based on the column coordinates or a typesetting recovery function based on deep learning, fitting the practical application scene, ensuring the recovery accuracy and improving the efficiency.

In particular, the deep learning-based layout recovery function can deal with the text disorder problem of a complicated layout, but consumes a long time. Although the typesetting recovery function based on the column coordinates can only rearrange some simple layouts and cannot recover the text sequence of complex layouts, the time consumed by the typesetting recovery function is negligible. In other words, with the use of the target detection technique based on the deep learning, the accuracy of the calculation is high, but the time consumed is significantly increased compared to the sorting directly using the column coordinates. In an actual application scenario, the number of particularly complex layouts is a small number, so that the complexity determining mode is provided by the embodiment, the complexity of the layouts is evaluated, different typesetting recovery strategies are selected based on the complexity scores in a targeted manner, and the layout recovery efficiency can be remarkably improved.

Referring to fig. 6, an alternative implementation of step 302 described above is shown. As shown in fig. 6, step 302 may include step 3021, step 3022, and step 3023.

Step 3021, determine whether the target layout is columnar.

In this implementation, the execution body may determine whether the target layout is in columns.

Step 3022, in response to determining the target layout subfield, determining the complexity to be a first preset value.

Here, the first preset value is greater than a preset complexity threshold.

Step 3023, in response to determining that the target layout is not striped, determining complexity according to the first number, the second number, and the number of text lines in the target layout.

Here, the first number may be divided by the number of text lines to obtain a first ratio. The second number may be divided by the number of lines of text to obtain a second ratio. The first ratio and the second ratio may then be weighted averaged as the complexity.

It is understood that the first ratio is a value of 0 or more and 1 or less. The second ratio is a value of 0 or more and 1 or less. If the texts in the target layout can be arranged from top to bottom without disorder, the output complexity is 0; if there is clutter, the complexity is greater than 0, with a maximum of 1. And after the score of the complexity of the layout is output, different typesetting recovery strategies are selected according to a preset complexity threshold value. And when the score is larger than the complexity threshold value, using a typesetting recovery strategy based on deep learning, and when the score is smaller than the threshold value, using a typesetting recovery strategy based on column coordinates.

It should be noted that, first, whether a frame exists is detected, and if a frame exists, only the column coordinates are used for sorting, and a left row and a right row are sequentially sorted, which may cause the order of the text to be completely wrong. However, the text order may be correct if ordered by column coordinates within each column. Therefore, it is important to detect whether the column is present or not, if the column is present, the complexity is set to be a first preset value (greater than a preset complexity threshold), and therefore, under the condition that the column is present, a typesetting recovery function based on deep learning can be adopted; if no column exists, the complexity is determined mainly by considering the overlapping relation and the error relation. Therefore, the accuracy and efficiency of balance layout restoration can be ensured while the calculation amount for determining complexity is reduced.

With further reference to fig. 7, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of a layout recovery apparatus, which corresponds to the embodiment of the method shown in fig. 1, and which can be applied to various electronic devices.

As shown in fig. 7, the layout recovery apparatus of the present embodiment includes: an extraction unit 701, a determination unit 702, a selection unit 703, and a recovery unit 704. The extraction unit is used for extracting text line position information of the target layout; the determining unit is used for determining the complexity of the target layout according to the text line position information, wherein the complexity is used for representing the complexity of text arrangement in the target layout; the selecting unit is used for selecting a target typesetting recovery function aiming at the target layout according to the complexity, wherein the typesetting recovery function is used for reconstructing the target layout based on the text line position information; and the recovery unit is used for recovering the target layout by adopting the target typesetting recovery function.

In this embodiment, specific processes of the extracting unit 701, the determining unit 702, the selecting unit 703 and the recovering unit 704 of the layout recovering apparatus and the technical effects brought by the processes can refer to the related descriptions of step 101, step 102, step 103 and step 104 in the corresponding embodiment of fig. 1, which are not described herein again.

In some embodiments, the determining the complexity of the target layout according to the text line position information, where the complexity is used to characterize the complexity of text arrangement in the layout, includes: based on at least one of: determining the complexity of the target layout according to the first frequency, the second frequency and whether the target layout is in columns or not; the first time indicates the times of occurrence of a dislocation relation in the target layout, wherein the dislocation relation is used for indicating the condition that two adjacent text lines are not overlapped in the line direction; the second number indicates the number of times that an overlap relationship appears in the target layout, wherein the overlap relationship is used for indicating that the difference value of the line coordinates of the two texts is smaller than the height of any text line.

In some embodiments, the method is based on at least one of: the first frequency, the second frequency and whether the target layout is in columns, and determining the complexity of the target layout comprises the following steps: determining whether the target layout is in columns; in response to determining that the target layout is subfield, determining that the complexity is a first preset value, wherein the first preset value is greater than a preset complexity threshold value; and responding to the fact that the target layout is not in column division, and determining the complexity according to the first times, the second times and the number of text lines in the target layout.

In some embodiments, the method is based on at least one of: the first frequency, the second frequency and whether the target layout is in columns, and determining the complexity of the target layout comprises the following steps: determining whether the target layout is in columns; in response to determining the target layout subfield, determining the complexity according to the first subfield indicating value, the first number of times, the second number of times and the number of text lines in the target layout; and responding to the fact that the target layout is not classified, and determining the complexity according to the second classification indicating value, the first times, the second times and the number of text lines in the target layout.

In some embodiments, the text line position information includes line coordinates and column coordinates of the text line; and whether the target layout is in columns is determined by the following method: determining whether at least two non-intersected coordinate areas exist in the line direction according to the line coordinates of each text line of the target layout; if yes, determining that the target document is in columns; and if not, determining that the target document is not in column.

In some embodiments, the at least two predefined typesetting recovery functions include a typesetting recovery function based on column coordinates; and selecting a target typesetting recovery function aiming at the target layout according to the complexity, wherein the method comprises the following steps: and in response to the complexity not being larger than a preset complexity threshold, selecting a typesetting recovery function based on the column coordinates as a target typesetting recovery function.

In some embodiments, the at least two predefined typesetting recovery functions comprise a deep learning-based typesetting recovery function; and selecting a target typesetting recovery function aiming at the target layout according to the complexity, wherein the method comprises the following steps: and responding to the fact that the complexity is larger than a preset complexity threshold value, and selecting a typesetting recovery function based on deep learning as a target typesetting recovery function.

Referring to fig. 8, fig. 8 illustrates an exemplary system architecture to which the method of panel restoration of one embodiment of the present disclosure may be applied.

As shown in fig. 8, the system architecture may include terminal devices 801, 802, 803, a network 804, and a server 805. The network 804 serves to provide a medium for communication links between the terminal devices 801, 802, 803 and the server 805. Network 804 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

The terminal devices 801, 802, 803 may interact with a server 805 over a network 804 to receive or send messages or the like. The terminal devices 801, 802, 803 may have various client applications installed thereon, such as a web browser application, a search-type application, and a news-information-type application. The client application in the terminal device 801, 802, 803 may receive the instruction of the user, and complete the corresponding function according to the instruction of the user, for example, add the corresponding information in the information according to the instruction of the user.

The terminal devices 801, 802, 803 may be hardware or software. When the terminal devices 801, 802, 803 are hardware, they may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like. When the terminal devices 801, 802, 803 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 805 may be a server providing various services, for example, receiving an information acquisition request sent by the terminal devices 801, 802, and 803, and acquiring presentation information corresponding to the information acquisition request in various ways according to the information acquisition request. And the relevant data of the presentation information is sent to the terminal devices 801, 802, 803.

It should be noted that the layout recovery method provided by the embodiment of the present disclosure may be executed by a terminal device, and accordingly, the layout recovery apparatus may be disposed in the terminal devices 801, 802, and 803. In addition, the layout recovery method provided by the embodiment of the present disclosure may also be executed by the server 805, and accordingly, the layout recovery apparatus may be disposed in the server 805.

It should be understood that the number of terminal devices, networks, and servers in fig. 8 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to fig. 9, shown is a schematic diagram of an electronic device (e.g., a terminal device or a server of fig. 5) suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 9, the electronic device may include a processing means (e.g., a central processing unit, a graphic processor, etc.) 901, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage means 908 into a Random Access Memory (RAM) 903. In the RAM903, various programs and data necessary for the operation of the electronic apparatus 900 are also stored. The processing apparatus 901, the ROM 902, and the RAM903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

Generally, the following devices may be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 907 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 908 including, for example, magnetic tape, hard disk, etc.; and a communication device 909. The communication means 909 may allow the electronic device to perform wireless or wired communication with other devices to exchange data. While fig. 9 illustrates an electronic device having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication device 909, or installed from the storage device 908, or installed from the ROM 902. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing apparatus 901.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText transfer protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: extracting text line position information in a target layout; determining the complexity of the target layout according to the text line position information, wherein the complexity is used for representing the complexity of text arrangement in the target layout; selecting a target typesetting recovery function aiming at the target layout according to the complexity, wherein the typesetting recovery function is used for reconstructing the target layout based on the text line position information; and adopting the target typesetting recovery function to recover the target layout.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Here, the name of a unit does not constitute a limitation of the unit itself in some cases, and for example, the extraction unit may also be described as a "unit that extracts text line position information".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

19页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种焊接工艺标注自动生成方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!