Address recognition method, device, equipment and computer readable storage medium

文档序号:1170279 发布日期:2020-09-18 浏览:11次 中文

阅读说明:本技术 地址识别方法、装置、设备与计算机可读存储介质 (Address recognition method, device, equipment and computer readable storage medium ) 是由 张慢丽 黄国财 陈政 于 2020-06-10 设计创作,主要内容包括:本发明公开了一种地址识别方法,包括:若检测到地址文本,则基于预构建的行政区划树和所述地址文本对应的偏移指针,识别所述地址文本在所述行政区划树的最佳区域项;确定所述最佳区域项的上下级关系,并基于所述上下级关系,确定所述最佳区域项是否正确;若正确,则基于所述最佳区域项,输出所述地址文本对应的识别结果。本发明还公开了一种地址识别装置、设备和计算机可读存储介质。本发明在识别地址文本的过程中,通过预构建的行政区划树以及偏移指针,正确识别出地址文本中的最佳区域项,再对最佳区域项进行核对,保证最佳区域项正确的情况下才输出识别结果,提高了地址识别的正判率,实现地址的智能识别。(The invention discloses an address identification method, which comprises the following steps: if the address text is detected, identifying the optimal area item of the address text in the administrative division tree based on a pre-constructed administrative division tree and an offset pointer corresponding to the address text; determining the superior-inferior relation of the optimal area item, and determining whether the optimal area item is correct or not based on the superior-inferior relation; and if the address text is correct, outputting the identification result corresponding to the address text based on the optimal area item. The invention also discloses an address recognition device, equipment and a computer readable storage medium. In the process of identifying the address text, the optimal area item in the address text is correctly identified through the pre-constructed administrative division tree and the offset pointer, and then the optimal area item is checked, so that the identification result is output only when the optimal area item is correct, the positive judgment rate of address identification is improved, and the intelligent identification of the address is realized.)

1. An address identification method, characterized in that the address identification method comprises the following steps:

if the address text is detected, identifying the optimal area item of the address text in the administrative division tree based on a pre-constructed administrative division tree and an offset pointer corresponding to the address text;

determining the superior-inferior relation of the optimal area item, and determining whether the optimal area item is correct or not based on the superior-inferior relation;

and if the address text is correct, outputting the identification result corresponding to the address text based on the optimal area item.

2. The address recognition method according to claim 1, wherein if the address text is detected, the step of recognizing the best area item of the address text in the administrative division tree based on a pre-constructed administrative division tree and an offset pointer corresponding to the address text comprises:

if the address text is detected, determining a target entry corresponding to the address text based on the offset pointer, and determining a target area item matched with the address text in the administrative division tree based on the target entry;

and determining the optimal area item of the address text in the administrative division tree based on the target area item and the area level of the target area item.

3. The address recognition method according to claim 2, wherein the target entry includes at least a first target entry and a second target entry, the target area entry includes at least a first target area entry and a second target area entry, and the step of determining, if an address text is detected, the target entry corresponding to the address text based on the offset pointer and determining, based on the target entry, the target area entry matching the address text in the administrative tree comprises:

if the address text is detected, determining a first target entry corresponding to the address text based on the offset pointer, and determining a first target area item matched with the first target entry in the administrative division tree;

and controlling the offset pointer to offset based on a preset text unit, so as to determine a second target entry corresponding to the address text based on the offset pointer after offset, and determining a second target area item matched with the second target entry in the child node corresponding to the administrative division tree in the first target area item.

4. The address recognition method of claim 3, wherein if an address text is detected, the step of determining the first target entry corresponding to the address text based on the offset pointer comprises:

if the address text is detected, determining that the target text content pointed by the offset pointer in the address text is a current main key, and determining whether a vocabulary entry matched with the current main key exists in the administrative division tree;

if the entry exists, determining whether a sub-entry exists in the entry;

if the sub-entry exists, controlling the offset pointer to offset a preset text unit to the text direction of the address text, updating the current main key based on the text content pointed by the offset pointer after offset and the target text content, and continuing to execute the step of determining whether the entry matched with the current main key exists in the administrative division tree;

and if the sub-entry does not exist, determining whether the entry is an administrative area name entry, wherein if so, determining that the entry is a first target entry.

5. The address recognition method of claim 4, wherein after the step of determining whether there is an entry in the administrative division tree that matches the current primary key, the address recognition method further comprises:

if the entry does not exist, determining whether the jumping times of the offset pointer do not exceed the preset times;

if yes, controlling the offset pointer to jump to a preset text unit in the text direction of the address text, and accumulating and updating the jumping times of the offset pointer;

updating the current primary key based on the text content pointed by the offset pointer after the offset pointer jumps, and executing the step of determining whether the entry matched with the current primary key exists in the administrative division tree.

6. The address recognition method of claim 2, wherein the step of determining the best area item of the address text in the administrative division tree based on the target area item and the area level of the target area item comprises:

determining the region level of the target region item and determining the matching type of the target region item based on the region code of the target region item;

and determining the optimal area item of the address text in the administrative division tree based on the target area item, the area level and the matching type.

7. The address recognition method of claim 6, wherein the step of determining the best area item of the address text in the administrative division tree based on the target area item, the area level, and the matching type comprises:

determining whether the optimal area item corresponding to the area level is recorded in a recording area corresponding to the address text;

if so, updating the optimal area item corresponding to the area level based on the matching type;

if the regional level is below provincial level and the upper level regional item of the target regional item is not recorded in the recording area, the upper level regional item of the target regional item is matched upwards in the administrative division tree based on the upper level code of the target regional item;

and updating and recording the optimal area item of each area level in the recording area based on a preset matching rule, the target area item and the superior area item.

8. The address recognition method according to any one of claims 1 to 7, wherein if an address text is detected, the address text is recognized before the step of recognizing the best region item of the administrative division tree based on a pre-constructed administrative division tree and an offset pointer corresponding to the address text, and the address recognition method further comprises:

if the text to be recognized is detected, recognizing non-administrative region information in the text to be recognized based on a preset rule;

separating out non-administrative region information in the text to be identified to obtain the address text;

if the address text is correct, outputting the recognition result corresponding to the address text based on the optimal area item, including:

and if the address text is correct, outputting an identification result corresponding to the address text based on the optimal region item and the non-administrative region information.

9. An address recognition apparatus, characterized in that the address recognition apparatus comprises:

the identification module is used for identifying the optimal area item of the address text in the administrative division tree based on a pre-constructed administrative division tree and an offset pointer corresponding to the address text if the address text is detected;

the determining module is used for determining the superior-inferior relation of the optimal area item and determining whether the optimal area item is correct or not based on the superior-inferior relation;

and the output module is used for outputting the identification result corresponding to the address text based on the optimal area item if the address text is correct.

10. An address recognition apparatus, characterized in that the address recognition apparatus comprises: memory, processor and address recognition program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the address recognition method according to any one of claims 1 to 8.

11. A computer-readable storage medium, characterized in that an address recognition program is stored on the computer-readable storage medium, which when executed by a processor implements the steps of the address recognition method according to any one of claims 1 to 8.

Technical Field

The present invention relates to the field of financial technology (Fintech) technologies, and in particular, to an address recognition method, apparatus, device, and computer-readable storage medium.

Background

In recent years, with the development of financial technology (Fintech), particularly internet finance, address recognition technology has been introduced into daily services of financial institutions such as banks. In the daily service process of financial institutions such as banks, user information is often required to be standardized, that is, the user information is arranged in a uniform format so as to be convenient for management and verification, wherein the user information includes address information and the like filled by a user, and therefore how to identify the address information is a technical problem to be solved by financial institutions such as banks.

In the prior art, generally, address entries in an address text are searched, and then province, city, district, village, street and village matching is performed, so as to obtain a standardized address with a certain format; or an address classification model is designed, the address is divided into a plurality of classes, each class exists as a tuple, and then the address text is converted, so that a standardized address with a certain format is obtained.

However, in the prior art, the address text with dirty data cannot be accurately identified, and the actual administrative regions subordinate to the upper and lower levels cannot be well identified, for example, the south mountain area of guangzhou city, guangzhou, the entire identification process is blocked due to the existence of the name, and although the area is the lower administrative region of the city, the actual south mountain area is not subordinate to the lower jurisdiction of guangzhou city, but belongs to the lower jurisdiction of Shenzhen city, it is seen that the existing address identification intelligence is not enough, and the positive rate of address identification still needs to be improved.

Disclosure of Invention

The invention mainly aims to provide an address identification method, an address identification device, address identification equipment and a computer readable storage medium, and aims to improve the positive judgment rate of address identification.

In order to achieve the above object, the present invention provides an address recognition method, including the steps of:

if the address text is detected, identifying the optimal area item of the address text in the administrative division tree based on a pre-constructed administrative division tree and an offset pointer corresponding to the address text;

determining the superior-inferior relation of the optimal area item, and determining whether the optimal area item is correct or not based on the superior-inferior relation;

and if the address text is correct, outputting the identification result corresponding to the address text based on the optimal area item.

Preferably, if the address text is detected, the step of identifying the best area item of the address text in the administrative division tree based on a pre-constructed administrative division tree and an offset pointer corresponding to the address text includes:

if the address text is detected, determining a target entry corresponding to the address text based on the offset pointer, and determining a target area item matched with the address text in the administrative division tree based on the target entry;

and determining the optimal area item of the address text in the administrative division tree based on the target area item and the area level of the target area item.

Preferably, the target entry includes at least a first target entry and a second target entry, the target area item corresponds to at least a first target area item and a second target area item, if an address text is detected, the step of determining the target entry corresponding to the address text based on the offset pointer and determining the target area item matching the address text in the administrative division tree based on the target entry includes:

if the address text is detected, determining a first target entry corresponding to the address text based on the offset pointer, and determining a first target area item matched with the first target entry in the administrative division tree;

and controlling the offset pointer to offset based on a preset text unit, so as to determine a second target entry corresponding to the address text based on the offset pointer after offset, and determining a second target area item matched with the second target entry in the child node corresponding to the administrative division tree in the first target area item.

Preferably, if the address text is detected, the step of determining the first target entry corresponding to the address text based on the offset pointer includes:

if the address text is detected, determining that the target text content pointed by the offset pointer in the address text is a current main key, and determining whether a vocabulary entry matched with the current main key exists in the administrative division tree;

if the entry exists, determining whether a sub-entry exists in the entry;

if the sub-entry exists, controlling the offset pointer to offset a preset text unit to the text direction of the address text, updating the current main key based on the text content pointed by the offset pointer after offset and the target text content, and continuing to execute the step of determining whether the entry matched with the current main key exists in the administrative division tree;

and if the sub-entry does not exist, determining whether the entry is an administrative area name entry, wherein if so, determining that the entry is a first target entry.

Preferably, after the step of determining whether there is an entry matching the current primary key in the administrative division tree, the address identification method further includes:

if the entry does not exist, determining whether the jumping times of the offset pointer do not exceed the preset times;

if yes, controlling the offset pointer to jump to a preset text unit in the text direction of the address text, and accumulating and updating the jumping times of the offset pointer;

updating the current primary key based on the text content pointed by the offset pointer after the offset pointer jumps, and executing the step of determining whether the entry matched with the current primary key exists in the administrative division tree.

Preferably, the step of determining the best area item of the address text in the administrative division tree based on the target area item and the area level of the target area item includes:

determining the region level of the target region item and determining the matching type of the target region item based on the region code of the target region item;

and determining the optimal area item of the address text in the administrative division tree based on the target area item, the area level and the matching type.

Preferably, the step of determining the best region item of the address text in the administrative division tree based on the target region item, the region level and the matching type includes:

determining whether the optimal area item corresponding to the area level is recorded in a recording area corresponding to the address text;

if so, updating the optimal area item corresponding to the area level based on the matching type;

if the regional level is below provincial level and the upper level regional item of the target regional item is not recorded in the recording area, the upper level regional item of the target regional item is matched upwards in the administrative division tree based on the upper level code of the target regional item;

and updating and recording the optimal area item of each area level in the recording area based on a preset matching rule, the target area item and the superior area item.

Preferably, if the address text is detected, before the step of identifying the address text in the best region item of the administrative division tree based on a pre-constructed administrative division tree and an offset pointer corresponding to the address text, the address identification method further includes:

if the text to be recognized is detected, recognizing non-administrative region information in the text to be recognized based on a preset rule;

separating out non-administrative region information in the text to be identified to obtain the address text;

if the address text is correct, outputting the recognition result corresponding to the address text based on the optimal area item, including:

and if the address text is correct, outputting an identification result corresponding to the address text based on the optimal region item and the non-administrative region information.

In addition, to achieve the above object, the present invention also provides an address recognition apparatus including:

the identification module is used for identifying the optimal area item of the address text in the administrative division tree based on a pre-constructed administrative division tree and an offset pointer corresponding to the address text if the address text is detected;

the determining module is used for determining the superior-inferior relation of the optimal area item and determining whether the optimal area item is correct or not based on the superior-inferior relation;

and the output module is used for outputting the identification result corresponding to the address text based on the optimal area item if the address text is correct.

Preferably, the identification module is further configured to:

if the address text is detected, determining a target entry corresponding to the address text based on the offset pointer, and determining a target area item matched with the address text in the administrative division tree based on the target entry;

and determining the optimal area item of the address text in the administrative division tree based on the target area item and the area level of the target area item.

Preferably, the target entry at least includes a first target entry and a second target entry, the target area item at least includes a first target area item and a second target area item, and the identification module is further configured to:

if the address text is detected, determining a first target entry corresponding to the address text based on the offset pointer, and determining a first target area item matched with the first target entry in the administrative division tree;

and controlling the offset pointer to offset based on a preset text unit, so as to determine a second target entry corresponding to the address text based on the offset pointer after offset, and determining a second target area item matched with the second target entry in the child node corresponding to the administrative division tree in the first target area item.

Preferably, the identification module is further configured to:

if the address text is detected, determining that the target text content pointed by the offset pointer in the address text is a current main key, and determining whether a vocabulary entry matched with the current main key exists in the administrative division tree;

if the entry exists, determining whether a sub-entry exists in the entry;

if the sub-entry exists, controlling the offset pointer to offset a preset text unit to the text direction of the address text, updating the current main key based on the text content pointed by the offset pointer after offset and the target text content, and continuing to execute the step of determining whether the entry matched with the current main key exists in the administrative division tree;

and if the sub-entry does not exist, determining whether the entry is an administrative area name entry, wherein if so, determining that the entry is a first target entry.

Preferably, the identification module is further configured to:

if the entry does not exist, determining whether the jumping times of the offset pointer do not exceed the preset times;

if yes, controlling the offset pointer to jump to a preset text unit in the text direction of the address text, and accumulating and updating the jumping times of the offset pointer;

updating the current primary key based on the text content pointed by the offset pointer after the offset pointer jumps, and executing the step of determining whether the entry matched with the current primary key exists in the administrative division tree.

Preferably, the identification module is further configured to:

determining the region level of the target region item and determining the matching type of the target region item based on the region code of the target region item;

and determining the optimal area item of the address text in the administrative division tree based on the target area item, the area level and the matching type.

Preferably, the identification module is further configured to:

determining whether the optimal area item corresponding to the area level is recorded in a recording area corresponding to the address text;

if so, updating the optimal area item corresponding to the area level based on the matching type;

if the regional level is below provincial level and the upper level regional item of the target regional item is not recorded in the recording area, the upper level regional item of the target regional item is matched upwards in the administrative division tree based on the upper level code of the target regional item;

and updating and recording the optimal area item of each area level in the recording area based on a preset matching rule, the target area item and the superior area item.

Preferably, the address recognition apparatus further includes a separation module, and the separation module is configured to:

if the text to be recognized is detected, recognizing non-administrative region information in the text to be recognized based on a preset rule;

and separating out the non-administrative region information in the text to be recognized to obtain the address text.

The output module is further configured to:

and if the address text is correct, outputting an identification result corresponding to the address text based on the optimal region item and the non-administrative region information.

In addition, to achieve the above object, the present invention also provides an address recognition apparatus including: the address recognition method comprises a memory, a processor and an address recognition program which is stored on the memory and can run on the processor, wherein the address recognition program realizes the steps of the address recognition method when being executed by the processor.

Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon an address recognition program which, when executed by a processor, implements the steps of the address recognition method as described above.

According to the address identification method provided by the invention, if an address text is detected, based on a pre-constructed administrative division tree and an offset pointer corresponding to the address text, the optimal regional item of the address text in the administrative division tree is identified; determining the superior-inferior relation of the optimal area item, and determining whether the optimal area item is correct or not based on the superior-inferior relation; and if the address text is correct, outputting the identification result corresponding to the address text based on the optimal area item. In the process of identifying the address text, the optimal area item in the address text is correctly identified through the pre-constructed administrative division tree and the offset pointer, and then the optimal area item is checked, so that the identification result is output only when the optimal area item is correct, the positive judgment rate of address identification is improved, and the intelligent identification of the address is realized.

Drawings

FIG. 1 is a schematic diagram of an apparatus architecture of a hardware operating environment according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating an address recognition method according to a first embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.

The device of the embodiment of the invention can be a mobile terminal or a server device.

As shown in fig. 1, the apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the configuration of the apparatus shown in fig. 1 is not intended to be limiting of the apparatus and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and an address recognition program.

The operating system is a program for managing and controlling the address recognition equipment and software resources and supports the running of a network communication module, a user interface module, an address recognition program and other programs or software; the network communication module is used for managing and controlling the network interface 1002; the user interface module is used to manage and control the user interface 1003.

In the address recognition apparatus shown in fig. 1, the address recognition apparatus calls an address recognition program stored in a memory 1005 by a processor 1001 and performs operations in various embodiments of the address recognition method described below.

Based on the hardware structure, the embodiment of the address identification method is provided.

Referring to fig. 2, fig. 2 is a schematic flowchart of a first embodiment of an address identification method of the present invention, where the method includes:

step S10, if an address text is detected, based on a pre-constructed administrative division tree and an offset pointer corresponding to the address text, identifying the optimal area item of the address text in the administrative division tree;

step S20, determining the superior-inferior relation of the optimal area item, and determining whether the optimal area item is correct or not based on the superior-inferior relation;

and step S30, if the answer is correct, outputting the identification result corresponding to the address text based on the optimal area item.

The address identification method is applied to address identification equipment of financial institutions such as financial institutions or banks, the address identification equipment can be terminals, robots or PC equipment, and for convenience of description, the address identification equipment is referred to as identification equipment for short. In this embodiment, the recognition device provides a text input window to acquire the address text input by the user on the text input window. In addition, the identification equipment can be also interfaced with a business system of a financial institution such as a bank and the like to crawl credit investigation data of a client in the business system, wherein the credit investigation data comprises address texts and the like of the client.

When the address text is detected, the optimal area item of the address text is identified through the pre-constructed administrative division tree and the offset pointer, and then the optimal area item is corrected, so that the positive judgment rate of the optimal area item is improved, and the final output identification result is more reliable.

The respective steps will be described in detail below:

step S10, if an address text is detected, based on a pre-constructed administrative division tree and an offset pointer corresponding to the address text, identifying the optimal area item of the address text in the administrative division tree;

in this embodiment, if the identification device detects the address text, the identification device identifies the optimal area item for the address text based on the administrative division tree and the offset pointer corresponding to the address text.

The administrative division tree is constructed in advance, and the specific construction process comprises the following steps:

step i, loading an administrative division address base, and creating an administrative division tree which takes a country as a root node and each administrative area as a branch node on the basis of the area level corresponding to each administrative area in the administrative division address base;

that is, before the address text is identified, the identification device loads an administrative division address library, where the administrative division address library includes names, administrative codes, and the like of each administrative area, and is specifically loaded as a tree structure with countries, such as china, as root nodes, that is, an administrative division tree, and taking china as an example, the administrative division tree includes seven layers, which are: china, province, city, district, street, village, town. Each node is an administrative area, the father node of each node is the upper level administrative area, namely the upper level area, of the corresponding node, the child node is the lower level administrative area, namely the lower level area, of the corresponding node, and the regional levels of the nodes on the same level are equal.

And ii, traversing and determining the first character of each node in the administrative division tree, and creating entries corresponding to the administrative division tree based on the first character of each node.

After the administrative division tree is loaded, traversing and determining the first word of each node in the administrative division tree, adding the first word of each node into entries (entry library), adding the same word into entries of the same character, and then forming sub-entries under the entries, such as Jiangsu province and Jiangsi province, corresponding to the word "Jiangsu" and the word "Jiangsi" as the sub-entries of "Jiangsu", the word "Jiangsu province" as the sub-entry of "Jiangsu", and the word "Jiangsi province" as the sub-entry of "Jiangsi".

At this time, the construction of the administrative division tree is completed, and the construction of the administrative division tree is prepared for the identification of the subsequent address text.

In addition, the offset pointer refers to a virtual data structure including the number of jumps, the recognition device generates an offset pointer corresponding to the current address text upon detection of the address text, the offset pointer initially points to the first character of the address text, and the recognition device controls the deflection pointer to deflect in the text direction as recognition progresses. The text content pointed by the offset pointer is the main key of the address text, and in the embodiment, the pointing direction of the offset pointer is preferably a word-by-word pointing direction, that is, a word-by-word offset.

In this embodiment, the recognition device compares the text content of the address text with the entries in the administrative division tree word by word through the pre-constructed administrative division tree and the offset pointer corresponding to the address text, so as to recognize the optimal regional items of the address text.

Further, in an embodiment, step S10 includes:

step a, if an address text is detected, determining a target entry corresponding to the address text based on the offset pointer, and determining a target area item matched with the address text in the administrative division tree based on the target entry;

in an embodiment, if the identification device detects an address text, a target entry corresponding to the address text is determined through an offset pointer, and then the target entry is matched with each level of administrative regions in an administrative division tree to determine a target region item of the address text in each level of administrative regions, and if the target entry of the current address text is 'Jiangsu province', the target region item of the provincial administrative region of the current address text is determined to be Jiangsu province; and if the target entry of the current address text is 'Futian area', determining that the target area item of the area-level administrative area of the current address text is the Futian area and the like.

Specifically, in an embodiment, the target entry at least includes a first target entry and a second target entry, the target area item at least includes a first target area item and a second target area item, and step a includes:

step a1, if an address text is detected, determining a first target entry corresponding to the address text based on the offset pointer, and determining a first target area item matched with the first target entry in the administrative division tree;

in an embodiment, if the identification device detects the address text, the identification device determines a first target entry corresponding to the address text through the offset pointer, and then searches for a first target regional item matched with the first target entry in the administrative division tree according to the first target entry, and if the first target entry corresponding to the current offset pointer is "guangdong province", the "guangdong province" is compared with nodes in the administrative division tree, so that the first target regional item corresponding to the "guangdong province" is determined to be a province-level administrative region- "guangdong province".

Further, in one embodiment, step a1 includes:

step a11, if an address text is detected, determining that the target text content pointed by the offset pointer in the address text is a current primary key, and determining whether a vocabulary entry matched with the current primary key exists in the administrative division tree;

in an embodiment, if the identification device detects the address text, the position of the offset pointer in the address text is determined, the target text content pointed by the offset pointer at the current position is determined to be the current primary key, and the current primary key is compared with the entry library in the administrative division tree to determine whether an entry matched with the current primary key exists in the entry library.

Step a12, if the entry exists, determining whether a sub-entry exists in the entry;

in an embodiment, after determining that an entry corresponding to a current primary key exists in an entry library in an administrative division tree, further determining whether a sub-entry exists under the entry, that is, determining whether the entry is traversed, if the entry corresponding to the current primary key is a "wide" entry, a "guangdong" sub-entry and a "guangxi" sub-entry are also present under the "wide" entry, and the like.

Step a13, if the sub-entry exists, controlling the offset pointer to offset a preset text unit to the text direction of the address text, updating the current main key based on the text content pointed by the offset pointer after offset and the target text content, and continuing to execute the step of determining whether the entry matched with the current main key exists in the administrative division tree;

in an embodiment, if it is determined that a sub-entry exists in an entry corresponding to a current primary key, the identification device controls the offset pointer to offset a preset text unit to a text direction of the address text, where the text direction may be from left to right, or from right to left, or from top to bottom, or from bottom to top, and in this embodiment, the text direction is preferably from left to right; in addition, the preset text unit refers to a displacement of the offset pointer offset, and in this embodiment, the preset text unit is preferably a word, such as a displacement of the offset pointer offset by one word in the right direction of the address text controlled by the recognition device.

And then updating the target text content based on the text content pointed after the offset pointer is offset and the original target text content, specifically combining the original target text content with the text content pointed after the offset pointer is offset to form new target text content, determining the updated target text content as a current main key, and continuously executing the step of determining whether a vocabulary entry matched with the current main key exists in the administrative division tree. If the current address text is 'Jiangsu province', the text content currently pointed by the offset pointer is 'Jiangsu', namely the current main key is 'Jiangsu', because the entry corresponding to the current main key has sub-entries, such as 'Jiangsu' and 'Jiangning' and the like, the description is not traversed, therefore, the identification equipment controls the offset pointer to deflect towards the right by one word and points to 'Su', the 'Jiangsu' is obtained by combining the 'Jiangsu' and the 'Su', the 'Jiangsu' is determined to be the current main key by updating, then the 'Jiangsu' is used as the current main key, whether the entry corresponding to the 'Jiangsu' exists in the entry library is determined, and if the entry exists, whether the current entry has the sub-entry is further determined.

Step a14, if the sub-entry does not exist, determining whether the entry is an administrative area name entry, wherein if yes, determining that the entry is a first target entry.

In an embodiment, when it is determined that there is no sub-entry for the entry corresponding to the current primary key, if the address text is "shenzhen city south mountain area", and the current primary key is "shenzhen city", only one entry is matched in the administrative division tree, and there is no sub-entry, it is determined whether the entry type of the entry is an administrative region name entry (item).

It will be appreciated that even if the offset pointer is controlled to continue to offset, the resulting primary key: "Shenzhen City south" can not be matched with the corresponding entry, that is, none of the nodes in the administrative region tree is called "Shenzhen City south", so that under the condition that it is determined that no sub-entry exists, no offset is needed, and whether an administrative region name entry exists is further determined. And if the term corresponding to the current primary key is determined to be the name term of the administrative area, determining the domain name term of the administrative area as a first target term.

It should be noted that, because there are many aliases for some special addresses, a user may easily abbreviate to an uncommon alias when inputting for reasons of convenience or habit, which may result in inaccurate identification or even incorrect identification.

These addresses often have the laws of longer address name, more self-governing distinguished names, more boundary area aliases, western area aliases, and the like, so in order to improve the recognition intelligence and the recognition positive rate, in an embodiment, an administrative area alias is also added to the administrative division tree, for example: the 'inner Mongolia' alias is added in the inner Mongolia autonomous region, the 'black dragon' alias is added in Heilongjiang province, the 'Ningxia' alias is added in Ningxia Hui autonomous region, and the like.

Therefore, when the sub-entry exists in the current main key but no entry after offset is matched with the sub-entry, whether the current main key has an alias or not is determined, if yes, the alias is determined as a first target entry, if the address text is ' inner Mongolian heading city ', the current main key is ' inner Mongolian ', and the inner Mongolian ' has the sub-entry, but the main key ' inner Mongolian ' obtained after offset pointer offset is controlled but no entry is matched with the main key, at this time, whether the previous main key ' inner Mongolian ' is the alias or not is judged, so that the ' inner Mongolian ' is determined as the first target entry, and the offset pointer is controlled to restore, namely, the position of the previous main key is returned, and when the next target area entry ' heading city ' is determined, the identification can be started from the ' package '.

Step a2, controlling the offset pointer to offset based on a preset text unit, so as to determine a second target entry corresponding to the address text based on the offset pointer after offset, and determining a second target area item matched with the second target entry in the child node corresponding to the administrative division tree for the first target area item.

In an embodiment, after the first target area item is determined, the offset pointer is controlled to offset toward the text direction of the address text according to a preset text unit, where the text direction may be from left to right, or from right to left, or from top to bottom, or from bottom to top, and in this embodiment, the text direction is preferably from left to right; in addition, the preset text unit refers to a displacement of the offset pointer offset, and in this embodiment, the preset text unit is preferably a word, such as a displacement of the offset pointer offset by one word in the right direction of the address text controlled by the recognition device.

Therefore, the second target entry corresponding to the address text is determined according to the offset pointer after the offset, and the determination process of the second target entry is similar to that of the first target entry, which is not described herein again.

It should be noted that, in the process of determining the second target entry, matching and searching may be performed from a root node of the administrative division tree, in this embodiment, it is preferable to determine the second target entry by matching in a child node corresponding to the administrative division tree of the first target area item, and determine the second target area item matched with the second target entry under the child node corresponding to the first target area item.

If the address text is "Guangdong Shenzhen City", and the current primary key is "Guangdong province", then after the first target region item "Guangdong province" of the province-level administrative region is matched in the administrative division tree, the offset pointer is controlled to be offset to the right, and if the current primary key pointed by the offset pointer after offset (after three offsets) is "Shenzhen City", then the second target region item is determined under the child node corresponding to the first region item, that is, the "Shenzhen City" is searched in the child node under the jurisdiction of "Guangdong province", and the searching efficiency is effectively improved without searching in all nodes of the administrative division tree.

In addition, in an embodiment, the second target region item may also be searched in all nodes of the administrative division tree, and a situation that the region level of the first target region item is smaller than that of the second target region item and cannot be found is avoided, for example, "guangdong province in shenzhen city", when the second target region item corresponding to "guangdong province" is determined, since the child node under the first target region item corresponding to "shenzhen city" does not have guangdong province, at this time, the second target region item may be searched in all nodes of the administrative division tree.

It can be understood that, if the target area item further includes a third target area item, the determination process is similar to the determination process of the second target area item, and is not described herein again.

And b, determining the optimal area item of the address text in the administrative division tree based on the target area item and the area level of the target area item.

In one embodiment, after the target area item is determined, the area level of the target area item is further determined, such as "guangzhou city," and the corresponding area level is a city level. And finally, determining the optimal area item of the address text in the administrative division tree based on the target area item and the area level of the target area item.

Specifically, in an embodiment, step b includes:

step b1, based on the area code of the target area item, determining the area level of the target area item and determining the matching type of the target area item;

in an embodiment, after determining the target area item, the identification device determines an area code of the target area item according to a node position of the target area item in the administrative division tree, and then determines an area level of the target area item, such as "guangzhou city", whose administrative code is 440100, and a corresponding area level is a city level, based on the area code of the target area item, and further determines a matching type of the target area item, where the matching type includes full name matching and non-full name matching, such as "inner Mongolia autonomous region" being full name matching and "inner Mongolia" being non-full name matching.

Step b2, based on the target area item, the area level and the matching type, determining the best area item of the address text in the administrative division tree.

In one embodiment, the identification device determines the best area item of the address text at each area level in the administrative division tree based on the target area item, the area level of the target area item, and the matching type of the target area item.

Specifically, in one embodiment, step b2 includes:

step b21, determining whether the recording area corresponding to the address text has recorded the best area item corresponding to the area level;

in one embodiment, a recording area curDivison corresponding to the address text is set in the identification device, wherein the recording area refers to a data structure for storing administrative region matching items. After the address text is deeply traversed and the entries are matched, due to the fact that the situation that the entries of the names of multiple administrative areas exist because of the fact that the names of the duplicate names, the aliases or the address text input by the user are wrong exists, all the entries of the entry are traversed, the best area item is found through the full-name matching priority and the area level priority, and the matching is preferentially selected when the area level priority is smaller.

Therefore, it is first determined whether the recording area corresponding to the address text has recorded the optimal area item corresponding to the area level of the target area item.

Step b22, if yes, based on the matching type, updating the best area item corresponding to the area level;

in an embodiment, if the optimal area item corresponding to the area level of the target area item is already recorded in the recording area, the optimal area item is updated based on the matching type of the target area item.

In specific implementation, if the matching type of the current target area item is full name matching, it is indicated that the credibility is high, so that the current target area item is used as the optimal area item of the corresponding area level, that is, the recorded optimal area item is replaced by the current target area item; if the matching type of the current target area item is non-full name matching, determining whether the recorded matching attribute of the optimal area item is upward matching, and if so, taking the current target area item as the optimal area item of the corresponding area level; if not, updating is not performed, because the upward matching is not true matching after all, and the priority is lower than that of the target area item obtained by true matching, therefore, if the recorded optimal area item in the recording area is obtained by upward matching, the optimal area item is replaced by the target area item, and updating of the optimal area item is completed, wherein the upward matching refers to matching of a higher-level area item through a lower-level area item.

Further, in an embodiment, if the matching attribute of the recorded optimal area item is not an upward match, further determining whether the recorded optimal area item is a full name match, if so, comparing the priority levels of the area levels of the recorded optimal area item and the full name match, and if the priority level of the area level of the current target area item is less than the priority level of the recorded optimal area item, updating the target area item as the optimal area item; if the priority of the area level of the current target area item is more than or equal to the priority of the recorded optimal area item, the updating is not carried out. And if the recorded optimal area item is not matched upwards or is not matched with the full name, namely alias matching, updating the target area item as the optimal area item.

B23, if the area level is below province level and the record area does not record the upper area item of the target area item, based on the upper code of the target area item, matching the upper area item of the target area item upwards in the administrative division tree;

in an embodiment, if the region level of the current target region item is below the province level, and the full current target region item upper level region item is not recorded in the recording region, if the current target region item is "zitian region", and the upper level region item "shenzhen city" of "zitian region" is not recorded in the recording region, the upper level region item of the target region item is matched upwards in the administrative planning tree according to the upper level code of the target region item, and the missing upper level region item is complemented.

It should be noted that the upward matching can be matched all the way up to province level, such as the above example, after being matched up to "Shenzhen City", if its superior region entry is not recorded in the recording region, then the upward matching is continued to "Guangdong province".

And b24, updating and recording the optimal area item of each area level in the recording area based on the preset matching rule, the target area item and the upper area item.

In one embodiment, the optimal area item of each area level is updated and recorded according to a preset matching rule, the target area item and the upper-level area item of the target area item.

In specific implementation, the corresponding relationship between the regional level and the priority level is set as shown in the following table, the smaller the priority number is, the higher the administrative level is, wherein it needs to be noted that the direct municipality and the provincial level are different in priority level; the provincial, prefectural and prefectural city and prefecture city are leveled, but the priority levels are different; the level 4 address of a specific platform is the same level as the level of the street, village and town, but the priority level is different.

Figure BDA0002532688260000161

The preset matching rules are as follows:

1. province: if the area level of the current target area item is provincial level, directly recording the target area item as the best area item acceptableItem of the provincial level in the recording area curDivision;

2. ProvinceLevelCity: if the area level of the current target area item is the level of the direct prefecture city, recording the target area item as the best area item of the province level in the recording area, and determining the first child node as the best area item of the city level, such as Beijing City, because the current prefecture city does not have the province level, recording the Beijing city as the best area item of the province level, and determining the child node 'the city prefecture' as the best area item of the city level;

3. city: if the regional level of the current target regional item is the city level, checking whether the city level in the recording area has a value, and if so, updating the optimal regional item of the city level into the target regional item; then checking whether the province level of the recording area has a value, if not, taking a father node in the target area item, namely a superior area item of the target area item, upwards matching the superior area item, and updating the province level of the recording area;

4. cityleveldistict: if the area level of the current target area item is a provincial, prefectural and county-level city, judging whether an area item with a lower level than the provincial, prefectural and county-level city, such as towns and the like, is identified before, judging whether the matching type of the area item is full name matching, and if so, not updating the optimal area item with the area level of the provincial, prefectural and county-level city; if not, updating the target area item to the city level and the district level of the recording district, then checking whether the province level in the recording district has a value, if not, taking the upper-level area item of the target area item, and updating the upper-level area item to the province level in an upward matching manner;

5. the District: if the regional level of the current target regional item is a district level and a county level, determining whether the district level and the county level in the recording district have values, if not, updating the current target regional item to the district level and the county level, taking a higher-level regional item of the target regional item, and upwards matching the higher-level regional item to the province level;

6. street, platformml 4: if the area level of the current target area item is the street level, determining the target area item as the optimal area item of the street level; and then determining whether the zone level in the recording zone has a value, and if not, taking the upper-level zone item of the target zone item to be matched upwards to the provincial level.

7. Town: if the area level of the current target area item is the village and town level, determining the target area item as the best area item of the village and town level; and then determining whether the county level and the district level in the recording district have values, and if not, taking the upper-level regional item of the target regional item to be matched upwards to the provincial level.

8. Village: similar to the township level determination rule, it should be noted that, in the upward matching process, the region is matched at most upward, because the more the upward matching number of levels, the higher the probability of matching error, and to reduce the recognition error rate, the region level is matched at most upward.

Step S20, determining the superior-inferior relation of the optimal area item, and determining whether the optimal area item is correct based on the superior-inferior relation.

In this embodiment, after the optimal area items of each area level are determined, the relationship between the upper level and the lower level of the optimal area items is further determined, and it can be understood that, because the optimal area items are continuously updated, even if an upward matching action is performed in the process of determining the optimal area items, it is difficult to ensure that the relationship between the upper level and the lower level of the optimal area items of each area level is correct.

Therefore, after the optimal area items of each area level are determined, the superior-inferior relationship of each optimal area item is determined, specifically, the administrative code of the optimal area item of each area level and the administrative codes of the superior and inferior levels thereof are determined, the superior administrative code of the optimal area item of the current area level is compared with the administrative code of the optimal area item of the previous area level in the recording area, whether the superior administrative code and the administrative code are consistent is determined, and if the superior administrative code and the administrative code are consistent, the optimal area item of the current area level and the optimal area item of the previous area level are determined to be the correct superior-inferior relationship; and if not, determining that the optimal area item of the current area level and the optimal area item of the previous area level are in wrong upper-lower level relation.

And (3) aiming at the wrong upper and lower level relations, upwards correcting and updating according to a full name matching rule, namely determining which optimal area item is matched in a full name mode, and upwards matching and correcting the optimal area item in the upper level area level by taking the optimal area item as a starting point.

Further, when the superior-inferior relation of the optimal area item is determined, the matching type in the optimal area item in the recording area is further determined to be the number of full-name matches, if the number of full-name matches is smaller than the preset number, for example, 2, the probability that the recognition result is wrong is high, and at this time, the provincial city is set to be empty.

And step S30, if the answer is correct, outputting the identification result corresponding to the address text based on the optimal area item.

In this embodiment, if it is determined that the superior-inferior relation of the optimal area item is correct, the corresponding recognition result of the address text is output, and specifically, the standardized result is output according to a preset format, such as an order format of province, city, and district.

If the address text is detected, identifying an optimal area item of the address text in the administrative division tree based on a pre-constructed administrative division tree and an offset pointer corresponding to the address text; determining the superior-inferior relation of the optimal area item, and determining whether the optimal area item is correct or not based on the superior-inferior relation; and if the address text is correct, outputting the identification result corresponding to the address text based on the optimal area item. In the process of identifying the address text, the optimal area item in the address text is correctly identified through the pre-constructed administrative division tree and the offset pointer, and then the optimal area item is checked, so that the identification result is output only when the optimal area item is correct, the positive judgment rate of address identification is improved, and the intelligent identification of the address is realized.

Further, based on the first embodiment of the address recognition method of the present invention, a second embodiment of the address recognition method of the present invention is proposed.

The second embodiment of the address recognition method is different from the first embodiment of the address recognition method in that, after the steps of determining that the first text content pointed by the offset pointer in the address text is the current primary key if the address text is detected, and determining whether a vocabulary entry matching the current primary key exists in the administrative division tree, the address recognition method further includes:

step a15, if the vocabulary entry does not exist, determining whether the jumping times of the offset pointer do not exceed the preset times;

step a16, if yes, controlling the offset pointer to jump to the text direction of the address text by a preset text unit, and accumulating and updating the jumping times of the offset pointer;

step a17, updating the current primary key based on the text content pointed by the offset pointer after jumping, and executing the step of determining whether the entry matched with the current primary key exists in the administrative division tree.

According to the method and the device, aiming at the condition that the dirty data is mixed in the address text and cannot be identified, the influence of the dirty data is avoided through the jump of the offset pointer, the whole identification process cannot be interrupted, and the intelligent identification of the address is realized.

The respective steps will be described in detail below:

step a15, if the entry does not exist, determining whether the jumping times of the offset pointer do not exceed the preset times.

In this embodiment, if it is determined that there is no entry matching the current primary key in the administrative division tree, it is determined whether the number of hops of the offset pointer does not exceed a preset number, where the preset number may be determined according to the historical recognition duration of the address text, and it can be understood that if the number of hops of the offset pointer is not limited, the offset pointer will jump continuously for an incorrect address text, such as "an incorrect address text," and a useless recognition duration is increased. Therefore, it is determined whether the current jump number of the offset pointer does not exceed the preset number, and in the implementation, the preset number may be 3.

And a16, if yes, controlling the offset pointer to jump to the text direction of the address text by a preset text unit, and accumulatively updating the jumping times of the offset pointer.

In this embodiment, if the number of times of jumping of the offset pointer does not exceed the preset number of times, the recognition device controls the offset pointer to jump to a preset text unit in a text direction of the address text, where the text direction may be from left to right, or from right to left, or from top to bottom, or from bottom to top, and in this embodiment, the text direction is preferably from left to right; in addition, the preset text unit refers to a displacement of offset pointer offset, in this embodiment, the preset text unit is preferably a word, and if the recognition device controls the offset pointer to offset by one word in the right direction of the address text, that is, the text content pointed by the current offset pointer does not match any entry, and the current main key is dirty data, the current text content is skipped, and the number of hops of the offset pointer is updated cumulatively.

Step a17, updating the current primary key based on the text content pointed by the offset pointer after jumping, and executing the step of determining whether the entry matched with the current primary key exists in the administrative division tree.

In this embodiment, when it is determined that the text content pointed by the offset pointer does not have any entry matching therewith, that is, it is determined that the text content pointed at present is dirty data, the dirty data is skipped, the text content pointed at after the offset pointer is skipped is determined as the current primary key, and then the step of determining whether the entry matching the current primary key exists in the administrative division tree is executed in a loop until the number of hops of the offset pointer exceeds a preset number, or it is determined that the entry matching the current primary key exists in the administrative division tree.

If the offset pointer points to the name, because the administrative division tree does not have a vocabulary entry matched with the name when the offset pointer points to the name, whether the jumping number of the offset pointer exceeds a preset number is determined, if so, the offset pointer is controlled to jump over the name and point to the wide, and matching is performed from the wide, so that the whole recognition process cannot be stopped or interrupted due to dirty data.

Further, in an embodiment, if the current primary key is determined based on the text content pointed by the offset pointer after offset and the original target text content after offset pointer offset, after determining that there is no entry matching with the current primary key, it is further determined whether the primary key corresponding to the original target text content has a unique administrative area name entry, and if so, the corresponding target area item is determined based on the administrative area name entry; if not, the current primary key is determined to be dirty data discarded.

For example, in the Tianheyuan of Guangzhou city, Guangdong, when the Guangdong is recognized, the sub-entries are matched with the Guangdong, so that the offset pointer is continuously offset to determine that the current main key is the Guangdong name, and at the moment, because the Guangdong name cannot be matched with the Guangdong name, whether the Guangdong corresponds to the unique administrative area name entry needs to be determined, and because the Guangdong has only one administrative area name entry: "Guangdong province", therefore, it can be determined that "Guangdong province" is the target area item corresponding to "Guangdong";

if the address text is the Tianhe district of Guangzhou city, when identifying the Guangdhou, the sub-entries are matched with the address text, so that the offset pointer is continuously offset to determine that the current main key is the Guangzhou, at the moment, because the Guangzhou cannot be matched with the wide name, whether the Guangzhou corresponds to the unique administrative region name entry needs to be determined, because the Guangzhou has the administrative region name entries such as Guangdhou and Guangxi, namely the administrative region name entry is not unique, whether the Guangzhou should correspond to the Guangdhou or the Guangxi province cannot be determined, and at the moment, the Guangzhou is determined to be dirty data to be discarded.

In the identification process of the address text, if dirty data is encountered, and the entry cannot be matched, the offset pointer is controlled to skip the dirty data, so that the whole identification process is continued, the whole identification process cannot be stopped or blocked due to the dirty data, and the intelligent identification of the address text is realized.

Further, based on the first and second embodiments of the address recognition method of the present invention, a third embodiment of the address recognition method of the present invention is proposed.

The third embodiment of the address recognition method is different from the first and second embodiments of the address recognition method in that, if an address text is detected, the address text is recognized before the step of recognizing the best area item of the administrative division tree based on a pre-constructed administrative division tree and an offset pointer corresponding to the address text, and the address recognition method further includes:

step c, if the text to be recognized is detected, recognizing non-administrative region information in the text to be recognized based on a preset rule;

step d, separating out non-administrative region information in the text to be recognized to obtain the address text;

if the address text is correct, outputting the recognition result corresponding to the address text based on the optimal area item, including:

and e, if the address text is correct, outputting an identification result corresponding to the address text based on the optimal area item and the non-administrative area information.

In the embodiment, before the address text is recognized, the information of the non-administrative areas, such as 1688 and the like, is separated, so that the influence of the non-administrative area information is avoided in the process of recognizing the address text, the whole recognition process is smoother, and finally, when the recognition result is output, the non-administrative area information is output together with the optimal area items of each area level, so that the recognition accuracy is improved.

The respective steps will be described in detail below:

and c, if the text to be recognized is detected, recognizing the non-administrative region information in the text to be recognized based on a preset rule.

In the embodiment, if the recognition device detects a text to be recognized, the non-administrative region information in the text to be recognized is distinguished through a preset rule, wherein the preset rule comprises a filtering rule and a matching rule; and removing special characters such as "- @ &" in the text to be recognized. And then, matching the non-administrative region information in the text to be recognized according to the matching rules, for example, constructing a first regular expression to match the building number, constructing a second regular expression to match the road information and the like.

And d, separating the non-administrative region information in the text to be recognized to obtain the address text.

In this embodiment, after obtaining the non-administrative area information, the non-administrative area information is separated from the text to be recognized, so as to obtain an address text only including the administrative area, which is convenient for recognizing the following best area item.

Further, step S30 includes:

and e, if the address text is correct, outputting an identification result corresponding to the address text based on the optimal area item and the non-administrative area information.

In the present embodiment, since the non-administrative area information is also a part of the address information, the non-administrative area information is also output as a part of the recognition result when the recognition result is finally output. That is, in this embodiment, the non-administrative information is separated first, so that the identification device can identify the address text conveniently, and after the correct optimal area item is identified, the non-administrative information is combined with the optimal area item, so as to output a complete identification result.

Examples are as follows:

inputting a text to be recognized: text "123435 asdfasg; (ii) a (ii) a "[ 18- -3-502 ] Jiangsu Thai Xinhua city Changrongzhen (Changrongzhen vicinity)";

and (3) recognition results:

Address{

320000000000, Jiangsu province,

cityId 321200000000, city 'thaizhou city',

districtId 321281000000, district 'xinghui',

streetld 321281119000, street changrongzhen,

townld 321281119000, townlong changrong,

villageId=null,village='null',

road='null',

roadNum='null',

buildingNum='18-3-502',

text ═ near the Rong town of New century city'

}

That is, "123435 asdfasg; (ii) a (ii) a And (c) removing characters such as (18-3-502) and non-administrative region information such as (near the Changrong town of the New century), identifying the optimal region item of the Changrong town of the Jiangsu Taizhou, and finally combining and outputting the optimal region item of each region level and the non-administrative region information to finally obtain a complete and correct identification result.

In the process of identifying the text to be identified, the non-administrative region information is firstly separated out to facilitate identification of the optimal region item, and then the non-administrative region information and the optimal region item are combined and output when the identification result is output, so that a complete and correct identification result is output, the positive judgment rate of address identification is improved, and intelligent identification of the address is realized.

The invention also provides an address identification device. The address recognition device of the present invention includes:

the identification module is used for identifying the optimal area item of the address text in the administrative division tree based on a pre-constructed administrative division tree and an offset pointer corresponding to the address text if the address text is detected;

the determining module is used for determining the superior-inferior relation of the optimal area item and determining whether the optimal area item is correct or not based on the superior-inferior relation;

and the output module is used for outputting the identification result corresponding to the address text based on the optimal area item if the address text is correct.

Preferably, the identification module is further configured to:

if the address text is detected, determining a target entry corresponding to the address text based on the offset pointer, and determining a target area item matched with the address text in the administrative division tree based on the target entry;

and determining the optimal area item of the address text in the administrative division tree based on the target area item and the area level of the target area item.

Preferably, the target entry at least includes a first target entry and a second target entry, the target area item at least includes a first target area item and a second target area item, and the identification module is further configured to:

if the address text is detected, determining a first target entry corresponding to the address text based on the offset pointer, and determining a first target area item matched with the first target entry in the administrative division tree;

and controlling the offset pointer to offset based on a preset text unit, so as to determine a second target entry corresponding to the address text based on the offset pointer after offset, and determining a second target area item matched with the second target entry in the child node corresponding to the administrative division tree in the first target area item.

Preferably, the identification module is further configured to:

if the address text is detected, determining that the target text content pointed by the offset pointer in the address text is a current main key, and determining whether a vocabulary entry matched with the current main key exists in the administrative division tree;

if the entry exists, determining whether a sub-entry exists in the entry;

if the sub-entry exists, controlling the offset pointer to offset a preset text unit to the text direction of the address text, updating the current main key based on the text content pointed by the offset pointer after offset and the target text content, and continuing to execute the step of determining whether the entry matched with the current main key exists in the administrative division tree;

and if the sub-entry does not exist, determining whether the entry is an administrative area name entry, wherein if so, determining that the entry is a first target entry.

Preferably, the identification module is further configured to:

if the entry does not exist, determining whether the jumping times of the offset pointer do not exceed the preset times;

if yes, controlling the offset pointer to jump to a preset text unit in the text direction of the address text, and accumulating and updating the jumping times of the offset pointer;

updating the current primary key based on the text content pointed by the offset pointer after the offset pointer jumps, and executing the step of determining whether the entry matched with the current primary key exists in the administrative division tree.

Preferably, the identification module is further configured to:

determining the region level of the target region item and determining the matching type of the target region item based on the region code of the target region item;

and determining the optimal area item of the address text in the administrative division tree based on the target area item, the area level and the matching type.

Preferably, the identification module is further configured to:

determining whether the optimal area item corresponding to the area level is recorded in a recording area corresponding to the address text;

if so, updating the optimal area item corresponding to the area level based on the matching type;

if the regional level is below provincial level and the upper level regional item of the target regional item is not recorded in the recording area, the upper level regional item of the target regional item is matched upwards in the administrative division tree based on the upper level code of the target regional item;

and updating and recording the optimal area item of each area level in the recording area based on a preset matching rule, the target area item and the superior area item.

Preferably, the address recognition apparatus further includes a separation module, and the separation module is configured to:

if the text to be recognized is detected, recognizing non-administrative region information in the text to be recognized based on a preset rule;

and separating out the non-administrative region information in the text to be recognized to obtain the address text.

The output module is further configured to:

and if the address text is correct, outputting an identification result corresponding to the address text based on the optimal region item and the non-administrative region information.

The invention also provides a computer readable storage medium.

The computer-readable storage medium of the present invention has stored thereon an address recognition program which, when executed by a processor, implements the steps of the address recognition method as described above.

The method implemented when the address recognition program running on the processor is executed may refer to each embodiment of the address recognition method of the present invention, and details are not described here.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

21页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于深度学习的问询信息识别方法、装置及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!