Address text processing method, device and equipment

文档序号：699311 发布日期：2021-05-04 浏览：6次中文

阅读说明：本技术 地址文本处理方法、装置及设备 (Address text processing method, device and equipment ) 是由刘楚郑华飞谢朋峻李林琳司罗于 2019-10-30 设计创作，主要内容包括：本发明实施例提供了一种地址文本处理方法、装置及设备。其中,地址文本处理装置包括编码器、解码器、指示器和生成器,编码器分别与解码器和指示器通信连接,解码器、指示器和生成器三者相互通信连接；编码器用于对地址数码元素中的各地址词进行编码处理；解码器用于对编码器编码处理后输出的编码数据或生成器输出的标准词进行解码处理；指示器用于根据编码数据和解码器解码处理后输出的解码数据,调整地址数码元素中各地址词的权重；生成器用于至少根据解码器输出的解码数据和指示器调整后的权重,生成与地址词对应的标准词,以根据标准词生成与地址数码元素对应的标准地址数码数据。该装置处理效果更好。(The embodiment of the invention provides an address text processing method, device and equipment. The address text processing device comprises an encoder, a decoder, an indicator and a generator, wherein the encoder is respectively in communication connection with the decoder and the indicator, and the decoder, the indicator and the generator are in communication connection with each other; the coder is used for coding each address word in the address digital element; the decoder is used for decoding the encoded data output by the encoder after encoding processing or the standard words output by the generator; the indicator is used for adjusting the weight of each address word in the address digital elements according to the coded data and the decoded data output by the decoder after decoding; the generator is used for generating a standard word corresponding to the address word at least according to the decoded data output by the decoder and the weight adjusted by the indicator so as to generate standard address number data corresponding to the address number elements according to the standard word. The device has better treatment effect.)

1. An address text processing device, which is characterized by comprising an encoder, a decoder, an indicator and a generator, wherein the encoder is respectively connected with the decoder and the indicator in a communication way, and the decoder, the indicator and the generator are mutually connected in a communication way;

the encoder is used for encoding each address word in the address digital elements;

the decoder is used for decoding the encoded data output after the encoding processing of the encoder or the standard words output by the generator;

the indicator is used for adjusting the weight of each address word in the address digital element according to the coded data and the decoded data output by the decoder after decoding;

the generator is used for generating a standard word corresponding to the address word at least according to the decoding data output by the decoder and the weight adjusted by the indicator so as to generate standard address digital data corresponding to the address digital pixel according to the standard word.

2. The apparatus of claim 1, further comprising a text processor communicatively coupled to the encoder for obtaining address number elements in the address text to be processed.

3. The apparatus of claim 1, wherein the encoder is a bidirectional long-and-short memory network layer, or wherein the decoder is a unidirectional long-and-short memory network layer.

4. The apparatus of claim 1, wherein the indicator is configured to perform a weighted average operation on the decoded data and the encoded data, determine a weight of each address word according to a result of the weighted average operation, and obtain a context encoding vector of the address digital element according to the weight.

5. The apparatus of claim 4,

the generator is used for carrying out weighted summation operation on the decoding data and the context coding vector, and obtaining dictionary distribution probability of the address code elements according to the result of the weighted summation operation; carrying out weighted summation operation on the context coding vector and the dictionary distribution probability, and obtaining a generation probability according to a result of the weighted summation operation; and generating a standard word corresponding to the current address word according to the generation probability, the context coding vector and the dictionary distribution probability.

6. The apparatus as claimed in claim 1, wherein the address text processing means receives an address text to be searched containing a digital code, and normalizes the address text to be searched through the encoder, the decoder, the pointer and the generator, and generates an address text containing a standard address digital code data according to a result of the normalization process.

7. The device of claim 1, wherein the address text processing device obtains address text to be processed in the electronic business card, normalizes the address to be processed through the encoder, the decoder, the indicator and the generator, and generates the address text containing standard address number data according to the normalization processing result.

8. An address text processing method, comprising:

acquiring an address text to be processed containing address number elements, and inputting the address text to be processed into the address text processing device of any one of claims 1-7;

and acquiring standard address digital data which is output by the address text processing device and corresponds to the address text to be processed.

9. The method of claim 8, wherein prior to said obtaining the pending address text containing address code elements, the method further comprises:

and carrying out standard address digital training on the address text processing device.

10. The method of claim 9, wherein the standard address code training of the address text processing device comprises:

acquiring address number elements in an address text for training, and generating an address number sample according to the address number elements;

and training the address text processing device by taking the address digital code sample as a training sample and meeting the address digital code output standard as a training target so as to obtain standard address digital code data corresponding to the input address digital code element by using the trained address text processing device.

11. The method of claim 10, wherein the standard address code training of the address text processing device comprises:

inputting the address digital code sample into the address text processing device, and generating standard address digital code data corresponding to the address digital code sample through the encoder, the decoder, the indicator and the generator;

determining the difference between the standard address digital data and the address digital output standard, and adjusting the training parameters of the address text processing device according to the difference;

and continuing to train the address text processing device by using the adjusted training parameters until the training termination condition is met.

12. The method of claim 11, wherein the standard address code training of the address text processing device comprises:

acquiring coded data output after the coder carries out coding processing on the address digital code sample;

inputting the coded data or the previous standard word output by the address text processing device into the decoder to obtain decoded data;

carrying out weighted average operation on the decoded data and the encoded data through the indicator, determining the weight of each address word according to the result of the weighted average operation, and obtaining a context encoding vector of the address digital sample according to the weight;

carrying out weighted summation operation on the decoding data and the context coding vector through the generator, and obtaining dictionary distribution probability of the address digital code sample according to the result of the weighted summation operation;

carrying out weighted summation operation on the context coding vector and the dictionary distribution probability through the generator, and obtaining a generation probability according to the result of the weighted summation operation;

generating a standard word corresponding to the current address word according to the generation probability, the context coding vector and the dictionary distribution probability through the generator;

updating the current address word, and returning to the step of inputting the standard word corresponding to the address word before the obtained current address word into the decoder to continue execution until all the address words generate the corresponding standard word;

and generating standard address digital data corresponding to the address digital samples according to the standard words corresponding to all the address words.

13. The method of claim 10, wherein the standard address code training of the address text processing device comprises:

acquiring address elements in the address text for training according to address element marking information in the address text for training;

segmenting address number elements from the address elements;

and generating the address code sample according to the address code element.

14. The method according to claim 8, wherein the address text to be processed containing address number elements is an address text to be searched containing numbers, and the obtaining the address text to be processed containing address number elements and inputting the address text to be processed into the address text processing device comprises:

inputting the address text to be searched containing the digital codes into the address text processing device, normalizing the address text to be searched through the encoder, the decoder, the indicator and the generator, and outputting standard address digital data corresponding to the address text to be searched containing the digital codes as a normalization processing result;

the method further comprises the following steps:

and generating an address text containing standard address number data according to the normalization processing result.

15. The method according to claim 8, wherein the address text to be processed containing address number elements is an address text to be processed in an electronic business card, and the obtaining the address text to be processed containing address number elements and inputting the address text to be processed into the address text processing device comprises:

inputting the address text to be processed in the electronic business card into the address text processing device, normalizing the address to be processed through the encoder, the decoder, the indicator and the generator, and outputting standard address digital data corresponding to the address text to be processed in the electronic business card as a normalization processing result;

the method further comprises the following steps:

and generating an address text containing standard address number data according to the normalization processing result.

16. An address text processing apparatus, characterized by comprising:

a first obtaining module, configured to obtain a to-be-processed address text containing address number elements, and input the to-be-processed address file into the address text processing apparatus according to any one of claims 1 to 7;

and the output module is used for acquiring standard address digital data which is output by the address text processing device and corresponds to the address text to be processed.

17. A computer storage medium having stored thereon a computer program which, when executed by a processor, implements the address text processing method according to any one of claims 8 to 15.

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to an address text processing method, device and equipment.

Background

In the prior art, address texts are required to be used in the fields of address navigation, mailing express delivery and the like, but the address texts used in daily life have the problems that the address expressions are not standard and uniform, and the same address has multiple expression modes. For example, the Chinese characters and Arabic characters are mixed, the expression is not standardized, and the like. For example, there may be a plurality of expressions "room 3-506 in a cell 10", "three units 506 in a cell 10", or "cells 10-3-506" in a cell 10. These problems create difficulties for address text parsing and standardization.

The existing address text standardization process generally adopts manual data observation, and uses a regular expression template to convert address texts with different expression modes into address texts in a standard mode according to the observation result. Such a standardization method is labor-intensive, has limited expression formats that can be covered, and cannot reliably convert all expression formats to standardized expression formats.

Disclosure of Invention

In view of the above, embodiments of the present invention provide an address text processing scheme to solve some or all of the above problems.

According to a first aspect of the embodiments of the present invention, there is provided an address text processing apparatus, comprising an encoder, a decoder, an indicator and a generator, wherein the encoder is communicatively connected to the decoder and the indicator, respectively, and the decoder, the indicator and the generator are communicatively connected to each other; the encoder is used for encoding each address word in the address digital elements; the decoder is used for decoding the encoded data output after the encoding processing of the encoder or the standard words output by the generator; the indicator is used for adjusting the weight of each address word in the address digital element according to the coded data and the decoded data output by the decoder after decoding; the generator is used for generating a standard word corresponding to the address word at least according to the decoding data output by the decoder and the weight adjusted by the indicator so as to generate standard address digital data corresponding to the address digital pixel according to the standard word.

According to a second aspect of the embodiments of the present invention, there is provided an address text processing method, including: acquiring an address text to be processed containing address digital elements, and inputting the address text to be processed into the address text processing device; and acquiring standard address digital data which is output by the address text processing device and corresponds to the address text to be processed.

According to the address text processing scheme provided by the embodiment of the invention, the input address digital code elements can be automatically converted through the encoder, the decoder, the indicator and the generation, so that the address digital code elements are accurately converted into standard address digital code data meeting the address digital code output standard, the address text standardization is realized, the labor is saved, and various different expression modes can be comprehensively covered.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present invention, and it is also possible for a person skilled in the art to obtain other drawings based on the drawings.

Fig. 1 is a schematic structural diagram of an address text processing apparatus according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating a second embodiment of an address text processing apparatus according to the present invention;

FIG. 3 is a flowchart illustrating steps of a method for processing address text according to a fourth embodiment of the present invention;

FIG. 4 is a flowchart illustrating steps of a fifth method for processing address text according to an embodiment of the present invention;

fig. 5 is a block diagram of an address text processing apparatus for generating a network model using an indicator according to a fifth embodiment of the present invention;

FIG. 6 is a flowchart illustrating steps of a method for processing address text according to a sixth embodiment of the present invention;

fig. 7 is a block diagram of an address text processing apparatus according to a seventh embodiment of the present invention;

fig. 8 is a block diagram of an address text processing apparatus according to an eighth embodiment of the present invention;

fig. 9 is a schematic structural diagram of an electronic device according to a ninth embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention shall fall within the scope of the protection of the embodiments of the present invention.

The following further describes specific implementation of the embodiments of the present invention with reference to the drawings.

Example one

Referring to fig. 1, a schematic structural diagram of an address text processing apparatus according to a first embodiment of the present invention is shown.

The address text processing device of the embodiment comprises an encoder 101, a decoder 103, an indicator 105 and a generator 107, wherein the encoder 101 is respectively connected with the decoder 103 and the indicator 105 in a communication way, and the decoder 103, the indicator 105 and the generator 107 are mutually connected in a communication way; the encoder 101 is configured to perform encoding processing on each address word in the address digital element; the decoder 103 is configured to decode encoded data output after the encoding processing by the encoder 101 or a standard word output by the generator 107; the indicator 105 is configured to adjust a weight of each address word in the address digital element according to the encoded data and decoded data output by the decoder 103 after decoding processing; the generator 107 is configured to generate a standard word corresponding to the address word according to at least the decoded data output by the decoder 103 and the weight adjusted by the indicator 105, so as to generate standard address number data corresponding to the address number element according to the standard word.

In this embodiment, the address text processing device is configured to process the address number elements in the address text to become standard address number data, so as to standardize the address text for subsequent use.

The address number element can be understood as an element containing numbers in the address text, such as a road number (e.g., 969, 212-214, etc.), a floor number (e.g., 7 units, three units, 12 floors, minus one floor, etc.), a floor number (e.g., 1 floor, 10 floors, B seats, etc.), a room number (e.g., 910 rooms, 1002 households, etc.), and the like.

The address words in the address number elements can be set and divided by those skilled in the art according to actual requirements. For example, "9", "6", "9", and "no" in No. 969 are address words. Alternatively, for example, the address words in the 969 number may also be "969" and "number", etc. The embodiment of the invention does not limit the setting and division rules of the address words in the address digital elements.

The encoder 101 is configured to perform encoding processing on address words in the address number elements and output encoded data. The encoding can be implemented by any suitable encoding method as required by those skilled in the art. For example, a neural network is used as the encoder 101 for encoding, and the encoder may be a unidirectional long-short term memory network (LSTM network), a bidirectional long-short term memory network, a multi-layer long-short term memory network, a Gated recursive Unit network (GRU), or the like, as long as encoding of the input address number element is achieved.

The encoded data output by the encoder 101 is transmitted to the decoder 103 and the indicator 105, respectively.

In the process of processing one address text, corresponding standard words are respectively output aiming at each address word in the address digital elements. The decoder 103 performs a decoding operation for each address word translation process.

In the process of one time of processing aiming at one address text, if the decoding operation carried out by the decoder 103 is the first decoding, the decoder 103 receives the coded data output by the encoder 101; if not, the decoder 103 receives the standard word output by the generator 107 (i.e. the standard word corresponding to the address word before the current address word), decodes the received encoded data or standard word, and outputs the decoded data. Decoding may be accomplished by any suitable decoding means by those skilled in the art. For example, a neural network is used as a decoder, and the decoder may be a unidirectional long-short time memory network (LSTM network), a bidirectional long-short time memory network, a multi-layer long-short time memory network, a Gated Recursive Unit (GRU), or the like.

The decoded data output by the decoder 103 is transmitted to the indicator 105 and the generator 107, respectively.

In one processing process of an address text, for each address word, the indicator 105 is configured to adjust a weight of each address word in the address digital elements according to the encoded data and the decoded data, so that an output standard word corresponds to a current address word. The weight adjustment may be implemented in any suitable way by a person skilled in the art, for example, the pointer may be implemented as a pointer network (pointer).

The adjusted weights of the indicator 105 are transmitted to the generator 107. The generator 107 is configured to generate a standard word corresponding to the current address word at least according to the decoded data and the weight adjusted by the indicator 105, so as to generate standard address digital data corresponding to the address data element according to each standard word.

In one processing for an address text, the indicator 105 selects a new address word in the address number element of interest, updating the current address word after each generation of a standard word.

According to the embodiment, the input address digital elements can be automatically converted through the encoder 101, the decoder 103, the indicator 105 and the generator 107, so that the address digital elements are accurately converted into standard address digital data meeting the address digital output standard, address text standardization is realized, manpower is saved, and various different expression modes can be comprehensively covered.

In addition, in a process of processing the address text once, the indicator 105 can adjust the weight of each address word according to the encoded data and the decoded data output by the decoder 103 each time, so that the concerned content can be selected from the address number elements each time the standard word is output, and the correlation between each standard word to be output and each address word in the address number elements is different, so as to reduce the influence of the irrelevant address word on the standard word, improve the accuracy of conversion, avoid the problem that the existing address translation model simply compresses the address text into a vector, searches for and generates the output text through the compressed vector in a preset dictionary, is easy to shake and can generate a digital code which does not exist in the address text, and improve the accuracy of address conversion.

Example two

Referring to fig. 2, a schematic structural diagram of an address text processing apparatus according to a second embodiment of the present invention is shown.

The address text processing apparatus of the present embodiment further optimizes the address text processing apparatus of the first embodiment.

As described in the first embodiment, the address text processing apparatus includes an encoder 101, a decoder 103, an indicator 105, and a generator 107.

In this embodiment, the apparatus further includes a text processor 109, where the text processor 109 is connected in communication with the encoder 101, and is configured to obtain address number elements in the address text to be processed.

In one case, the address text to be processed may be obtained from an address library containing a large amount of address text. When the address text is obtained from the address library, a part of the address text can be sampled according to different expression modes adopted by the address text for address element labeling, and address element labeling information is obtained.

The address element tagging information is used to indicate the type of the address element, such as a provincial type (prov), a city level (city), a district level (district), a road level (road), a road number (roadno), a detailed location (poi), a building number (houseno), a floor number (floor), a room number (roomno), and the like. And acquiring all address elements according to the address element marking information.

The skilled person can obtain the address element labeling information as required, for example, the address element labeling information in the address text may be manually labeled, or the address text may be labeled by a regular rule, or a neural network model is trained to label the address text.

In another case, the pending address text may be address text input by the user. After the address text to be processed is obtained, the address text can be labeled manually, or the address text can be labeled by a regular rule, or a neural network model is trained to label the address text, so as to obtain address element labeling information.

The text processor 109 is arranged to segment address number elements from the address elements.

In a feasible mode, after all address elements are obtained, the address elements are clustered, the address elements with the same address element labeling information are clustered into a class, and address number elements containing numbers, such as a road number address element, a floor number address element, a room number address element and the like, are cut from a clustering result. But not limited thereto, other ways of segmenting address digital elements are also applicable, such as a keyword extraction way.

In addition, the text processor 109 can conveniently and accurately obtain the address number elements from the address text to be processed, thereby improving the usability.

EXAMPLE III

The address text processing apparatus of the present embodiment further optimizes the address text processing apparatus of the first embodiment.

As described in the first embodiment, the address text processing apparatus includes an encoder 101, a decoder 103, an indicator 105, and a generator 107. The address text processing means may also include the text processor 109 or may not include the text processor 109.

In this embodiment, the indicator 105 is configured to perform a weighted average operation on the decoded data and the encoded data, determine a weight of each address word according to a result of the weighted average operation, and obtain a context encoding vector of the address digital element according to the weight.

In this case, the generator 107 is configured to perform a weighted summation operation on the decoded data and the context coding vector, and obtain a dictionary distribution probability of the address code element according to a result of the weighted summation operation; carrying out weighted summation operation on the context coding vector and the dictionary distribution probability, and obtaining a generation probability according to a result of the weighted summation operation; and generating a standard word corresponding to the current address word according to the generation probability, the context coding vector and the dictionary distribution probability.

In this way, the indicator 105 can select the concerned content from the address number elements when outputting the standard word each time, so that the correlation between each standard word to be output and each address word in the address number elements is different, the influence of the irrelevant address words on the standard words is reduced, the conversion accuracy is improved, and all context information in the address number elements is integrated in the generated context coding vector, so that the generator 107 can integrate the context information when generating the standard words, and the conversion accuracy is ensured.

In a first usage scenario, the address text processing device receives an address text to be searched containing digital codes, normalizes the address text to be searched through the encoder, the decoder, the indicator and the generator, and generates an address text containing standard address number data according to a normalization processing result.

For example, when a user uses a search function of a map or uses an address search function during address filling, the user inputs an address text to be searched containing numerical numbers, such as: eighteen Dong Lu of Zhongguan village. After the address text processing device acquires the address text to be searched, the address text to be searched is input into the encoder, is subjected to normalization processing through the encoder, the decoder, the indicator and the generator, and outputs an address text containing standard address number data, namely 'Zhongguan Donglou No. 18'. A search may subsequently be made based on the address text containing standard address number data.

Of course, the expression form of the standard address digital data may be determined as needed, and this embodiment does not limit this.

Therefore, the address text processing device can carry out normalization processing on the address number elements in the address text to be searched input by the user, so that the processed address text to be searched contains standard address number data, the searching accuracy rate is higher during subsequent address searching, and the searching effect is ensured.

In a second usage scenario, the address text processing device obtains an address text to be processed in the electronic business card, normalizes the address to be processed through the encoder, the decoder, the indicator and the generator, and generates the address text containing standard address number data according to a normalization processing result.

For example, when the user inputs an address in the electronic business card, for example, the address text processing device "guan dong lu in hai lake region" acquires an address text to be processed, and normalizes the address to be processed through the encoder, the decoder, the indicator and the generator, so that the address text to be processed is converted into an address file containing standard address number data, that is, "guan dong lu in hai lake region 18" (the expression form of the address text to be processed in the example is consistent with the set address number output standard). Therefore, the address expression forms in the electronic business cards can be unified, and the accuracy is better when the addresses of the electronic business cards are used subsequently. In addition, optionally, the address text processing means may also perform some completion processing on the address text to be processed. For example, "east road No. 18 of guancun, the area of hai lake" was complemented to "east road No. 18 of guancun, the area of hai lake, beijing.

Therefore, the address text processing device can carry out normalization processing on address digital elements in the address text to be processed in the electronic business card, so that the address expression form in the processed electronic business card is more uniform and accurate, and the reliability of the address in the electronic business card used subsequently is better. For example, when the address text in the electronic business card is used as the receiving address, the receiving address can be more accurate, and the adverse effect caused by the nonstandard expression form can be avoided.

Example four

Referring to fig. 3, a flowchart illustrating steps of an address text processing method according to a fourth embodiment of the present invention is shown.

The address text processing method of the embodiment comprises the following steps: acquiring an address text to be processed containing address digital elements, and inputting the address text to be processed into the address text processing device; and acquiring standard address digital data which is output by the address text processing device and corresponds to the address text to be processed.

The address number element may be understood as an element containing a number in an address text, for example, a road number (e.g., 969, 212-214, etc.), a floor number (e.g., 7-unit, three-unit, 12-floor, minus one floor, etc.), a building number (e.g., 1-floor, 10-floor, B-seat, etc.), a room number (e.g., 910-room, 1002-house, etc.), and the like.

The address text processing device can process the address text to be processed in the manner as described in the foregoing embodiments one to three, and therefore the processing procedure is not described herein again.

After the address text processing device processes the address text to be processed, standard address digital data which corresponds to the address digital elements in the address text to be processed and meets the digital address output standard is output. For example, if the number of building is expressed in the form of "@ th building", the digital address output standard converts "1 st building" and "10 th building" into the corresponding "1 st building" and "10 th building", and the like.

Therefore, the address texts to be processed adopting different expression modes can be normalized, so that the expression modes are uniform and standard, and the subsequent use of the address texts is facilitated. For example, in the process of searching addresses, the search result is more accurate.

Optionally, in a first case, the aforementioned address text to be processed containing address number elements is an address text to be searched containing numbers, the acquiring the address text to be processed containing address number elements, and inputting the address text to be processed into the address text processing apparatus includes: inputting the address text to be searched containing the digital codes into the address text processing device, normalizing the address text to be searched through the encoder, the decoder, the indicator and the generator, and outputting standard address digital data corresponding to the address text to be searched containing the digital codes as a normalization processing result.

In this case, the method further comprises: and generating an address text containing standard address number data according to the normalization processing result.

Optionally, in a second case, the acquiring the address text to be processed containing the address number element and inputting the address text to be processed into the address text processing apparatus, where the address text to be processed containing the address number element is the address text to be processed in the electronic business card, includes: inputting the address text to be processed in the electronic business card into the address text processing device, normalizing the address to be processed through the encoder, the decoder, the indicator and the generator, and outputting standard address digital data corresponding to the address text to be processed in the electronic business card as a normalization processing result;

in this case, the method further comprises: and generating an address text containing standard address number data according to the normalization processing result.

Optionally, before the obtaining the address text to be processed containing the address number element, the method further includes: and carrying out standard address digital training on the address text processing device.

For example, the address text processing apparatus may be implemented as a device deployed with a pointer generation network model. The standard address digital training of the address text processing device comprises the following steps:

step S102: and acquiring address number elements in the training address text, and generating an address number sample according to the address number elements.

As shown in the previous example, the address number elements included in different address texts are different, but all the address number elements include at least one address word. In a specific application, address words in the address number elements can be set and divided by those skilled in the art according to actual requirements.

For example, "9", "6", "9", and "no" in No. 969 are address words. Alternatively, for example, the address words in the 969 number may also be "969" and "number", etc. The embodiment of the invention does not limit the setting and division rules of the address words in the address digital elements.

In this embodiment, address code samples are generated based on address code elements. Since the address code samples are generated from the address code elements, the address code samples also include at least one address word.

Based on the address code elements, the corresponding address code output standard can be further preset for subsequent use. The address code output standard is used to indicate a standard expression manner of the address code element, and a person skilled in the art may preset any appropriate address code output standard according to needs, which is not limited in this embodiment.

For example, for the address number elements of the road number level, such as 12, thirteen, one hundred 08, etc., the format of the address number output standard set corresponding thereto may be "@ symbol", which indicates a single road number, i.e., final output is 12, 13, 108, etc. Based on this, label information can be set for the address number element, for example, the label information corresponding to "thirteen dashes" is number 13, so that the address number sample and the corresponding label information can be used as input together to train the indicator generation network model. Such as { number 12, number 12 }, { thirteen, number 13 }, { one hundred 08, number 108 }, and so on. The former item is the information corresponding to the address digital sample, and the latter item is the marking information corresponding to the address digital sample.

For example, for the address number elements of the floor number level, such as level 1, level three, level 17, etc., the format of the address number output standard set corresponding thereto may be "@ layer", indicating a single floor number. I.e., the final output is 1 layer, 3 layers, 17 layers, etc. Based on this, the label information can be set for the address code element, and the generated indicator generates the input information of the network model, such as {1 st floor, 1 st floor }, { three floors, 3 floors }, { 17 th floor, 17 th floor }, etc.

Of course, in other embodiments, a person skilled in the art may construct any suitable form of address code sample as needed, and this embodiment is not limited to this.

Step S104: and training the address text processing device by taking the address digital code sample as a training sample and meeting the address digital code output standard as a training target so as to obtain standard address digital code data corresponding to the input address digital code element by using the trained address text processing device.

The address text processing device may be a device deployed with a pointer generation network model. Training the address text processing means may be training a pointer generation network model.

The pointer generation network model is a generation network model with a pointer function, which can realize sequence (sequence) to sequence (sequence) conversion. The indicator generation network model of the present embodiment is a model in which an attention mechanism is introduced, and includes an indicating portion and a generating portion.

The pointer generation network model of the present embodiment can convert an input sequence of an indefinite length into an output sequence of an indefinite length, and thus can convert address number elements in an arbitrary expression into standard address number data satisfying a set address number output standard.

The generation part can convert the selected concerned content into standard words meeting the set address digital output standard according to the input address digital samples, and then determines the standard address digital data according to the standard words. Therefore, the address texts in different expression modes can be unified, and the standardization of the address texts is realized.

The instruction part of the indicator generating network model can select concerned content from the address digital sample when outputting the standard words each time due to the introduction of the attention mechanism, so that the correlation between each standard word to be output and each address word in the address digital sample is different, the influence of the irrelevant address word on the standard word is reduced, the conversion accuracy is improved, the problems that the existing address translation model is easy to shake and can generate codes which do not exist in the address text by searching the compressed vector in a preset dictionary after compressing the address text into one vector, and the address conversion accuracy is improved.

In a feasible mode, when the address text processing device is trained, each address word in the address digital code sample is input into the address text processing device, the address text processing device processes the address digital code sample, generates standard words corresponding to each address word, and combines all the standard words into standard address digital data. And then, adjusting training parameters in the address text processing device according to the standard address number data and the set address number output standard. And inputting another address code sample into the address text processing device to continue training until a training termination condition is met (such as reaching a set training time or reaching a set training threshold). Therefore, the trained address text processing device can learn better training parameters and accurately convert the input address number elements into corresponding standard address number data.

In addition, when each standard word is generated, the indicating part of the address text processing device can select different address words in the concerned address digital code sample, so that the generated standard word has higher correlation with the concerned address word, soft alignment between the standard word and the address word is realized, and the accuracy of conversion is further ensured.

For example, the address number sample is "twelve storied building", and the address text processing device selects the focused address word "ten" first in the conversion, so that the generation part has the largest influence on the output standard word by "ten" when outputting the standard word, and the influence on the output standard word by the address words "two" and "storied building" becomes relatively small, so that the generation part can accurately output the standard word corresponding to ten, that is: 1. similarly, after the standard word 1 is output, the instruction section selects the address word "two" of interest, so that the influence of "two" on the output standard word becomes large, and the influence of other address words becomes small, and therefore, the output standard word is 2.

Therefore, when the standard words are output each time, the address text processing device can focus attention on the related address words, interference of other unrelated address words on the output standard words is reduced, and accuracy is improved.

According to the method and the device, the address number elements in the address text are obtained, the address number samples are generated according to the address number elements, the address number samples are used as training samples to train the indicator generation network, and therefore the trained address text processing device can be sensitive to numbers and can accurately convert the address number elements in the address text. By the aid of the trained address text processing device, address digital elements in the address text can be automatically converted by the trained address text processing device, so that the address digital elements are accurately converted into standard address digital data meeting address digital output standards, address text standardization is achieved, manpower is saved, and various different expression modes can be comprehensively covered.

In addition, the indicating part of the address text processing device can select concerned contents from the address digital code sample when outputting the standard words each time, so that the relevance between each standard word to be output and each address word in the address digital code sample is different, the influence of the irrelevant address words on the standard words is reduced, the conversion accuracy is improved, the problems that the existing address translation model simply compresses the address text into a vector, the compressed vector is searched in a preset dictionary and generates an output text, the jitter is easy to generate, and the digital codes which do not exist in the address text are generated are solved, and the address conversion accuracy is improved.

The address text processing method of the present embodiment may be executed by any suitable electronic device having data processing capabilities, including but not limited to: servers, mobile terminals (such as tablet computers, mobile phones and the like), PCs and the like.

EXAMPLE five

Referring to fig. 4, a flowchart illustrating steps of an address text processing method according to a fifth embodiment of the present invention is shown.

The present embodiment takes a specific address text processing apparatus as an example, and describes an address text processing method of the present embodiment, which includes steps S102 and S104 described in the fourth embodiment.

As shown in fig. 5, a block diagram of an address text processing apparatus for generating a network model using a pointer is shown. In the present embodiment, the address text processing apparatus includes an encoder, a decoder, an indicator, and a generator.

Wherein the indicator belongs to the aforementioned indication part. The encoder, decoder and generator belong to the aforementioned generation section.

The encoder is used for encoding each address word in the address digital code samples (the address digital code samples comprise at least one address word) and outputting encoded data. Any appropriate neural network may be adopted as the encoder by those skilled in the art as required, for example, the encoder may be a unidirectional long-short term memory network (LSTM network), a bidirectional long-short term memory network, a multi-layer long-short term memory network, a Gated cyclic Unit network (GRU), or the like, as long as encoding of the input address digital samples is achieved.

The decoder is used for decoding the coded data or the standard word corresponding to the address word before the current address word and outputting the decoded data. Any suitable neural network may be used as the decoder by those skilled in the art, for example, the decoder may be a unidirectional long-short term memory network (LSTM network), a bidirectional long-short term memory network, a multi-layer long-short term memory network, a Gated Recursive Unit (GRU) network, or the like.

The indicator is used for adjusting the weight of each address word in the address digital code sample according to the coded data and the decoded data. The pointer may be implemented as a pointer network (pointer).

The generator is used for generating a standard word corresponding to the current address word at least according to the decoded data and the weight adjusted by the indicator.

After each standard word is generated, the current address word is updated, and the indicator selects a new address word in the address number sample of interest.

In step S104, the address text processing apparatus training the address text processing apparatus by using the address number sample as a training sample and using the address number output standard meeting the set address number output standard as a training target includes the following substeps:

substep S1041: inputting the address digital code sample into the address text processing device, and generating standard address digital code data corresponding to the address digital code sample through the encoder, the decoder, the indicator and the generator.

And inputting the address digital samples into the encoder as input data of the encoder, and simultaneously using the labeled data labeled according to the corresponding address digital output standard as supervision (namely a training target) of the address text processing device.

The address number elements are from eleven to seventeen, the expression mode of the corresponding address number output standard is from' @ # - @ #, and continuous floor numbers are represented, namely, the output is from 11 # to 17 #.

Wherein generating, by the encoder, the decoder, the indicator, and the generator, standard address code data corresponding to the address code samples comprises:

step A: and acquiring coded data output after the coder codes the address digital code sample.

Specifically, each address word of the address digital code sample is input into the encoder, and the encoder performs encoding processing on each address word to obtain encoded data.

Taking the encoder as a single-layer bidirectional long and short term memory network as an example, during encoding, a hidden layer (hidden layer) in the long and short term memory network processes each address word and obtains output encoded data.

Because the encoder is a long-term memory network, the output encoded data includes encoded semantic data (CJ in fig. 5, where J is a numerical value corresponding to the last address word in the address digital sample, J is a positive integer greater than or equal to 1, and CJ is encoded semantic data output by the encoder when the last address word is input into the encoder as input data) and encoded implicit data (h1 to hJ in fig. 5, where hJ refers to encoded implicit data generated by performing encoder implicit processing on the J-th address word in the address digital sample). Of course, the encoder may employ different networks and may output different content in the encoded data.

And B: and inputting the coded data or the standard word corresponding to the address word before the current address word into the decoder to obtain decoded data.

Taking the decoder as a single-layer unidirectional long-and-short time memory network as an example, the decoder performs decoding processing once at each processing time.

At the first processing time, the input data of the decoder includes the encoded data output by the encoder and information such as < star > identifying the character. At processing time except the first processing time, the input of the decoder comprises a standard word corresponding to a previous address word output at the previous processing time and part or all of decoded data output at the previous processing time.

When decoding, the hidden layer of the decoder decodes the input data and outputs the decoded data. The decoded data includes decoded semantic data (in fig. 5, D1 to Di, where i is a positive integer greater than or equal to 1, i is equal to 1 at the first processing time, and Di is decoded semantic data output by the decoder at the ith processing time) and decoded hidden data (in fig. 5, Hi is decoded hidden data output by the decoder at the ith processing time). Of course, the decoder may use different networks and the content in the decoded data it outputs may be different.

And C: and carrying out weighted average operation on the decoded data and the encoded data through the indicator, determining the weight of each address word according to the result of the weighted average operation, and obtaining a context encoding vector of the address digital sample according to the weight.

At the first processing time, the indicator carries out weighted average operation on coding hidden layer data (h 1-hJ) in the coded data and decoding hidden layer data in decoding data output by the decoder at the current processing time, and the weight of each address word in the address digital sample is determined according to the weighted average operation result. And generating a context coding vector (H shown in figure 5) of the address code samples according to the determined weights. The context coding vector is used for indicating semantic features among address words in the address digital code sample, and can be represented in a probability form corresponding to each address word, so that the standard words at the current moment can be accurately output according to the semantic features.

Specifically, the calculation method of the weighted average may be: the decoding hidden layer data at the current processing time and the coding hidden layer data h 1-hJ are respectively calculated, J calculation results are subjected to weighted average operation (namely softmax), the calculation results are in an interval from 0 to 1, the sum is 1, and the result of the softmax operation is used as the weight of each address word.

Of course, in other embodiments, the indicator may use other methods to calculate the weight of each address word, which is not limited in this embodiment.

Alternatively, for ease of calculation, in the present embodiment, the indicator generates a context encoding vector according to the calculated weight.

At the ith processing time (the numeric area of i is from 1 to the number of all standard words of the standard address digital data, in this case, i can be a positive integer greater than 1 such as 2, 3, 4, etc.), the indicator performs weighted average operation on the encoded hidden layer data in the encoded data and the decoded hidden layer data in the decoded data output by the decoder at the ith processing time, and determines the weight of each address word in the address digital sample according to the result of the weighted average operation. And generating the context coding vector according to the determined weight.

Step D: and performing weighted summation operation on the decoded data and the context coding vector through the generator, and obtaining the dictionary distribution probability of the address digital code sample according to the result of the weighted summation operation.

Wherein, the dictionary distribution probability is used for indicating the probability that the standard word output at the current moment is each dictionary word in the dictionary. Wherein, the dictionary is constructed according to the address words in the address digital code sample.

In this embodiment, the dictionary distribution probability is determined according to the decoded hidden layer data and the context coding vector at the current processing time in the decoded data. The specific determination means can be implemented by those skilled in the art by adopting any appropriate prior art means according to actual needs, and the embodiment of the present invention is not limited thereto.

Step E: and performing weighted summation operation on the context coding vector and the dictionary distribution probability through the generator, and obtaining a generation probability according to a result of the weighted summation operation.

The generation probability (Pgen shown in fig. 5) is used to indicate the source of the standard word output at the current processing time, in other words, it is determined according to the generation probability that the standard word output at the current processing time is acquired from the dictionary or the standard word output at the current processing time is acquired from the address words of the address code sample.

By acquiring the generation probability, the address words can be directly copied from the address digital code samples as output standard words when needed, so that the problem of generating digital codes which do not exist in the address text in the prior art during conversion can be avoided, and the accuracy is improved.

In this embodiment, the generation probability at the i-th processing time is determined based on the output data (denoted by Yi-1) at the i-th processing time of the decoder, the decoded semantic data (denoted by Ci) in the decoded data at the i-th processing time of the decoder, and the context coding vector H at the i-th processing time.

In one possible approach, the generation probability can be expressed as equation 1:

where σ () may be any suitable processing function, such as a sigmoid function, which maps variables between 0 and 1 as a threshold function of the neural network, thereby representing the probability of generation, which can be determined by one skilled in the art as desired.Is a context-coding vector H^*Weight of (1), H^*Is the ith processing timeThe context-encoded vector of (1) is,is decoding semantic data C_iWeight of (C)_iIs the decoded semantic data at the ith processing instant,is the weight of the standard word output at the previous processing instant (i.e. instant i-1), Y_i-1Is a vector or matrix representation of the standard word output at the i-1 th processing instant, and b is an offset value.

Step F: and generating a standard word corresponding to the current address word according to the generation probability, the context coding vector and the dictionary distribution probability through the generator.

In this embodiment, let the standard word that outputs the current processing time be denoted as p (w), the calculation method of p (w) can be expressed by the following formula 2:

P(w)＝Pgen*P_vocab(w)+(1-Pgen)Σ_i:wiH^*(formula 2)

Wherein Pgen is the probability of generation, P_vocab(w) dictionary distribution probability, Σ_i:wiH^*The probability of each address word indicated by the vector is coded for the i-th instance context.

Step G: and updating the current address word, and returning to the step of inputting the standard word corresponding to the address word before the acquired current address word into the decoder to continue execution until all the address words generate the corresponding standard word.

After the standard word at the ith processing time is generated, updating the current address word at the (i + 1) th processing time, if the current address word is changed into the (J + 1) th address word after the standard word corresponding to the jth (the value range of J is from 1 to J) address word is generated, returning to the step of inputting the standard word corresponding to the previous address word of the obtained current address word into the decoder to be continuously executed, so as to obtain the new standard word corresponding to the current address word (the (J + 1) th address word) until all the address words generate the corresponding standard word.

Step H: and generating standard address digital data corresponding to the address digital samples according to the standard words corresponding to all the address words by the generator.

After the standard words corresponding to all address words are obtained, the standard words can be spliced together to generate standard address digital data corresponding to the address digital samples, or standard address digital data can be generated according to the standard words in other ways.

When a standard address word is generated, a generator of the address text processing device determines a generation probability according to a context coding vector, a standard word corresponding to a previous address word and decoded data, so that the address word is determined to be selected from an address digital code sample to be output as the standard word according to the generation probability or words in a dictionary are output as the standard words according to a dictionary distribution probability, the address text processing device can well adapt to the requirement that some information (such as digital codes) needs to be reserved without conversion in the address text standardization process, and the problem that the output address text contains address digital codes without original address text due to the fact that the address text is subjected to undifferentiated conversion in the prior art is solved.

Substep S1042: and determining the difference between the standard address digital data and the address digital output standard, and adjusting the training parameters of the address text processing device according to the difference.

After the standard address code data is obtained, training parameters (including but not limited to conventional training parameters such as weights) of the address text processing device can be adjusted according to the difference (such as loss value) between the standard address code data and the set address code output standard. The skilled person can determine the adjusted training parameters by any suitable calculation method, such as a gradient descent method, and the like, which is not limited in this embodiment.

Substep S1043: and continuing to train the address text processing device by using the adjusted training parameters until the training termination condition is met.

And (3) performing training iteration on the address text processing device, after the adjustment of the training parameters is completed in each iteration, returning to the step (S1041), inputting the next address number sample into the address text processing device, and continuing to train the address text processing device until a training termination condition is met, wherein the difference between the standard address number data and the set address number output standard is smaller than a set value or the training times meet the set times and the like.

The trained address text processing device can accurately convert the address digital elements in the address text, realizes the unification of the expression modes of the address digital elements, and provides a good basis for subsequent navigation or other data processing according to the address text.

As can be seen from the foregoing, the trained address text processing apparatus can implement the address text standardization, which includes:

and inputting the address digital elements into an encoder of the trained address text processing device to obtain encoded data (which comprises encoded semantic data and encoded hidden layer data corresponding to each address word) output by the encoder.

At the first processing time of the decoder, the coded semantic data and the preset < star > character are input into the decoder, and the decoding data (including the decoded semantic data at the current processing time and the decoded hidden layer data at the current processing time) at the current processing time is obtained by processing the hidden layer of the decoder.

And the indicator determines the context coding vector at the current processing moment according to the decoding hidden layer data at the current processing moment and the coding hidden layer data corresponding to each address word.

And the generator determines the dictionary distribution probability according to the context coding vector at the current processing moment and the decoding hidden layer data at the current processing moment.

The generator determines the generation probability according to the context coding vector at the current processing time, the standard word corresponding to the previous address word (because the current processing time is the first processing time, the standard word corresponding to the previous address word is a preset < star > character) and the decoded semantic data at the current processing time.

And the generator determines a standard word which is output at the current processing moment and corresponds to the current address word according to the generation probability, the context coding vector and the generation dictionary distribution probability at the current processing moment.

And at the next processing moment, inputting the decoded semantic data output at the current processing moment and the standard word corresponding to the current address word into a decoder, and returning to the step of processing the input data through the hidden layer of the decoder to continue to execute until the output standard word is a preset character such as < end > indicating that the processing is finished.

After all the standard words are obtained, the standard words can be spliced into standard address digital data, so that the address digital elements of the address text are standardized, and further the address text is standardized.

Because the address text processing device is an end-to-end network model, in the aspect of training used marking data, only input data and output data of the address text processing device need to be considered, manual rules do not need to be specified for each kind of input data, and a large amount of manpower can be saved. In addition, compared with other standardized processing methods, the address text processing device has the function of an indicator, can select address words (such as digital codes) which do not need to be processed from the originally input address words to be reserved, and solves the problem that output digital codes are digital codes which are not stored in the original text and are caused by indiscriminate processing of input data in the prior art. The address text processing device can convert the address text into a unified expression mode under the condition of keeping original information, so that subsequent application such as navigation or express delivery according to the address text is more convenient and accurate. According to the method and the device, the address number elements in the address text are obtained, the address number samples are generated according to the address number elements, the address number samples are used as training samples to train the indicator generation network, and therefore the trained address text processing device can be sensitive to numbers and can accurately convert the address number elements in the address text. By the aid of the trained address text processing device, address digital elements in the address text can be automatically converted by the trained address text processing device, so that the address digital elements are accurately converted into standard address digital data meeting address digital output standards, address text standardization is achieved, manpower is saved, and various different expression modes can be comprehensively covered.

When the trained address text processing device generates the standard words, the generator determines whether the standard words at the current moment are generated from each address word or from the dictionary according to the generation probability, and the conversion accuracy is further ensured.

EXAMPLE six

Referring to fig. 6, a flowchart illustrating steps of an address text processing method according to a sixth embodiment of the present invention is shown.

The address text method of the present embodiment includes steps S102 to S104 of the fourth embodiment or the fifth embodiment.

Wherein the step S102 comprises the following substeps:

substep S1021: and acquiring the address elements in the address text according to the address element marking information in the address text.

The address text may be obtained from an address library containing a large amount of address text. When the address text is obtained from the address library, a part of the address text can be sampled according to different expression modes adopted by the address text for address element labeling, and address element labeling information is obtained.

In order to ensure the training effect on the address text processing device, the sampled address texts ensure that the number of the address texts in various expression modes is relatively close, and all or most of the expression modes of the address texts can be covered.

Substep S1022: and segmenting address number elements from the address elements.

Substep S1023: and generating the address code sample according to the address code element.

As mentioned above, the address code samples may be samples that include only address code elements, or may be samples that include both address code elements and other elements.

In addition, different address digital output standards can be set for different types of address digital elements, so that the marking information can be set for the address digital elements to be used as a training basis for a subsequent address text processing device.

And the address elements are obtained according to the address element labeling information, and the address digital elements are cut out from the address elements to be used for generating the address digital samples, so that the generation efficiency of the address digital samples can be improved, the address digital elements can be rapidly and comprehensively obtained, the training efficiency of the address text processing device is improved, and the training cost is reduced.

EXAMPLE seven

Referring to fig. 7, a block diagram of an address text processing apparatus according to a seventh embodiment of the present invention is shown.

The address text processing device of the present embodiment includes: a first obtaining module 502, configured to obtain a to-be-processed address text containing address number elements, and input the to-be-processed address file into the address text processing apparatus; an output module 504, configured to obtain standard address number data corresponding to the address text to be processed, where the standard address number data is output by the address text processing apparatus. .

According to the embodiment, the input address digital elements can be automatically converted through the encoder, the decoder, the indicator and the generation, so that the address digital elements are accurately converted into the standard address digital data meeting the address digital output standard, address text standardization is realized, manpower is saved, and various different expression modes can be comprehensively covered.

Example eight

Referring to fig. 8, there is shown a block diagram of an address text processing apparatus according to an eighth embodiment of the present invention.

The address text processing device of the present embodiment includes: a first obtaining module 602, configured to obtain a to-be-processed address text containing address number elements, and input the to-be-processed address file into the address text processing device; an output module 604, configured to obtain standard address number data corresponding to the address text to be processed, where the standard address number data is output by the address text processing device.

Optionally, the apparatus further comprises: a training module 606, configured to perform standard address code training on the address text processing apparatus before the to-be-processed address text including the address code element is obtained.

Optionally, the training module 606 comprises: a second obtaining module 6061, configured to obtain an address number element in an address text for training, and generate an address number sample according to the address number element; and a processing module 6062, configured to train the address text processing apparatus by using the address digital code sample as a training sample and using the address digital code output standard as a training target, so as to obtain standard address digital data corresponding to the input address digital code element by using the trained address text processing apparatus.

Optionally, the processing module 6062 includes: an input module, configured to input the address code sample into the address text processing device, and generate standard address code data corresponding to the address code sample through the encoder, the decoder, the indicator, and the generator; the determining module is used for determining the difference between the standard address digital data and the address digital output standard and adjusting the training parameters of the address text processing device according to the difference; and the adjusting module is used for continuing to train the address text processing device by using the adjusted training parameters until the training termination condition is met.

Optionally, the input module comprises: the third obtaining module is used for obtaining coded data output after the coder carries out coding processing on the address digital code sample; the fourth acquisition module is used for inputting the coded data or the previous standard word output by the address text processing device into the decoder to acquire decoded data; the weight adjusting module is used for carrying out weighted average operation on the decoding data and the encoding data through the indicator, determining the weight of each address word according to the result of the weighted average operation, and obtaining a context encoding vector of the address digital sample according to the weight; a fifth obtaining module, configured to perform, by the generator, a weighted summation operation on the decoded data and the context coding vector, and obtain a dictionary distribution probability of the address code sample according to a result of the weighted summation operation; a sixth obtaining module, configured to perform, by the generator, weighted summation on the context coding vector and the dictionary distribution probability, and obtain a generation probability according to a result of the weighted summation; the first generation module is used for generating a standard word corresponding to the current address word according to the generation probability, the context coding vector and the dictionary distribution probability through the generator; the updating module is used for updating the current address word and returning the step of inputting the standard word corresponding to the address word before the obtained current address word into the decoder to be continuously executed until all the address words generate the corresponding standard word; and the second generation module is used for generating standard address digital data corresponding to the address digital samples according to the standard words corresponding to all the address words.

Optionally, the second obtaining module 6061 includes: a seventh obtaining module, configured to obtain, according to address element tagging information in the address text for training, an address element in the address text for training; a segmentation module for segmenting address digital elements from the address elements; a third generating module, configured to generate the address code samples according to the address code elements.

Optionally, the address text to be processed containing address number elements is an address text to be searched containing numbers, the first obtaining module 602 is further configured to input the address text to be searched containing numbers into the address text processing apparatus, normalize the address text to be searched through the encoder, the decoder, the indicator, and the generator, and output standard address number data corresponding to the address text to be searched containing numbers as a normalization result; the apparatus further comprises: and the fourth generation module is used for generating an address text containing standard address number data according to the normalization processing result.

Optionally, the address text to be processed containing the address number elements is an address text to be processed in an electronic business card, the first obtaining module is further configured to input the address text to be processed in the electronic business card into the address text processing device, normalize the address to be processed through the encoder, the decoder, the indicator and the generator, and output standard address number data corresponding to the address text to be processed in the electronic business card as a normalization result; the apparatus further comprises: and the fifth generation module is used for generating an address text containing standard address number data according to the normalization processing result.

The address text processing apparatus of this embodiment is used to implement the corresponding address text processing method in the foregoing multiple method embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Example nine

Referring to fig. 9, a schematic structural diagram of an electronic device according to a ninth embodiment of the present invention is shown, and the specific embodiment of the present invention does not limit the specific implementation of the electronic device.

As shown in fig. 9, the electronic device may include: a processor (processor)702, a Communications Interface 704, a memory 706, and a communication bus 708.

Wherein:

the processor 702, communication interface 704, and memory 706 communicate with each other via a communication bus 708.

A communication interface 704 for communicating with other electronic devices, such as a terminal device or a server.

The processor 702 is configured to execute the program 710, and may specifically execute relevant steps in the above-described address text processing method embodiment.

In particular, the program 710 may include program code that includes computer operating instructions.

The processor 702 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement an embodiment of the present invention. The electronic device comprises one or more processors, which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

The memory 706 stores a program 710. The memory 706 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 710 may specifically be used to cause the processor 702 to perform the following operations: acquiring an address text to be processed containing address digital elements, and inputting the address text to be processed into the address text processing device; and acquiring standard address digital data which is output by the address text processing device and corresponds to the address text to be processed.

In an alternative embodiment, the program 710 is further configured to cause the processor 702 to perform standard address code training on the address text processing apparatus before the obtaining of the address text to be processed containing address code elements.

In an alternative embodiment, the program 710 is further configured to enable the processor 702 to obtain address number elements in an address text for training when performing standard address number training on the address text processing apparatus, and generate an address number sample according to the address number elements; and training the address text processing device by taking the address digital code sample as a training sample and meeting the address digital code output standard as a training target so as to obtain standard address digital code data corresponding to the input address digital code element by using the trained address text processing device.

In an alternative embodiment, the program 710 is further configured to cause the processor 702 to input the address code sample into the address text processing apparatus when performing standard address code training on the address text processing apparatus, and generate standard address code data corresponding to the address code sample through the encoder, the decoder, the indicator and the generator; determining the difference between the standard address digital data and the address digital output standard, and adjusting the training parameters of the address text processing device according to the difference; and continuing to train the address text processing device by using the adjusted training parameters until the training termination condition is met.

In an optional implementation manner, the program 710 is further configured to enable the processor 702 to obtain encoded data output after the encoder performs encoding processing on the address digital samples when performing standard address digital training on the address text processing apparatus; inputting the coded data or the previous standard word output by the address text processing device into the decoder to obtain decoded data; carrying out weighted average operation on the decoded data and the encoded data through the indicator, determining the weight of each address word according to the result of the weighted average operation, and obtaining a context encoding vector of the address digital sample according to the weight; carrying out weighted summation operation on the decoding data and the context coding vector through the generator, and obtaining dictionary distribution probability of the address digital code sample according to the result of the weighted summation operation; carrying out weighted summation operation on the context coding vector and the dictionary distribution probability through the generator, and obtaining a generation probability according to the result of the weighted summation operation; generating a standard word corresponding to the current address word according to the generation probability, the context coding vector and the dictionary distribution probability through the generator; updating the current address word, and returning to the step of inputting the standard word corresponding to the address word before the obtained current address word into the decoder to continue execution until all the address words generate the corresponding standard word; and generating standard address digital data corresponding to the address digital samples according to the standard words corresponding to all the address words.

In an alternative embodiment, the program 710 is further configured to enable the processor 702 to obtain address elements in the address text for training according to the address element labeling information in the address text for training when performing standard address digital training on the address text processing apparatus; segmenting address number elements from the address elements; and generating the address code sample according to the address code element.

In an optional implementation manner, the address text to be processed containing address number elements is an address text to be searched containing digital numbers, and the program 710 is further configured to, when acquiring the address text to be processed containing address number elements and inputting the address text to be processed into the address text processing apparatus, input the address text to be searched containing digital numbers into the address text processing apparatus, and perform normalization processing on the address text to be searched through the encoder, the decoder, the indicator, and the generator, and output standard address number data corresponding to the address text to be searched containing digital numbers as a normalization processing result; and generating an address text containing standard address number data according to the normalization processing result.

In an optional implementation manner, the address text to be processed containing address number elements is an address text to be processed in an electronic business card, and the program 710 is further configured to enable the processor 702 to input the address text to be processed in the electronic business card into the address text processing apparatus when the address text to be processed containing address number elements is acquired and input into the address text processing apparatus, and perform normalization processing on the address to be processed through the encoder, the decoder, the indicator and the generator, and output standard address number data corresponding to the address text to be processed in the electronic business card as a normalization processing result; and generating an address text containing standard address number data according to the normalization processing result.

For specific implementation of each step in the program 710, reference may be made to corresponding steps and corresponding descriptions in units in the foregoing address text processing method embodiment, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

By the electronic equipment, the address digital elements in the address text are obtained, the address digital samples are generated according to the address digital elements, and the pointer generation network is trained by using the address digital samples as training samples, so that the trained address text processing device can be sensitive to numbers and can accurately convert the address digital elements in the address text. By the aid of the trained address text processing device, address digital elements in the address text can be automatically converted by the trained address text processing device, so that the address digital elements are accurately converted into standard address digital data meeting address digital output standards, address text standardization is achieved, manpower is saved, and various different expression modes can be comprehensively covered.

It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present invention may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present invention.

The above-described method according to an embodiment of the present invention may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the method described herein may be stored in such software processing on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that the computer, processor, microprocessor controller or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the address text processing method described herein. Further, when a general-purpose computer accesses code for implementing the address text processing method shown herein, execution of the code converts the general-purpose computer into a special-purpose computer for executing the address text processing method shown herein.

Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.

The above embodiments are only for illustrating the embodiments of the present invention and not for limiting the embodiments of the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present invention, so that all equivalent technical solutions also belong to the scope of the embodiments of the present invention, and the scope of patent protection of the embodiments of the present invention should be defined by the claims.

28页详细技术资料下载

Address text processing method, device and equipment

相关技术

网友询问留言