Regular expression conversion method, device, equipment and storage medium

文档序号:1799295 发布日期:2021-11-05 浏览:6次 中文

阅读说明:本技术 一种正则表达式转换方法、装置、设备及存储介质 (Regular expression conversion method, device, equipment and storage medium ) 是由 傅东博 于 2021-06-30 设计创作,主要内容包括:本申请提出一种正则表达式转换方法、装置、设备及存储介质,该方法包括:获取原始逻辑表达式;生成原始逻辑表达式对应的层级列表,层级列表中包括按照逻辑层级的预设顺序排列的多个子表达式;根据层级列表和预设元字符映射表,将原始逻辑表达式转换为对应的正则表达式。本申请自动将用户输入的原始逻辑表达式转换为对应的正则表达式。实现自动生成满足用户逻辑需求的正则表达式,生成速度快,准确性高,能够生成逻辑层级多层嵌套的复杂的正则表达式。而且能快速生成大量正则表达式,能够满足线上文本匹配的时效性需求,可以实现字级、词级、句式级等级别随意组合的文本匹配,无需分布匹配,提高了文本匹配效率。(The application provides a regular expression conversion method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring an original logic expression; generating a hierarchical list corresponding to the original logic expression, wherein the hierarchical list comprises a plurality of sub-expressions arranged according to a preset sequence of logic hierarchies; and converting the original logic expression into a corresponding regular expression according to the level list and the preset meta-character mapping table. The method and the device automatically convert the original logic expression input by the user into the corresponding regular expression. The method has the advantages of realizing automatic generation of the regular expression meeting the logic requirements of the user, along with high generation speed and high accuracy, and being capable of generating the complex regular expression nested in multiple layers of logic levels. And a large number of regular expressions can be generated quickly, the timeliness requirement of on-line text matching can be met, text matching with random combination of word level, word level and sentence level can be realized, distribution matching is not needed, and the text matching efficiency is improved.)

1. A regular expression conversion method, comprising:

acquiring an original logic expression;

generating a hierarchical list corresponding to the original logic expression, wherein the hierarchical list comprises a plurality of sub-expressions arranged according to a preset sequence of logic hierarchies;

and converting the original logic expression into a corresponding regular expression according to the level list and a preset meta-character mapping table.

2. The method of claim 1, wherein generating the hierarchical list corresponding to the original logical expression comprises:

acquiring a constant array and a logic expression corresponding to the constant array from an original logic expression;

and generating a hierarchy list corresponding to the original logic expression according to the constant array and the logic expression.

3. The method of claim 2, wherein generating the hierarchical list of the original logical expression from the array of constants and the logical expression comprises:

replacing the constant identifier in the logic expression with a corresponding constant character in the constant array to obtain a constant logic expression;

dividing the constant logic expression into a plurality of sub-expressions according to each logic symbol and bracket included by the constant logic expression;

and generating a hierarchy list corresponding to the original logic expression according to the operation priority of each logic symbol and the plurality of sub-expressions.

4. The method according to claim 3, wherein the generating a hierarchical list corresponding to the original logical expression according to the operation priority of each logical symbol and the plurality of sub-expressions comprises:

respectively determining a logic symbol to be executed in each sub-expression;

respectively determining the logic level corresponding to each sub-expression according to the operation priority of the logic symbol to be executed in each sub-expression and the bracket level of the bracket in each sub-expression in the constant logic expression;

and sequencing each sub-expression according to a preset sequence of logic levels to obtain a level list corresponding to the original logic expression.

5. The method according to any one of claims 1 to 4, wherein the converting the original logic expression into a corresponding regular expression according to the hierarchical list and a predetermined meta-character mapping table comprises:

converting each sub-expression included in the hierarchical list into a sub-regular expression respectively according to a preset meta-character mapping table;

and splicing each obtained sub regular expression into a regular expression corresponding to the original logic expression.

6. The method according to claim 5, wherein the converting each sub-expression included in the hierarchical list into a sub-regular expression according to a preset meta-character mapping table respectively comprises:

randomly acquiring a sub-expression from at least one sub-expression with the highest current hierarchy in the hierarchical list;

converting the obtained sub-expressions into corresponding sub-regular expressions according to a preset meta-character mapping table;

deleting the obtained sub-expressions from the hierarchical list, returning to execute the operation of randomly obtaining one sub-expression from at least one sub-expression with the highest current hierarchical level in the hierarchical list, and executing in a circulating way until the sub-expressions are not contained in the hierarchical list any more.

7. The method according to claim 6, wherein the converting the obtained sub-expressions into corresponding sub-regular expressions according to a preset meta-character mapping table comprises:

judging whether the obtained sub-expression contains a preset symbol configured in a preset meta-character mapping table or not;

if yes, acquiring a meta character and a replacement rule corresponding to the preset symbol from the preset meta character mapping table;

and replacing the preset symbols in the sub-expressions with meta-characters corresponding to the preset symbols according to the replacement rules corresponding to the preset symbols to obtain the sub-regular expressions corresponding to the sub-expressions.

8. The method according to any of claims 1-4, wherein after converting the original logic expression into a corresponding regular expression, further comprising:

compiling a regular expression corresponding to the original logic expression;

and performing text matching according to the compiled regular expression.

9. A regular expression conversion apparatus, comprising:

the acquisition module is used for acquiring an original logic expression;

the generating module is used for generating a hierarchy list corresponding to the original logic expression according to the operation priority of each logic symbol included in the original logic expression;

and the conversion module is used for converting the original logic expression into a corresponding regular expression according to the hierarchy list and a preset meta-character mapping table.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to implement the method of any one of claims 1-8.

11. A computer-readable storage medium, on which a computer program is stored, characterized in that the program is executed by a processor to implement the method according to any of claims 1-8.

Technical Field

The application belongs to the technical field of natural language processing, and particularly relates to a regular expression conversion method, a device, equipment and a storage medium.

Background

In the field of natural language processing, regular expressions are strings used to describe and match a certain syntactic rule, and are often used for text retrieval, text matching, or text replacement. The regular expression is composed of constant characters for matching text in text retrieval, text matching, or text replacement, and meta characters for defining an operation rule when matching text by the constant characters.

In the related art, a user manually writes a plurality of regular expressions according to text information to be matched to form a regular set, the regular set is provided to a device, and the device performs standardized matching on content to be matched according to the regular set.

However, the regular expression is written manually by a user, the efficiency is low, the error is easy to occur, a long time is needed for text matching according to the manually written regular expression, and the accuracy is not high.

Disclosure of Invention

The application provides a regular expression conversion method, a device, equipment and a storage medium, which are used for automatically converting an original logic expression into a corresponding regular expression. The method has the advantages that the regular expressions meeting the logic requirements of users can be automatically generated, the generation speed is high, the accuracy is high, a large number of regular expressions can be rapidly generated, and the timeliness requirement of online text matching is met.

An embodiment of a first aspect of the present application provides a regular expression conversion method, including:

acquiring an original logic expression;

generating a hierarchical list corresponding to the original logic expression, wherein the hierarchical list comprises a plurality of sub-expressions arranged according to a preset sequence of logic hierarchies;

and converting the original logic expression into a corresponding regular expression according to the level list and a preset meta-character mapping table.

In some embodiments of the present application, the generating a hierarchical list corresponding to the original logical expression includes:

acquiring a constant array and a logic expression corresponding to the constant array from an original logic expression;

and generating a hierarchy list corresponding to the original logic expression according to the constant array and the logic expression.

In some embodiments of the present application, the generating a hierarchical list corresponding to the original logic expression according to the constant array and the logic expression includes:

replacing the constant identifier in the logic expression with a corresponding constant character in the constant array to obtain a constant logic expression;

dividing the constant logic expression into a plurality of sub-expressions according to each logic symbol and bracket included by the constant logic expression;

and generating a hierarchy list corresponding to the original logic expression according to the operation priority of each logic symbol and the plurality of sub-expressions.

In some embodiments of the present application, the generating a hierarchical list corresponding to the original logical expression according to the operation priority of each logical symbol and the plurality of sub-expressions includes:

respectively determining a logic symbol to be executed in each sub-expression;

respectively determining the logic level corresponding to each sub-expression according to the operation priority of the logic symbol to be executed in each sub-expression and the bracket level of the bracket in each sub-expression in the constant logic expression;

and sequencing each sub-expression according to a preset sequence of logic levels to obtain a level list corresponding to the original logic expression.

In some embodiments of the present application, the converting the original logic expression into a corresponding regular expression according to the hierarchical list and a preset meta-character mapping table includes:

converting each sub-expression included in the hierarchical list into a sub-regular expression respectively according to a preset meta-character mapping table;

and splicing each obtained sub regular expression into a regular expression corresponding to the original logic expression.

In some embodiments of the present application, the converting, according to a preset meta-character mapping table, each sub-expression included in the hierarchical list into a sub-regular expression respectively includes:

randomly acquiring a sub-expression from at least one sub-expression with the highest current hierarchy in the hierarchical list;

converting the obtained sub-expressions into corresponding sub-regular expressions according to a preset meta-character mapping table;

deleting the obtained sub-expressions from the hierarchical list, returning to execute the operation of randomly obtaining one sub-expression from at least one sub-expression with the highest current hierarchical level in the hierarchical list, and executing in a circulating way until the sub-expressions are not contained in the hierarchical list any more.

In some embodiments of the present application, the converting the obtained sub-expressions into corresponding sub-regular expressions according to a preset meta-character mapping table includes:

judging whether the obtained sub-expression contains a preset symbol configured in a preset meta-character mapping table or not;

if yes, acquiring a meta character and a replacement rule corresponding to the preset symbol from the preset meta character mapping table;

and replacing the preset symbols in the sub-expressions with meta-characters corresponding to the preset symbols according to the replacement rules corresponding to the preset symbols to obtain the sub-regular expressions corresponding to the sub-expressions.

In some embodiments of the present application, after converting the original logic expression into a corresponding regular expression, the method further includes:

compiling a regular expression corresponding to the original logic expression;

and performing text matching according to the compiled regular expression.

An embodiment of a second aspect of the present application provides a regular expression conversion apparatus, including:

the acquisition module is used for acquiring an original logic expression;

the generating module is used for generating a hierarchy list corresponding to the original logic expression according to the operation priority of each logic symbol included in the original logic expression;

and the conversion module is used for converting the original logic expression into a corresponding regular expression according to the hierarchy list and a preset meta-character mapping table.

Embodiments of the third aspect of the present application provide an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to implement the method of the first aspect.

An embodiment of a fourth aspect of the present application provides a computer-readable storage medium having a computer program stored thereon, the program being executable by a processor to implement the method of the first aspect.

The technical scheme provided in the embodiment of the application at least has the following technical effects or advantages:

in the embodiment of the application, a user can edit the original logic expression in a personalized manner, and the device automatically converts the original logic expression input by the user into a corresponding regular expression. The method has the advantages of realizing automatic generation of the regular expressions meeting the logic requirements of users, along with high generation speed and high accuracy, and being capable of generating the complex regular expressions nested in multiple layers of logic levels, such as generating the regular expressions corresponding to complex logic expressions such as ' available and unavailable … … ', ' unavailable but available and available only when … … exists, and the like. And a large number of regular expressions can be generated quickly, the timeliness requirement of on-line text matching can be met, text matching with random combination of word level, word level and sentence level can be realized, distribution matching is not needed, and the text matching efficiency is improved.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings.

In the drawings:

FIG. 1 illustrates a flow chart of a regular expression conversion method provided by an embodiment of the present application;

FIG. 2 illustrates another flow diagram of a regular expression conversion method provided by an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a regular expression conversion apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 5 is a schematic diagram of a storage medium provided in an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which this application belongs.

The following describes a regular expression conversion method, a regular expression conversion device, regular expression conversion equipment and a storage medium according to embodiments of the present application with reference to the drawings.

In the related art, in the field of natural language processing technology, regular expressions are often used for text retrieval, text matching, text replacement, and other operations. Specifically, a user manually writes text information to be matched to obtain a plurality of regular expressions to form a regular set, the regular set is provided to the device, and the device performs standardized matching on content to be matched according to the regular set. However, the regular expression is written manually, the efficiency is low, errors are prone to occurring, long time is needed for text matching according to the manually written regular expression, and accuracy is not high.

Based on this, the embodiment of the present application provides a regular expression conversion method, in which a user only needs to provide an original logic expression, and automatically generates a corresponding hierarchical list according to the original logic expression, where the hierarchical list includes a plurality of sub-expressions arranged in a hierarchical manner. And sequentially replacing the preset logic symbols in the sub-expressions with the meta-characters according to the mapping relation among the preset logic symbols, the meta-characters and the replacement rules included in the preset meta-character mapping table from high to low of the hierarchy in the hierarchical list, so as to obtain the regular expressions corresponding to the original logic expressions. The method has the advantages of automatically generating the regular expressions meeting the logic requirements of users, generating the complex regular expressions nested in multiple layers of logic levels, quickly generating a large number of regular expressions, meeting the timeliness requirements of online text matching, realizing text matching of random combination of word level, word level and sentence level levels, avoiding distribution matching and improving the text matching efficiency.

Referring to fig. 1, the method specifically includes the following steps:

step 101: and acquiring an original logic expression.

The embodiment of the application provides a regular conversion interface for a user to input an original logic expression, the user edits a self-defined original logic expression in the regular conversion interface, the original logic expression may include a constant array and a logic expression corresponding to the constant array, wherein the constant array includes a plurality of constant characters, and the plurality of constant characters include constant characters which need to be hit and constant characters which need not be hit when matching a text. The constant characters can be characters, symbols, numbers or character strings formed by combining at least two of the characters, the symbols, the numbers or the three, wherein the characters can be characters, words or sentences in any languages such as Chinese, English and German. The symbols can be any symbols such as punctuation symbols, mathematical symbols, character symbols and the like. For example, in the constant sets [ 'pain', 'pain #5', 'pain #5', the constant sets of 'pain', 'pain #5' and 'pain #5' are four constant characters. Wherein, "#" is a connector, and ' don't ' pain #5' indicates all patterns of ' don't … … pain ' and the interval between the ' don ' word and the ' pain ' word cannot exceed 5 characters.

The logic expression corresponding to the constant characters is used for expressing the logic relationship among a plurality of constant characters included in the constant array, the logic expression includes logic symbols, brackets and constant identifiers corresponding to the constant characters, the constant identifiers are used for identifying the constant characters in the constant array, and the logic symbols are used for expressing the logic relationship among the constant characters. For example, the constant array [ 'pain', 'pain # 5') may correspond to a logical expression of (1|2) &! (3|4) where 1, 2, 3, 4 are constant identifiers identifying the constant characters 'pain', 'pain #5' and 'pain #5', respectively. "|", "&" and "! "in turn is a logical symbol or, and, not. In this example, the constant identifier is identified by the arrangement number of the constant characters in the constant array, and in practical applications, a subscript may be further labeled for each constant character in the constant array, and the subscript of the constant character is used as the constant identifier in the logic expression.

In addition to the above constant data and corresponding logic expression, the original logic expression input by the user may further include a text tag, where the text tag is used to express a text meaning that the user needs to match. For example, the text label may be "pain evident". Accordingly, the complete original logical expression entered by the user may be: 'sense of pain' { 'words' [ 'pain', 'pain' no # pain #5',' no # pain #5'],' logic ':' (1|2) &! (3|4)' }. Alternatively, it may be: { 'words' [ 'pain', 'pain #5' ], 'logic': 1|2 &! (3|4)' }. Alternatively, it may be: { [ ' pain ', ' pain #5' ], ' (1|2) &! (3|4)' }. Alternatively, it may be: { [ pain, pain #5], (1|2) &! (3|4) }. Alternatively, it may be: [ pain, pain #5], (1|2) &! (3|4), and so on. Among the four constant characters of 'pain', 'pain #5' and 'pain #5', the 'pain' and 'pain' are constant characters that must be hit in text matching, and the 'pain #5' and 'pain #5' are constant characters that must not be hit in text matching.

The structure of the original logic expression shown above is only used as an example, and the data composition structure of the original logic expression is not particularly limited in the embodiment of the present application, and the structure of the original logic expression may be designed according to requirements in practical application. In order to enable a user to accurately input an original logic expression meeting a pre-designed data composition structure and avoid the occurrence of repeatedly modifying the original logic expression due to format problems, the embodiment of the application can display prompt information for prompting the structural format of the original logic expression while displaying a regular conversion interface, and an example of the original logic expression can be given in the prompt information, so that the user can input the original logic expression in a correct format by referring to the prompt information.

When the fact that a user edits the regular conversion interface is detected, the original logic expression input by the user is obtained from the regular conversion interface. The user can edit the self-defined original logic expression through the regular conversion interface, the requirement of the user on personalized logic matching is met, the content of the original logic expression is simple, the user does not need to consider the complex grammar requirement and numerous meta characters of the regular expression when inputting the original logic expression, the input of the original logic expression is not easy to make mistakes, and the editing efficiency is high.

Step 102: and generating a hierarchy list corresponding to the original logic expression, wherein the hierarchy list comprises a plurality of sub-expressions arranged according to a preset sequence of the logic hierarchy.

The hierarchical list corresponding to the original logic expression includes a plurality of sub-expressions, the sub-expressions are split from the original logic expression, and the sub-expressions are arranged according to a preset order of logic levels in the hierarchical list, where the preset order may be an order from high to low of the logic levels or an order from low to high of the logic levels. The higher the logic level of a sub-expression, the higher the execution priority of the sub-expression when operating on the original logic expression.

The step specifically generates a hierarchical list corresponding to the original logical expression through the following operations of steps S1 and S2, and specifically includes:

s1: and acquiring the constant array and the logic expression corresponding to the constant array from the original logic expression.

After an original logic expression input by a user is acquired from a regular conversion interface, the original logic expression is analyzed according to a data composition structure of the preset original logic expression, and a constant array included in the original logic expression and a logic expression corresponding to the constant array are identified from the original logic expression.

For example, assume that the original logic expression obtained from the regular conversion interface is 'pain perception' { 'words': [ 'pain', 'pain #5' ], 'logic': 1|2 &! (3|4) '}, constant arrays [' pain ',' pain #5'] and corresponding logic expressions' (1|2) & | are extracted from the original logic expression! (3| 4)'.

S2: and generating a hierarchical list corresponding to the original logic expression according to the constant array and the logic expression.

After obtaining the constant array and the corresponding logic expression included in the original logic expression through step S1, generating a hierarchical list corresponding to the original logic expression through the following operations of steps S21-S23, which specifically includes:

s21: and replacing the constant identifiers in the logic expression with corresponding constant characters in the constant array to obtain the constant logic expression.

Starting from the first character of the logic expression corresponding to the constant array, judging whether the current character is a constant identifier, if so, acquiring the constant character corresponding to the constant identifier from the constant array, and replacing the constant identifier in the logic expression with the corresponding constant character. And if the current character is judged not to be the constant identifier, skipping the current character and judging whether the next character is the constant identifier. And traversing the characters in the logic expression in sequence, and replacing each constant identifier in the logic expression with a corresponding constant character to obtain the constant logic expression.

For example, constant arrays of [ ' pain ', ' pain #5' ] and corresponding logic expressions ' (1|2) &!are obtained from the original logic expressions! (3| 4)'. The constant identifiers 1, 2, 3 and 4 in the logic expression are replaced by the constant characters ' pain ', ' pain #5' and ' pain #5' in the constant array in turn, resulting in the constant logic expressions [ [ ' pain ', ' pain ' ], ' & ', [ ' |! ', [ ' pain #5', ' | ', ' pain #5' ] ] ].

In other embodiments of the present application, the calculation priority of each logical symbol is preset, such as the order of calculation priority from top to bottom is not! ", and" & ", or" | ", etc. And sequentially replacing the constant identifiers corresponding to the logic symbols according to the operation priority of the logic symbols in the logic expression corresponding to the constant array. Such as first replacing the non "! The corresponding constant identifier is replaced, the constant identifier corresponding to "&" is replaced, and the constant identifier corresponding to "|" is replaced finally.

For example, the logical expression' (1|2) &! Logical symbol not of the highest priority of operation in (3|4)' |! "the required operation is (3|4), which is a logical operation, and is not a constant identifier, so that no replacement operation is performed. The logical symbol "&" with the highest operation priority requires the operation of (1|2) &! (3|4), which is also a logical expression, is not a specific constant identifier and therefore does not perform a replacement operation. Finally, the logical symbol "|" with the lowest operation priority needs to be operated as (1|2) and (3|4), wherein 1, 2, 3 and 4 are specific constant identifiers, so that the constant characters corresponding to the constant identifiers are respectively obtained from the constant arrays [ ' pain ', ' pain #5' ], and will be in the logical expression ' (1|2) & | |. These constant identifiers are replaced with corresponding constant characters in (3|4), respectively, resulting in the final constant logic expression [ [ ' pain ', ' | ', ' pain ' ], ' & ', [ ' | | which is the final constant logic expression! ', [ ' pain #5', ' | ', ' pain #5' ] ] ].

In other embodiments of the present application, the constant identifiers may also be replaced by a hierarchy expressed in parentheses in the logical expression. For example, the logical expression' (1|2) &! The highest level in (3|4)' is the level defined by two brackets, i.e. the constant identifiers in (1|2) and (3|4) are replaced preferentially.

After replacing each constant identifier in the logic expression with a corresponding constant character in any of the above manners, a hierarchical list corresponding to the original logic expression is generated through the following operations of steps S22 and S23.

S22: and splitting the constant logic expression into a plurality of sub-expressions according to each logic symbol and bracket included by the constant logic expression.

And determining a partial expression required by the operation of each logic symbol according to the logic symbols and brackets included in the constant logic expression, splitting the partial expression corresponding to each logic symbol to form a plurality of sub-expressions, wherein each sub-expression only executes the operation of one logic symbol during the operation.

For example, the constant logic expression [ [ 'pain', 'pain' ], ', [' | ], ', [' ]! ', [' not # pain #5',' not # pain #5'] ] includes four logic symbols from front to back as' | ',' |! ',' I ', the first logical symbol' | 'corresponds to sub-expressions of [' pain ',' pain '], and the last logical symbol' | 'corresponds to sub-expressions of [' pain no #5',' pain no #5'], and not'! 'the corresponding sub-expression is ['! ', [ ' pain #5', ' pain #5' ] ], and sub-expressions corresponding to ' & ' are [ [ [ [ ' pain ', ' pain ' ], ', [ ', ' pain ' ], and ', [ ' |! ', [ ' pain #5', ' | ', ' pain #5' ] ] ].

S23: and generating a hierarchical list corresponding to the original logic expression according to the operation priority of each logic symbol and the plurality of sub-expressions.

Specifically, logic symbols to be executed in each sub-expression are respectively determined; respectively determining the logic level corresponding to each sub-expression according to the operation priority of the logic symbol to be executed in each sub-expression and the bracket level of the bracket in each sub-expression in the constant logic expression; and sequencing each sub-expression according to the preset sequence of the logic hierarchy to obtain a hierarchy list corresponding to the original logic expression. The preset order may be in order of logic levels from high to low, or in order of logic levels from low to high.

For example, the constant logic expression [ [ 'pain', 'pain' ], ', [' | ], ', [' ]! 'not' pain #5',' not 'pain #5' ], 4 sub-expressions are respectively [ 'pain', 'pain', [ 'not' pain #5',' not 'pain #5' ], and [ 'not [', 'pain #5' ], respectively! ', [ ' pain #5', ' pain #5' ] ] and [ [ ' pain ', ' pain ' ], ', [ ' |! ', [ ' pain #5', ' | ', ' pain #5' ] ] ]. Wherein, the logic symbol required to be executed in the sub-expression [ 'pain', 'pain' ] is '|', which includes parentheses at a parenthesis level of 2 in the primary constant logic expression. The logical symbol required to be executed in the sub-expressions [ 'pain #5', '|', 'pain #5' ] is '|', which includes parentheses at a parenthesis level of 3 in the original constant logical expression. Sub-expression ['! The logical symbol to be executed in', [' not # pain #5',' | ',' not # pain #5'] ] is'! ', which includes parentheses at the level 2 in the original constant logic expression, it is noted here that each sub-expression is only concerned with the outermost parentheses, since the inner parentheses have already been processed in the other sub-expressions. Sub-expressions [ [ 'pain', '|', 'pain' ], ', [' | ]! The logic symbol to be executed in', [ ' not # pain #5', ' not # pain #5' ] ] ] is ' & ', which includes the outermost layer parenthesis at the parenthesis level of 1 in the original constant logic expression. The higher the bracket hierarchy in this example, the higher the execution priority.

And determining the logic level of the sub-expression according to the operation priority of the logic symbol and the bracket level, wherein the higher the bracket level is, the higher the logic level of the sub-expression is. The higher the operation priority of the logical symbol, the higher the logical hierarchy of the sub-expression. The sub-expressions are first sorted in the order of the bracket hierarchy of the outermost brackets corresponding to each sub-expression from high to low. And sorting the sub-expressions with the same bracket level according to the operation priority of the corresponding logic symbols. The arrangement order of the sub-expressions is arranged from high to low in the logic level. Of course, the logical levels may be arranged in order from low to high.

For example, the sub-expressions [ 'pain', '|', 'pain' ] correspond to the logical symbol '|' and the bracket level 2. The sub-expressions [ 'not # pain #5', '|', 'not # pain #5' ] correspond to the logical symbol '|' and the corresponding bracket level is 3. Sub-expression ['! ', [' not # pain #5',' | ',' not # pain #5'] ] corresponds to a logical symbol'! ' and the corresponding bracket level is 2. Sub-expressions [ [ 'pain', '|', 'pain' ], ', [' | ]! ', [ ' not # pain #5', ' not # pain #5' ] ] ] correspond to the logical symbol ' & ' and the corresponding bracket hierarchy is 1. The sequence of sub-expressions shown in table 1 is first arranged in the parenthesis hierarchy from high to low. Wherein the sub-expressions [ ' pain ', ' | ', ' pain ' ] and [ ' | ]! ', [ ' not pain #5', ' | ', ' not pain #5' ] ] are arranged in the same level and are sorted according to the operation priority of the corresponding logic symbols. [' |! ', [' pain #5',' | ',' pain #5'] ] corresponding to the logical symbol'! The operation priority of ' is higher than that of the logical symbol ' | ' corresponding to [ ' pain ', ' | ', ' pain ' ], and thus the hierarchical list obtained after the final sorting is shown in table 2.

TABLE 1

[ 'non # pain #5', '|', 'non # pain #5']
[ 'pain', 'pain']"[' |! ', [ ' pain no #5', ' | ', ' pain no #5']]
[ [ ' pain ', ', ' pain '],'&', [' |! ', [ ' pain no #5', ' | ', ' pain no #5']]]

TABLE 2

[ 'non # pain #5', '|', 'non # pain #5']
[' |! ', [ ' pain no #5', ' | ', ' pain no #5']]
[ 'pain', 'pain']
[ [ ' pain ', ', ' pain '],'&', [' |! ', [ ' pain no #5','No # pain #5']]]

Step 103: and converting the original logic expression into a corresponding regular expression according to the level list and the preset meta-character mapping table.

Some symbols in the original logic expression are simple, convenient and easy to understand in manual input, errors are not prone to occurring in manual editing of the original logic expression, and efficiency is improved. However, these symbols cannot be compiled in a computer programming language, and therefore need to be converted into a standard expression form in a regular expression. Based on the embodiment of the application, a preset meta-character mapping table is configured in advance, and the preset meta-character mapping table comprises mapping relations of preset symbols, meta-characters and replacement rules. The preset symbol may be any symbol such as "#", "@", "^" and the like, a character string including the symbol, and the like. The meta-character corresponding to the preset symbol is a standard meta-character in the regular expression or a meta-character combination composed of a plurality of meta-characters according to the corresponding replacement rule.

In the step, each sub-expression included in the hierarchical list is converted into a sub-regular expression according to a preset meta-character mapping table. Specifically, randomly acquiring a sub-expression from at least one sub-expression with the highest current hierarchy in a hierarchical list; converting the obtained sub-expressions into corresponding sub-regular expressions according to a preset meta-character mapping table; and deleting the obtained sub-expressions from the hierarchical list, returning to execute the operation of randomly obtaining one sub-expression from at least one sub-expression with the highest current hierarchical level in the hierarchical list, and executing in a circulating way until the sub-expressions are not contained in the hierarchical list any more. Through the replacement of circulation recursion, conversion efficiency can be improved, omission is not prone to occurring, and accuracy is high.

The specific process of converting the obtained sub-expressions into corresponding sub-regular expressions is to judge whether the obtained sub-expressions contain preset symbols configured in a preset meta-character mapping table. If yes, acquiring the meta character and the replacement rule corresponding to the preset symbol from the preset meta character mapping table. And replacing the preset symbols in the sub-expressions with meta-characters corresponding to the preset symbols according to replacement rules corresponding to the preset symbols to obtain the sub-regular expressions corresponding to the sub-expressions.

The preset symbol configured in the preset meta-character mapping table may be a character string including a connector "#", such as "a # B # δ" or "a # C # B # δ", where A, B, C is a word or the like used for text matching, and δ is a number. "A # B # δ" represents a sentence pattern of A … … B, and the number of characters spaced between A and B cannot exceed δ. "A # C # B # δ" represents a sentence pattern of A … … B … … C, and the number of characters spaced between A and B and between B and C cannot exceed δ. There may be any intermediate character between the first character a and the last character B in the string comprising the connector, connected by the connector "#".

The preset meta-character mapping table is configured with meta-characters and replacement rules corresponding to a character string including a connector "#", for example, a meta-character set corresponding to "a # B # δ" and formed according to a replacement rule is "a (.

From the above, the position of the number controlling the number of characters exists between the first word and the last word of the whole expression, and the meta character "? ", a meta character"? "therefore," # δ "in" a # B δ "may be replaced with" (. At this point, the replacement work for the character number limit is completed.

For the character located between the first and last words, such as "# B # C #" in "a # B # C # δ", the words connected with the connector "#" in the middle except the first and last words may be truncated by the algorithm, and the connectors "#" before and after it are replaced with the primitive character "? After all the replacement is finished, "()" is added from head to tail, and the hierarchical package is finished, namely the replacement operation is finished.

For example, assuming that the constant character is "a # B # C # 20", the constant character is converted into a (. And if the constant character "a # B # C # D # 20", the constant character is converted into "a (.

The process of transforming to generate regular expressions is described by taking the hierarchical list shown in table 2 above as an example. Firstly, a sub-expression [ ' pain no #5', ', ' pain no #5' ] with the highest hierarchy is obtained from the hierarchy list, a preset symbol "#" configured in a preset meta-character mapping table is determined to be included in the sub-expression, and a constant character including the preset symbol "#" in the sub-expression is replaced according to the preset meta-character mapping table to obtain a corresponding sub-regular expression (no (. Then the layer level is higher ['! ', [' not pain #5',' | ',' not pain #5'] ] and it is noted that the conversion has been performed in a higher-level sub-expression, and it is not necessary to convert again, and it is judged that the portion of the sub-expression which has not been converted is [' |! ', ], the portion is transformed (. The sub-expressions [ 'pain', 'pain') are then obtained from the hierarchical list and converted to ((pain) | (pain)). Finally, the sub-expressions [ [ 'pain', 'pain', 'pain' ], ', [' | ] are obtained from the hierarchical list! ', [' not # pain #5',' not # pain #5'] ], a portion in which conversion has not yet been performed is determined as [,' & ', ], which is converted into' ((.

And finally, splicing each obtained sub regular expression into a regular expression corresponding to the original logic expression. For example, the sub regular expressions obtained above are concatenated into a regular expression corresponding to the original logical expression { ' pain evident ': ' (.

The original logic expression input by the user is automatically converted into the corresponding regular expression in the above mode, the regular expression is compiled, and then text matching is performed in the text library to be matched according to the compiled regular expression.

For example, for the regular expression { 'pain evident' (. However, the doctor who takes the injection today is not only friendly in attitude and special in manipulation, but also feels pain, and the doctor can hit the injection.

In order to facilitate understanding of the conversion process of the regular expression in the present application, the following description is made with reference to the accompanying drawings. As shown in fig. 2. A1: an original logic expression edited by a user is received from the regular conversion interface. A2: and generating a hierarchy list corresponding to the original logic expression, wherein the hierarchy list comprises a plurality of sub-expressions arranged according to a preset sequence of the logic hierarchy. A3: randomly acquiring one sub-expression from at least one sub-expression with the highest current hierarchy in the hierarchical list. A4: and judging whether the current sub-expression contains a preset symbol configured in a preset meta-character mapping table, if so, executing the step A5, and if not, executing the step A6. A5: and replacing the preset symbols in the sub-expressions with meta-characters corresponding to the preset symbols according to replacement rules corresponding to the preset symbols to obtain the sub-regular expressions corresponding to the sub-expressions. A6: the current sub-expression is deleted from the hierarchical list. A7: and judging whether the sub expressions exist in the hierarchical list, if so, returning to the step A3, and if not, executing the step A8. A8: and splicing each obtained sub regular expression into a regular expression corresponding to the original logic expression.

In the embodiment of the application, a user can edit the original logic expression in a personalized manner, and the device automatically converts the original logic expression input by the user into a corresponding regular expression. The method has the advantages of realizing automatic generation of the regular expressions meeting the logic requirements of users, along with high generation speed and high accuracy, and being capable of generating the complex regular expressions nested in multiple layers of logic levels, such as generating the regular expressions corresponding to complex logic expressions such as ' available and unavailable … … ', ' unavailable but available and available only when … … exists, and the like. And a large number of regular expressions can be generated quickly, the timeliness requirement of on-line text matching can be met, text matching with random combination of word level, word level and sentence level can be realized, distribution matching is not needed, and the text matching efficiency is improved.

The embodiment of the present application further provides a regular expression conversion device, which is used for executing the regular expression conversion method provided in any of the above embodiments. As shown in fig. 3, the apparatus includes:

an obtaining module 201, configured to obtain an original logic expression;

a generating module 202, configured to generate a hierarchical list corresponding to the original logical expression according to the operation priority of each logical symbol included in the original logical expression;

and the conversion module 203 is configured to convert the original logic expression into a corresponding regular expression according to the hierarchical list and the preset meta-character mapping table.

A generating module 202, configured to obtain a constant array and a logic expression corresponding to the constant array from an original logic expression; and generating a hierarchical list corresponding to the original logic expression according to the constant array and the logic expression.

The generating module 202 is configured to replace the constant identifier in the logic expression with a corresponding constant character in the constant array to obtain a constant logic expression; dividing the constant logic expression into a plurality of sub-expressions according to each logic symbol and bracket included by the constant logic expression; and generating a hierarchical list corresponding to the original logic expression according to the operation priority of each logic symbol and the plurality of sub-expressions.

A generating module 202, configured to determine logic symbols to be executed in each sub-expression respectively; respectively determining the logic level corresponding to each sub-expression according to the operation priority of the logic symbol to be executed in each sub-expression and the bracket level of the bracket in each sub-expression in the constant logic expression; and sequencing each sub-expression according to the preset sequence of the logic hierarchy to obtain a hierarchy list corresponding to the original logic expression.

The conversion module 203 is configured to convert each sub-expression included in the hierarchical list into a sub-regular expression according to a preset meta-character mapping table; and splicing each obtained sub regular expression into a regular expression corresponding to the original logic expression.

A conversion module 203, configured to randomly obtain a sub-expression from at least one sub-expression with the highest current hierarchy in the hierarchical list; converting the obtained sub-expressions into corresponding sub-regular expressions according to a preset meta-character mapping table; and deleting the obtained sub-expressions from the hierarchical list, returning to execute the operation of randomly obtaining one sub-expression from at least one sub-expression with the highest current hierarchical level in the hierarchical list, and executing in a circulating way until the sub-expressions are not contained in the hierarchical list any more.

The conversion module 203 is configured to determine whether the obtained sub-expression includes a preset symbol configured in a preset meta-character mapping table; if yes, acquiring a meta character and a replacement rule corresponding to the preset symbol from a preset meta character mapping table; and replacing the preset symbols in the sub-expressions with meta-characters corresponding to the preset symbols according to replacement rules corresponding to the preset symbols to obtain the sub-regular expressions corresponding to the sub-expressions.

The device also includes: the matching module is used for compiling the regular expression corresponding to the original logic expression; and performing text matching according to the compiled regular expression.

The regular expression conversion device provided by the above embodiment of the present application and the regular expression conversion method provided by the embodiment of the present application have the same beneficial effects as methods adopted, operated or implemented by application programs stored in the regular expression conversion device.

The embodiment of the application also provides electronic equipment for executing the regular expression conversion method. Referring to fig. 4, a schematic diagram of an electronic device provided in some embodiments of the present application is shown. As shown in fig. 4, the electronic device 8 includes: a processor 800, a memory 801, a bus 802 and a communication interface 803, the processor 800, the communication interface 803 and the memory 801 being connected by the bus 802; the memory 801 stores a computer program that can be executed on the processor 800, and the processor 800 executes the regular expression conversion method provided in any of the foregoing embodiments when executing the computer program.

The Memory 801 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the apparatus and at least one other network element is realized through at least one communication interface 803 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, etc. may be used.

Bus 802 can be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The memory 801 is configured to store a program, and the processor 800 executes the program after receiving an execution instruction, where the regular expression conversion method disclosed in any embodiment of the present application may be applied to the processor 800, or implemented by the processor 800.

The processor 800 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 800. The Processor 800 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 801, and the processor 800 reads the information in the memory 801 and completes the steps of the method in combination with the hardware thereof.

The electronic device provided by the embodiment of the application and the regular expression conversion method provided by the embodiment of the application have the same inventive concept and have the same beneficial effects as the method adopted, operated or realized by the electronic device.

Referring to fig. 5, the computer-readable storage medium is an optical disc 30, and a computer program (i.e., a program product) is stored thereon, and when being executed by a processor, the computer program may execute the regular expression conversion method provided in any of the foregoing embodiments.

It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.

The computer-readable storage medium provided by the above-mentioned embodiment of the present application and the regular expression conversion method provided by the embodiment of the present application have the same beneficial effects as the method adopted, operated or implemented by the application program stored in the computer-readable storage medium.

It should be noted that:

in the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted to reflect the following schematic: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

17页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种跨业务树形数据转换方法以及相关设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!