Method for generating short regular expression from finite automaton

文档序号:1846552 发布日期:2021-11-16 浏览:19次 中文

阅读说明:本技术 一种从有限自动机生成简短正则表达式的方法 (Method for generating short regular expression from finite automaton ) 是由 高俊涛 刘云峰 文必龙 王志宝 王永安 于 2021-08-23 设计创作,主要内容包括:本发明公开了一种从有限自动机生成简短正则表达式的方法,包括:S1、对于候选状态消减序列空间,定义临时变量S*存储当前最优状态消减序列,定义临时变量L存储当前最短正则表达式长度;S2、用表达式自动机描述状态消减的中间结果,根据表达式自动机规模估算生成的正则表达式的长度;S3、基于临时变量S*和L,对候选状态消减序列空间进行剪枝处理,提高状态消减序列的搜索效率;S4、按照搜索到的最优状态消减序列,采用状态消减法将有限自动机转换成简短的正则表达式。本发明能被应用于软件行为建模、基于模型的测试、软件工程、计算与语言的研究等领域,显著地提高自动机生成较短正则表达式的效率。(The invention discloses a method for generating a short regular expression from a finite automaton, which comprises the following steps: s1, for the candidate state reduction sequence space, defining a temporary variable S to store the current optimal state reduction sequence, and defining a temporary variable L to store the length of the current shortest regular expression; s2, describing an intermediate result of state reduction by using an expression automaton, and estimating the length of the generated regular expression according to the scale of the expression automaton; s3, pruning the candidate state subduction sequence space based on temporary variables S and L, and improving the searching efficiency of the state subduction sequence; and S4, converting the finite automaton into a short regular expression by adopting a state subtraction method according to the searched optimal state subtraction sequence. The method can be applied to the fields of software behavior modeling, model-based testing, software engineering, calculation, language research and the like, and the efficiency of generating the shorter regular expression by the automaton is obviously improved.)

1. A method for generating a short regular expression from a finite automaton is characterized in that the method comprises the steps of converting the finite automaton into the regular expression by using a state subtraction method, obtaining an optimal state subtraction sequence by searching a candidate set of a state subtraction sequence, and finally generating the short regular expression; which comprises the following steps:

s1, for a candidate state reduction sequence space, defining a temporary variable S to store a current optimal state reduction sequence, defining a temporary variable L to store the length of a current shortest regular expression, and taking the variables S and L as temporary variables in the process of searching the candidate state reduction sequence space;

s2, describing an intermediate result of state reduction by using an expression automaton, and estimating the length of the generated regular expression according to the scale of the expression automaton;

s3, pruning the candidate state subduction sequence space based on temporary variables S and L, and improving the searching efficiency of the state subduction sequence;

and S4, converting the finite automaton into a short regular expression by adopting a state subtraction method according to the searched optimal state subtraction sequence.

2. The method of claim 1, wherein in step S1, it is determined that during the state reduction process of the finite automaton, all possible state reduction sequences form a candidate state reduction sequence space, a temporary variable S is defined to store the current optimal state reduction sequence, and a temporary variable L is defined to store the length of the current shortest regular expression.

3. The method of claim 1, wherein in step S2, each stage of state reduction for the finite automata needs to select a state to reduce and generate an expression automata, represented by EAs, where EA represents the current expression automata, S is a state sequence, and S can be either a complete state reduction sequence including all states to be reduced or an incomplete state reduction subsequence including only some states to be reduced; in the state reduction process, the expression automata generated in each stage depends on the current reduced state and the expression automata in the previous stage, the expression automata complete set generated by the given finite automata is constructed into a tree according to the dependency relationship among the expression automata, each node on the tree represents one expression automata, the expression automata corresponding to the leaf node depends on the expression automata corresponding to the father node, the expression automata corresponding to the leaf node only comprises an initial state and a final state, so that the leaf node uniquely determines a regular expression, and the length of the regular expression is the character scale of the leaf node expression automata.

4. The method of generating short regular expression from finite automaton according to claim 1, wherein in step S3, the optimal sequence search algorithm OSSAP (A) of pruning strategy is adoptedn) Carrying out global search on a candidate state reduction sequence space, calling NextSeq (l, d) algorithm to carry out pruning operation in the search process, and using Eliminate (EA, q) for each state reduction sequencek) The algorithm reduces the state; comparing the absolute EA absolute value with the shortest length L of the current record to judge whether pruning operation is needed or not; if EA is smaller than L, pruning is needed; if EA is larger than L, pruning operation is not needed; finally, returning a state reduction sequence which does not need pruning operation or a state reduction sequence which carries out pruning operation, namely returning an optimal state reduction sequence; wherein A isnRepresenting a finite automaton which n times of reduction operation is required, l representing a state reduction sequence, d representing a pruning depth, EA representing a current expression automaton, | EA | representing a character scale of the current expression automaton,qkrepresenting the kth state shedding sequence.

5. A method of generating short regular expressions from finite automata according to claim 4, wherein the NextSeq (l, d) algorithm implements the following procedure:

enumerating the next uncut state reduction sequence according to the dictionary sequence, if the length of the state reduction sequence is equal to the searched depth of the tree node of the current expression automaton, pruning is not needed, otherwise, skipping the state reduction sequence with the root node to the current search node as the prefix to complete pruning; suffix state reduction sequences after the search node are arranged in ascending order to ensure enumeration of the state reduction sequences in lexicographic order.

Technical Field

The invention relates to the technical field of computers of regular language models, in particular to a method for generating a short regular expression from a finite automaton.

Background

The regular expression has the same expression capability as the finite automaton and belongs to the regular language model. The regular expression can be converted into an equivalent finite automaton, and the finite automaton can also be converted into an equivalent regular expression. Because regular expressions are more suitable for human reading, the regular expressions become text pattern description tools commonly adopted by various computer system tools and development languages. The finite automata is converted into the simple regular expression, so that the regular language understanding and application are facilitated, the research result of the automata learning can be widely applied, and the research on generating the simple regular expression from the finite automata has important theoretical value and practical significance. Regular expressions are generated from finite automata, and it is known to use heuristic algorithms to optimize the shedding sequence such as: the total length of the generated regular expression is longer by methods such as a flow method, a must-pass path method, a loop calculation method, a dynamic degree product method, a static degree product method, a DM weight method and the like.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a method for generating a short regular expression from a finite automaton.

In order to achieve the purpose, the technical scheme provided by the invention is as follows: a method for generating a short regular expression from a finite automaton is characterized in that a state subtraction method is utilized to convert the finite automaton into the regular expression, an optimal state subtraction sequence is obtained by searching a candidate set of a state subtraction sequence, and the short regular expression is generated finally; which comprises the following steps:

s1, for a candidate state reduction sequence space, defining a temporary variable S to store a current optimal state reduction sequence, defining a temporary variable L to store the length of a current shortest regular expression, and taking the variables S and L as temporary variables in the process of searching the candidate state reduction sequence space;

s2, describing an intermediate result of state reduction by using an expression automaton, and estimating the length of the generated regular expression according to the scale of the expression automaton;

s3, pruning the candidate state subduction sequence space based on temporary variables S and L, and improving the searching efficiency of the state subduction sequence;

and S4, converting the finite automaton into a short regular expression by adopting a state subtraction method according to the searched optimal state subtraction sequence.

Further, in step S1, it is determined that, in the process of performing state reduction on the finite automaton, all possible state reduction sequences form a candidate state reduction sequence space, a temporary variable S is defined to store the current optimal state reduction sequence, and a temporary variable L is defined to store the length of the current shortest regular expression.

Further, in step S2, each stage of the state reduction performed on the finite automata needs to select a state to reduce and generate an expression automata, which is represented by EAs, where EA represents the current expression automata, and S is a state sequence, where S can be a complete state reduction sequence including all states to be reduced, or an incomplete state reduction subsequence including only a part of states to be reduced; in the state reduction process, the expression automata generated in each stage depends on the current reduced state and the expression automata in the previous stage, the expression automata complete set generated by the given finite automata is constructed into a tree according to the dependency relationship among the expression automata, each node on the tree represents one expression automata, the expression automata corresponding to the leaf node depends on the expression automata corresponding to the father node, the expression automata corresponding to the leaf node only comprises an initial state and a final state, so that the leaf node uniquely determines a regular expression, and the length of the regular expression is the character scale of the leaf node expression automata.

Further, in step S3, the optimal sequence search algorithm OSSAP (a) using the pruning strategyn) Carrying out global search on a candidate state reduction sequence space, calling NextSeq (l, d) algorithm to carry out pruning operation in the search process, and using Eliminate (EA, q) for each state reduction sequencek) The algorithm reduces the state; will | EA | andcomparing the currently recorded shortest length L to judge whether pruning operation is needed or not; if EA is smaller than L, pruning is needed; if EA is larger than L, pruning operation is not needed; finally, returning a state reduction sequence which does not need pruning operation or a state reduction sequence which carries out pruning operation, namely returning an optimal state reduction sequence; wherein A isnRepresenting finite automata needing to perform subtraction operation for n times, l representing state subtraction sequence, d representing pruning depth, EA representing current expression automata, | EA | representing character scale of current expression automata, qkRepresenting the kth state shedding sequence.

Further, the NextSeq (l, d) algorithm implements the following process:

enumerating the next uncut state reduction sequence according to the dictionary sequence, if the length of the state reduction sequence is equal to the searched depth of the tree node of the current expression automaton, pruning is not needed, otherwise, skipping the state reduction sequence with the root node to the current search node as the prefix to complete pruning; suffix state reduction sequences after the search node are arranged in ascending order to ensure enumeration of the state reduction sequences in lexicographic order.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention obviously improves the generation efficiency of the optimal subduction sequence and the efficiency of generating a short regular expression through the pruning operation.

2. The invention can be applied to the research of software behavior modeling, model-based testing, software engineering, calculation and language, is particularly suitable for improving the modeling and processing capacity of the finite automaton on complex behaviors, improving the quality of regular expressions, fully utilizing the research results in the field of automaton learning and promoting the development of the sequence type big data processing technology.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention.

FIG. 2 is a state reduction process diagram of the method of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

Referring to fig. 1, the present embodiment discloses a method for generating a short regular expression from a finite automaton, comprising the following steps:

1) determining that all possible state reduction sequences form a candidate state reduction sequence space in the process of state reduction of the finite automaton, defining a temporary variable S to store a current optimal state reduction sequence, defining a length of a temporary variable L to store a current shortest regular expression, and taking the variables S and L as temporary variables in the process of searching the candidate state reduction sequence space.

2) Each stage of the finite automata for state reduction needs to select a state for reduction and generate an expression automata, represented by EAs, wherein EA represents the current expression automata, s is a state sequence, and s can be a complete state reduction sequence containing all states to be reduced or an incomplete state reduction subsequence containing only part of the states to be reduced; in the state reduction process, the expression automata generated in each stage depends on the current reduced state and the expression automata in the previous stage, the expression automata complete set generated by the given finite automata is constructed into a tree according to the dependency relationship among the expression automata, each node on the tree represents one expression automata, the expression automata corresponding to the leaf node depends on the expression automata corresponding to the father node, the expression automata corresponding to the leaf node only comprises an initial state and a final state, so that the leaf node uniquely determines a regular expression, and the length of the regular expression is the character scale of the leaf node expression automata.

3) Optimal sequence search algorithm OSSAP (A) adopting pruning strategyn) Carrying out global search on a candidate state reduction sequence space, calling NextSeq (l, d) algorithm to carry out pruning operation in the search process, and using Eliminate (EA, q) for each state reduction sequencek) The algorithm reduces the state; comparing the absolute EA absolute value with the shortest length L of the current record to judge whether pruning operation is needed or not; if EA is less than L, pruning is requiredOperating; if EA is larger than L, pruning operation is not needed; finally, returning a state reduction sequence which does not need pruning operation or a state reduction sequence which carries out pruning operation, namely returning an optimal state reduction sequence; wherein A isnRepresenting finite automata needing to perform subtraction operation for n times, l representing state subtraction sequence, d representing pruning depth, EA representing current expression automata, | EA | representing character scale of current expression automata, qkRepresenting the kth state shedding sequence.

The NextSeq (l, d) algorithm implements the following process: enumerating the next uncut state reduction sequence according to the dictionary sequence, if the length of the state reduction sequence is equal to the searched depth of the tree node of the current expression automaton, pruning is not needed, otherwise, skipping the state reduction sequence with the root node to the current search node as the prefix to complete pruning; suffix state reduction sequences after the search node are arranged in ascending order to ensure enumeration of the state reduction sequences in lexicographic order.

4) And converting the finite automata into a short regular expression by adopting a state subtraction method according to the searched optimal state subtraction sequence.

Referring to fig. 2, the state reduction process of the method of the embodiment includes:

step 1: finite automata A1State q1 is the initial state and state q4 is the final state;

step 2: deterministic finite automata A1The resulting expression automaton A after reduction of state q32The transfer functions from q1 to q3 and q3 to q2 are combined into the transfer functions from q1 to q 2;

and step 3: expression automaton A2The expression automaton A obtained after the q2 is subtracted3The transfer functions from q1 to q2 and q2 to q4 are combined into the transfer functions from q1 to q 4;

and 4, step 4: final expression automaton a formed by merging the transfer functions from q1 to q1 and q4 to q4 into the transfer functions from q1 to q44The regular expression equivalent to it is a (d | (b | ck ×) h) p ×.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

8页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种罐区管理的工业流程编辑系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!