Mongolian international standard code-shape code conversion method and device and computer terminal

文档序号:763053 发布日期:2021-04-06 浏览:38次 中文

阅读说明:本技术 蒙古文国际标准编码到形码转换方法、装置及计算机终端 (Mongolian international standard code-shape code conversion method and device and computer terminal ) 是由 范道尔吉 武慧娟 于 2020-12-28 设计创作,主要内容包括:本发明公开了一种蒙古文国际标准编码到形码转换方法、装置及计算机终端。该方法先枚举出每个蒙古文字母的所有可能的Unicode编码序列,再将按照位置属性“独”、“首”、“中”、“末”分为四份,并且每份中按照Unicode串长度的倒序进行排序,而后获取待转换蒙古文Unicode串,初始化转换结果序列,然后对各份进行遍历并分别判断,并且在遍历过程中利用转换结果序列进行存放,从而实现将蒙古文国际标准编码转换到形码,即能够实现多对多的关系转化,也可以对没有固定转换关系的转换,还可以对不是定长的编码进行转换,从而可以生成作为中间码的字形编码,便于蒙古文印刷体识别和手写识别,方便蒙古文Unicode编码表示和存储。(The invention discloses a method and a device for converting Mongolian international standard codes into shape codes and a computer terminal. The method includes the steps of firstly enumerating all possible Unicode coding sequences of Mongolian letters, then dividing the Mongolian Unicode coding sequences into four parts according to position attributes, sorting the Mongolian Unicode sequences according to the reverse order of the lengths of the Unicode sequences, then obtaining Mongolian Unicode strings to be converted, initializing conversion result sequences, traversing and judging the parts respectively, and storing the conversion result sequences in the traversing process, so that conversion of Mongolian international standard codes into shape codes is achieved, conversion of many-to-many relations can be achieved, conversion of codes without fixed conversion relations can be achieved, and codes without fixed lengths can be converted, and font codes serving as intermediate codes can be generated, Mongolian printing body identification and handwriting identification are facilitated, and representation and storage of the Mongolian Unicode codes are facilitated.)

1. A Mongolian international standard code-to-shape code conversion method is characterized by comprising the following steps:

(1) enumerating all possible Unicode encoding sequences for generating each Mongolian letter, recording each format 'Unicode string ═ letter ID' and recording as UL;

(2) dividing UL into four shares according to position attributes, respectively recording the four shares as UL _ A, UL _ S, UL _ M, UL _ E, and sequencing the UL in each share according to the reverse order of the length of a Unicode string; wherein, UL _ X (i) and scide are defined as Unicode strings before UL row i ═ and letter IDs after UL row i ═ and X is one of letters A, S, M, E;

(3) marking Mongolian Unicode strings to be converted as mgl, and initializing a conversion result sequence glist [ ];

(4) traversing UL _ A and judging whether mgl exists in UL _ A, if yes, ending traversing UL _ A and otherwise continuing to traverse UL _ A;

(5) traversing UL _ S, judging whether UL _ S (i) uni matches the beginning of mgl, if so, determining glist [ [ UL _ A (i) ] scode ] and mgl ═ mgl-UL _ A (i) ] uni, and continuing to traverse UL _ S, otherwise, stopping traversing UL _ S;

(6) traversing UL _ E, judging whether mgl exists in UL _ E, if yes, then glast + [ UL _ A (i), scode ] and ending the traversal of UL _ E, otherwise, continuing to traverse UL _ E;

(7) and traversing UL _ M, judging whether UL _ M (i) uni matches the beginning of mgl, if so, determining glist + [ UL _ A (i) scode ] and mgl-UL _ A (i) uni, and if not, ending the traversal of UL _ M, otherwise, continuing to traverse UL _ M, and if not, stopping the traversal of UL _ M.

2. The Mongolian international standard code-to-shape code conversion method of claim 1, wherein the location attribute comprises one, head, middle, and tail; where the location attribute "one" is used to denote an independent word without a connector, the location attribute "first" is used to denote a string where the connector is present and must appear at the beginning of the word, the location attribute "middle" is used to denote a string where the connector is present and must appear in the middle of the word, and the location attribute "last" is used to denote a string where the connector is present and must appear at the end of the word.

3. The Mongolian international standard code-to-shape code conversion method of claim 1, wherein the number of Mongolian letters is 382; wherein, the Mongolian letters are coded according to the sequence of the first and the last columns and the three-digit number.

4. The Mongolian international standard code-to-shape code conversion method of claim 3, wherein the Mongolian alphabet "001 "is as follows:

1833 180B 1823 180B=001

1833 1826 180C=001

1833 180B 1823 200D=001

1833 180B 1824 200D=001

1833 180B 1825 180B 200D=001

1833 180B 1826 180B 200D=001

200D 1832 1823 180B=001

200D 1832 1824 180B=001

200D 1833 1823 180B=001

200D 1833 1824 180B=001

200D 1832 1823 200D=001

200D 1832 1824 200D=001

200D 1832 1825 200D=001

200D 1832 1826 200D=001

200D 1833 1823 200D=001

200D 1833 1824 200D=001

200D 1833 1825 200D=001

200D 1833 1826 200D=001

202F 1833 1824 200D=001

202F 1833 1826 200D=001

here, "200D" represents a connector.

5. The Mongolian international standard code-to-shape code conversion method as claimed in claim 1, wherein in step (5), an error code 1 is issued after stopping the traversal UL _ S; in step (7), an error code 2 is issued after the traversal UL _ M is aborted.

6. A Mongolian international standard code-to-shape code conversion method is characterized by comprising the following steps:

(1) defining a set of Mongolian letters, wherein the set of Mongolian letters comprises a plurality of Mongolian letters; each Mongolian letter is coded according to the sequence of the first row and the last row and the three-digit number, and the letter ID of each Mongolian letter is obtained;

(2) enumerating all possible Unicode encoding sequences that generate each Mongolian letter, and defining the format of each Mongolian letter "Unicode string ═ letter ID" and as an unordered list;

(3) dividing the unordered list into an exclusive part, a first part, a neutral part and an ending part according to the position attributes of 'exclusive', 'first', 'middle' and 'end', and sequencing each part according to the reverse order of the length of the Unicode string; wherein the location attribute "one" is used to denote an independent word without a connector, the location attribute "first" is used to denote a string in which the connector is present and the connector must appear at the beginning of the word, the location attribute "middle" is used to denote a string in which the connector is present and the connector must appear in the middle of the word, and the location attribute "last" is used to denote a string in which the connector is present and the connector must appear at the end of the word;

(4) obtaining Mongolian Unicode strings to be converted, and initializing a conversion result sequence;

(5) traversing the uniqueness part, judging whether the Mongolian Unicode string to be converted exists in the uniqueness part, if so, storing the letter ID after the corresponding row is equal to the character ID in the conversion result sequence, and if not, continuously traversing the uniqueness part;

(6) traversing the initial part, judging whether Unicode strings of each line of the initial part are in initial matching with Mongolian Unicode strings to be converted, if so, storing the letter ID after the corresponding line is ═ in the conversion result sequence, deleting the corresponding matching part from the Mongolian Unicode strings to be converted, continuing traversing the initial part, and if not, stopping traversing the initial part;

(7) traversing the neutral part, judging whether the Mongolian Unicode string to be converted exists in the neutral part, if so, accumulating the letter ID after the corresponding row is equal to the letter ID in the conversion result sequence, otherwise, continuously traversing the neutral part;

(8) traversing the terminal part, judging whether the Unicode string of the terminal part is in initial matching with the Mongolian Unicode string to be converted, if so, accumulating the letter ID after the corresponding line is ═ in the conversion result sequence, deleting the corresponding matching part from the Mongolian Unicode string to be converted, continuing traversing the terminal part, and if not, stopping traversing the terminal part.

7. The Mongolian international standard code-to-shape code conversion method of claim 6, wherein in the step (1), the number of Mongolian letters is 382;

in step (6), sending out an error code 1 after stopping traversing the first part;

in step (8), an error code 2 is issued after the traversal of the ending part is aborted.

8. A device for converting an international Mongolian code into a shape code, which applies the international Mongolian code into a shape code conversion method according to any one of claims 1 to 5, comprising:

an enumeration module, configured to enumerate all possible Unicode encoding sequences for generating each Mongolian letter, and record each format "Unicode string ═ letter ID" and record it as UL;

the dividing module is used for dividing UL into four parts according to the position attributes, recording the four parts as UL _ A, UL _ S, UL _ M, UL _ E respectively, and sequencing the parts according to the reverse order of the length of the Unicode string; wherein, UL _ X (i) and scide are defined as Unicode strings before UL row i ═ and letter IDs after UL row i ═ and X is one of letters A, S, M, E;

an initialization module, which is used for recording Mongolian Unicode strings to be converted as mgl, and initializing conversion result sequences glist [ ];

a first traversal module, configured to traverse the UL _ a and determine whether mgl exists in the UL _ a, if the row number is i, then list [ UL _ a (i.). scode ] and end traversal of the UL _ a, otherwise, continue to traverse the UL _ a;

a second traversal module, configured to traverse the UL _ S, determine whether UL _ S (i), uni matches the beginning of mgl, if there is a matching row, determine that glist [ UL _ a (i), scode ] and mgl ═ mgl-UL _ a (i), uni, and continue to traverse the UL _ S, otherwise, stop traversing the UL _ S;

a third traversing module, configured to traverse the UL _ E, determine whether mgl exists in the UL _ E, if yes and the row number is i, determine that "glist + [ UL _ a (i). scode ] and end traversing the UL _ E, otherwise, continue traversing the UL _ E;

and a traversal module four, configured to traverse UL _ M, determine whether UL _ M (i), uni matches the beginning of mgl, if there is a matching row, determine "glist + [ UL _ a (i), scode ] and" mgl-UL _ a (i), uni ", and when mgl is empty, end traversal of UL _ M, otherwise continue traversal of UL _ M, and if there is no matching row, stop traversal of UL _ M.

9. A Mongolian international standard code-to-shape code conversion apparatus to which the Mongolian international standard code-to-shape code conversion method according to any one of claims 6 or 7 is applied, comprising:

a definition module for defining a set of Mongolian letters, the set of Mongolian letters comprising a plurality of Mongolian letters; each Mongolian letter is coded according to the sequence of the first row and the last row and the three-digit number, and the letter ID of each Mongolian letter is obtained;

an enumeration module for enumerating all possible Unicode encoding sequences that generate each Mongolian letter, and defining the format "Unicode string ═ letter ID" of each Mongolian letter and being an unordered list;

the dividing module is used for dividing the unordered list into an exclusive part, a first part, a neutral part and an ending part according to the position attributes of 'exclusive', 'first', 'middle' and 'end', and sequencing each part according to the reverse order of the lengths of the Unicode strings; wherein the location attribute "one" is used to denote an independent word without a connector, the location attribute "first" is used to denote a string in which the connector is present and the connector must appear at the beginning of the word, the location attribute "middle" is used to denote a string in which the connector is present and the connector must appear in the middle of the word, and the location attribute "last" is used to denote a string in which the connector is present and the connector must appear at the end of the word;

the initialization module is used for acquiring Mongolian Unicode strings to be converted and initializing conversion result sequences;

a first traversing module, configured to traverse the uniqueness part, and determine whether the Mongolian Unicode string to be converted exists in the uniqueness part, if so, store the letter ID after the row is "═ in the conversion result sequence, otherwise, continue to traverse the uniqueness part;

a traversal module II, configured to traverse the initial portion, determine whether Unicode strings of each line of the initial portion are in initial matching with the Unicode string of the Mongolian to be converted, if there is a matching line, store the letter ID after the line is "═ in the conversion result sequence, delete the corresponding matching portion from the Unicode string of the Mongolian to be converted, continue traversing the initial portion, and if there is no matching line, stop traversing the initial portion;

a third traversing module, configured to traverse the neutral portion, and determine whether the Mongolian Unicode string to be converted exists in the neutral portion, if so, accumulate the letter ID after the row is "═ in the conversion result sequence, otherwise, continue traversing the neutral portion;

and the traversing module is used for traversing the terminal part, judging whether the Unicode string of the terminal part is in initial matching with the Unicode string of the Mongolian to be converted, if so, accumulating the letter ID after the corresponding line is ═ in the conversion result sequence, deleting the corresponding matching part from the Mongolian Unicode string to be converted, continuously traversing the terminal part, and if not, stopping traversing the terminal part.

10. A computer terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, carries out the steps of the method for conversion of a montage international standard code to a shape code according to any one of claims 1 to 7.

Technical Field

The invention relates to a conversion method in the technical field of information processing, in particular to a Mongolian international standard code-shape code conversion method, a Mongolian international standard code-shape code conversion device and a computer terminal.

Background

The research work of Mongolian information processing is firstly developed in the aspect of character typesetting, and because a character typesetting system focuses on the shape of a character, only one word can be in the correct shape. Therefore, the Mongolian coding scheme based on the shape code also comes along. When different research units formulate respective shape code schemes, some research units adopt one character to define only one code, but can express a plurality of letters with different pronunciations; some characters define a plurality of codes, and the same font codes are different and can represent letters with different pronunciations; some methods redefine the partial structure of multiple letters into a character or recombine partial strokes of letters from the habit of writing characters and the aesthetic point of view, and define a code for each character.

The Mongolian international standard code was published by the International organization for standardization and Unicode technical Committee in 1993 in the ISO/IEC 10646 International Standard character set. In this standard, one "plane" starting from U +1800 is taken as the Mongolian character set encoding bit. The code bit interval actually occupied by the Mongolian character is U + 1800-U +18 AF. The international Mongolian standard character set includes traditional Mongolian, Tuite, Henburg, Manchu and three characters of Mongolian, Tuotui and Manchu for transcribing the Egli characters, punctuation marks, numbers and control marks of Tibetan and Sanskrit. Most letters of traditional Mongolian, Tuben, Xiebei, Manchu have different variant forms depending on their position in the word (prefix, middle of word, suffix) etc., sometimes more than ten variant forms can be given to a letter.

Font coding is sometimes required as an intermediate code in current Mongolian print recognition and handwriting recognition tasks, rather than using Unicode coding directly. Because the Mongolian Unicode encoding cannot directly represent glyph information. Mongolian resources are all represented and stored by Unicode at present. Therefore, a conversion process from Mongolian Unicode coding to font code is needed to solve the problem that the existing Mongolian Unicode coding cannot be directly used in Mongolian print recognition and handwriting recognition tasks and is inconvenient to use.

Disclosure of Invention

In order to solve the technical problem that the existing Mongolian Unicode cannot be directly used in Mongolian print recognition and handwriting recognition tasks and is inconvenient to use, the invention provides a method and a device for converting Mongolian international standard codes into shape codes and a computer terminal.

The invention is realized by adopting the following technical scheme: a Mongolian international standard code-to-shape code conversion method comprises the following steps:

(1) enumerating all possible Unicode encoding sequences for generating each Mongolian letter, recording each format 'Unicode string ═ letter ID' and recording as UL;

(2) dividing UL into four shares according to position attributes, respectively recording the four shares as UL _ A, UL _ S, UL _ M, UL _ E, and sequencing the UL in each share according to the reverse order of the length of a Unicode string; wherein, UL _ X (i) and scide are defined as Unicode strings before UL row i ═ and letter IDs after UL row i ═ and X is one of letters A, S, M, E;

(3) marking Mongolian Unicode strings to be converted as mgl, and initializing a conversion result sequence glist [ ];

(4) traversing UL _ A and judging whether mgl exists in UL _ A, if yes, ending traversing UL _ A and otherwise continuing to traverse UL _ A;

(5) traversing UL _ S, judging whether UL _ S (i) uni matches the beginning of mgl, if so, determining glist [ [ UL _ A (i) ] scode ] and mgl ═ mgl-UL _ A (i) ] uni, and continuing to traverse UL _ S, otherwise, stopping traversing UL _ S;

(6) traversing UL _ E, judging whether mgl exists in UL _ E, if yes, then glast + [ UL _ A (i), scode ] and ending the traversal of UL _ E, otherwise, continuing to traverse UL _ E;

(7) and traversing UL _ M, judging whether UL _ M (i) uni matches the beginning of mgl, if so, determining glist + [ UL _ A (i) scode ] and mgl-UL _ A (i) uni, and if not, ending the traversal of UL _ M, otherwise, continuing to traverse UL _ M, and if not, stopping the traversal of UL _ M.

The invention enumerates all possible Unicode coding sequences of each Mongolian letter, then divides the Unicode coding sequences into four parts according to the position attribute, and sorts the parts according to the reverse order of the Unicode string length, then, each part is traversed and respectively judged, and the conversion result sequence is used for storing in the traversing process, thereby realizing the conversion of Mongolian international standard codes into shape codes, namely, the conversion of many-to-many relationship can be realized, the conversion without fixed conversion relationship can be realized, the conversion of codes which are not fixed length can be realized, thereby generating a font code as a middle code, facilitating the recognition of Mongolian print and handwriting, facilitating the representation and storage of Mongolian Unicode, therefore, the technical problem that the existing Mongolian Unicode code cannot be directly used in Mongolian print recognition and handwriting recognition tasks and is inconvenient to use is solved.

As a further improvement of the above solution, the location attribute includes "sole", "head", "middle", "end"; where the location attribute "one" is used to denote an independent word without a connector, the location attribute "first" is used to denote a string where the connector is present and must appear at the beginning of the word, the location attribute "middle" is used to denote a string where the connector is present and must appear in the middle of the word, and the location attribute "last" is used to denote a string where the connector is present and must appear at the end of the word.

As a further improvement of the above solution, the number of the mongolian letters is 382; wherein, the Mongolian letters are coded according to the sequence of the first and the last columns and the three-digit number.

As a further improvement of the above scheme, the Mongolian lettersAll possible Unicode coding sequences of (a) are:

1833 180B 1823 180B=001

1833 1826 180C=001

1833 180B 1823 200D=001

1833 180B 1824 200D=001

1833 180B 1825 180B 200D=001

1833 180B 1826 180B 200D=001

200D 1832 1823 180B=001

200D 1832 1824 180B=001

200D 1833 1823 180B=001

200D 1833 1824 180B=001

200D 1832 1823 200D=001

200D 1832 1824 200D=001

200D 1832 1825 200D=001

200D 1832 1826 200D=001

200D 1833 1823 200D=001

200D 1833 1824 200D=001

200D 1833 1825 200D=001

200D 1833 1826 200D=001

202F 1833 1824 200D=001

202F 1833 1826 200D=001

here, "200D" represents a connector.

As a further improvement of the above scheme, in step (5), an error code 1 is issued after the traversal of UL _ S is terminated; in step (7), an error code 2 is issued after the traversal UL _ M is aborted.

The invention also provides a method for converting Mongolian international standard codes into shape codes, which comprises the following steps:

(1) defining a set of Mongolian letters, wherein the set of Mongolian letters comprises a plurality of Mongolian letters; each Mongolian letter is coded according to the sequence of the first row and the last row and the three-digit number, and the letter ID of each Mongolian letter is obtained;

(2) enumerating all possible Unicode encoding sequences that generate each Mongolian letter, and defining the format of each Mongolian letter "Unicode string ═ letter ID" and as an unordered list;

(3) dividing the unordered list into an exclusive part, a first part, a neutral part and an ending part according to the position attributes of 'exclusive', 'first', 'middle' and 'end', and sequencing each part according to the reverse order of the length of the Unicode string; wherein the location attribute "one" is used to denote an independent word without a connector, the location attribute "first" is used to denote a string in which the connector is present and the connector must appear at the beginning of the word, the location attribute "middle" is used to denote a string in which the connector is present and the connector must appear in the middle of the word, and the location attribute "last" is used to denote a string in which the connector is present and the connector must appear at the end of the word;

(4) obtaining Mongolian Unicode strings to be converted, and initializing a conversion result sequence;

(5) traversing the uniqueness part, judging whether the Mongolian Unicode string to be converted exists in the uniqueness part, if so, storing the letter ID after the corresponding row is equal to the character ID in the conversion result sequence, and if not, continuously traversing the uniqueness part;

(6) traversing the initial part, judging whether Unicode strings of each line of the initial part are in initial matching with Mongolian Unicode strings to be converted, if so, storing the letter ID after the corresponding line is ═ in the conversion result sequence, deleting the corresponding matching part from the Mongolian Unicode strings to be converted, continuing traversing the initial part, and if not, stopping traversing the initial part;

(7) traversing the neutral part, judging whether the Mongolian Unicode string to be converted exists in the neutral part, if so, accumulating the letter ID after the corresponding row is equal to the letter ID in the conversion result sequence, otherwise, continuously traversing the neutral part;

(8) traversing the terminal part, judging whether the Unicode string of the terminal part is in initial matching with the Mongolian Unicode string to be converted, if so, accumulating the letter ID after the corresponding line is ═ in the conversion result sequence, deleting the corresponding matching part from the Mongolian Unicode string to be converted, continuing traversing the terminal part, and if not, stopping traversing the terminal part.

As a further improvement of the above scheme, in the step (1), the number of the mongolian letters is 382;

in step (6), sending out an error code 1 after stopping traversing the first part;

in step (8), an error code 2 is issued after the traversal of the ending part is aborted.

The present invention also provides a device for converting Mongolian international standard code into shape code, which applies any of the Mongolian international standard code into shape code, and comprises:

an enumeration module, configured to enumerate all possible Unicode encoding sequences for generating each Mongolian letter, and record each format "Unicode string ═ letter ID" and record it as UL;

the dividing module is used for dividing UL into four parts according to the position attributes, recording the four parts as UL _ A, UL _ S, UL _ M, UL _ E respectively, and sequencing the parts according to the reverse order of the length of the Unicode string; wherein, UL _ X (i) and scide are defined as Unicode strings before UL row i ═ and letter IDs after UL row i ═ and X is one of letters A, S, M, E;

an initialization module, which is used for recording Mongolian Unicode strings to be converted as mgl, and initializing conversion result sequences glist [ ];

a first traversal module, configured to traverse the UL _ a and determine whether mgl exists in the UL _ a, if the row number is i, then list [ UL _ a (i.). scode ] and end traversal of the UL _ a, otherwise, continue to traverse the UL _ a;

a second traversal module, configured to traverse the UL _ S, determine whether UL _ S (i), uni matches the beginning of mgl, if there is a matching row, determine that glist [ UL _ a (i), scode ] and mgl ═ mgl-UL _ a (i), uni, and continue to traverse the UL _ S, otherwise, stop traversing the UL _ S;

a third traversing module, configured to traverse the UL _ E, determine whether mgl exists in the UL _ E, if yes and the row number is i, determine that "glist + [ UL _ a (i). scode ] and end traversing the UL _ E, otherwise, continue traversing the UL _ E;

and a traversal module four, configured to traverse UL _ M, determine whether UL _ M (i), uni matches the beginning of mgl, if there is a matching row, determine "glist + [ UL _ a (i), scode ] and" mgl-UL _ a (i), uni ", and when mgl is empty, end traversal of UL _ M, otherwise continue traversal of UL _ M, and if there is no matching row, stop traversal of UL _ M.

The present invention also provides a device for converting Mongolian international standard code into shape code, which applies any of the Mongolian international standard code into shape code, and comprises:

a definition module for defining a set of Mongolian letters, the set of Mongolian letters comprising a plurality of Mongolian letters; each Mongolian letter is coded according to the sequence of the first row and the last row and the three-digit number, and the letter ID of each Mongolian letter is obtained;

an enumeration module for enumerating all possible Unicode encoding sequences that generate each Mongolian letter, and defining the format "Unicode string ═ letter ID" of each Mongolian letter and being an unordered list;

the dividing module is used for dividing the unordered list into an exclusive part, a first part, a neutral part and an ending part according to the position attributes of 'exclusive', 'first', 'middle' and 'end', and sequencing each part according to the reverse order of the lengths of the Unicode strings; wherein the location attribute "one" is used to denote an independent word without a connector, the location attribute "first" is used to denote a string in which the connector is present and the connector must appear at the beginning of the word, the location attribute "middle" is used to denote a string in which the connector is present and the connector must appear in the middle of the word, and the location attribute "last" is used to denote a string in which the connector is present and the connector must appear at the end of the word;

the initialization module is used for acquiring Mongolian Unicode strings to be converted and initializing conversion result sequences;

a first traversing module, configured to traverse the uniqueness part, and determine whether the Mongolian Unicode string to be converted exists in the uniqueness part, if so, store the letter ID after the row is "═ in the conversion result sequence, otherwise, continue to traverse the uniqueness part;

a traversal module II, configured to traverse the initial portion, determine whether Unicode strings of each line of the initial portion are in initial matching with the Unicode string of the Mongolian to be converted, if there is a matching line, store the letter ID after the line is "═ in the conversion result sequence, delete the corresponding matching portion from the Unicode string of the Mongolian to be converted, continue traversing the initial portion, and if there is no matching line, stop traversing the initial portion;

a third traversing module, configured to traverse the neutral portion, and determine whether the Mongolian Unicode string to be converted exists in the neutral portion, if so, accumulate the letter ID after the row is "═ in the conversion result sequence, otherwise, continue traversing the neutral portion;

and the traversing module is used for traversing the terminal part, judging whether the Unicode string of the terminal part is in initial matching with the Unicode string of the Mongolian to be converted, if so, accumulating the letter ID after the corresponding line is ═ in the conversion result sequence, deleting the corresponding matching part from the Mongolian Unicode string to be converted, continuously traversing the terminal part, and if not, stopping traversing the terminal part.

The present invention also provides a computer terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of any of the above-mentioned Mongolian international standard code-to-shape code conversion methods when executing the program.

Compared with the existing Mongolian Unicode coding, the Mongolian international standard coding-shape code conversion method, the Mongolian international standard coding-shape code conversion device and the computer terminal have the following beneficial effects:

1. the method for converting Mongolian international standard codes into shape codes includes the steps of enumerating all possible Unicode coding sequences of Mongolian letters, dividing the Mongolian international standard codes into four parts according to position attributes, sequencing the Mongolian international standard codes according to the reverse order of the Unicode string length in each part, traversing the parts, judging the parts respectively, storing the parts by utilizing conversion result sequences in the traversing process, converting the Mongolian international standard codes into shape codes, namely converting the Mongolian international standard codes into the shape codes, converting the Mongolian international standard codes into the shape codes without fixed conversion relations, converting the Mongolian international standard codes into the shape codes without the fixed conversion relations, and generating character shape codes serving as intermediate codes.

2. The beneficial effect of the Mongolian international standard code-shape code conversion device is the same as that of the Mongolian international standard code-shape code conversion method, and the description is omitted here.

3. The beneficial effect of the computer terminal is the same as that of the Mongolian international standard code-shape code conversion method, and the description is omitted here.

Drawings

FIG. 1 is a diagram showing Mongolian Unicode encoding in embodiment 1 of the present invention.

FIG. 2 is a statistical chart of 382 Mongolian letters in example 1 of the present invention.

Fig. 3 is a flowchart of the method for converting the montgomery international standard code into the shape code according to embodiment 1 of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Example 1

The embodiment provides a Mongolian international standard code-to-shape code conversion method for converting Mongolian international standard codes to shape codes. Referring to fig. 1, the inventors have found through research that only one of these variant forms, called "nominal character", is encoded according to the relevant rules of ISO/IEC 10646. The general principle of "nominal character" selection is: for vowels, their independent form is adopted: for consonants, the prefix form that appears in front of the vowel "a" is used. All other forms are referred to as "deformation manifest forms". If some letters of different languages have the same form at the beginning of the word. When the word or the end of a word has different forms, different fonts are adopted to distinguish the languages. For example, letters pronounced as BA, for traditional Mongolian(U +182A) represents a Chinese character ofU + 184B).

In the traditional Mongolian, there are four special vowels O, U, OE and UE, the deformation of the first two characters shows the same character shape, and the deformation of the last two characters shows the same character shape, but they are four letters with different shapes. Taken in the character set. In a stand-alone formPrefix form of UIndependent forms of OEAnd the prefix form of E

In each language, the deformation corresponding to the nominal character represents the selection of the character, which can be generally determined according to its position or part of speech in the word. But a few deformations can not be determined, and the deformations can only be determined by control symbols in the character set. The control symbol used by Mongolian international standard code character set comprises: u +180B, U +180C, U +180D, U +180E, U +202F, U +200C, U + 200D. The last 3 codes from the common symbol area. Font coding is sometimes required as an intermediate code in current Mongolian print recognition and handwriting recognition tasks, rather than using Unicode coding directly. Because Mongolian Unicode cannot directly represent font information, Mongolian resources are represented and stored by Unicode at present. From a glyph perspective, Mongolian does not have as many as 26 letters fixed independently as English, but can define different sets of letters from different perspectives.

Based on this, the present embodiment defines the number of the Mongolian letters as 382, i.e. a Mongolian letter set containing 382 letters is formed, and the 382 Mongolian letters are shown in FIG. 2.

Wherein the Mongolian letters are each coded in three digits in the order of first and last columns, starting from the upper left corner, e.g.

The conversion process is to convert Mongolian represented by Unicode into Mongolian represented by letter number. For example: mongolian paper "(0x 18200 x 18370 x 18200 x1833) "convert to" 244196369 ". The conversion relationship is characterized in that:

(1) many-to-many relationship. For example, the two "0 x 18370 x 1820" in the above example are converted to "196";

(2) there is no fixed conversion relationship, e.g., "0 x 1820" in the above example is converted to "244", but other words "0 x 1820" may be converted to "307310329357", etc.;

(3) instead of the fixed-length conversion, "0 x1820 → 244", "0 x 18370 x1820 → 196", "0 x1833 → 369" in the above example.

Referring to fig. 3, the inventor proposes a conversion method in consideration of the above, and the method for converting the montage international standard code into the shape code of the embodiment includes the following steps.

(1) All possible Unicode encoding sequences that generate each of the mongolian alphabets are enumerated, and each format "Unicode string ═ alphabet ID" is recorded and denoted UL. Wherein, Mongolian lettersAll possible Unicode coding sequences of (a) are:

1833 180B 1823 180B=001

1833 1826 180C=001

1833 180B 1823 200D=001

1833 180B 1824 200D=001

1833 180B 1825 180B 200D=001

1833 180B 1826 180B 200D=001

200D 1832 1823 180B=001

200D 1832 1824 180B=001

200D 1833 1823 180B=001

200D 1833 1824 180B=001

200D 1832 1823 200D=001

200D 1832 1824 200D=001

200D 1832 1825 200D=001

200D 1832 1826 200D=001

200D 1833 1823 200D=001

200D 1833 1824 200D=001

200D 1833 1825 200D=001

200D 1833 1826 200D=001

202F 1833 1824 200D=001

202F 1833 1826 200D=001

here, "200D" represents a connector. The absence of "200D" indicates that the string is an independent word, represented by the location attribute "one"; "x200D" indicates that this string must appear at the beginning of the word, represented by the position attribute "first"; "200D by 200D" indicates that this string must appear in the middle of the word, represented by the position attribute "in"; "200D" indicates that this string must appear at the end of the word, represented by the position attribute "end".

(2) Dividing UL into four shares according to position attributes, respectively recording the four shares as UL _ A, UL _ S, UL _ M, UL _ E, and sequencing the UL in each share according to the reverse order of the length of a Unicode string; where, UL _ X (i) and scide are defined as Unicode strings before UL row i ═ and letter IDs after UL row i ═ and X is one of letters A, S, M, E. In the present embodiment, the location attributes include "sole", "head", "middle", and "end". Where the location attribute "one" is used to denote an independent word without a connector, the location attribute "first" is used to denote a string where a connector exists and must appear at the beginning of the word, the location attribute "middle" is used to denote a string where a connector exists and must appear in the middle of the word, and the location attribute "last" is used to denote a string where a connector exists and must appear at the end of the word.

(3) The Mongolian Unicode string to be converted is denoted mgl, and the conversion result sequence glist is initialized [ ]. glist [ ] indicates null, and mgl is held in "4-bit hexadecimal" format.

(4) And traversing UL _ A and judging whether mgl exists in UL _ A, if yes, ending the traversal of [ UL _ A (i).

(5) And traversing UL _ S, judging whether UL _ S (i) uni matches the beginning of mgl, if so, determining glist [ [ UL _ A (i) ] scode ] and mgl ═ mgl-UL _ A (i) ] uni, and continuing to traverse UL _ S, otherwise, stopping traversing UL _ S. In this step, an error code 1 is issued after the traversal UL _ S is terminated.

(6) And traversing UL _ E, judging whether mgl exists in UL _ E, if yes, determining glist + [ UL _ A (i.). scode ] and ending the traversal of UL _ E, otherwise, continuing to traverse UL _ E.

(7) And traversing UL _ M, judging whether UL _ M (i) uni matches the beginning of mgl, if so, determining glist + [ UL _ A (i) scode ] and mgl-UL _ A (i) uni, and if not, ending the traversal of UL _ M, otherwise, continuing to traverse UL _ M, and if not, stopping the traversal of UL _ M. In this step, an error code 2 is issued after the traversal of UL _ M is terminated.

In summary, compared with the existing Mongolian Unicode encoding, the Mongolian international standard encoding-shape code conversion method of the embodiment has the following advantages:

the method for converting Mongolian international standard codes into shape codes includes the steps of enumerating all possible Unicode coding sequences of Mongolian letters, dividing the Mongolian international standard codes into four parts according to position attributes, sequencing the Mongolian international standard codes according to the reverse order of the Unicode string length in each part, traversing the parts, judging the parts respectively, storing the parts by utilizing conversion result sequences in the traversing process, converting the Mongolian international standard codes into shape codes, namely converting the Mongolian international standard codes into the shape codes, converting the Mongolian international standard codes into the shape codes without fixed conversion relations, converting the Mongolian international standard codes into the shape codes without the fixed conversion relations, and generating character shape codes serving as intermediate codes.

Example 2

This embodiment provides a method for converting the Mongolian international standard code into a shape code, which is similar to that of embodiment 1, and in this embodiment, the method specifically includes the following steps.

(1) A set of Mongolian letters is defined, the set of Mongolian letters comprising a plurality of Mongolian letters. And each Mongolian letter is coded according to the three-digit number in the sequence of the first row and the last row to obtain the letter ID of each Mongolian letter. Wherein the number of the Mongolian letters is 382, and the 382 Mongolian letters are shown in FIG. 2 in example 1.

(2) All possible Unicode encoding sequences that generate each Mongolian letter are enumerated, and the format "Unicode string ═ letter ID" for each Mongolian letter is defined and is an unordered list.

(3) The unordered list is divided into an exclusive part, a first part, a neutral part and an ending part according to the position attributes of 'exclusive', 'first', 'middle' and 'end', and each part is sorted according to the reverse order of the length of the Unicode string. Where the location attribute "one" is used to denote an independent word without a connector, the location attribute "first" is used to denote a string where a connector exists and must appear at the beginning of the word, the location attribute "middle" is used to denote a string where a connector exists and must appear in the middle of the word, and the location attribute "last" is used to denote a string where a connector exists and must appear at the end of the word.

(4) Obtaining Mongolian Unicode strings to be converted and initializing conversion result sequences.

(5) Traversing the uniqueness part, judging whether Mongolian Unicode strings to be converted exist in the uniqueness part, if so, storing the letter ID after the corresponding line is changed into the letter ID in a conversion result sequence, and if not, continuously traversing the uniqueness part.

(6) Traversing the initial part, judging whether the Unicode strings of each row of the initial part are in initial matching with the Mongolian Unicode strings to be converted, if so, storing the letter ID after the corresponding row is' in the conversion result sequence, deleting the corresponding matching part from the Mongolian Unicode strings to be converted, continuing traversing the initial part, if not, stopping traversing the initial part, and sending out an error code 1 after stopping traversing the initial part.

(7) And traversing the neutral part, judging whether the Mongolian Unicode string to be converted exists in the neutral part, if so, accumulating the letter ID after the corresponding row is equal to the letter ID in the conversion result sequence, and if not, continuously traversing the neutral part.

(8) Traversing the terminal part, judging whether the Unicode string of the terminal part is in initial matching with the Mongolian Unicode string to be converted, if so, accumulating the letter ID after the corresponding line is marked in the conversion result sequence, deleting the corresponding matching part from the Mongolian Unicode string to be converted, continuing traversing the terminal part, if not, stopping traversing the terminal part, and sending out an error code 2 after stopping traversing the terminal part.

Example 3

The embodiment provides a device for converting the Mongolian international standard code into the shape code, which applies the method for converting the Mongolian international standard code into the shape code in embodiment 1 and comprises an enumeration module, a division module, an initialization module, a first traversal module, a second traversal module, a third traversal module and a fourth traversal module.

The enumeration module is used for enumerating all possible Unicode coding sequences for generating each Mongolian letter, recording each format of Unicode string which is the letter ID and recording the format as UL. The dividing module is used for dividing UL into four shares according to the position attribute, and respectively recording the four shares as UL _ A, UL _ S, UL _ M, UL _ E, and sorting the UL in each share according to the reverse order of the length of the Unicode string. Where, UL _ X (i) and scide are defined as Unicode strings before UL row i ═ and letter IDs after UL row i ═ and X is one of letters A, S, M, E. The initialization module is used for recording Mongolian Unicode strings to be converted as mgl, and initializing conversion result sequences glist [ ].

And the traversal module is used for traversing the UL _ a and judging whether mgl exists in the UL _ a, if so, the list is [ UL _ a (i), scode ] and the traversal of the UL _ a is finished, otherwise, the traversal of the UL _ a is continued. The traversal module is used for traversing the UL _ S, determining whether UL _ S (i), uni matches the beginning of mgl, if there is a matching row, then glist [ UL _ a (i), scode ] and mgl ═ mgl-UL _ a (i), uni, and continuously traversing the UL _ S, otherwise, stopping traversing the UL _ S. And the third traversing module is used for traversing the UL _ E, judging whether mgl exists in the UL _ E, if so, determining that glist + [ UL _ a (i). scode ] ends to traverse the UL _ E, and otherwise, continuing to traverse the UL _ E. The traversal module is configured to traverse the UL _ M, determine whether UL _ M (i), uni matches the beginning of mgl, if there is a matching row, determine that glist + [ UL _ a (i), scode ] and mgl ═ mgl-UL _ a (i), uni, and end traversal of the UL _ M when mgl ═ is empty, otherwise continue traversal of the UL _ M, and if there is no matching row, stop traversal of the UL _ M. The four traversal modules are respectively used for traversing the four parts divided by the division module to realize the conversion from Mongolian international standard codes to shape codes.

Example 4

The embodiment provides a device for converting the Mongolian international standard code into the shape code, which applies the method for converting the Mongolian international standard code into the shape code in the embodiment 2 and comprises a definition module, an enumeration module, a division module, an initialization module, a traversal module I, a traversal module II, a traversal module III and a traversal module IV.

The definition module is used for defining a Mongolian alphabet set, and the Mongolian alphabet set comprises a plurality of Mongolian letters. And each Mongolian letter is coded according to the three-digit number in the sequence of the first row and the last row to obtain the letter ID of each Mongolian letter. The enumeration module is used for enumerating all possible Unicode encoding sequences for generating each Mongolian letter, and defines the format of each Mongolian letter as a Unicode string (letter ID) and as an unordered list. The dividing module is used for dividing the unordered list into an exclusive part, a first part, a neutral part and an ending part according to the position attributes of 'exclusive', 'first', 'middle' and 'ending', and sequencing each part according to the reverse order of the lengths of the Unicode strings. Where the location attribute "one" is used to denote an independent word without a connector, the location attribute "first" is used to denote a string where a connector exists and must appear at the beginning of the word, the location attribute "middle" is used to denote a string where a connector exists and must appear in the middle of the word, and the location attribute "last" is used to denote a string where a connector exists and must appear at the end of the word. The initialization module is used for obtaining Mongolian Unicode strings to be converted and initializing conversion result sequences.

And the traversal module I is used for traversing the uniqueness part and judging whether the Mongolian Unicode string to be converted exists in the uniqueness part, if so, storing the letter ID after the corresponding line is ═ in the conversion result sequence, and otherwise, continuously traversing the uniqueness part. The traversal module is used for traversing the initial part, judging whether the Unicode strings of all lines of the initial part are in initial matching with the Mongolian Unicode strings to be converted or not, if the matching lines exist, storing the letter ID after the corresponding line is equal to the letter ID in the conversion result sequence, deleting the corresponding matching part from the Mongolian Unicode strings to be converted, continuing to traverse the initial part, and if the matching lines do not exist, stopping traversing the initial part. The traversing module is used for traversing the neutral part, judging whether the Mongolian Unicode string to be converted exists in the neutral part, if so, accumulating the letter ID after the corresponding row is equal to the letter ID in the conversion result sequence, and if not, continuously traversing the neutral part. And the traversal module is used for traversing the terminal part, judging whether the Unicode string of the terminal part is in initial matching with the Mongolian Unicode string to be converted or not, if the matching line exists, accumulating the corresponding letter ID after the line is 'matched' in the conversion result sequence, deleting the corresponding matching part from the Mongolian Unicode string to be converted, continuously traversing the terminal part, and if the matching line does not exist, stopping traversing the terminal part. The four traversal modules are respectively used for traversing the four parts divided by the division module to realize the conversion from Mongolian international standard codes to shape codes.

Example 5

The present embodiments provide a computer terminal comprising a memory, a processor, and a computer program stored on the memory and executable on the processor. The processor, when executing the program, implements the steps of the Mongolian International Standard code-to-shape code conversion method of embodiment 1 or 2.

When the method of embodiment 1 or 2 is applied, the method may be applied in a form of software, for example, a program designed to run independently is installed on a computer terminal, and the computer terminal may be a computer, a smart phone, a control system, other internet of things equipment, and the like. The method of embodiment 1 may also be designed as an embedded running program, and installed on a computer terminal, such as a single chip microcomputer.

Example 6

The present embodiment provides a computer-readable storage medium having a computer program stored thereon. The program, when executed by a processor, implements the steps of the Mongolian International Standard code-to-shape code conversion method of embodiment 1 or 2.

The method of embodiment 1 or 2 may be applied in the form of software, such as a program designed to be independently run by a computer-readable storage medium, such as a usb flash drive, designed as a usb shield, and a program designed to start the whole method by external triggering through the usb flash drive.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

18页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:单据生成方法、装置、计算机设备和存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!