ANTLR 4-based source code translation method

文档序号:935271 发布日期:2021-03-05 浏览:24次 中文

阅读说明:本技术 一种基于antlr4的源码翻译方法 (ANTLR 4-based source code translation method ) 是由 马猛飞 魏志强 杨永全 贾东宁 俞茂学 马广浩 桂琳 许佳立 于 2020-12-02 设计创作,主要内容包括:本发明提出一种基于ANTLR4的源码翻译方法,属于翻译方法技术领域,其基于多线程机制,翻译效率高,且支持多语言解析。该翻译方法包括如下步骤:对待翻译工程所包含的源码文件按照语言种类进行分类,并将不同种类的源码文件分别放入不同的待解析目录中;通过调度线程将不同待解析目录中的源码文件分别调入不同的预处理线程;预处理线程分析调入的源码文件之间的依赖关系并确定解析顺序,计算得到应输出解析文件总数;解析线程按照解析顺序调取源码文件,利用ANTLR4进行翻译并输出对应的解析文件;判断输出的解析文件总数与应输出解析文件总数是否相等;若相等,则翻译结束,否则重复上述步骤;整合输出的解析文件,得到翻译结果。(The invention provides a source code translation method based on ANTLR4, belongs to the technical field of translation methods, is based on a multithreading mechanism, has high translation efficiency and supports multilingual analysis. The translation method comprises the following steps: classifying source code files contained in a project to be translated according to language types, and respectively placing different types of source code files into different directories to be analyzed; respectively calling source code files in different directories to be analyzed into different preprocessing threads through a scheduling thread; the preprocessing thread analyzes the dependency relationship among the called source code files, determines the analysis sequence and calculates to obtain the total number of files to be output and analyzed; the parsing thread calls the source code files according to the parsing sequence, translates the source code files by using ANTLR4 and outputs corresponding parsing files; judging whether the total number of the output analysis files is equal to the total number of the analysis files to be output; if the two are equal, the translation is finished, otherwise, the steps are repeated; and integrating the output analysis files to obtain a translation result.)

1. An ANTLR 4-based source code translation method is characterized by comprising the following steps:

(1) classifying source code files contained in a project to be translated according to language types, analyzing and storing calling relations among different types of source code files, establishing directories to be analyzed with the same number as the types of the source code files, and respectively placing the different types of source code files into different directories to be analyzed;

(2) establishing preprocessing threads with the same number as the number of the source code file types, and respectively calling the source code files in different directories to be analyzed into different preprocessing threads through scheduling threads;

(3) different preprocessing threads are analyzed in parallel, each preprocessing thread analyzes and stores the dependency relationship among the called source code files of the same type, the analyzing sequence of the source code files is determined according to the dependency relationship among the source code files of the type, the analyzing sequence is stored in a list file, and the number of the analyzed files to be output corresponding to the source code files of the type is obtained through calculation; summarizing the number of files to be output and analyzed obtained by calculating all preprocessing threads to obtain the total number of the files to be output and analyzed;

(4) establishing a plurality of analysis threads, wherein the number of the analysis threads is greater than or equal to the number of the source code file types, different analysis threads read different list files respectively, and different analysis threads analyze in parallel; calling source code files one by one from a directory to be analyzed by each analysis thread according to an analysis sequence recorded by the read list file, analyzing a syntax tree by using ANTLR4, translating the source code files by using the syntax tree and outputting corresponding analysis files;

(5) after all the analysis threads finish analyzing, judging whether the total number of the output analysis files is equal to the total number of the analysis files to be output;

(6) if the judgment results in the step (5) are equal, the translation is finished; if the judgment result in the step (5) is not equal, repeating the steps (2) to (6) until the total number of the output analysis files is equal to the total number of the analysis files to be output;

(7) and integrating the output analysis files according to the calling relation among different types of source code files and the dependency relation among the same type of source code files to obtain a translation result.

2. The ANTLR 4-based source code translation method according to claim 1, wherein: in the step (1), after different types of source code files are put into different directories to be analyzed, a step of adding a flag identification field to each source code file according to the processing state of the source code files is further included, wherein the value of flag is 0 or 1 or 2, the flag value is 0 to identify an unresolved state, the flag value is 1 to identify an in-analysis state, and the flag value is 2 to identify an analyzed state.

3. The ANTLR 4-based source code translation method according to claim 2, wherein: in the step (4), the parsing thread only calls the source code file with the flag value of 0, changes the flag value of the source code file into 1 after the source code file is called, and changes the flag value of the source code file into 2 after the source code file is translated.

4. The ANTLR 4-based source code translation method according to claim 3, wherein in step (5), the determination method for all the analysis threads to complete the analysis is: and judging whether the flag values of all the source code files are 2, if so, completing the analysis of all the analysis threads, and if not, completing the analysis of the analysis threads.

5. The ANTLR 4-based source code translation method according to claim 1, wherein: in the step (7), after the output analysis file is integrated, the steps of previewing and debugging the integrated file are also included.

Technical Field

The invention belongs to the technical field of translation methods, and particularly relates to a source code translation method based on ANTLR 4.

Background

In order to perform translation operation on a project more conveniently and intuitively, a language expression form of project source code generally needs to be converted into a target language expression form, and the ANTLR4 makes the process simpler. ANTLR4 is a powerful grammar generator tool for reading, processing, executing and translating structured text or binary files, which is widely used in academic and industrial production practices. The ANTLR4 has its own prescribed grammar rules, and in general, the user can write the corresponding g4 file according to the grammar rules, and after the ANTLR4 reads the content of the g4, the file format described by the g4 file can be analyzed, and a grammar tree can be generated. As shown in fig. 1, the ANTLR4 parsing process mainly includes two steps: first, a lexical analyzer (lexer) is generated for the language, the lexer is a program for converting input text into lexical symbols, and then a data structure for recognizing a sentence structure according to the lexical symbols to generate a syntax tree (parsetree) records a process of recognizing the input sentence structure by the lexical analyzer and components of the structure. A user can reconstruct and replace to generate a new syntax tree according to the structure and the service logic of the target language by traversing all nodes of the syntax tree, and output the new syntax tree, so that the translation from one language to another language is completed.

However, currently, the ANTLR4 generally uses a serial mechanism for translation of a project, and processes one file and then another file after processing the one file, and the execution time required for translation of a large open source project is long, which causes a problem of low execution efficiency. Moreover, the source code of many projects at present not only has one programming language, but also can be multiple languages, the structure is more complicated, and the translation requirement cannot be met only by adopting serial analysis with a single mode.

Disclosure of Invention

Aiming at the defects of the conventional ANTLR4 translation method in translation, the invention provides a source code translation method based on ANTLR4, which is based on a multithreading mechanism, has high translation efficiency and supports multilingual analysis.

In order to achieve the purpose, the invention adopts the technical scheme that:

an ANTLR 4-based source code translation method comprises the following steps:

(1) classifying source code files contained in a project to be translated according to language types, analyzing and storing calling relations among different types of source code files, establishing directories to be analyzed with the same number as the types of the source code files, and respectively placing the different types of source code files into different directories to be analyzed;

(2) establishing preprocessing threads with the same number as the number of the source code file types, and respectively calling the source code files in different directories to be analyzed into different preprocessing threads through scheduling threads;

(3) different preprocessing threads are analyzed in parallel, each preprocessing thread analyzes and stores the dependency relationship among the called source code files of the same type, the analyzing sequence of the source code files is determined according to the dependency relationship among the source code files of the type, the analyzing sequence is stored in a list file, and the number of the analyzed files to be output corresponding to the source code files of the type is obtained through calculation; summarizing the number of files to be output and analyzed obtained by calculating all preprocessing threads to obtain the total number of the files to be output and analyzed;

(4) establishing a plurality of analysis threads, wherein the number of the analysis threads is greater than or equal to the number of the source code file types, different analysis threads read different list files respectively, and different analysis threads analyze in parallel; calling source code files one by one from a directory to be analyzed by each analysis thread according to an analysis sequence recorded by the read list file, analyzing a syntax tree by using ANTLR4, translating the source code files by using the syntax tree and outputting corresponding analysis files;

(5) after all the analysis threads finish analyzing, judging whether the total number of the output analysis files is equal to the total number of the analysis files to be output;

(6) if the judgment results in the step (5) are equal, the translation is finished; if the judgment result in the step (5) is not equal, repeating the steps (2) to (6) until the total number of the output analysis files is equal to the total number of the analysis files to be output;

(7) and integrating the output analysis files according to the calling relation among different types of source code files and the dependency relation among the same type of source code files to obtain a translation result.

Preferably, in the step (1), after different types of source code files are placed into different directories to be parsed, a step of adding a flag identification field to each source code file according to the processing state of the source code files, wherein a flag value is 0 or 1 or 2, the flag value is 0 to identify an unresolved state, the flag value is 1 to identify a parsing-in state, and the flag value is 2 to identify a parsed state.

Preferably, in the step (4), the parsing thread only calls the source code file with the flag value of 0, changes the flag value of the source code file to 1 after the calling, and changes the flag value of the source code file to 2 after the translation is completed.

Preferably, in step (5), the method for determining that all analysis threads complete analysis is: and judging whether the flag values of all the source code files are 2, if so, completing the analysis of all the analysis threads, and if not, completing the analysis of the analysis threads.

Preferably, the step (7) further includes, after integrating the output analysis files, a step of previewing and debugging the integrated file.

Compared with the prior art, the invention has the advantages and beneficial effects that:

1. the ANTLR 4-based source code translation method provided by the invention classifies source code files of different language types, realizes a multi-language parallel analysis function through the cooperative cooperation of a designed scheduling thread, a preprocessing thread and an analysis thread, and improves the translation efficiency of complex engineering;

2. according to the ANTLR 4-based source code translation method, the calling relationship among source code files of different languages and the dependency relationship among the source code files are analyzed, the files can be automatically fused and analyzed according to the calling relationship and the dependency relationship after translation is completed, manual combination is not needed, and the translation efficiency of complex engineering is further improved.

Drawings

Fig. 1 is a schematic diagram illustrating an analysis process of ANTLR4 in the prior art;

fig. 2 is a schematic diagram of an analysis process of the ANTLR 4-based source code translation method according to the embodiment of the present invention;

fig. 3 is a flowchart of a source translation method based on ANTLR4 according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.

It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.

As shown in fig. 2, an embodiment of the present invention relates to a source code translation method based on ANTLR4, including the following steps:

(1) classifying source code files contained in a project to be translated according to language types, analyzing and storing calling relations among different types of source code files, establishing directories to be analyzed with the same number as the types of the source code files, and respectively placing the different types of source code files into the different directories to be analyzed. The language type includes JAVA, C + +, C, Python, C #, fortran, and other existing languages.

(2) Establishing preprocessing threads with the same number as the source code file types, and respectively calling the source code files in different directories to be analyzed into different preprocessing threads through scheduling threads.

(3) Different preprocessing threads are analyzed in parallel, each preprocessing thread analyzes and stores the dependency relationship among the called source code files of the same type, the analyzing sequence of the source code files is determined according to the dependency relationship among the source code files of the type, the analyzing sequence is stored in a list file, and the number of the analyzed files to be output corresponding to the source code files of the type is obtained through calculation; and summarizing the number of files to be output and analyzed obtained by calculating all the preprocessing threads to obtain the total number of the files to be output and analyzed.

(4) Establishing a plurality of analysis threads, wherein the number of the analysis threads is greater than or equal to the number of the source code file types, different analysis threads read different list files respectively, and different analysis threads analyze in parallel; and calling the source code files from the directory to be analyzed one by each analysis thread according to the analysis sequence recorded by the read list file, analyzing a syntax tree by using ANTLR4, translating the source code files by using the syntax tree and outputting corresponding analysis files. It should be noted that the specific steps of translating the source code file by ANTLR4 are well known to those skilled in the art, and the process includes: according to the written g4 file, the file is further analyzed into a top-down syntax tree ParseTree, each node in the tree corresponds to a certain code segment in a program, then a user traverses, reconstructs and replaces all nodes of the ParseTree according to the rules of a target language to form a new syntax tree, and then the new syntax tree is output to obtain an analysis file.

(5) And after all the analysis threads finish analyzing, judging whether the total number of the output analysis files is equal to the total number of the analysis files to be output.

(6) If the judgment results in the step (5) are equal, the translation is finished; and (5) if the judgment results in the step (5) are not equal, repeating the steps (2) to (6) until the total number of the output analysis files is equal to the total number of the analysis files to be output.

(7) And integrating the output analysis files according to the calling relation among different types of source code files and the dependency relation among the same type of source code files to obtain a translation result.

The ANTLR 4-based source code translation method classifies source code files of different language types, realizes a multi-language parallel analysis function through the cooperative cooperation of a designed scheduling thread, a preprocessing thread and an analysis thread, and improves the translation efficiency of complex engineering. Meanwhile, the ANTLR 4-based source code translation method analyzes the calling relationship among source code files of different languages and the dependency relationship among the source code files, can automatically fuse analysis files according to the calling relationship and the dependency relationship after the translation is completed, does not need manual combination, and further improves the translation efficiency of complex projects.

In order to further improve the parsing efficiency, preferably, in step (1), after different types of source code files are placed in different directories to be parsed, the method further includes a step of adding a flag identification field to each source code file according to the processing state of the source code files, wherein a flag value is 0 or 1 or 2, the flag value is 0 to identify an unresolved state, the flag value is 1 to identify a parsing state, and the flag value is 2 to identify a parsed state. In the step (4), the parsing thread only calls the source code file with the flag value of 0, changes the flag value of the source code file into 1 after the source code file is called, and changes the flag value of the source code file into 2 after the source code file is translated. In the step (5), the method for judging the completion of the analysis of all the analysis threads comprises the following steps: and judging whether the flag values of all the source code files are 2, if so, completing the analysis of all the analysis threads, and if not, completing the analysis of the analysis threads. By adding the flag identification field and updating in real time according to the processing state of the source code file, only one parsing thread can access the same source code file at the same time, and whether parsing is completed or not is conveniently judged, so that parsing efficiency is improved.

In addition, in the step (7), after the output analysis file is integrated, the step of previewing and debugging the integrated file is further included to ensure that a satisfactory translation result is output.

8页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:大规模异构环境下资源环境动态部署方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!