Software defect automatic assignment method and system based on version submission information

文档序号:1614251 发布日期:2020-01-10 浏览:13次 中文

阅读说明:本技术 一种基于版本提交信息的软件缺陷自动分派方法及系统 (Software defect automatic assignment method and system based on version submission information ) 是由 朱云龙 任洪敏 李璐璐 于 2019-09-23 设计创作,主要内容包括:本发明公开了一种基于版本提交信息的软件缺陷自动分派方法,包括以下步骤:步骤1:提取commit信息和bug信息,并建立commit信息模型和bug信息模型;步骤2:bug信息模型基于LDA模型进行计算,得到bug-topic列表;步骤3:对commit信息模型和bug信息模型进行相似度匹配,得出bug-fixer列表;步骤4:将bug-topic列表与bug-fixer列表相互映射,得到fixer-topic列表;步骤5:任意一种bug信息对应的bug-topic列表通过fixer-topic列表进行分派比对,匹配出合适的软件缺陷修复者。此方法解决了源码文件定位准确率低和修复者工作量巨大的问题,充分挖掘commit库中有价值的信息,精准定位源码文件及其开发者,有效的提高了缺陷分派的准确度,实现了软件缺陷的自动分派,减小了修复者的工作量。(The invention discloses a software defect automatic assignment method based on version submission information, which comprises the following steps: step 1: extracting commit information and bug information, and establishing a commit information model and a bug information model; step 2: calculating the bug information model based on the LDA model to obtain a bug-topic list; and step 3: similarity matching is carried out on the commit information model and the bug information model to obtain a bug-fixed list; and 4, step 4: mapping the bug-topic list and the bug-fixer list mutually to obtain a fixer-topic list; and 5: and the bug-topic list corresponding to any bug information is subjected to assignment comparison through the fixer-topic list, and a proper software defect repairer is matched. The method solves the problems of low source code file positioning accuracy and huge workload of repairers, fully excavates valuable information in the commit library, accurately positions the source code file and developers thereof, effectively improves the accuracy of defect assignment, realizes automatic assignment of software defects, and reduces the workload of the repairers.)

1. A software defect automatic allocation method based on version submission information is characterized by comprising the following steps:

step 1: extracting commit information from a version control tool warehouse, extracting bug information from a software bug warehouse, and respectively establishing a commit information model and a bug information model;

step 2: the bug information model is calculated based on an LDA model to obtain a bug-topic list of the bug information model;

and step 3: carrying out similarity matching on the commit information model and the bug information model to obtain a bug-fixed list corresponding to the bug information model;

and 4, step 4: mapping the bug-topic list and the bug-fixer list mutually to obtain a fixer-topic list of the bug information model;

and 5: and the bug-topic list corresponding to any bug information in the bug information model is subjected to dispatch comparison through the fixer-topic list, and a proper software defect repairer is matched.

2. The method for automatically assigning software defects based on version submission information as claimed in claim 1, wherein the extracting commit information and the extracting bug information are obtained by preprocessing the defect text features in a version control tool warehouse and a software bug warehouse through NLP technology, respectively; the commit information comprises commit description information, commit submission date, source code files modified by developers and commit submitters; the bug information comprises bug abstract, bug description information, bug comment and bug submission time.

3. The method for automatic assignment of software defects based on version submission information as claimed in claim 2, wherein the preprocessing includes extracting word stems, removing stop words, splitting compound words, and extracting the commit information and the bug information by vocabulary normalization method in a version control tool warehouse and a software bug warehouse.

4. The method for software defect automatic assignment based on version commit information as claimed in claim 3, wherein the LDA model performing calculation further comprises the steps of:

step 2.1: mapping each bug information on the bug information model on the LDA model to obtain a bug mapping vector p (z)i|z-iW), and satisfies:

Figure FDA0002211470690000021

wherein alpha and beta are respectively multidimensional variablesTwo vectors of the weight relationship between, and satisfy: α ═ β ═ 1/K; w is aiThe word is the ith word in the bug information; z is a radical ofiThe subject of the word in the bug information is obtained; t and V are respectively a subject z in the bug informationiThe number of the words w and the total number of different words w in the bug information; d is the ith word wiThe document in which the document is located;

Figure FDA0002211470690000022

step 2.2: sampling and iterating the bug mapping vector for multiple times to obtain a subject probability distribution vector to which each word w in the bug information belongs; the topic probability distribution vector is thetad,zAnd satisfies the following conditions:

Figure FDA0002211470690000025

wherein the content of the first and second substances,

Figure FDA0002211470690000026

step 2.3: selecting the theme probability distribution vector theta in the bug informationa,zWith said bug mapping vector p (z)i|z-iAnd w) are in one-to-one correspondence to obtain a bug-topic list of the bug information model.

5. The method for automatic software defect assignment based on version commit information as claimed in claim 4, wherein said performing similarity matching further comprises the steps of:

step 3.1: analyzing the commit information in the commit information model which is most similar to the bug information to obtain similar commit information, and counting the number N of source code files modified by the development personnel in the similar commit information and the submission time T of the similar commitC

Step 3.2: calculating according to the similar commit information and the bug information to obtain a similarity weight w(bc)And satisfies the following conditions:

wherein b is the bug information, and c is the commit information; wBIs the bug description information in the bug information, WCThe commit description information in the commit information; n is the number of source code files, T, modified by the developer in the similar commit informationBTime submitted in the bug information, TCIndicating commit time of commit associated with the bug;

step 3.3: selecting the similarity weight w(b,c)The optimal solution of the bug information model is in one-to-one correspondence with the bug information, so that a bug-fixed list corresponding to the bug information model is obtained.

6. An automatic software defect dispatching system based on version submission information, which is implemented based on the automatic software defect dispatching method based on version submission information as claimed in any one of claims 1 to 5, and which comprises:

the information extraction module is connected with the version control tool warehouse and the software bug warehouse, respectively extracts commit information and bug information, and respectively establishes a commit information model and a bug information model;

the list construction module is connected with the information extraction module and is used for constructing a bug-topic list and a bug-fixer list of the bug information model;

the mapping module is connected with the list building module and maps the bug-topic list and the bug-fixer list with each other to obtain a fixer-topic list of the bug information model;

and the assignment comparison module is connected with the list construction module and the mapping module and used for performing assignment comparison on the bug-topic list corresponding to the bug information and the fixed-topic list.

7. The system for automatic software defect dispatch based on version submission information of claim 6, further comprising a display module coupled to the dispatch comparison module for displaying the matching appropriate software defect repairer.

8. The system for automatic software bug assignment based on version submission information as claimed in claim 7, wherein the information extraction module includes a commit information extraction module and a bug information extraction module; the first end of the commit information extraction module is connected with a version control tool warehouse, and the second end of the commit information extraction module is connected with the list construction module to extract and construct the commit information model; and the first end of the bug information extraction module is connected with the software bug warehouse, and the second end of the bug information extraction module is connected with the list construction module to extract and construct the bug information.

9. The system for automatic software defect assignment based on version submission information of claim 8, wherein the list building module includes a subject list building module and a relationship list building module; the first end of the theme list building module is connected with the bug information extraction module, the second end of the theme list building module is connected with the mapping module, and the third end of the theme list building module is connected with the assignment comparison module to build a bug-topic list of the bug information model; and the first end of the relation list building module is connected with the commit information extraction module, and the second end of the relation list building module is connected with the mapping module to build a bug-fixed list corresponding to the bug information model.

10. The system for automatically assigning software defects based on version submission information as claimed in claim 9, wherein an LDA model is further provided in the subject list construction module; and the bug information model calculates to obtain the bug-topic list based on the LDA model.

Technical Field

The invention relates to the technical field of software defect management in software warehouse mining, in particular to a software defect automatic assignment method and system based on version submission information.

Background

As the software industry develops, the size and complexity of open source software are continuously increased, and therefore more defects are generated continuously. Due to the increasing number of defects, if developers do not repair the defects timely, the use of software by users is further inconvenient. Timely repair of large-scale defects becomes a great problem in software engineering research and practice, which also adds challenges to software maintenance and seriously affects the reliability and availability of software. Software maintenance requires a high investment in cost and effort. In the early days of open source software, the entire project generated fewer bugs due to the smaller size and complexity of the project, and it was entirely possible to manually assign bugs to developers for repair. However, for the open source software project at the present stage, the number of defects is greatly increased, the task load of defect allocation is heavy, and the task of defect allocation greatly exceeds the workload born by one person.

The commit library in the version control tool warehouse can better reflect various kinds of information of developers, and most of the prior art uses related source code files to extract related developers and analyze the experiences of the developers, so that valuable information in the commit library is not fully mined.

The original software defect allocation technology is that keywords and developer information are extracted from the text content of the defect, and the contribution rate of the developer to the keywords in the source code file is obtained through the information retrieval technology, so that the optimal defect repairer is obtained. However, when such methods are used for positioning source code files, the accuracy is often not ideal.

Because the current mainstream defect assignment method is to recommend a source code file related to the bug and then create the corresponding relation between the bug and a source code file developer, but the method does not consider a large amount of interference information in the source code file and has low source code file positioning accuracy, thereby reducing the precision of recommending the source code file, finding out the source code file which is unfamiliar to the developer and increasing the workload of a repairer.

Disclosure of Invention

The invention aims to provide a software defect automatic assignment method and system based on version submission information. The system and the method aim to solve the problems of low source code file positioning accuracy and huge workload of a repairer, fully mine valuable information in a commit library, accurately position a source code file and a developer thereof, effectively improve the accuracy of defect assignment, realize automatic assignment of software defects and reduce the workload of the repairer.

In order to achieve the above object, the present invention provides a method for automatically assigning software defects based on version submission information, which comprises the following steps:

step 1: extracting version submission information (commit information) from a version control tool warehouse, extracting defect information (bug information) from a software defect warehouse (software bug warehouse), and respectively establishing a commit information model and a bug information model;

step 2: calculating the bug information model based on a document theme Allocation (LDA) model to obtain a bug-topic list of the bug information model;

and step 3: similarity matching is carried out on the commit information model and the bug information model, and a bug-fixed relation (bug-fixed) list corresponding to the bug information model is obtained;

and 4, step 4: mapping the bug-topic list and the bug-fixer list to obtain a relation topic (fixer-topic) list of the bug information model;

and 5: and carrying out assignment comparison on the bug-topic list corresponding to any bug information in the bug information model through the fixer-topic list to match out a proper software defect repairer.

Most preferably, the extracting of the commit information and the extracting of the bug information are realized by preprocessing the defect text characteristics in a version control tool warehouse and a software bug warehouse through a Natural Language Processing (NLP) technology to respectively obtain the commit information and the bug information; the commit information comprises commit description information, commit submission date, source code files modified by the developer and commit submitters; the bug information comprises bug abstract, bug description information, bug comment and bug submission time.

Most preferably, the preprocessing comprises extracting commit information and bug information by methods of extracting word stems, removing stop words, splitting compound words, normalizing words and the like from the version control tool warehouse and the software bug warehouse.

Most preferably, the calculation of the LDA model further comprises the steps of:

step 2.1: mapping each bug information on the bug information model on the LDA model to obtain a bug mapping vector p (z)i|z-iW), and satisfies:

Figure BDA0002211470700000021

wherein, alpha, beta are two vectors of the mutual weight relation of multidimensional variables, and satisfy: α ═ β ═ 1/K; w is aiThe number is the ith word in the bug information; z is a radical ofiThe subject of the word in the bug information; t and V are respectively subject z in bug informationiThe number of the words w and the total number of different words w in the bug information; d is the ith word wiThe document in which the document is located;

Figure BDA0002211470700000031

the number of times a word w is assigned to a topic z;represents the total number of times all words w are assigned to z;

Figure BDA0002211470700000033

the times of z is assigned to the words w in the document d in the bug information; n isdIs the total number of words w contained by document d;

step 2.2: sampling and iterating the bug mapping vector for multiple times to obtain a theme probability distribution vector to which each word w belongs in the bug information; the topic probability distribution vector is thetad,zAnd satisfies the following conditions:

Figure BDA0002211470700000034

wherein the content of the first and second substances,the number of words with subject z in document d in bug information, ndThe total number of words in the document d in the bug information is;

step 2.3: selecting a theme probability distribution vector theta in bug informationd,zWith the bug mapping vector p (z)i|z-iAnd w) are in one-to-one correspondence to obtain a bug-topic list of the bug information model.

Most preferably, performing similarity matching further comprises the steps of:

step 3.1: analyzing the commit information in the commit information model which is most similar to the bug information to obtain similar commit information, and counting the number N of source code files modified by the developer in the similar commit information and the submission time T of the similar commitC

Step 3.2: calculating according to the similar commit information and the bug information to obtain a similarity weight w(b,c)And satisfies the following conditions:

Figure BDA0002211470700000036

wherein, b is bug information, and c is commit information; wBFor the bug description information in the bug information, WCThe commit description information in the commit information; n is the number of source code files, T, that have been modified by the developer in the similar commit informationBTime of submission in bug information, TCIndicating commit time of commit associated with the bug;

step 3.3: selecting similarity weight w(b,c)The optimal solution of the bug information is in one-to-one correspondence with the bug information, so that a bug-fixed list corresponding to the bug information model is obtained.

The invention also provides a software defect automatic allocation system based on version submission information, which is realized based on a software defect automatic allocation method based on version submission information, and comprises the following steps: the information extraction module is connected with the version control tool warehouse and the software bug warehouse, respectively extracts commit information and bug information, and respectively establishes a commit information model and a bug information model; the list building module is connected with the information extraction module and used for building a bug-topic list and a bug-fixer list of the bug information model; the mapping module is connected with the list building module and maps the bug-topoc list and the bug-fixer list with each other to obtain a fixer-topoc list of the bug information model; and the assignment comparison module is connected with the list construction module and the mapping module and used for performing assignment comparison on the bug-topic list corresponding to the bug information and the fixed-topic list.

Most preferably, the system further comprises a display module connected to the dispatch comparison module for displaying the matched appropriate software bug fixes.

Most preferably, the information extraction module comprises a commit information extraction module and a bug information extraction module; the first end of the commit information extraction module is connected with the version control tool warehouse, the second end of the commit information extraction module is connected with the list construction module, and a commit information model is extracted and constructed; and the first end of the bug information extraction module is connected with the software bug warehouse, and the second end of the bug information extraction module is connected with the list construction module to extract and construct bug information.

Most preferably, the list building module comprises a subject list building module and a relationship list building module; the first end of the theme list building module is connected with the bug information extraction module, the second end of the theme list building module is connected with the mapping module, and the third end of the theme list building module is connected with the assignment comparison module to build a bug-topic list of the bug information model; and the first end of the relation list building module is connected with the commit information extraction module, and the second end of the relation list building module is connected with the mapping module to build a bug-fixed list corresponding to the bug information model.

Most preferably, the theme list building module is also provided with an LDA model; and the bug information model calculates to obtain a bug-topic list based on the LDA model.

By applying the method and the device, the problems of low source code file positioning accuracy and huge workload of repairers are solved, valuable information in the commit library is fully mined, the source code files and developers thereof are accurately positioned, the accuracy of defect assignment is effectively improved, automatic assignment of software defects is realized, and the workload of the repairers is reduced.

Compared with the prior art, the invention has the following beneficial effects:

1. the method adopts the commit library to mine the actual situation of the developer, then utilizes the LDA model to eliminate the problem of high dimensionality of the vector space model, and overcomes the defects that the text described by the bug in the bug information is short, the formed vector space is too sparse, the measuring of the distance is not facilitated, the data is sparse and the noise is contained, thereby improving the accuracy of software defect assignment.

2. The method realizes the automatic dispatch of the software defects and reduces the workload of a repairer.

Drawings

FIG. 1 is a flow chart of a software defect automatic assignment method according to the present invention;

FIG. 2 is a schematic structural diagram of an automatic software defect dispatching system according to the present invention.

Detailed Description

The invention will be further described by the following specific examples in conjunction with the drawings, which are provided for illustration only and are not intended to limit the scope of the invention.

The invention relates to a software defect automatic assignment method based on version submission information, which comprises the following steps as shown in figure 1:

step 1: the method comprises the steps of extracting version submission information (commit information) from a version control tool warehouse 1, extracting defect information (bug information) from a software defect warehouse (software bug warehouse) 2 of the software defect tracking system, and respectively establishing a commit information model and a defect information model (bug information model) so as to respectively record different commit information and different bug information.

The extracting of the commit information and the extracting of the bug information are to preprocess the defect text characteristics in the version control tool warehouse 1 and the software bug warehouse 2 by a Natural Language Processing (NLP) technology, so as to respectively obtain the commit information and the bug information.

The preprocessing comprises extracting stem, stop word removing, compound word splitting, vocabulary normalization and the like from the version control tool warehouse 1 and the software bug warehouse 2 of the software defect tracking system to extract bug information and commit information.

The stop words in the software bug warehouse 2 are words which do not contain actual semantics in Natural Language Processing (NLP), such as single letters and some common auxiliary words, the frequency of occurrence of the words is often high, and the words do not have obvious distinctiveness, so that the stop words need to be removed when information is extracted; the splitting and synthesizing words are obtained by splitting the text characteristic information in the software bug warehouse 2 into a plurality of words; one type of text noise is multiple representations of a single word, such as: "use", "using" and "used" are various expression forms of "use", but the meanings in the context are similar, and the vocabulary normalization is to unify different forms of words; case adjustment is the adjustment of all words to a lower case (or upper case) state, such as: stop, and words in three states can be written in the form of Stop, and the case of the words is unified.

The commit information comprises commit description information, commit submission date, source code files modified by the developer and commit submitters; the bug information comprises bug abstract, bug description information, bug comment and bug submission time; the characteristics of the 4 types of bug information cover the cognitive information of developers on the defects; the first 3 messages are characterized by defective text features.

Step 2: calculating the bug information model based on a document theme Allocation (LDA) model to obtain a bug-topic list of the bug information model; the calculation of the LDA model further comprises the following steps:

step 2.1: mapping each bug information on the bug information model on the LDA model to obtain a bug mapping vector p (z)i|z-iW), and satisfies:

Figure BDA0002211470700000061

wherein, alpha and beta are respectivelyTwo vectors of the weight relation of the multidimensional variables to each other, and satisfy: α ═ β ═ 1/K; w is aiThe number is the ith word in the bug information; z is a radical ofiThe subject of the word in the bug information; t and V are respectively subject z in bug informationiThe number of the words w and the total number of different words w in the bug information; d is the ith word wiThe document in which the document is located;

Figure BDA0002211470700000062

the number of times a word w is assigned to a topic z;

Figure BDA0002211470700000063

represents the total number of times all words w are assigned to z;

Figure BDA0002211470700000064

the times of z is assigned to the words w in the document d in the bug information; n isdIs the total number of words w contained by document d.

Step 2.2: sampling and iterating the bug mapping vector for multiple times to obtain a theme probability distribution vector to which each word w belongs in the bug information; the topic probability distribution vector is thetad,zAnd satisfies the following conditions:

Figure BDA0002211470700000065

wherein the content of the first and second substances,

Figure BDA0002211470700000066

the number of words with subject z in document d in bug information, ndThe total number of words in the document d in the bug information.

Step 2.3: selecting a theme probability distribution vector theta in bug informationd,zWith the bug mapping vector p (z)i|z-iAnd w) are in one-to-one correspondence to obtain a bug-topic list of the bug information model.

And selecting the optimal solution further comprises setting a value of K, corresponding each bug information to one or more bug mapping vectors, and selecting a theme z with the highest possibility of the bug information to obtain a bug-topic list of the bug information model.

And step 3: similarity matching is carried out on the commit information model and the bug information model, and a bug-fixed relation (bug-fixed) list corresponding to the bug information model is obtained; the similarity matching further comprises the following steps:

step 3.1: analyzing the commit information in the commit information model which is most similar to the bug information to obtain similar commit information, and counting the number N of source code files modified by the developer in the similar commit information and the submission time T of the similar commitC(ii) a Counting the number N of source code files modified by developers in the similar commit information, wherein the greater the number N of the modified source code files is, the stronger the bug repairing capability is; the smaller the difference between the commit time and the bug commit time, the greater the likelihood that the commit submitter will repair the bug.

Step 3.2: calculating according to the similar commit information and the bug information to obtain a similarity weight w(b,c)And satisfies the following conditions:

wherein, b is bug information, and c is commit information; wBFor the bug description information in the bug information, WCThe commit description information in the commit information; n is the number of source code files, T, that have been modified by the developer in the similar commit informationBTime of submission in bug information, TCIndicating the commit time of commit description information associated with the bug.

Step 3.3: selecting similarity weight w(b,c)The optimal solution of the bug information is in one-to-one correspondence with the bug information, so that a bug-fixed list corresponding to the bug information model is obtained.

And 4, step 4: and mapping the bug-topic list and the bug-fixer list to obtain a relation topic (fixer-topic) list of the bug information model.

And 5: and carrying out assignment comparison on the bug-topic list corresponding to any bug information in the bug information model through the fixer-topic list to match out a proper software defect repairer.

The invention also provides a software defect automatic assignment system based on version submission information, which is realized based on a software defect automatic assignment method based on version submission information, and as shown in fig. 2, the system comprises an information extraction module, a list construction module, a mapping module 3, an assignment comparison module 4 and a display module 5; the information extraction module 1 is connected with the version control tool warehouse 1 and the software bug warehouse 2, respectively extracts commit information and bug information, and respectively establishes a commit information model and a bug information model; the list building module is connected with the information extraction module and used for building a bug-topic list and a bug-fixer list of the bug information model; the mapping module 3 is connected with the list building module and maps the bug-topoc list and the bug-fixer list with each other to obtain a fixer-topoc list of the bug information model; the assignment comparison module 4 is connected with the list construction module and the mapping module 3 and used for performing assignment comparison on the bug-topic list corresponding to the bug information and the fixed-topic list; and the display module 5 is connected with the assignment comparison module 4 and displays the matched proper software defect repairer.

The information extraction module comprises a commit information extraction module 6 and a bug information extraction module 7; a first end of the commit information extraction module 6 is connected with the version control tool warehouse 1, and a second end is connected with the list construction module to extract and construct a commit information model; the first end of the bug information extraction module 7 is connected with the software bug warehouse 2, and the second end is connected with the list construction module to extract and construct bug information.

The list building module also comprises a subject list building module 9 and a relation list building module 8; the first end of the theme list building module 9 is connected with the bug information extraction module 7, the second end is connected with the mapping module 3, and the third end is connected with the assignment comparison module 4 to build a bug-topic list of the bug information model; the first end of the relation list building module 8 is connected with the commit information extraction module 6, and the second end is connected with the mapping module 3, so that a bug-fixed list corresponding to the bug information model is built.

A topic generation model (LDA model) is also set in the topic list construction module 9; and the bug information model calculates to obtain a bug-topic list based on the LDA model.

The working principle of the invention is as follows:

extracting commit information from a version control tool warehouse, extracting bug information from a software bug warehouse, and respectively establishing a commit information model and a bug information model; calculating the bug information model based on the LDA model to obtain a bug-topic list of the bug information model; similarity matching is carried out on the commit information model and the bug information model, and a bug-fixed list corresponding to the bug information model is obtained; mapping the bug-topic list and the bug-fixer list mutually to obtain a fixer-topic list of the bug information model; and carrying out assignment comparison on the bug-topic list corresponding to any bug information in the bug information model through the fixer-topic list to match out a proper software defect repairer.

In conclusion, the method and the device solve the problems of low source code file positioning accuracy and huge workload of repairers, fully excavate valuable information in the commit library, accurately position the source code file and developers thereof, effectively improve the accuracy of defect assignment, realize automatic assignment of software defects and reduce the workload of the repairers.

While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

12页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种确定安全级别的方法及装置、存储介质和设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!