Method and device for determining contribution degree of open source of code, computer equipment and medium

文档序号:152863 发布日期:2021-10-26 浏览:40次 中文

阅读说明:本技术 代码开源贡献度的确定方法、装置、计算机设备及介质 (Method and device for determining contribution degree of open source of code, computer equipment and medium ) 是由 杨占栋 李昱 王全礼 张晨 张美伟 范钟艺 于 2021-07-30 设计创作,主要内容包括:本发明公开了一种代码开源贡献度的确定方法、装置、计算机设备及介质,涉及自动程序设计技术领域,其中该方法包括:提取代码的基本贡献特征;提取代码重用影响的特征;提取代码对于开发影响的特征;根据所述代码的基本贡献特征,代码重用影响的特征,以及代码对于开发影响的特征,确定代码开源贡献度。本发明可以高效准确地确定代码开源贡献度。(The invention discloses a method, a device, computer equipment and a medium for determining the contribution degree of code open source, and relates to the technical field of automatic program design, wherein the method comprises the following steps: extracting basic contribution features of the code; extracting the characteristics of the code reuse influence; extracting the characteristics of the code on development influence; and determining the open source contribution degree of the code according to the basic contribution characteristics of the code, the characteristics of the code reuse influence and the characteristics of the code on the development influence. The invention can efficiently and accurately determine the contribution degree of the code open source.)

1. A method for determining the contribution degree of an open source of a code is characterized by comprising the following steps:

extracting basic contribution features of the code;

extracting the characteristics of the code reuse influence;

extracting the characteristics of the code on development influence;

and determining the open source contribution degree of the code according to the basic contribution characteristics of the code, the characteristics of the code reuse influence and the characteristics of the code on the development influence.

2. The method for determining the degree of contribution of an open source of a code according to claim 1, wherein extracting the basic contribution feature of the code comprises: and determining the basic contribution characteristics of the code according to the total number of submitted codes and the number of times of submitting the code.

3. The method for determining the degree of contribution of the code to the source of claim 2, wherein determining the basic contribution feature of the code according to the total number of lines of the code submitted and the number of times the code was submitted comprises determining the basic contribution feature of the code according to the following formula:

C1=α×LoC(d)+β×NoC(d)+γ;

wherein, C1For the basic contribution features of the code, loc (d) is the total number of lines of the code submitted, noc (d) is the number of times the code was submitted, and α, β, γ are known parameter values.

4. The method of determining code open source contribution of claim 1, wherein extracting features of code reuse impact comprises:

determining the ranking of each function in the code based on the method of PageRank;

the characteristics of the code reuse impact are determined according to the ranking of each function.

5. The method of claim 4, wherein determining the rank of each function in the code based on the PageRank method comprises determining the rank of each function according to the following formula:

wherein, FiRepresenting the ith function, PR (F), in the codei) Ranking for the ith function in the code, S (F)ji) Representing a calling function FiSet of all functions of, FjIs S (F)ji) The j-th function in the set, PR (F)j) Is S (F)ji) Ranking of the jth function in the set, njRepresenting a calling function FiN denotes all the function numbers, α is a known parameter value.

6. The method of claim 4, wherein determining the characteristics of the code reuse impact based on the ranking of each function comprises determining the characteristics of the code reuse impact according to the following formula:

wherein, C2For code reuse influencing features, SD (F)i) Set of all functions submitting code for developers, FiRepresenting a function in the code, PR (F)i) The ranking for each function in the code.

7. The method for determining the contribution of the code to the open source of claim 1, wherein extracting the characteristics of the code on the development influence comprises:

acquiring a log text data vector submitted each time a code is submitted;

performing text encoding on the log text data vector;

knowledge encoding is carried out on the log text data vector;

and combining the text codes and the knowledge codes to obtain combined codes, classifying the combined codes by using softmax regression, and identifying the text category as the characteristics of the code on development influence.

8. The method for determining the contribution of the code to the open source of claim 7, wherein the combining text coding and knowledge coding to obtain combined coding, classifying the combined coding by using softmax regression, and identifying the text category as the characteristic of the code on the development influence comprises determining the characteristic of the code on the development influence according to the following formula:

C3=softmax(e);

wherein, C3For the characteristics of the code's impact on development, e ═ p; q. q.s]E is the combined code, p is the knowledge code, and q is the text code.

9. The method for determining the degree of contribution of the code according to claim 1, wherein determining the degree of contribution of the code according to the basic contribution feature of the code, the feature of the code reuse influence, and the feature of the code for development influence comprises determining the degree of contribution of the code according to the following formula:

wherein S isdContribution degree for code open source, wiB is a known parameter value, CiIncluding basic contribution features, features of code reuse impact and features of code versus development impact.

10. A device for determining an open-source contribution degree of a code, comprising:

a first extraction unit for extracting basic contribution features of the code;

a second extraction unit for extracting features of the code reuse influence;

the third extraction unit is used for extracting the characteristics of the code on the development influence;

and the determining unit is used for determining the open source contribution degree of the code according to the basic contribution characteristics of the code, the characteristics of the code reuse influence and the characteristics of the code on the development influence.

11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 9 when executing the computer program.

12. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 1 to 9.

Technical Field

The present invention relates to the field of automatic programming technologies, and in particular, to a method and an apparatus for determining a contribution degree of an open source of a code, a computer device, and a medium.

Background

This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

Developers contribute code to a repository of software projects, which is often characterized by simple metrics such as number of submissions, number of lines of code, and so forth. Currently, one of the most popular open source software hosting platforms, GitHub, is simply the use of submissions to rank developers of a project. The professional knowledge browser is a classic tool for identifying the skills of developers, and measures the contribution quantity of codes instead of the contribution value quantity by using the changed code line number as an index. For example, the application logic core functionality may be more valuable than the auxiliary script, and the amount of code may be much less than the auxiliary script code.

In many cases, it is desirable to compare the contribution values of different developers. Traditional value-based software engineering focuses on creating economic value as a way to prioritize resource allocation and scheduling, and other measures of its value may be more relevant in some cases. For the project of open source software, the contribution of developers can greatly influence the cooperation, coordination and leadership, and the measurement of the value of the contribution of developers of the open source software project is the simplest and most basic ways, such as the number of submissions, the number of submitted code lines and the economic value of project creation.

Disclosure of Invention

The embodiment of the invention provides a method for determining an open source contribution degree, which is used for efficiently and accurately determining the open source contribution degree of a code, and comprises the following steps:

extracting basic contribution features of the code;

extracting the characteristics of the code reuse influence;

extracting the characteristics of the code on development influence;

and determining the open source contribution degree of the code according to the basic contribution characteristics of the code, the characteristics of the code reuse influence and the characteristics of the code on the development influence.

In one embodiment, extracting the basic contribution features of the code may include: and determining the basic contribution characteristics of the code according to the total number of submitted codes and the number of times of submitting the code.

In one embodiment, determining the base contribution features of the code based on the total number of lines of code submitted and the number of times the code was submitted may include determining the base contribution features of the code according to the following formula:

C1=α×LoC(d)+β×NoC(d)+γ;

wherein, C1For the basic contribution features of the code, loc (d) is the total number of lines of the code submitted, noc (d) is the number of times the code was submitted, and α, β, γ are known parameter values.

In one embodiment, extracting the characteristics of the code reuse impact may include:

determining the ranking of each function in the code based on the method of PageRank;

the characteristics of the code reuse impact are determined according to the ranking of each function.

In one embodiment, determining the rank of each function in the code based on the PageRank method may include determining the rank of each function according to the following formula:

wherein, FiRepresenting the ith function, PR (F), in the codei) Ranking for the ith function in the code, S (F)ji) Representing a calling function FiSet of all functions of, FjIs S (F)ji) The j-th function in the set, PR (F)j) Is S (F)ji) Ranking of the jth function in the set, njRepresenting a calling function FiN denotes all the function numbers, α is a known parameter value.

In one embodiment, determining the characteristics of the code reuse impact based on the ranking of each function may include:

wherein, C2For code reuse influencing features, SD (F)i) Set of all functions submitting code for developers, FiRepresenting a function in the code, PR (F)i) The ranking for each function in the code.

In one embodiment, extracting the features of the code on the development impact may include:

acquiring a log text data vector submitted each time a code is submitted;

performing text encoding on the log text data vector;

knowledge encoding is carried out on the log text data vector;

and combining the text codes and the knowledge codes to obtain combined codes, classifying the combined codes by using softmax regression, and identifying the text category as the characteristics of the code on development influence.

In one embodiment, combining the text encoding and the knowledge encoding to obtain a combined encoding, classifying the combined encoding using softmax regression, and identifying the text category as the characteristic of the code on the development impact may include determining the characteristic of the code on the development impact according to the following formula:

C3=softmax(e);

wherein, C3For the characteristics of the code's impact on development, e ═ p; q. q.s]E is the combined code, p is the knowledge code, and q is the text code.

In one embodiment, determining the code open source contribution degree according to the basic contribution characteristics of the code, the characteristics of the code reuse influence and the characteristics of the code on the development influence may include determining the code open source contribution degree according to the following formula:

wherein S isdContribution degree for code open source, wiB is a known parameter value, CiIncluding basic contribution features, features of code reuse impact and features of code versus development impact.

The embodiment of the invention also provides a device for determining the open source contribution degree, which is used for efficiently and accurately determining the open source contribution degree of the code, and comprises the following components:

a first extraction unit for extracting basic contribution features of the code;

a second extraction unit for extracting features of the code reuse influence;

the third extraction unit is used for extracting the characteristics of the code on the development influence;

and the determining unit is used for determining the open source contribution degree of the code according to the basic contribution characteristics of the code, the characteristics of the code reuse influence and the characteristics of the code on the development influence.

In one embodiment, the first extraction unit is specifically configured to: and determining the basic contribution characteristics of the code according to the total number of submitted codes and the number of times of submitting the code.

In one embodiment, the first extraction unit is specifically configured to determine the basic contribution feature of the code according to the following formula:

C1=α×LoC(d)+β×NoC(d)+γ;

wherein, C1For the basic contribution features of the code, loc (d) is the total number of lines of the code submitted, noc (d) is the number of times the code was submitted, and α, β, γ are known parameter values.

In an embodiment, the second extracting unit is specifically configured to:

determining the ranking of each function in the code based on the method of PageRank;

the characteristics of the code reuse impact are determined according to the ranking of each function.

In an embodiment, the second extracting unit is specifically configured to determine the rank of each function according to the following formula:

wherein, FiRepresenting the ith function, PR (F), in the codei) Ranking for the ith function in the code, S (F)ji) Representing a calling function FiSet of all functions of, FjIs S (F)ji) The j-th function in the set, PR (F)j) Is S (F)ji) Ranking of the jth function in the set, njRepresenting a calling function FiN denotes all the function numbers, α is a known parameter value.

In an embodiment, the second extracting unit is specifically configured to determine the rank of each function according to the following formula:

wherein, C2For code reuse influencing features, SD (F)i) Set of all functions submitting code for developers, FiRepresenting a function in the code, PR (F)i) The ranking for each function in the code.

In an embodiment, the third extraction unit is specifically configured to determine a feature of the code reuse influence according to the following formula:

acquiring a log text data vector submitted each time a code is submitted;

performing text encoding on the log text data vector;

knowledge encoding is carried out on the log text data vector;

and combining the text codes and the knowledge codes to obtain combined codes, classifying the combined codes by using softmax regression, and identifying the text category as the characteristics of the code on development influence.

In an embodiment, the third extraction unit is specifically configured to determine a feature of the code reuse influence according to the following formula:

C3=softmax(e);

wherein, C3For the characteristics of the code's impact on development, e ═ p; q. q.s]E is the combined code, p is the knowledge code, and q is the text code.

The determining unit is specifically configured to determine the code open source contribution degree according to the following formula:

wherein S isdContribution degree for code open source, wiB is a known parameter value, CiIncluding basic contribution features, features of code reuse impact and features of code versus development impact.

The embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method for determining the open-source contribution degree of the code is implemented.

The embodiment of the invention also provides a computer readable storage medium, which stores a computer program for executing the method for determining the code open source contribution degree.

In the embodiment of the invention, the determination scheme of the code open source contribution degree is realized by the following steps: extracting basic contribution features of the code; extracting the characteristics of the code reuse influence; extracting the characteristics of the code on development influence; and determining the code open source contribution degree according to the basic contribution characteristics of the code, the characteristics of the code reuse influence and the characteristics of the code on the development influence, so that the code open source contribution degree can be efficiently and accurately determined.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:

FIG. 1 is a flowchart illustrating a method for determining an open-source contribution according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of extracting features of code reuse influence according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart illustrating the process of extracting the characteristics of the code on the development influence according to the embodiment of the present invention;

FIG. 4 is a schematic structural diagram of an apparatus for determining an open-source contribution degree according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a computer device in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.

Before describing embodiments of the present invention, terms related to the embodiments of the present invention will be described.

1. Code is a source file written by a programmer in a language supported by a development tool, and is a set of explicit rules for representing information in discrete form by characters, symbols, or signal symbols. The principles of code design include uniqueness, standardization and versatility, extensibility and stability, ease of recognition and memory, strive for shortness and format unification, and ease of modification.

2. Open Source (Open Source) is known collectively as Open Source code. The open source is to modify and learn the user on the basis of the source code, but the open source system also has copyright and is protected by law. The open source software in the market is in endless, and many people may consider the most obvious characteristic of the open source software to be free, but in reality, the most obvious characteristic of the open source software is not the same, and the open source software is opened, that is, any person can obtain the source code of the software, modify, learn and even redistribute the source code, and certainly within the copyright limitation range.

The open source system has two groups of users facing the open source system, namely programmers who are most concerned about source codes and cannot develop and utilize the source codes for the second time; the second is common end users who only care that the software function can be not strong enough. The open source system is mainly 'open', is accepting, containing and developing, finds out the same thing and different things, is mutual interest and win-win, and is the essence of open source.

When the user uses the open source product, the user needs to indicate that the product is from the open source software and the name of the source code writer is noted, and the modified product is returned to the open source software, otherwise the modified product is regarded as infringement. At present, domestic piracy is extremely abused, even if software of a closed source is illegally pirated, even the copyright is falsified, and the copyright falsification of software of an open source is a simple operation of searching and replacing. The indifference of copyright awareness is the biggest obstacle to domestic open source development.

The open source system starts late in China, but develops quickly, and is definitely the mainstream in the industry in the future. Those who are open sources, seemingly open sources, will be angry by the software that encrypts their core code. The open source true meaning is modified and learned by using the source code, and after the source true meaning is solved, the actions and events of counterfeiting, infringement and illegal are necessarily reduced. An open source does not merely represent open program source code.

With the development of electronic commerce, online shopping is more and more popular, about one fourth of three hundred million online residents have online shopping experience, and the large online shopping consumer market also enables more and more small and medium-sized companies and large private network merchants to develop own online shops, especially independent online shops and business platforms on company electronic commerce, so that the purposes of developing own independent online shop brands, managing and expanding online propaganda and marketing channels are achieved.

3. The contribution degree, also called contribution rate, is an index for analyzing economic benefit. It refers to the ratio of the amount of useful or useful result to the consumption and occupation of resources, i.e., the ratio of the amount produced to the amount put in, or the ratio of the amount obtained to the amount spent.

The embodiment of the invention provides a scheme for determining (evaluating) the contribution degree of an open source of a code, which comprises the following steps: (1) extracting basic characteristics of the code; (2) extracting the characteristics of the code reuse influence; (3) extracting the characteristics of the code on development influence; (4) fusing basic contribution characteristics of codes, reusing influence characteristics of codes and developing influence characteristics of codes, and calculating the fraction value of the open source contribution degree by using multiple linear regression. Compared with the results obtained by manual evaluation or simple calculation methods, the method can obtain more accurate evaluation results of the contribution degree of the open source code. The following describes the determination scheme of the code open source contribution degree in detail.

Fig. 1 is a schematic flow chart of a method for determining an open-source contribution degree in an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

step 101: extracting basic contribution features of the code;

step 102: extracting the characteristics of the code reuse influence;

step 103: extracting the characteristics of the code on development influence;

step 104: and determining the open source contribution degree of the code according to the basic contribution characteristics of the code, the characteristics of the code reuse influence and the characteristics of the code on the development influence.

In the embodiment of the invention, the determination scheme of the code open source contribution degree is realized by the following steps: extracting basic contribution features of the code; extracting the characteristics of the code reuse influence; extracting the characteristics of the code on development influence; and determining the code open source contribution degree according to the basic contribution characteristics of the code, the characteristics of the code reuse influence and the characteristics of the code on the development influence, so that the code open source contribution degree can be efficiently and accurately determined. The individual steps involved in the method are described in detail below.

First, the basic contribution feature of the code is extracted in the above step 101.

For a developer, define d, the basic feature as C1The method is composed of two parts, the first part is the total line number of submitted codes, namely LoC (d), and the second part is the number of times of submitting the codes, namely NoC (d).

That is, in one embodiment, extracting the base contribution features of the code may include: and determining the basic contribution characteristics of the code according to the total number of submitted codes and the number of times of submitting the code.

In one embodiment, determining the base contribution features of the code based on the total number of lines of code submitted and the number of times the code was submitted may include determining the base contribution features of the code according to the following formula:

C1=α×LoC(d)+β×NoC(d)+γ;

wherein, C1For the basic contribution features of the code, loc (d) is the total number of lines of the code submitted, noc (d) is the number of times the code was submitted, and α, β, γ are known parameter values.

When the method is specifically implemented, the basic contribution characteristics of the code can be efficiently and accurately determined, and further the code open source contribution degree can be efficiently and accurately determined.

Secondly, next, the above step 102 is introduced to extract the influence characteristics of code reuse.

In one embodiment, as shown in fig. 2, extracting the characteristics of the code reuse impact may include:

step 1021: determining the ranking of each function in the code based on the method of PageRank;

step 1022: the characteristics of the code reuse impact are determined according to the ranking of each function.

The method for obtaining the code by using the existing tool calls the graph data, and calculates the rank of each function based on the method of the PageRank, that is, in one embodiment, determining the rank of each function in the code based on the method of the PageRank may include determining the rank of each function according to the following formula:

wherein, FiRepresenting the ith function, PR (F), in the codei) Ranking for the ith function in the code, S (F)ji) Representing a calling function FiAll ofSet of functions, FjIs S (F)ji) The j-th function in the set, PR (F)j) Is S (F)ji) Ranking of the jj-th function in the set, njRepresenting a calling function FiN denotes all the function numbers, α is a known parameter value.

Then, for developer d, the accumulated value of the score values of all functions of the submitted code is his final score, i.e. the impact characteristics of code reuse, and specifically, in one embodiment, determining the characteristics of code reuse impact according to the ranking of each function may include:

wherein, C2For code reuse influencing features, SD (F)i) Set of all functions submitting code for developers, FiRepresenting a function in the code, PR (F)i) The ranking for each function in the code.

In specific implementation, the ranking method of each function and the method for determining the characteristics of the code reuse influence can efficiently and accurately determine the characteristics of the code reuse influence, and further can efficiently and accurately determine the code open source contribution degree.

Thirdly, next, the above step 103 is introduced, and the characteristics of the code on the development influence are extracted.

Each time the code is submitted, a submit log is made, the text data is encoded and then classified. Mainly comprising an input layer, text coding, knowledge coding and finally text category prediction.

1) Input Embedding of Input layer

The input layer comprises two parts: a short text sequence of length n and an entity sequence of length m, using vectors comprising: character embedding, word embedding, and concept embedding, CNN for character level, pre-training word vector for word and concept level.

2) Text Encoding

The role of this module is to calculate the text x ═ x (x)1,x2,…,xn) The sentence of (c) represents q. Prior to using self-attention, BiLSTM was added to convert the underlying input. Note that the force mechanism uses a weighted sum to generate an output vector, and thus its representation capability is limited. Meanwhile, BilSTM is good at capturing the context information of the sequence, and can further improve the expression capability of the attention network.

After the output of the BilSTM, the BiLSTM goes through a self-attention mechanism (scaled dot-product attention mechanism)

The matrix output from the attention mechanism is denoted a and then a layer of max firing is used to obtain a representation q of the sentence, with the aim of selecting the maximum in each dimension of the vector to capture the most important features.

3) Knowledge Encoding

Given a set of concepts C of size m, denoted C1,c2,…,cmWherein c isiIs the ith concept vector and its vector representation p needs to be obtained. Two attention mechanisms, C-ST (focus directions Short text), C-CS (focus directions focus set), are used to focus more on important concepts.

C-ST (concept languages Short text) is used for calculating semantic relevance of texts and corresponding concept sets, and adverse effects of some incorrect concepts introduced due to ambiguity of entities or KB (knowledge retrieval) noise are reduced.

αiRepresents the ith concept ciWeight of attention to text, αiThe larger the concept, the more relevant the semantics of the short text. f (-) is a non-linear activation function, e.g.The hyperbolic tangent transform tanh, softmax is used to normalize the attention weight of each concept. W1Is a weight matrix, w1Is a weight vector, b1Is the offset.

C-CS (Concept tolwards Concept set) is used to calculate the importance of each Concept in the Concept set.

βiRepresents the ith concept ciAttention weight, W, over the entire set of concepts2Is a weight matrix, w2Is a weight vector, b2Is the offset. The effect of C-CS attention is similar to that of feature selection. This is a "soft" feature selection that assigns greater weight to important concepts and lesser weight (near zero) to common concepts. Alpha is alphaiAnd betaiThe combination of (a) and (b) is as follows:

αirepresenting the final attention weight of the ith concept to the text, γ ∈ {0,1} is used to adjust αiAnd betaiThe 'soft switching' of the two weights, the setting method of the value of gamma is two kinds: taking gamma as a hyper-parameter, and manually adjusting to reach the optimum; the gamma participates in the training of the neural network and is automatically adjusted.

Currently, the second method is adopted, and the calculation method of γ is as follows:

γ=σ(wT[α;β]+b)

wherein, w and b are parameters needing to be learned, and sigma is a sigmoid function.

Finally, a weighted sum of the concept vectors is calculated, resulting in a semantic vector p representing the concept, i.e. in one embodiment the knowledge encoding is determined according to the following formula:

cirepresents the ith concept, and m represents a total of m concepts.

4) Text category prediction

Combining the text coding and the knowledge coding to generate a new coding e, namely:

e=[p;q]

classifying the merged codes by using softmax regression, and finally identifying a specific category, that is, in one embodiment, merging text codes and knowledge codes to obtain merged codes, classifying the merged codes by using softmax regression, and identifying a text category as a feature of the code on the development influence, which may include determining the feature of the code on the development influence according to the following formula:

C3=softmax(e)

wherein, C3For the characteristics of the code's impact on development, e ═ p; q. q.s]E is the combined code, p is the knowledge code, and q is the text code.

Such as repairing errors, making improvements, creating new functionality, maintaining document categories, etc.

From the above, in one embodiment, as shown in fig. 3, extracting the characteristics of the code on the development influence may include:

step 1031: acquiring a log text data vector submitted each time a code is submitted;

step 1032: performing text encoding on the log text data vector;

step 1033: knowledge encoding is carried out on the log text data vector;

step 1034: and combining the text codes and the knowledge codes to obtain combined codes, classifying the combined codes by using softmax regression, and identifying the text category as the characteristics of the code on development influence.

In specific implementation, the ranking method of each function and the method for determining the characteristics of the code reuse influence can efficiently and accurately determine the characteristics of the code on the development influence, and further can efficiently and accurately determine the contribution degree of the code to the development influence.

And fourthly, introducing the step 104, fusing the code basic characteristics, the code reuse influence characteristics and the code development influence characteristics, and calculating the fraction value of the open source contribution degree by using multivariate linear regression.

In one embodiment, determining the code open source contribution degree according to the basic contribution characteristics of the code, the characteristics of the code reuse influence and the characteristics of the code on the development influence may include determining the code open source contribution degree according to the following formula:

wherein S isdContribution degree for code open source, wiB is a known parameter value, CiIncluding basic contribution features, features of code reuse impact and features of code versus development impact.

In specific implementation, the detailed implementation method for determining the code open source contribution degree according to the basic contribution characteristics of the code, the characteristics of the code reuse influence and the characteristics of the code on the development influence can efficiently and accurately determine the code open source contribution degree.

The embodiment of the present invention further provides a device for determining the contribution degree of the open source of the code, as described in the following embodiments. Because the principle of the device for solving the problem is similar to the method for determining the contribution degree of the code open source, the implementation of the device can refer to the implementation of the method for determining the contribution degree of the code open source, and repeated details are not repeated.

Fig. 4 is a schematic structural diagram of an apparatus for determining an open-source contribution degree in an embodiment of the present invention, as shown in fig. 4, the apparatus includes:

a first extraction unit 01 for extracting basic contribution features of the code;

a second extraction unit 02 for extracting features of the code reuse impact;

a third extraction unit 03, configured to extract features of the code on development influence;

and the determining unit 04 is configured to determine the code open source contribution degree according to the basic contribution feature of the code, the feature of the code reuse influence, and the feature of the code on the development influence.

In one embodiment, the first extraction unit is specifically configured to: and determining the basic contribution characteristics of the code according to the total number of submitted codes and the number of times of submitting the code.

In one embodiment, the first extraction unit is specifically configured to determine the basic contribution feature of the code according to the following formula:

C1=α×LoC(d)+β×NoC(d)+γ;

wherein, C1For the basic contribution features of the code, loc (d) is the total number of lines of the code submitted, noc (d) is the number of times the code was submitted, and α, β, γ are known parameter values.

In an embodiment, the second extracting unit is specifically configured to:

determining the ranking of each function in the code based on the method of PageRank;

the characteristics of the code reuse impact are determined according to the ranking of each function.

In an embodiment, the second extracting unit is specifically configured to determine the rank of each function according to the following formula:

wherein, FiRepresenting the ith function, PR (F), in the codei) Ranking for the ith function in the code, S (F)ji) Representing a calling function FiSet of all functions of, FjIs S (F)ji) The j-th function in the set, PR (F)j) Is S (F)ji) Ranking of the jth function in the set, njRepresenting a calling function FiN denotes all the function numbers, α is a known parameter value.

In an embodiment, the second extracting unit is specifically configured to determine the rank of each function according to the following formula:

wherein, C2For code reuse influencing features, SD (F)i) Set of all functions submitting code for developers, FiRepresenting a function in the code, PR (F)i) The ranking for each function in the code.

In an embodiment, the third extraction unit is specifically configured to determine a feature of the code reuse influence according to the following formula:

acquiring a log text data vector submitted each time a code is submitted;

performing text encoding on the log text data vector;

knowledge encoding is carried out on the log text data vector;

and combining the text codes and the knowledge codes to obtain combined codes, classifying the combined codes by using softmax regression, and identifying the text category as the characteristics of the code on development influence.

In an embodiment, the third extraction unit is specifically configured to determine a feature of the code reuse influence according to the following formula:

C3=softmax(e);

wherein, C3For the characteristics of the code's impact on development, e ═ p; q. q.s]E is the combined code, p is the knowledge code, and q is the text code.

The determining unit is specifically configured to determine the code open source contribution degree according to the following formula:

wherein S isdContribution degree for code open source, wiB is a known parameter value, CiIncluding basic contribution features, features of code reuse impact and features of code versus development impact.

An embodiment of the present invention further provides a computer device, as shown in fig. 5, including a memory 302, a processor 304, and a computer program stored on the memory and executable on the processor, where the processor implements the above-mentioned arbitrary method for determining the code open source contribution degree when executing the computer program.

In particular, the computer device may be a computer terminal, a server or a similar computing device.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer program for executing any method for determining a code open source contribution degree is stored in the computer-readable storage medium.

In particular, computer-readable storage media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer-readable storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable storage medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

In the embodiment of the invention, the determination scheme of the code open source contribution degree is realized by the following steps: extracting basic contribution features of the code; extracting the characteristics of the code reuse influence; extracting the characteristics of the code on development influence; and determining the code open source contribution degree according to the basic contribution characteristics of the code, the characteristics of the code reuse influence and the characteristics of the code on the development influence, so that the code open source contribution degree can be efficiently and accurately determined.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

17页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种web前端代码转换方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!