Dimension reduction comparison method for two large integers in big data analysis

文档序号：1708491 发布日期：2019-12-13 浏览：23次中文

阅读说明：本技术 一种大数据分析中两个大整数降维比较方法 (Dimension reduction comparison method for two large integers in big data analysis ) 是由沈华张明武刘白张依梦于 2019-08-29 设计创作，主要内容包括：本发明公开了一种大数据分析中两个大整数降维比较方法,在大数据的应用中,比较两个大整数是使用频率很高的基本运算,该运算的效率高低直接对大数据应用的实现效率产生影响。为了高效比较两个大整数,本发明首先分别提取两个大整数的特征值,然后通过比较大整数的特征值得到两个大整数的比较结果。如果大整数是n比特长的数,本发明提出的大整数的特征值是<Image he="63" wi="91" file="DDA0002184439560000011.GIF" imgContent="drawing" imgFormat="GIF" orientation="portrait" inline="no"></Image>比特长的数。因此本发明将比较两个n比特长的数转换为比较<Image he="68" wi="83" file="DDA0002184439560000012.GIF" imgContent="drawing" imgFormat="GIF" orientation="portrait" inline="no"></Image>比特长的数,从而提高了求解效率,因此高效地解决了两个大整数的比较问题。(The invention discloses a dimension reduction comparison method for two large integers in big data analysis. In order to efficiently compare two large integers, the characteristic values of the two large integers are respectively extracted, and then the comparison result of the two large integers is obtained through the characteristic value of the larger integer. If the large integer is n bits long, the characteristic value of the large integer proposed by the invention is Number of bits long. The invention thus converts comparing two n-bit long numbers into a comparison A number of bits long, therebyThe solution efficiency is improved, and therefore the problem of comparison of two large integers is solved efficiently.)

1. A dimension reduction comparison method for two large integers in large data analysis is provided, wherein the two given large integers are A and B, the binary lengths of the two large integers are n, the n is close to or larger than the bit number of a computer address bus, namely the sizes of the A and the B are close to or exceed the maximum integer which can be stored in a computer memory; with (a)_n-1,a_n-2,…,a₁,a₀) N binary bits representing a large integer A, denoted by (b)_n-1,b_n-2,…,b₁,b₀) N binary bits representing a large integer B;

Characterized in that the method comprises the following steps:

Step 1: representing the large integer A and the large integer B in binary form (a)_n-1,a_n-2,…,a₁,a₀) And (b)_n-1,b_n-2,…,b₁,b₀)；

Step 2: the large integer a and the large integer B are compared based on their binary form.

2. The method for dimension reduction comparison of two large integers in big data analysis according to claim 1, wherein the step 2 is implemented by the following steps:

Step 2.1: extracting characteristic values of the large integers from the highest bit of the large integers to the direction of the low bit; assuming that the extracted characteristic value of the large integer A is F_Af is the characteristic value of the extracted large integer B_B＝(j,t)；

Step 2.2: characteristic value according to large integer A is F_Af for (i, k) and a large integer B_B(j, t), the following judgment is made:

(i) If i < j, the big integer A is smaller than the big integer B, and the comparison process is ended;

(ii) If i is greater than j, the large integer A is greater than the large integer B, and the comparison process is ended;

(iii) If i ═ j, then continue comparing k and t:

a) If k is less than t, the big integer A is less than the big integer B, and the comparison process is ended;

b) If k is greater than t, the big integer A is greater than the big integer B, and the comparison process is ended;

c) if (k ═ t and k ═ i +1) or (k ═ t and i-k ═ 0), then the large integer a equals the large integer B and the comparison process ends; otherwise, re-extracting the characteristic value F of the large integer A from the i-k-1 th bit of the large integer A and the j-t-1 th bit of the large integer B to the direction of the low bit_ACharacteristic value F of ═ i, k and large integer B_B(j, t), i.e. from (a)_i-k-1,a_i-k-2,…,a₁,a₀) And (b)_j-t-1,b_j-t-2,…,b₁,b₀) Starting with the highest bit of the sequence, re-extracting the characteristic value F of the large integer A in the direction of the low bits_ACharacteristic value F of ═ i, k and large integer B_Bthen repeat step 2.2.

3. The method for dimension reduction comparison of two large integers in big data analysis according to claim 2, wherein the step 2.1 is implemented by the following steps:

According to the binary bit form of the large integer A (a)_n-1,a_n-2,…,a₁,a₀) The first bit with "1" value is found from the high bit to the low bit, assuming a_iI.e. a_n-1＝a_n-2＝…＝a_i+10 and a_i1 and statistically from a_iHow many bits of successive "value 1" start to appear, assuming that there are k such bits, i.e. a_i＝a_i-1＝…＝a_i-(k-1)1 and a_i-kwhen k is i +1, then a is 0_i＝a_i-1＝…＝a₀1 is ═ 1; if all bits are 0, i.e. a_n-1＝a_n-2＝…＝a₀when the value is equal to 0, the value is equal to 0; obviously, the characteristic value F of the large integer A_AI, k in (i, k) are both [0, n ]]Integers in the range whose binary length is log₂ ⁿ(ii) a Wherein k is more than or equal to 1<i+1；

According to the binary bit form (B) of the large integer B_n-1,b_n-2,…,b₁,b₀) The first bit with "1" value is found from the high bit to the low bit, and b is assumed to be_jI.e. b_n-1＝b_n-2＝…＝b_j+10 and b_j1 and statistically from b_jHow many consecutive bits of "1" value start to appear, assuming there are t such bits, i.e. b_j＝b_j-1＝…＝b_j-(t-1)1 and b_j-t0, when t is j +1, then b_j＝b_j-1＝…＝b₀1 is ═ 1; if all bits are 0, i.e. b_n-1＝b_n-2＝…＝b₀When j is 0, t is 0; obviously, the characteristic value F of the large integer B_BJ, t in (j, t) are all [0, n]Integers in the range of their binary lengthWherein, t is more than or equal to 1<j+1；

Through the characteristic value extraction, the comparison of two large integers A and B with the binary length of n is converted into the corresponding binary length ofComparison of characteristic values of (a).

Technical Field

The invention belongs to the technical field of big data analysis, and relates to a dimension reduction comparison method for two big integers in big data analysis.

Background

The size of the computer memory is determined by the number of bits of the address bus, and if the number of bits of the address bus is 32, the size of the memory is 2³²in this case, how to implement the comparison of large integers⁹⁹⁹⁹,2¹⁰⁰⁰⁰-1]Two large integers in the range, each integer being 2 in size²⁰Then each integer will be divided into 500 blocks, and the comparison requires 500 iterations, and the amount of memory space spent in each iteration is (2 × 2)²⁰)/8＝2¹⁸And B, the storage efficiency and the calculation efficiency are not high. Therefore, how to realize efficient large integer comparison is a considerable research problem.

Disclosure of Invention

in order to solve the problems, the invention provides a dimension reduction comparison method for two large integers in large data analysis.

The technical scheme adopted by the invention is as follows: the technical scheme adopted by the invention is as follows: a dimension reduction comparison method for two large integers in big data analysis. Given that the two large integers are A and B, their binary lengths are n, which is close to or greater than the number of bits on the computer address bus, i.e., A and B are close to or exceed the maximum integer that can be stored in the computer memory. With (a)_n-1,a_n-2,…,a₁,a₀) N binary bits representing a large integer A, denoted by (b)_n-1,b_n-2,…,b₁,b₀) Representing the n binary bits of the large integer B.

Characterized in that the method comprises the following steps:

Step 1: representing the large integer A and the large integer B in binary form (a)_n-1,a_n-2,…,a₁,a₀) And (b)_n-1,b_n-2,…,b₁,b₀)；

Step 2: the comparison of the large integer a and the large integer B is implemented based on their binary form.

preferably, the specific implementation of step 2 includes:

Step 2.1: the characteristic value of the large integer is extracted in the direction of the low bit starting from the highest bit of the large integer. Assuming that the extracted characteristic value of the large integer A is F_AF is the characteristic value of the extracted large integer B_B＝(j,t)；

Step 2.2: characteristic value according to large integer A is F_AF for (i, k) and a large integer B_B(j, t), the following judgment is made:

(i) If i < j, the big integer A is smaller than the big integer B, and the comparison process is ended;

(ii) If i is greater than j, the large integer A is greater than the large integer B, and the comparison process is ended;

(iii) If i ═ j, then continue comparing k and t:

a) If k is less than t, the big integer A is less than the big integer B, and the comparison process is ended;

b) if k is greater than t, the big integer A is greater than the big integer B, and the comparison process is ended;

If (k ═ t and k ═ i +1) or (k ═ t and i-k ═ 0), then the large integer a equals the large integer B and the comparison process ends; otherwise, re-extracting the characteristic value F of the large integer A from the i-k-1 th bit of the large integer A and the j-t-1 th bit of the large integer B to the direction of the low bit_Acharacteristic value F of ═ i, k and large integer B_B(j, t), i.e. from (a)_i-k-1,a_i-k-2,…,a₁,a₀) And (b)_j-t-1,b_j-t-2,…,b₁,b₀) Starting with the highest bit of the sequence, re-extracting the characteristic value F of the large integer A in the direction of the low bits_ACharacteristic value F of ═ i, k and large integer B_B＝(jT), then step 2.2 is repeated.

Preferably, the specific implementation of step 2.1 comprises:

According to the binary bit form of the large integer A (a)_n-1,a_n-2,…,a₁,a₀) The first bit with "1" value is found from the high bit to the low bit, assuming a_iI.e. a_n-1＝a_n-2＝…＝a_i+10 and a_i1 and statistically from a_iHow many consecutive bits of "1" value begin to appear, assuming that there is k (1. ltoreq. k)<i +1) such bits, i.e. a_i＝a_i-1＝…＝a_i-(k-1)1 and a_i-k0, when k is i +1, means a_i＝a_i-1＝…＝a₀1. If all bits are 0 (i.e. a)_n-1＝a_n-2＝…＝a₀0), i-k-0. F is to be_AThe term (i, k) is defined as the characteristic value of a large integer a. Obviously, the characteristic value F of the large integer A_AI, k in (i, k) are both [0, n ]]integers in the range whose binary length is log₂ ⁿ。

According to the binary bit form (B) of the large integer B_n-1,b_n-2,…,b₁,b₀) The first bit with "1" value is found from the high bit to the low bit, and b is assumed to be_ji.e. b_n-1＝b_n-2＝…＝b_j+10 and b_j1 and statistically from b_jHow many consecutive bits with a "value of 1" start to appear, assuming that there is t (1. ltoreq. t)<j +1) such bits, i.e. b_j＝b_j-1＝…＝b_j-(t-1)1 and b_j-t0, when t is j +1, means b_j＝b_j-1＝…＝b₀1. If all bits are 0 (i.e. b)_n-1＝b_n-2＝…＝b₀0), j-t-0. F is to be_BWhere (j, t) is defined as the characteristic value of the large integer B. Obviously, the characteristic value F of the large integer B_BJ, t in (j, t) are all [0, n]Integers in the range of their binary length

Through the characteristic value extraction, the comparison of two large integers A and B with the binary length of n is converted into the corresponding binary length ofcomparison of characteristic values of (a). For example, assume that n is 1000, and a and B belong to [2 ═ B⁹⁹⁹,2¹⁰⁰⁰-1]Two large integers in the range, their comparison will be converted to belong to after the above conversioninteger comparisons within ranges.

compared with the prior art, the method of the invention has the following advantages and beneficial effects:

The invention discloses a dimension reduction comparison method for two large integers in big data analysis. Aiming at the problem of big integer comparison in a big data application environment, the method firstly extracts the characteristic value of each big integer; the comparison of large integers is then converted to a comparison of corresponding eigenvalues. If the bit length of a large integer is n, then it is for the eigenvalue that has a bit length ofFor example, suppose two belongings are compared [2⁹⁹⁹⁹,2¹⁰⁰⁰⁰-1]two large integers in the range and assuming that the conventional method based on the idea of "divide and conquer" has each integer in size of 2²⁰Then each integer will be divided into 500 blocks, and the comparison requires 500 iterations, and the amount of memory space spent in each iteration is (2 × 2)²⁰)/8＝2¹⁸B, the storage efficiency and the calculation efficiency are not high, if the dimension reduction conversion is carried out, the comparison of the storage efficiency and the calculation efficiency is converted to belong toInteger comparison in range, required memory emptyThe size of the gap isTherefore, the invention realizes the obvious dimension reduction of the large integer and obviously improves the comparison efficiency of the large integer.

drawings

FIG. 1: a flow chart of an embodiment of the invention;

FIG. 2: the extraction of the characteristic value of the large integer A in the embodiment of the invention is schematically shown in figure 1;

FIG. 3: the extraction of the characteristic value of the large integer A in the embodiment of the invention is schematically shown in figure 2;

FIG. 4: the extraction of the characteristic value of the large integer A in the embodiment of the invention is schematically shown in figure 3;

FIG. 5: the extraction of the characteristic value of the large integer A in the embodiment of the invention is schematically shown in figure 4;

FIG. 6: the extraction of the characteristic value of the large integer B in the embodiment of the invention is schematically shown in figure 1;

FIG. 7: the extraction of the characteristic value of the large integer B in the embodiment of the invention is schematically shown in figure 2;

FIG. 8: the extraction of the characteristic value of the large integer B in the embodiment of the invention is schematically shown in figure 3;

FIG. 9: the extraction of the characteristic value of the large integer B in the embodiment of the invention is schematically shown in figure 4.

Detailed description of the invention

In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.

Referring to fig. 1, the dimension reduction comparison method for two large integers in big data analysis provided by the present invention includes the following steps:

Step 1: representing the large integer A and the large integer B in binary form (a)_n-1,a_n-2,…,a₁,a₀) And (b)_n-1,b_n-2,…,b₁,b₀)；

Step 2: the comparison of the large integer a and the large integer B is implemented based on their binary form.

Referring to FIGS. 2 and 3, the binary bit pattern (a) is based on the large integer A_n-1,a_n-2,…,a₁,a₀) The first bit with "1" value is found from the high bit to the low bit, assuming a_iI.e. a_n-1＝a_n-2＝…＝a_i+10 and a_i1 and statistically from a_iHow many consecutive bits of "1" value begin to appear, assuming that there is k (1. ltoreq. k)<i +1) such bits, i.e. a_i＝a_i-1＝…＝a_i-(k-1)1 and a_i-k＝0。

Referring to fig. 4, when k is i +1, it means a_i＝a_i-1＝…＝a₀＝1。

Referring to FIG. 5, if all the binary bits are 0 (i.e., a)_n-1＝a_n-2＝…＝a₀0), i-k-0.

obviously, the characteristic value F of the large integer A_AI, k in (i, k) are both [0, n ]]Integers in the range whose binary length is log₂ ⁿ。

Referring to FIGS. 6 and 7, the binary bit pattern (B) is based on the large integer B_n-1,b_n-2,…,b₁,b₀) The first bit with "1" value is found from the high bit to the low bit, and b is assumed to be_jI.e. b_n-1＝b_n-2＝…＝b_j+10 and b_j1 and statistically from b_jhow many consecutive bits with a "value of 1" start to appear, assuming that there is t (1. ltoreq. t)<j +1) such bits, i.e. b_j＝b_j-1＝…＝b_j-(t-1)1 and b_j-t＝0。

See fig. 8, when t is j +1, it means b_j＝b_j-1＝…＝b₀＝1。

see fig. 9, if all twoThe carry bits are all 0 (i.e. b)_n-1＝b_n-2＝…＝b₀0), j-t-0.

Obviously, the characteristic value F of the large integer B_BJ, t in (j, t) are all [0, n]integers in the range of their binary length

Step 2.2: characteristic value according to large integer A is F_AF for (i, k) and a large integer B_B(j, t), the following judgment is made:

(i) If i < j, the big integer A is smaller than the big integer B, and the comparison process is ended;

(ii) If i is greater than j, the large integer A is greater than the large integer B, and the comparison process is ended;

(iii) If i ═ j, then continue comparing k and t:

a) If k is less than t, the big integer A is less than the big integer B, and the comparison process is ended;

b) If k is greater than t, the big integer A is greater than the big integer B, and the comparison process is ended;

c) If (k ═ t and k ═ i +1) or (k ═ t and i-k ═ 0), then the large integer a equals the large integer B and the comparison process ends; otherwise, re-extracting the characteristic value F of the large integer A from the i-k-1 th bit of the large integer A and the j-t-1 th bit of the large integer B to the direction of the low bit_ACharacteristic value F of ═ i, k and large integer B_B(j, t), i.e. from (a)_i-k-1,a_i-k-2,…,a₁,a₀) And (b)_j-t-1,b_j-t-2,…,b₁,b₀) Starting with the highest bit, re-extracting the eigenvalues of the large integer a and the eigenvalues of the large integer B towards the low bit direction, and then repeating step 2.2.

The invention provides a dimension reduction comparison method aiming at solving the problem of comparison of two large integers under a big data application environment. Firstly, the characteristic values of the two large integers are respectively extracted, and then the comparison result of the two large integers is obtained through the characteristic values of the larger integers. If the large integer is n bits long, the invention proposesThe characteristic value of the large integer isNumber of bits long. The invention thus converts comparing two n-bit long numbers into a comparisonThe number of bits is long, so that the solving efficiency is improved, and the problem of comparing two large integers in a large data application environment is solved efficiently.

the method can be used for searching, sorting, matching and other applications in a big data environment, and has high practicability.

It should be understood that parts of the specification not set forth in detail are prior art; the above description of the preferred embodiments is intended to be illustrative, and not to be construed as limiting the scope of the invention, which is defined by the appended claims, and all changes and modifications that fall within the metes and bounds of the claims, or equivalences of such metes and bounds are therefore intended to be embraced by the appended claims.

9页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：数值的随机修约

Dimension reduction comparison method for two large integers in big data analysis

相关技术

网友询问留言