DNA data storage coding method

文档序号:1127735 发布日期:2020-10-02 浏览:5次 中文

阅读说明:本技术 一种dna数据存储编码方法 (DNA data storage coding method ) 是由 任兆瑞 于 2020-06-24 设计创作,主要内容包括:本发明公开了一种DNA数据存储编码方法,方法首先将无理数转换为二进制密钥字符串后,将二进制信息原文与二进制密钥进行逻辑运算得到二进制密文信息,最后将二进制密文信息逐两位进行转换形成存储的DNA序列。二进制密钥字符串转换方法可以采用将十进制无理数直接转换为二进制后去掉小数点或是将十进制无理数去掉小数点后将各数字位上的奇数变为1,偶数变为0等方法。采用本发明的DNA数据存储编码方法,能够克服DNA序列生成过程中易产生较多重复序列从而无法应用的缺陷,且加密后被破解的可能性较小,从而发挥出DNA数字存储的巨大优势。(The invention discloses a DNA data storage coding method, which comprises the steps of firstly converting irrational numbers into binary key character strings, then carrying out logic operation on binary information original texts and binary keys to obtain binary secret text information, and finally converting the binary secret text information into a stored DNA sequence by two bits. The binary key character string conversion method can adopt the method of directly converting decimal irrational numbers into binary numbers and then removing decimal points, or changing odd numbers on each digit position into 1 and even numbers into 0 after removing decimal points from the decimal irrational numbers. The DNA data storage coding method can overcome the defect that more repetitive sequences are easy to generate in the process of generating the DNA sequence and cannot be applied, and has low possibility of being cracked after encryption, thereby exerting the great advantage of DNA digital storage.)

1. A DNA data storage encoding method, comprising the steps of:

step S1, converting the irrational number into a binary key character string;

step S2, performing bitwise logical operation on the binary information text and the binary key character string obtained in step S1 to obtain binary information;

and step S3, sequentially converting the binary ciphertext information obtained in the step S2 bit by bit, converting the four combinations of 00, 01, 10 and 11 into four DNA bases of A/T/G/C, and forming a DNA sequence.

2. The DNA data storage encoding method of claim 1, wherein the binary key string converting method in step S1 is: and directly converting the decimal irrational number into a binary number, and removing decimal points.

3. The DNA data storage encoding method of claim 1, wherein the binary key string converting method in step S1 is: after decimal irrational number is removed, odd number on each digit is changed into 1, even number is changed into 0.

4. The method for encoding a DNA data storage according to claim 1, wherein the logical operation in step S2 is an exclusive OR operation.

5. The method for encoding DNA data storage according to claim 1, wherein the logical operation in step S2 is a non-operation.

Technical Field

The invention discloses a coding method for storing data by using biological genetic information DNA, belonging to the technical field of biotechnology and information.

Background

The human information such as characters, sound, images and the like has various storage carriers, such as ancient carapace bones, carved stones, silk, sheepskin, bamboo slips and paper; modern records, tapes, floppy disks, compact disks, hard disks, etc. In the fifties of the last century, DNA (deoxyribonucleic acid) was confirmed by scientists as a genetic information carrier for organisms, and has its own natural advantages in data storage compared to general information storage carriers: the storage density is high, and 1 g of DNA can store all book contents in the world; the storage is stable and long, and can reach tens of thousands of years to millions of years; is convenient to carry, can be carried by bacteria or organisms in vivo or can be stored in a container for a long time, and the like.

DNA is an important genetic material carrier, is a linear or annular double-helix biological macromolecule generated by billions of years of evolution of natural organisms, and has a structure that two chain-shaped complementary bases are complementary to each other. The traditional information carriers such as paper are mainly represented by graphic characters, etc., the information carriers such as optical disks are represented by binary electrical signals 0/1, DNA is realized by sequentially arranging four bases of A/T/G/C, and different sequences represent different information, which is equivalent to quaternary system.

In general, in DNA of living organisms, it is necessary to make the distribution and ratio of A/T/G/C bases uniform and to minimize the repetition of large segments, and there is a special demand for the use of DNA as an information carrier. Because 0 and 1 of the electric signal and the magnetic signal are respectively realized by using whether to be electrified or different magnetic pole directions, the repeated sequence has no influence on the storage, the copy and the like of the information; however, the replication in the DNA is performed by a biological mechanism, and a large repeated sequence, whether a repeat of a single base (e.g. 100 a in series) or a repeat of a certain length (e.g. 100 ACTTs in series), can cause errors such as subsequent recombination or mismatch, which seriously affects the information storage of the DNA.

The binary original information can obtain a string of DNA sequences in a way of generating one A/T/G/C base for every two bits, and then synthesize DNA real objects of the sequences by chemical synthesis and PCR to store information. However, there is often a large repetition of segments due to the vast majority of information. If not possible to do some transcoding, the resulting DNA sequence will have many repetitive segments that make it biologically very difficult to synthesize and replicate the DNA molecule. At present, the DNA sequence synthesis and sequence determination technology is mature, and the related cost is further greatly reduced, so that the application obstacle of the DNA data storage is urgently needed to be solved, and the DNA digital storage method is widely popularized.

Disclosure of Invention

Technical problem to be solved by the invention

The invention provides a DNA data storage coding method, aiming at solving the problem that more repeated sequences are generated when the existing binary original information is stored in DNA.

Technical scheme

In order to solve the technical problems, the invention adopts the following technical scheme:

a DNA data storage encoding method comprising the steps of:

step 1, converting an irrational number into a binary key character string;

step 2, carrying out bitwise logical operation on the binary information original text and the binary key character string obtained in the step 1 to obtain binary information;

and 3, sequentially converting the binary ciphertext information obtained in the step 2 two bits by two bits, and converting the four combinations of 00, 01, 10 and 11 into four DNA bases of A/T/G/C to form a DNA sequence.

Further, the method for converting the binary key string in step 1 is as follows: the decimal irrational number is directly converted into binary number and then the decimal point is removed, or the decimal irrational number is removed and then the odd number on each digit is changed into 1 and the even number is changed into 0.

Further, the logical operation in step 2 adopts an exclusive-or operation or a nor operation method.

Advantageous effects

Compared with the prior art, the technical scheme provided by the invention has the following beneficial effects:

the coding method overcomes the defect that more repetitive sequences are easy to generate in the process of generating the DNA sequence and cannot be applied, and the encrypted DNA sequence has low possibility of being cracked;

the application of the coding method of the invention can popularize the DNA digital storage method and exert the great advantage of the DNA digital storage.

Drawings

FIG. 1 is a block flow diagram of a data storage method of the present invention;

FIG. 2 is a diagram illustrating an example of a process for storing the Chinese character "Hua" using the method of the present invention;

FIG. 3 is a diagram illustrating an example of a process for storing a Chinese character "one" using the method of the present invention.

Detailed Description

For a further understanding of the invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings.

As shown in fig. 1, the first step of the method is to convert an irrational number, such as a circumferential ratio pi, a natural base e,

Figure BDA0002555166830000021

etc., in methods including, but not limited to, those irrational numbers per se and those derived from functions thereof, such as 2 pi/3,

Figure BDA0002555166830000022

etc.;

taking the circumferential ratio pi as an example, the first twenty bits converted into binary according to the binary conversion rule are: 11.001001000011111101 removing decimal point and taking 11001001000011111101 as random key string;

another way is to change the odd number on each digit of the decimal irrational number to 1 and the even number to 0, and obtain a random code key character string, such as:

decimal circumferential ratio: 31415926535897932

Corresponding to the random code key: 11011100111011110

During application, irrational numbers for generating the binary key character string include but are not limited to 10-system, 2-system itself, 4-system, 8-system and the like, and only the 2-system is completed according to the corresponding conversion rule.

The second step of the method is to perform bitwise logical operation on the information text and the binary key character string, in this embodiment, logical xor and logical not are adopted, and the operation rule of logical xor is as follows: 0^0 ═ 0; 0^ 1; 1^ 0^ 1; 1^ 0; the operation rule of the logical negation is as follows: 0|0 ═ 1; 0|1 ═ 0; 1|0 ═ 0; 1|1 ═ 1;

in practical applications, since the length of the irrational number is infinite, for example, the circumferential ratio pi can be calculated to trillions of bits at present, the length of the required irrational number can be determined according to the length of the actual original binary information, and furthermore, generating the random code sequence from the irrational number includes, but is not limited to, starting from the beginning, such as starting from a specified bit in a positive or reverse order, or in an alternate or several alternate order.

To simplify the explanation, 8 bits of binary information are used for illustration. For example, a represents the original binary information, b represents the binary string of the random code key, and c represents the binary ciphertext.

Here, four different original information a1, a2, a3 and a4 are taken, and encoded by using a character string obtained by taking parity from the first 8 bits of a circumference ratio key:

a1=00000000,a2=11111111,a3=10001110,a4=01010101,b=11011100

(1) the encoding operation is an exclusive-or operation, namely a ^ b ═ c; the reduction operation is an exclusive-or operation, and c ^ b ═ a.

And (3) encoding:

a1^ b ^ c1, the result of the bitwise operation is: 00000000^11011100 ^11011100

a2^ b ^ c2, the result of the bitwise operation is: 11111111^11011100 ^ 00100011

a3^ b ^ c3, the result of the bitwise operation is: 10001110^11011100 ^ 01010010

a4^ b ^ c4, the result of the bitwise operation is: 01010101^11011100 ^ 10001001

Reduction:

c1^ b ^ a1, the result of the bitwise operation is: 11011100^11011100 ^ 00000000

c2^ b ^ a2, the result of the bitwise operation is: 00100011^11011100 ^ 11111111

c3^ b ^ a3, the result of the bitwise operation is: 01010010^11011100 ^ 10001110

c4^ b ^ a4, the result of the bitwise operation is: 10001001^11011100 ^ 01010101

(2) The encoding operation is a negation operation, a | b ═ c; the reduction operation is not operation, c | b ═ a

And (3) encoding:

a1| b ═ c1, and the result of the bitwise operation is: 00000000|11011100 ═ 00100011

a2| b ═ c2, and the result of the bitwise operation is: 11111111|11011100 ═ 11011100

a3| b ═ c3, and the result of the bitwise operation is: 10001110|11011100 ═ 10101101

a4| b ═ c4, and the result of the bitwise operation is: 01010101|11011100 ═ 01110110

Reduction:

c1| b ═ a1, and the result of the bitwise operation is: 00100011|11011100 ═ 00000000

c2| b ═ a2, and the result of the bitwise operation is: 11011100|11011100 ═ 11111111

c3| b ═ a3, and the result of the bitwise operation is: 10101101|11011100 ═ 10001110

c4| b ═ a4, and the result of the bitwise operation is: 01110110|11011100 ═ 01010101

As can be seen from the above example, the encoded information transforms the sequence in the original information, whether all 0, all 1, or the information with more uniform distribution of 1 and 0, into the information with more uniform distribution of 1 and 0. In practice, the logical operations between the binary key and the message source include, but are not limited to, the above two operations.

In the third step of the method, binary ciphertext information is converted into DNA sequence information two by two, for example, DNA base a is 00, T is 11, G is 10, C is 01, and C1 binary ciphertext after xor operation is 11011100, and the converted DNA sequence is TCTA.

As shown in fig. 2, the chinese character "hua" is converted into a binary lattice diagram of 16x16, after removing the decimal point, the irrational number pi takes 1-256 bits, generates a binary key character string according to the method that the odd number of each digital bit is changed into 1 and the even number is changed into 0, performs bitwise exclusive or operation with the original binary lattice information, and converts into a DNA sequence according to the mode of 00- > a01- > C10- > G11- > T.

As shown in FIG. 3, convert the Chinese character "one" into a binary bitmap of 16x16, remove the decimal point by the irrational number pi, take 257 and 512 bits, generate a binary key character string according to the method that the odd number of each digital bit becomes 1 and the even number becomes 0, perform bitwise XOR operation with the original binary bitmap information, and convert into DNA sequence according to the mode of 00- > A01- > C10- > G11- > T.

The present invention and its embodiments have been described above schematically, without limitation, and what is shown in the drawings is only one of the embodiments of the present invention, and the actual structure is not limited thereto. Therefore, if the person skilled in the art receives the teaching, without departing from the spirit of the invention, the person skilled in the art shall not inventively design the similar structural modes and embodiments to the technical solution, but shall fall within the scope of the invention.

8页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:汉字拼音转换方法、装置、电子设备及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!