Lossless compression method and lossless decompression method based on weighted probability model

文档序号：571673 发布日期：2021-05-18 浏览：11次中文

阅读说明：本技术 一种基于加权概率模型的无损压缩方法和无损解压方法 (Lossless compression method and lossless decompression method based on weighted probability model ) 是由王杰林于 2020-12-28 设计创作，主要内容包括：本发明公开了一种基于加权概率模型的无损压缩方法和无损解压方法,本发明提供了一种基于加权概率模型的无损压缩方法,根据最大熵定理,均匀分布的二进制序列将无法再进行无损压缩,现有的熵编码均遵从该定理。本方法的核心是提供了一种等长无损降熵变换的过程,因为对均匀分布的二进制序列进行了等长无损降熵处理,所以使得均匀分布的二进制序列可以进行无损压缩。压缩率与自定义的权系数有关。显然本方法的压缩率优于现有的熵编码算法。而且由于本方法是以比特为单位的编译码过程,无需大量的硬件缓存和编码,且可以分段并行处理,所以运算的硬件资源少。本发明还基于本无损压缩方法提供了一种基于加权概率模型的无损压缩方法,实现了本无损压缩方法的逆过程。(The invention discloses a lossless compression method and a lossless decompression method based on a weighted probability model, and provides the lossless compression method based on the weighted probability model. The method has the core that a process of equal-length lossless entropy reduction transformation is provided, and because equal-length lossless entropy reduction processing is carried out on the uniformly distributed binary sequences, the uniformly distributed binary sequences can be subjected to lossless compression. The compression ratio is related to the customized weight coefficient. Obviously, the compression rate of the method is superior to that of the existing entropy coding algorithm. In addition, the method is a coding and decoding process with bits as units, does not need a large amount of hardware cache and coding, and can perform segmented parallel processing, so that the operation hardware resources are less. The invention also provides a lossless compression method based on the weighted probability model based on the lossless compression method, and the inverse process of the lossless compression method is realized.)

1. A lossless compression method based on a weighted probability model is characterized by comprising the following steps:

the method comprises the steps of converting an isometric lossless entropy reduction rate of a binary sequence X with a sequence length of n and uniformly distributed symbols into a binary sequence Y with the sequence length of n, wherein a first weight coefficient r used in the isometric lossless entropy reduction process₁The value range is as follows: r is₁∈[0.5，1.0)；

Lossless compressing the binary sequence Y into a binary sequence Z through a weighted probability model, wherein a second weight coefficient r used by the weighted probability model₂The values of (A) are as follows: r is₂＝1。

2. The weighted probability model-based lossless compression method according to claim 1, wherein the equal-length lossless entropy reduction process includes:

s101, setting initial parameters: r₀＝1，L₀0, i-1, j-0 and said r₁Said r₁Taking any value in the interval [0.5, 1.0);

s102, according to the coding formulaL_i＝L_i-1+R_i-1F(x_i-1，r₁) And H_i＝L_i+R_iCalculating the interval superscript value of the ith symbol 0 in the binary sequence X Wherein said x_iRepresents the ith symbol in the binary sequence X, the R_i，L_i，H_iIn order to encode the parameters of the audio signal,the p (x)_i) Denotes the ith symbol x_iOf said mass probability function, saidDenotes the ith symbol x_iA weighted mass probability function of (a);

s103, ifThe output symbol 0, j equals j + 1; if it isOutputting a symbol 1;

s104, if j is equal to or less than n, jumping to the step S102; if j is more than n, obtaining the binary sequence Y.

3. The weighted probability model-based lossless compression method according to claim 2, wherein the binary sequence Y is lossless compressed into the binary sequence Z by the weighted probability model, comprising the steps of:

s201, setting initial parameters: r₀＝1，L₀0, i-1, j-0 and said r₂＝1；

S202, coding the ith symbol in the binary sequence Y, and if the ith symbol is a symbol 0, entering the step S203; if the ith symbol in the binary sequence Y is symbol 1, jumping to step S204;

s203, according to the coding formulaAnd L_i＝L_i-1+R_i-1F(y_i-1，r₂) Calculation of R_iAnd L_iI +1, go to step S205; wherein said y_iRepresents the ith symbol in the binary sequence Y;

s204, according to the coding formulaAnd L_i＝L_i-1+R_i-1F(y_i-1，r₂) Calculation of R_iAnd L_i；

S205, if i is less than or equal to n, jumping to S202; if i is more than n, obtaining the binary sequence Z.

4. A lossless decompression method based on a weighted probability model is applied to the lossless compression method based on the weighted probability model in claim 1, and comprises the following steps:

lossless decompressing the binary sequence Z into the binary sequence Y through a weighted probability model;

and converting the equivalent-length lossless entropy increase of the binary sequence Y into the binary sequence X.

5. The weighted probability model-based lossless decompression method according to claim 4, wherein the transforming the binary sequence Y into the binary sequence X without loss of entropy comprises:

s401, setting initial parameters: r₀＝1，L₀0, i-1, j-0 and said r₁；

S402, coding the ith symbol in the binary sequence Y, and if the ith symbol is a symbol 0, entering the step S403; if the ith symbol in the binary sequence Y is symbol 1, jumping to step S404;

s403, according to the coding formulaAnd L_i＝L_i-1+R_i-1F(y_i-1，r₁) Calculation of R_iAnd L_iI +1, jumping to step S405;

s404, according to the coding formulaAnd L_i＝L_i-1+R_i-1F(y_i-1，r₁) Calculation of R_iAnd L_i；

S405, if i is not more than n, jumping to the step S402; if i is more than n, obtaining the binary sequence X.

6. The weighted probability model-based lossless decompression method according to claim 5, wherein the binary sequence Z is decompressed losslessly into the binary sequence Y by a weighted probability model, comprising the steps of:

s301, setting initial parameters: r₀＝1，L₀0, i-1, j-0 and said r₂；

S302, according to the coding formulaL_i＝L_i-1+R_i-1F(z_i-1，r₂) And H_i＝L_i+R_iCalculating the interval superscript value of the ith symbol 0 in the binary sequence Z Wherein said z is_iRepresents the ith symbol in the binary sequence Z;

s303, ifThe output symbol 0, j equals j + 1; if it isOutputting a symbol 1;

s304, i ═ i + 1; if j is less than or equal to the sequence length of the binary sequence Z, jumping to step S302; and if j is larger than the sequence length of the binary sequence Z, obtaining the binary sequence Y.

7. An encoding device, characterized by comprising: at least one control processor and a memory for communicative connection with the at least one control processor; the memory stores instructions executable by the at least one control processor to enable the at least one control processor to perform the weighted probability model based lossless compression method of any one of claims 1 to 3 and/or the weighted probability model based lossless decompression method of any one of claims 4 to 6.

8. A computer-readable storage medium storing computer-executable instructions for causing a computer to perform the weighted probability model based lossless compression method of any one of claims 1 to 3 and/or the weighted probability model based lossless decompression method of any one of claims 4 to 6.

Technical Field

The invention relates to the technical field of communication coding, in particular to a lossless compression method and a lossless decompression method based on a weighted probability model.

Background

In the big data era, the rapid increase of data volume brings huge pressure to network transmission and storage. In order to solve this problem, on one hand, hardware needs to be upgraded, and on the other hand, lossless encoding algorithms with higher compression rates are mined. Common lossless compression methods include dictionary coding, run-length coding, arithmetic coding, etc., which are collectively called entropy coding, but the current entropy coding has the following defects:

1) according to the maximum entropy theorem, the uniformly distributed binary sequences cannot be subjected to lossless compression, and the existing entropy coding follows the theorem; 2) the compression ratio is low and the demand on hardware resources is high.

Disclosure of Invention

The present invention is directed to solving at least one of the problems of the prior art. Therefore, the invention provides a lossless compression method and a lossless decompression method based on a weighted probability model. Carrying out equal-length lossless entropy reduction treatment on the uniformly distributed binary sequences so that the uniformly distributed binary sequences can be subjected to lossless compression; the compression ratio is improved, the hardware resource requirement is less, and the compression ratio is determined by the weight coefficient used by the equal-length lossless entropy reduction.

In a first aspect of the present invention, a lossless compression method based on a weighted probability model is provided, which includes the following steps:

In a second aspect of the present invention, a weighted probability model-based lossless decompression method is provided, which is applied to the weighted probability model-based lossless compression method in the first aspect of the present invention, and includes the following steps:

lossless decompressing the binary sequence Z into the binary sequence Y through a weighted probability model;

and converting the equivalent-length lossless entropy increase of the binary sequence Y into the binary sequence X.

In a third aspect of the present invention, there is provided an encoding device comprising: at least one control processor and a memory for communicative connection with the at least one control processor; the memory stores instructions executable by the at least one control processor to enable the at least one control processor to perform a weighted probability model based lossless compression method according to the first aspect of the invention and/or a weighted probability model based lossless decompression method according to the second aspect of the invention.

In a fourth aspect of the present invention, there is provided a computer-readable storage medium, wherein the computer-readable storage medium stores computer-executable instructions for causing a computer to perform the weighted probability model based lossless compression method according to the first aspect of the present invention and/or the weighted probability model based lossless decompression method according to the second aspect of the present invention.

According to the embodiment of the invention, at least the following beneficial effects are achieved:

the invention provides a lossless compression method based on a weighted probability model, according to the maximum entropy theorem, uniformly distributed binary sequences cannot be subjected to lossless compression any more, and the existing entropy coding conforms to the theorem. The method has the core that a process of equal-length lossless entropy reduction transformation is provided, and because equal-length lossless entropy reduction processing is carried out on the uniformly distributed binary sequences, the uniformly distributed binary sequences can be subjected to lossless compression. The compression ratio is related to the customized weight coefficient. Obviously, the compression rate of the method is superior to that of the existing entropy coding algorithm. In addition, the method is a coding and decoding process with bits as units, does not need a large amount of hardware cache and coding, and can perform segmented parallel processing, so that the operation hardware resources are less. The invention also provides a lossless compression method based on the weighted probability model based on the lossless compression method, and the inverse process of the lossless compression method is realized.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 shows x when n is 1 according to an embodiment of the present invention₁A schematic diagram of F (X, r) for 0, 1.

Fig. 2 shows that when n is 2 and x is known in the embodiment of the present invention₁When x₂A schematic diagram of F (X, r) for 0, 1.

FIG. 3 is a schematic flow chart of a lossless compression method based on a weighted probability model according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart of a lossless decompression method based on a weighted probability model according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an encoding apparatus according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

To facilitate understanding of the technical solutions of the present invention by those skilled in the art, before introducing the embodiments of the present application, explanation will be made on related concepts involved in the present application:

firstly, weighting probability and a weighting probability model;

let X be { X ═ X₁，x₂，...，x_nIs a random process of finite or several possible values. Unless specifically stated, this set of possible values for the random process will be referred to as the set of nonnegative integers, a ═ 0, 1, 2_iE.a (i ═ 1, 2.., n). There is then a probability space for all values in a:

where x ∈ A. Since the random process must be transferred to a value in set a, at any time i there is:

wherein p is more than or equal to 0 and (x) is more than or equal to 1. Thus, the cumulative distribution function f (a) at any time i can be expressed by p (x) as:

F(a)＝∑_x≤a p(x) (3)

wherein F is more than or equal to 0 and less than or equal to 1, and a belongs to A. A set of examples is provided herein: if the discrete random variable X has a probability mass function p (0) ═ 0.5, p (1) ═ 0.3, and p (2) ═ 0.2, then F (0) ═ p (0) ═ 0.5, F (1) ═ p (0) + p (1) ═ 0.8, and F (2) ═ p (0) + p (1) + p (2) ═ 1.0.

Let x_iIs a Bernoulli random variable, then x_i∈{0，1}，x_iP (0) ═ P (x) of the probability mass function of (1)_i＝0)＝1-p，p(1)＝P(x_i1 ≦ p, where 0 ≦ p ≦ 1, and i ≦ 1, 2. Random Process X Presence 2ⁿAnd each X is a binary sequence of length n. Obviously, 2ⁿIn this possibility, there are distinct morphological features (regularity) in the partial binary sequence. For example: some binary sequences satisfy "the number of consecutive symbols 1 in the sequence is at most t". Also for example, some binary sequences satisfy the expression "the number of consecutive symbols 1 in the sequence is at most t, and the number of consecutive symbols 0 in the sequence is at most s". Since morphological features are known, t and s are positive integers of known numerosity. Wherein the bernoulli random variable is: assuming an experiment, the success and failure of the experiment result are represented by X ═ 1 and X ═ 0 respectively; the probability mass function of X is P (0) ═ P (X ═ 0) ═ 1-P, P (1) ═ P (X ═ 1) ═ P, wherein P (0 ≦ P ≦ 1) is the probability of success of the test result; the random variable X is referred to as a bernoulli random variable.

Therefore, when the random process X exhibits some known morphological characteristics, it has the following applications in the field of information coding technology:

(1) data compression; since a symbol 0 is inevitably generated after t consecutive symbols 1 in the binary sequence, the morphological feature of the binary sequence is known information, and lossless coding can improve the compression effect by removing the symbol 0. That is, the binary sequences with different morphological characteristics adopt different methods to remove redundant information.

(2) Data verification; if the decoded binary sequence does not satisfy the condition that the number of the continuous symbols 1 in the sequence is at most t, the decoding is in error, and the binary sequence can be used for data verification. Different morphological characteristics can construct channel coding methods with different code rates.

(3) Digital watermarking; any binary sequence is modified to meet a certain morphological characteristic. And then lossless encoding is performed, thereby constructing a digital watermark encoding method.

Obviously, in such a binary sequence (i.e. a random process), the current symbol state is related to a limited number of previous adjacent symbol states.

Taking as an example that the binary sequence satisfies morphological characteristics "the number of consecutive symbols 1 in the sequence is at most 2", the sequence is composed of "0", "10", and "110". Based on Markov chain or conditional probability analysis, there are three probability mass functions for symbol 0, p (0|0), p (0|1), p (0|1, 1). There are two probability mass functions for symbol 1, p (1|0) and p (1|1), respectively. When coding, because the binary sequence is known as the source, the probability quality function used by each symbol can be accurately selected. However, the probability mass function cannot be accurately selected during decoding, for example, a "0" is already decoded (the probability mass function can be defined during decoding of the first symbol), and the probability mass function cannot be accurately selected due to the existence of three probability mass functions for the symbol 0. The probability mass function cannot be correctly selected with the same sign 1. If "01" is decoded, it is not feasible to predict the next symbol by the decoded result because there are two kinds of probability mass functions p (1|1) or p (0| 1). When "011" has been decoded, there is a unique choice of p (0|1, 1) since "011" is necessarily followed by the symbol 0.

When the coded data is tampered or transmitted in error, each symbol can be decoded in error, and the Markov chain or the conditional probability construction coding and decoding method cannot be adopted. To summarize the above reasons, based on the probability theorem, the method for constructing the codec for the binary sequence needs to satisfy three conditions:

(1) there is a unique known probability mass function p (x) for each symbol x decoded, where p (x) can be the only known probability mass function derived from the decoding (inverse process of encoding), for example, "011" is decoded, and since "011" is necessarily followed by symbol 0, there is a unique probability mass function p (x) ═ p (0|1, 1).

(2) There is a known variable r characterizing the morphology of the sequence, which may also be the value of a known function f (i) (1, 2., n), i.e., r (f) (i). The morphological characteristics of the sequences are different and the value of r should also be different.

(3) The coding time r should act on the probability mass function of the corresponding symbol at each position of the sequence.

Definable functionsTo construct coding methods, e.g.(or)，(or) And the probability mass function and r can be accurately selected in the whole coding and decoding process. The invention refers to r as the form characteristic coefficient of probability mass function, called weight coefficient for short. Due to the fact that(or)，So thatGenerating polynomials is inconvenient to reason about and analyze, so the following is based onDefining a weighted probability mass function and a weighted cumulative distribution function, and analyzing the two functions to obtain the mathematical property of r. Since p (x) and r are known, andx is different, soCan be simply marked as

Define 1, the weighted probability mass function is:

p (a) is a probability mass function of a, 0 ≦ p (a) ≦ 1, r is a weight coefficient, and r is a known positive real number. Obviously, the weighted probability sum of all symbols is:

definition 2, a weighted cumulative distribution function (weighted distribution function for short) is:

F(a，r)＝rF(a)＝r∑_x≤a p(x) (6)

the weighted distribution function for sequence X is denoted as F (X, r) according to definition 2. When n is 1, F (X, r) is:

F(X，r)＝rF(x₁)＝rF(x₁-1)+rp(x₁)

as shown in fig. 1, when n is 2, x is selected₁Correspond toInterval of [ F (x) ]₁-1，r)，F(x₁R)), due to F (x)₁，r)＝F(x₁-1，r)+rp(x₁) So that the interval length is Then, the interval [ F (x) ]₁-1，r)，F(x₁-1，r)+rp(x₁) Multiplying the length of the section by a weight coefficient r, and if r is less than 1, reducing the section; if r > 1 the interval is expanded; if r is 1, the time zone is unchanged. Then the interval becomes [ F (x) ]₁-1，r)，F(x₁-1，r)+r²p(x₁) Followed by mixing r)²p(x₁) Dividing the probability mass of each symbol into k +1 parts according to the formula (1), wherein the corresponding interval of the divided symbol 0 is [ F (x)₁-1，r)，F(x₁-1，r)+r²p(x₁) p (0)); the interval corresponding to symbol 1 is [ F (x)₁-1，r)+r²p(x₁)p(0)，F(x₁-1，r)+r²p(x₁) (p (0) + p (1))); the interval corresponding to symbol 2 is [ F (x) ]₁-1，r)+r²p(x₁)(p(0)+p(1))，F(x₁-1，r)+r²p(x₁) (p (0) + p (1) + p (2))), and so on, and F (x)₁-1，r)＝rF(x₁-1) obtaining:

F(X，r)＝rF(x₁-1)+r²F(x₂)p(x₁)＝rF(x₁-1)+r²F(x₂-1)p(x₁)+r²p(x₁)p(x₂)

at this time, the interval length is r²p(x₁)p(x₂). As shown in fig. 2:

and analogy, when n is 3:

F(X，r)＝rF(x₁-1)+r²F(x₂-1)p(x₁)+r³F(x₃)p(x₁)p(x₂)＝rF(x₁-1)+r²F(x₂-1)p(x₁)+r³F(x₃-1)p(x₁)p(x₂)+r³p(x₁)p(x₂)p(x₃)

thus, it is madeBy analogy, the following results are obtained:

the set of weighted distribution functions satisfying equation (7) is defined as a weighted probability model, referred to as a weighted model for short, and is denoted as { F (X, r) }. If X_iE.g., a is {0, 1}, then, F (X, r) } is referred to as a binary weighting model. Order:

H_n＝F(X，r) (8)

x is due to_iMust take the value in A, so p (x)_i) Is more than or equal to 0. It is apparent that the formulae (8) to (10) are in the interval sequence, L_i，H_iIs the variable X of the sequence X at the time i (i ═ 1, 2.., n)_iSubscript, R, on corresponding interval_i＝H_i-L_iIs the length of the interval. { [ L ]_n，H_n) And is the interval column defined on the weighted probability model. Equations (8) to (10) are expressed iteratively as:

L_i＝L_i-1+R_i-1F(x_i-1，r) (12)

H_i＝L_i+R_i (13)

obviously, r in equation (7) is a known real number, and equation (7) is referred to as a static weighting model. If r is equal to at time iThe value of the function ω is known_iI.e. omega_iF (i) is a known function, and the coefficient sequence is W ═ ω { (i) }₁，ω₂，...，ω_nThen (7) can be expressed as:

the set of weighted distribution functions satisfying equation (14) is referred to as a dynamic weighted model. When ω is₁＝ω₂＝…＝ω_nWhen r, F (X, W) is F (X, r). If omega₁＝ω₂＝…＝ω_nWhen r is 1, F (X, W) is F (X, 1) is F (X).

The iterative equation based on equation (15) is:

R_i＝R_i-1p(x_i) (16)

L_i＝L_i-1+R_i-1F(x_i-1) (17)

H_i＝L_i+R_i (18)

then, the weighting factor ω has theorem 1_i(i ═ 1, 2.., n) satisfies 0 < ω_iWhen the content is less than or equal to 1, this is true.

The proof process of theorem 1 is as follows:

∵0＜ω_i+1r is less than or equal to 1 and is obtained from the formula (11) to (13)_i+1＝R_iω_i+1p(x_i+1)；

∴0＜R_i+1≤R_ip(x_i+1)；

∵L_i+1＝L_i+R_iω_i+1F(x_i+1-1), wherein R_iω_i+1F(x_i+1-1)≥0；

∴L_i+1≥L_i；

∵H_i+1＝L_i+1+R_i+1And R is_i+1＞0；

∴L_i+1＜H_i+1；

∵H_i+1＝L_i+R_iω_i+1F(x_i+1-1)+R_ip(x_i+1)＜L_i+R_i(F(x_i+1-1)+p(x_i+1))；

∵F(x_i+1)＝F(x_i+1-1)+p(x_i+1) Due to F (x)_i+1)≤ω_i+1And ω is_i+1≤1；

∴H_i+1≤L_i+R_i＝H_i；

Theorem 2 can be obtained by the induction method according to theorem 1:

theorem 2, where i is 1, 2, n, if the weight coefficient ω is_iSatisfy 0 < omega_iLess than or equal to 1, comprising

Any i has ω_i1, then, the model is called as { F (X, W) } as a standard model; any i has a value of 0 < omega_i1 or less and omega is present_iIf the value is less than 1, the model is called as a contraction model (F (X, W)); any i has ω_iNot less than 1 and omega_iIf > 1, the expansion model is called as { F (X, W) }.

Secondly, weighting the information entropy of the probability model;

let the discrete memoryless information source sequence X ═ X₁，x₂，...，x_n)(x_iE.g., a ═ {0, 1, 2,. said, s }), when the weight coefficient r is 1, e.g., a, mTherefore, it is not only easy to useDefined by shannon information entropy, the entropy of X is (base on logarithm s + 1):

when r ≠ 1, the definition has a probabilityRandom variable x of_iThe self information quantity is as follows:

set { x_iThe expression "a" refers to a (i: 1, 2.., n; a ∈ A) wherein c is present_aA. When r is known, the total information content of the source sequence X is:

the average amount of information per symbol is then:

define 3, let H (X, r) be the weighted model information entropy (unit: bits/symbols):

then, there is theorem 3, and the discrete memoryless source sequence X ═ X₁，x₂，...，x_n)(X_iE.g., a {0, 1, 2,.., s }, i 1, 2,.., n) is distortion-free encoded by a weighted probability model, with a minimum limit of H (X, r)_max)(r_maxThe largest weight coefficient).

The proof process of theorem 3 is as follows:

any r > r_maxWhen L is_n∈[L_n，H_n)∧L_n∈[L_n-1，H_n-1)∧…∧L_n∈[L_i，H_i) If not, the sequence X cannot be reduced. When r is more than 0 and less than or equal to 1, -log r is more than or equal to 0, and H (X, r) is more than or equal to H (X); when r is more than 1 and less than or equal to r_maxWhen-log r < 0, there is H (X, r) < H (X), it is clear that the minimum limit is H (X, r) — log r_max+H(X)。

Theorem 3 gives the information entropy of the static weighting model. In the dynamic weighting model, when the coefficient sequence W is ═ ω₁，ω₂，...，ω_nWhen known, according to the independent discrete random sequence X, the weighted probability is:

according to the logarithm algorithm, the following can be obtained:

due to the set { x_iThe expression "a" refers to a (i: 1, 2.., n; a ∈ A) wherein c is present_aA, so:

obviously, equation (26) can be transformed into:

then, averaging equation (28) to each symbol, then there is:

due to the fact thatTherefore, it is not only easy to useThus:

order to

H (X, W) ═ logr-H (X) is available. When r is less than or equal to r_maxWhen L is_n∈[L_n，H_n)∧L_n∈[L_n-1，H_n-1)∧…∧L_n∈[L_i，H_i) This is true.

Embodiments of the invention;

according to the existing theory as above: let an arbitrary binary sequence X of length n, where the probabilities of symbol 0 and symbol 1 are p (0) and p (1). According to the theory of information, the entropy of information h (X) ═ p (0) log of sequence X₂ p(0)-p(1)log₂p (1). Assuming that the result after the weighted probability model coding is n, there is:

-n log₂ r+nH(X)＝n (32)

the method is simplified and can be obtained:

r＝2^H(x)-1 (33)

since H is more than or equal to 0 and less than or equal to 1 (X), r is more than or equal to 0.5 and less than or equal to 1.0. According to the above theorem 2, because And [ L_i，H_i) Unique and symbol x_iCorrespondingly, so lossless coding is possible. Take fig. 1 and 2 as an example, becauseSo x is decoded₁. Decoding x in the same way₂. Obviously, the weighted probability model coding algorithm is lossless. The weighted probability model can then transform the binary sequence X losslessly into a binary sequence Y of equal length by equation (32).

Then, it is concluded that: the symbols in the binary sequence Y are uniformly distributed, i.e., p (0) ═ p (1) ═ 0.5.

The proof procedure for conclusion one is as follows: according to the theory of information, the entropy of information of sequence Y is h (Y) -p (0) log₂ p(0)-p(1)log₂p (1). Since the sequence Y is the result of lossless coding using the weighted probability model, H (Y) is equal to H (X, r). Since the sequences X and Y are of equal length, there is nH (Y). Since the formula (32) is satisfied if and only if h (y) is 1, p (0) may be equal to p (1) or equal to 0.5 from h (y). The probability of symbol 0 and symbol 1 in sequence Y is then equal, i.e. the symbols are evenly distributed.

Then, according to the conclusion one, the arbitrary binary sequence X can be given a weighting coefficient by equation (33), and lossless coded into the completely random binary sequence Y by the weighted probability model, and the binary sequence Y can also be lossless restored to the sequence X. Since the sequence X is known, H (X) is determined, and r is known, namely r corresponds to the entropy of the sequence X. Because H is more than or equal to 0 and (X) is more than or equal to 1, r is equal to [0.5, 1.0 ].

If only the binary sequence X with the length of n is known (the symbols 0 and 1 in the sequence X are uniformly distributed), setting r ∈ [0.5, 1.0 ] to be decoded into the sequence Y with the length of n through a weighting model. It is obvious that different r can result in different entropy of the sequence Y.

And a second conclusion: given that symbols in the binary sequence X with length n are uniformly distributed, and any given weight coefficient r ∈ [0.5, 1.0) is decoded by a weighted probability model to obtain a binary sequence Y, the sequence Y must satisfy:

(1) h (y) < h (x) ═ 1; (2) the length of sequence Y is n.

The proof process of conclusion two is as follows:

when the binary sequence X is known, the weighting coefficients r correspond one-to-one to the sequence Y according to conclusion one. Since r is known, it can be obtained from formula (33):

H(Y)＝1+log₂ r (34)

if r is 1.0, since the completely random binary sequence X has h (X) 1, 1 is 2 from formula (33)^H(Y)-1H (y) is 1, so h (y) is h (x). The weighted probability model coding process is an isentropic transformation process, so r is less than 1.0. If r is more than or equal to 0.5 and less than 1.0 and-1 and less than or equal to log₂r is less than 0 and is substituted for formula (34), and H (Y) is more than or equal to 0 and less than 1. Then h (y) < h (x) ═ 1, the weighted probability model decoding process is a lossless entropy-reduction transform process. Since the weighted probability model is a lossless encoding and decoding process, the length of the decoded sequence Y is n.

According to the conclusion two, the symbols in any binary sequence X satisfy the uniform distribution, and can be considered as the result of coding the sequence Y based on the weighted probability model with the known weight coefficient r. When the sequence X is decoded by a weighted probability model with r being more than or equal to 0.5 and less than 1.0, a binary sequence Y is obtained, namely H being more than or equal to 0 and (Y) being less than 1. And then, the random sequence Y is weighted and coded by setting r to be 1.0 to obtain a sequence Z. Since H (Y) < 1, the length of sequence Z must be less than n according to Shannon distortion theorem (as can be demonstrated by Shannon distortion theorem).

According to the first and second conclusions, referring to fig. 3, a first embodiment is provided, in which a weighted probability model-based lossless compression method includes the following steps:

s100, converting the equal-length lossless entropy reduction of a binary sequence X with the sequence length of n and uniformly distributed symbols into a binary sequence Y with the sequence length of n, wherein a first weight coefficient r used in the equal-length lossless entropy reduction process₁The value range is as follows: r is₁∈[0.5，1.0)。

The specific implementation of step S100 is:

s101, setting initial parameters: r₀＝1，L₀0, i-1, j-0 and r₁，r₁The interval [0 ] is taken.5, 1.0);

parameter r in step S101₁Can be arbitrarily set in the interval of [0.5, 1.0). Parameter r₁The choice of (c) determines the compressibility of the method.

S102, according to the coding formulaL_i＝L_i-1+R_i-1F(x_i-1，r₁) And H_i＝L_i+R_iCalculating the interval superscript value of the ith symbol 0 in the binary sequence X Wherein x_iRepresenting the ith symbol, R, in a binary sequence X_i，L_i，H_iIn order to encode the parameters of the audio signal,p(x_i) Denotes the ith symbol x_iThe quality probability function of (a) is,denotes the ith symbol x_iA weighted mass probability function of (a);

s103, judging Y andsize of (1), ifThe output symbol 0, j equals j + 1; if it isOutputting a symbol 1;

s104, if j is equal to or less than n, jumping to the step S102; if j is more than n, obtaining a binary sequence Y;

s200, passingThe weighted probability model losslessly compresses the binary sequence Y into the binary sequence Z, wherein the weighted probability model uses a second weight coefficient r₂The values of (A) are as follows: r is₂＝1。

The specific implementation of step S200 is:

s201, setting initial parameters: r₀＝1，L₀0, i-1, j-0, V-0 and r₂＝1；

For convenience of calculation and description, a parameter V is added in step S201, the initial value of the parameter V is 0, and the parameter V is used for characterizing the weighted model encoded L_iThe value of (c).

S202, coding the ith symbol in the binary sequence Y, and if the ith symbol is a symbol 0, entering the step S203; if the ith symbol in the binary sequence Y is symbol 1, jumping to step S204;

s203, according to the coding formula R_i＝R_i-1p(x_i)、L_i＝L_i-1+R_i-1F(x_i-1，r₂) Calculation of R_iAnd L_iValue of (A), R_i＝R_i-1p (0), F (-1) ═ 0, then L_i＝L_i-1(ii) a i is i + 1; skipping to step S205;

s204, according to the coding formula R_i＝R_i-1p(x_i) And L_i＝L_i-1+R_i-1F(x_i-1，r₂) Calculation of R_iAnd L_iValue of (A), R_i＝R_i-1p (0), since F (0) is p (0), L_i＝L_i-1+R_i-1p(0)；

S205, if i is less than or equal to n, jumping to S202; if i > n, V ═ L_nAnd ending the coding, and outputting V, namely the binary sequence Z.

In the lossless compression method based on the weighted probability model provided in this embodiment, first, the sequence X is a sequence Y whose length is equal based on the above conclusion two lossless entropy reductions, and then the sequence Y passes through the weighted probability model (r)₂1) lossless compression into a sequence Z.

The lossless compression method based on the weighted probability model provided by the embodiment has the beneficial effects that:

according to the maximum entropy theorem, the uniformly distributed binary sequences cannot be subjected to lossless compression any more, and the existing entropy coding complies with the theorem. The core of the embodiment of the method is to provide a process of equal-length lossless entropy reduction transformation, and because equal-length lossless entropy reduction processing (corresponding to the obtained binary sequence Y) is performed on the uniformly distributed binary sequence (corresponding to the binary sequence X in the above step), the uniformly distributed binary sequence can be subjected to lossless compression. The compression ratio is given by equation (34) and is related to the custom weight factor. Obviously, the compression rate of the method is superior to that of the existing entropy coding algorithm. In addition, the method is a coding and decoding process with bits as units, does not need a large amount of hardware cache and coding, and can perform segmented parallel processing, so that the operation hardware resources are less. The applicable scenes of the method mainly comprise: the compressed image, video and file are compressed again, and the compression ratio can pass through a weight coefficient (corresponding to r)₁) Self-defining

Referring to fig. 4, based on the first embodiment and the second embodiment of the present invention, there is further provided a weighted probability model-based lossless decompression method, where it is to be noted that the method is an inverse process of the weighted probability model-based lossless compression method provided in the first embodiment, both are based on the same inventive concept, and the weighted probability model-based lossless decompression method includes the following steps:

s300, lossless decompression is carried out on the binary sequence Z into the binary sequence Y through a weighted probability model.

The specific implementation manner of step S300 is:

s301, setting initial parameters: r₀＝1，L₀＝0，i＝1，j＝0，V；

The parameters V in step S301 are known, and correspond to V in step S205 of the first embodiment. The length of the binary sequence Z is known.

S302, according to the coding formulaL_i＝L_i-1+R_i-1F(z_i-1，r₂) And H_i＝L_i+R_iCalculating the interval superscript value of the ith symbol 0 in the binary sequence Z

S303, judging V andsize of (1), ifThe output symbol 0, j equals j + 1; if it isOutputting a symbol 1;

S400, converting the binary sequence Y into the binary sequence X in a lossless entropy-increasing mode in the same length.

The specific implementation of step S400 is:

s401, setting initial parameters: r₀＝1，L₀＝0，i＝1，j＝0，V₁0 and r₁；

Parameter r of step S401₁For the known parameter r₁Corresponding to r in step S101 in the first embodiment₁，The sequence length of the binary sequence Y is known as n. For convenience of calculation and description, a parameter V is added in step S401₁Parameter V₁Is 0, parameter V₁For characterizingWeighted model encoded L_iThe value of (c).

S402, coding the ith symbol in the binary sequence Y, and if the ith symbol is a symbol 0, entering the step S403; if the ith symbol in the binary sequence Y is symbol 1, jumping to step S404;

s403, according to the coding formulaAnd L_i＝L_i-1+R_i-1F(y_i-1，r₁) Calculation of R_iAnd L_iThe value of (a) is,f (-1) ═ 0, then L_i＝L_i-1(ii) a i is i + 1; skipping to step S405;

s404, according to the coding formulaAnd L_i＝L_i-1+R_i-1F(y_i-1，r₁) Calculation of R_iAnd L_iThe value of (a) is,due to the fact thatThe flow advances to step S405;

s405, if i is not more than n, jumping to the step S402; if i > n, V₁＝L_n(V₁X), the encoding is ended, and the entropy-increasing transform is completed. V₁I.e. a binary sequence X.

The lossless decompression method based on the weighted probability model provided by the embodiment is the inverse process of the method of the first embodiment, and mainly comprises the following steps: firstly, decoding a sequence Y from a sequence Z; sequence Y is then transformed to sequence X by a lossless entropy-increasing transform based on the above conclusion.

The method of the present embodiment is the reverse process of the method of the first embodiment, and the method of the present embodiment and the method of the first embodiment are the same inventive concept, and the beneficial effects thereof are not described herein again.

Referring to fig. 5, a third embodiment of the present invention provides an encoding device, which may be any type of smart terminal, such as a mobile phone, a tablet computer, a personal computer, etc. Specifically, the encoding device includes: one or more control processors and memory, here exemplified by a control processor. The control processor and the memory may be connected by a bus or other means, here exemplified by a connection via a bus.

The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the encoding devices in the embodiments of the present invention. The control processor implements the weighted probability model based lossless compression method described in the first embodiment above and/or the weighted probability model based lossless decompression method described in the second embodiment above by executing the non-transitory software programs, instructions, and modules stored in the memory.

The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the control processor, and these remote memories may be connected to the encoding-based device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The one or more modules are stored in the memory and, when executed by the one or more control processors, perform the weighted probability model based lossless compression method of the first embodiment described above and/or the weighted probability model based lossless decompression method of the second embodiment described above.

In a fourth embodiment of the present invention, a computer-readable storage medium is provided, which stores computer-executable instructions for one or more control processors to perform the weighted probability model-based lossless compression method according to the first embodiment and/or the weighted probability model-based lossless decompression method according to the second embodiment.

Through the above description of the embodiments, those skilled in the art can clearly understand that the embodiments can be implemented by software plus a general hardware platform. Those skilled in the art will appreciate that all or part of the processes in the methods for implementing the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes in the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an illustrative embodiment," "an example," "a specific example," or "some examples" or the like mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

16页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种实现信号高误码率下的编码识别方法

Lossless compression method and lossless decompression method based on weighted probability model

相关技术

网友询问留言