In-vehicle interactive audio encryption method, device and equipment

文档序号:1171441 发布日期:2020-09-18 浏览:28次 中文

阅读说明:本技术 车内交互音频加密方法、装置及设备 (In-vehicle interactive audio encryption method, device and equipment ) 是由 张宏斌 张启 李深安 于 2020-06-01 设计创作,主要内容包括:本发明公开了一种车内交互音频加密方法、装置及设备。本发明摒弃固定信息加密思路,基于车内音频交互所涉及的多种动态信息并采取多环节联合应用的思路,提出了一种具有很高安全性、可靠性的交互音频加密方案。具体是从用户的交互音频以及交互场景中至少提取到两种不同维度的动态信息,并将多种动态信息相互融合用于生成水印,而又以与交互音频内容相关的转写文本作为另一种动态信息,用于对音频进行相位调制,除此之外,还考虑综合上述各种动态信息生成相应秘钥,并以独立的传输通道分别向外发送秘钥和加密音频,这样,基于上述种种安全措施的联合应用,可以有效保护车内用户的个人信息乃至隐私,与现有方式相比,大大增加了非授权破解的难度。(The invention discloses an in-vehicle interactive audio encryption method, device and equipment. The invention abandons the concept of fixed information encryption, and provides an interactive audio encryption scheme with high safety and reliability based on various dynamic information related to audio interaction in the vehicle and by adopting the concept of multi-link combined application. Specifically, at least two kinds of dynamic information with different dimensions are extracted from interactive audio and an interactive scene of a user, a plurality of kinds of dynamic information are mutually fused to generate a watermark, a transcription text related to interactive audio content is used as another kind of dynamic information for carrying out phase modulation on the audio, besides, the various kinds of dynamic information are considered to be synthesized to generate a corresponding secret key, and the secret key and the encrypted audio are respectively sent out through independent transmission channels, so that personal information and privacy of the user in the vehicle can be effectively protected based on the combined application of various safety measures, and compared with the existing mode, the difficulty of unauthorized cracking is greatly increased.)

1. An in-vehicle interactive audio encryption method is characterized by comprising the following steps:

acquiring first dynamic information and second dynamic information from an interactive audio and an interactive scene of a user;

fusing the first dynamic information and the second dynamic information to generate a watermark sequence;

generating a phase modulation sequence based on the transcription text of the interactive audio;

generating a secret key based on the first dynamic information, the second dynamic information and the transcribed text;

performing phase modulation on the interactive audio according to the phase modulation sequence;

and embedding the watermark generated based on the watermark sequence into the modulated interactive audio to obtain the encrypted interactive audio.

2. The in-vehicle interactive audio encryption method according to claim 1, wherein the first dynamic information is user personal information; the second dynamic information is vehicle information.

3. The in-vehicle interactive audio encryption method according to claim 2, wherein the user personal information includes one or more of: voiceprint information, age and gender; the vehicle information includes: the position, the vehicle speed, the tire pressure, the oil temperature, the battery power, the setting parameters of the user on the equipment in the vehicle, the windowing state, the tire pressure and the number of passengers.

4. The in-vehicle interactive audio encryption method according to claim 2, wherein the fusing the first dynamic information and the second dynamic information to generate a watermark sequence comprises:

splicing the feature vector of the user personal information and the feature vector of the vehicle information to obtain a feature fusion sequence;

respectively taking each dimension of the feature fusion sequence as an initial condition to generate a first random matrix approximately obeying Gaussian distribution;

and acquiring the watermark sequence from the first random matrix according to a preset strategy.

5. The in-vehicle interactive audio encryption method according to claim 1, wherein the generating a phase modulation sequence based on the transcribed text of the interactive audio comprises:

representing the transcribed text encoding as a sequence of numbers;

respectively taking each number in the number sequence as an initial condition, and generating a second random matrix approximately obeying Gaussian distribution;

and acquiring the phase modulation sequence from the second random matrix according to a preset strategy.

6. The in-vehicle interactive audio encryption method according to claim 1, wherein the generating a key based on the first dynamic information, the second dynamic information, and the transcribed text comprises:

fusing the first dynamic information and the second dynamic information, and generating a first single-frequency signal according to a preset strategy;

generating a second single-frequency signal according to a preset strategy and the transcribed text;

and fusing the first single-frequency signal and the second single-frequency signal to obtain the secret key.

7. The in-vehicle interactive audio encryption method according to any one of claims 1 to 6, further comprising: and respectively configuring the secret key and the encrypted interactive audio in different transmission channels.

8. An in-vehicle interactive audio encryption device, comprising:

the dynamic information acquisition module is used for acquiring first dynamic information and second dynamic information from an interactive audio frequency and an interactive scene of a user;

the watermark generating module is used for fusing the first dynamic information and the second dynamic information to generate a watermark sequence;

the phase modulation sequence generation module is used for generating a phase modulation sequence based on the transcription text of the interactive audio;

a key generation module, configured to generate a key based on the first dynamic information, the second dynamic information, and the transcribed text;

the phase modulation module is used for carrying out phase modulation on the interactive audio according to the phase modulation sequence;

and the watermark adding module is used for embedding the watermark generated based on the watermark sequence into the modulated interactive audio to obtain the encrypted interactive audio.

9. The in-vehicle interactive audio encryption device according to claim 8, further comprising:

and the encryption information transmission configuration module is used for respectively configuring the secret key and the encrypted interactive audio in different transmission channels.

10. An in-vehicle interactive audio encryption device, comprising:

one or more processors, memory, and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions which, when executed by the apparatus, cause the apparatus to perform the in-vehicle interactive audio encryption method of any of claims 1-7.

Technical Field

The invention relates to the field of vehicle networking, in particular to an in-vehicle interactive audio encryption method, device and equipment.

Background

The effects of existing sophisticated audio processing techniques, such as but not limited to echo cancellation, single-microphone noise reduction, microphone array noise reduction, and other front-end speech enhancement techniques, are greatly enhanced. Meanwhile, with the development and popularization of the internet of vehicles, the interactive audio of the user in the vehicle can be acquired by the external device through various ways and is subjected to corresponding application processing, such as semantic understanding, business query, data backup, matching retrieval, remote service request and the like, and in the process of performing the operations, the vehicle usually transmits the man-machine interaction information to the cloud background after being processed by the local front end. However, related laws and regulations and industrial specifications are not perfect, and a transmission action of human-computer interaction information generally does not need user authorization, so that the event that the information of the user in the vehicle and even the privacy are stolen is very easy to occur, and therefore, from the information security perspective, the information and the privacy of the user in the vehicle need to be protected in a front end manner.

The existing encryption protection method is that after the front-end noise reduction processing is carried out on the interactive audio, a single-frequency signal coded by a fixed code word sequence is added in the high-frequency section of the audio, and the design core of the method is based on fixed information, so that the encrypted content is easy to crack.

Disclosure of Invention

In view of the foregoing, the present invention is directed to provide an in-vehicle interactive audio encryption method, apparatus and device, and accordingly provides a computer-readable storage medium and a computer program product, which can effectively overcome the existing disadvantage that an in-vehicle interactive voice is encrypted by means of fixed information by means of a combined encryption manner of multiple dynamic information.

The technical scheme adopted by the invention is as follows:

in a first aspect, the present invention provides an in-vehicle interactive audio encryption method, including:

acquiring first dynamic information and second dynamic information from an interactive audio and an interactive scene of a user;

fusing the first dynamic information and the second dynamic information to generate a watermark sequence;

generating a phase modulation sequence based on the transcription text of the interactive audio;

generating a secret key based on the first dynamic information, the second dynamic information and the transcribed text;

performing phase modulation on the interactive audio according to the phase modulation sequence;

and embedding the watermark generated based on the watermark sequence into the modulated interactive audio to obtain the encrypted interactive audio.

In at least one possible implementation manner, the first dynamic information is user personal information; the second dynamic information is vehicle information.

In at least one possible implementation manner, the user personal information includes one or more of the following: voiceprint information, age and gender; the vehicle information includes: the position, the vehicle speed, the tire pressure, the oil temperature, the battery power, the setting parameters of the user on the equipment in the vehicle, the windowing state, the tire pressure and the number of passengers.

In at least one possible implementation manner, the fusing the first dynamic information and the second dynamic information to generate a watermark sequence includes:

splicing the feature vector of the user personal information and the feature vector of the vehicle information to obtain a feature fusion sequence;

respectively taking each dimension of the feature fusion sequence as an initial condition to generate a first random matrix approximately obeying Gaussian distribution;

and acquiring the watermark sequence from the first random matrix according to a preset strategy.

In at least one possible implementation manner, the generating a phase modulation sequence based on the transcribed text of the interactive audio includes:

representing the transcribed text encoding as a sequence of numbers;

respectively taking each number in the number sequence as an initial condition, and generating a second random matrix approximately obeying Gaussian distribution;

and acquiring the phase modulation sequence from the second random matrix according to a preset strategy.

In at least one possible implementation manner, the generating a key based on the first dynamic information, the second dynamic information, and the transcribed text includes:

fusing the first dynamic information and the second dynamic information, and generating a first single-frequency signal according to a preset strategy;

generating a second single-frequency signal according to a preset strategy and the transcribed text;

and fusing the first single-frequency signal and the second single-frequency signal to obtain the secret key.

In at least one possible implementation manner, the method further includes: and respectively configuring the secret key and the encrypted interactive audio in different transmission channels.

In a second aspect, the present invention provides an in-vehicle interactive audio encryption apparatus, including:

the dynamic information acquisition module is used for acquiring first dynamic information and second dynamic information from an interactive audio frequency and an interactive scene of a user;

the watermark generating module is used for fusing the first dynamic information and the second dynamic information to generate a watermark sequence;

the phase modulation sequence generation module is used for generating a phase modulation sequence based on the transcription text of the interactive audio;

a key generation module, configured to generate a key based on the first dynamic information, the second dynamic information, and the transcribed text;

the phase modulation module is used for carrying out phase modulation on the interactive audio according to the phase modulation sequence;

and the watermark adding module is used for embedding the watermark generated based on the watermark sequence into the modulated interactive audio to obtain the encrypted interactive audio.

In at least one possible implementation manner, the first dynamic information includes voiceprint information of a user; the second dynamic information includes one or more of the following vehicle information: location, vehicle speed, windowing state, tire pressure, and number of occupants.

In at least one possible implementation manner, the watermark generating module includes:

the characteristic splicing unit is used for splicing the characteristic vector of the voiceprint information of the user and the characteristic vector of the vehicle information to obtain a characteristic fusion sequence;

the first matrix representation unit is used for respectively taking each dimension of the feature fusion sequence as an initial condition to generate a first random matrix approximately obeying Gaussian distribution;

and the watermark acquisition unit is used for acquiring the watermark sequence from the first random matrix according to a preset strategy.

In at least one possible implementation manner, the phase modulation sequence generation module includes:

an encoding unit for encoding and representing the transcription text as a number sequence;

a second matrix representing unit, configured to generate a second random matrix approximately complying with gaussian distribution by using each number in the number sequence as an initial condition;

and the phase modulation sequence acquisition unit is used for acquiring the phase modulation sequence from the second random matrix according to a preset strategy.

In at least one possible implementation manner, the key generation module includes:

the first single-frequency signal generating unit is used for fusing the first dynamic information and the second dynamic information and generating a first single-frequency signal according to a preset strategy;

the second single-frequency signal generating unit is used for generating a second single-frequency signal according to a preset strategy and the transcribed text;

and the key generating unit is used for re-fusing the first single-frequency signal and the second single-frequency signal to obtain the key.

In at least one possible implementation manner, the apparatus further includes: and the encryption information transmission configuration module is used for respectively configuring the secret key and the encrypted interactive audio in different transmission channels.

In a third aspect, the present invention provides an in-vehicle interactive audio encryption device, including:

one or more processors, memory which may employ a non-volatile storage medium, and one or more computer programs stored in the memory, the one or more computer programs comprising instructions which, when executed by the apparatus, cause the apparatus to perform the method as in the first aspect or any possible implementation of the first aspect.

In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon a computer program which, when run on a computer, causes the computer to perform the method as described in the first aspect or any possible implementation manner of the first aspect.

In a fifth aspect, the present invention also provides a computer program product for performing the method of the first aspect or any possible implementation manner of the first aspect, when the computer program product is executed by a computer.

In a possible design of the fifth aspect, the relevant program related to the product may be stored in whole or in part on a memory packaged with the processor, or may be stored in part or in whole on a storage medium not packaged with the processor.

The invention abandons the concept of fixed information encryption, and provides an interactive audio encryption scheme with high safety and reliability based on various dynamic information related to audio interaction in the vehicle and by adopting the concept of multi-link combined application. Specifically, at least two kinds of dynamic information with different dimensions are extracted from interactive audio and an interactive scene of a user, a plurality of kinds of dynamic information are mutually fused to generate a watermark, a transcription text related to interactive audio content is used as another kind of dynamic information to perform phase modulation on the audio, besides, the various kinds of dynamic information are further considered to be synthesized to generate a corresponding secret key, and the secret key and the encrypted audio are respectively sent to the outside through independent transmission channels, so that the personal information and the privacy of the user in the vehicle can be effectively protected based on the combined application of the safety measures, and compared with the existing mode, the difficulty of unauthorized decryption is greatly increased.

Drawings

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described with reference to the accompanying drawings, in which:

FIG. 1 is a flowchart of an embodiment of an in-vehicle interactive audio encryption method provided by the present invention;

fig. 2 is a schematic diagram of an embodiment of a process for generating a watermark provided by the present invention;

fig. 3 is a block diagram of an embodiment of an in-vehicle interactive audio encryption device provided by the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.

Before the technical scheme of the invention is explained, the existing in-vehicle interactive audio encryption scheme is explained, and from the technical development in the field, the anti-attack technology of the in-vehicle interactive voice is not mature, and in the voice interaction process, the audio is directly transmitted in plaintext after being compressed and encoded, and corresponding processing is carried out at a server side. Therefore, once a hacker intercepts the audio of the in-vehicle user, the voice request of the in-vehicle user is simulated by utilizing the voiceprint characteristics of the audio, and the driving safety cannot be guaranteed undoubtedly.

As mentioned above, the conventional encryption method for the in-vehicle interactive audio is to add a single-frequency signal encoded by a fixed codeword sequence to the high-frequency band of the audio, and when the remote receiving end decodes the audio, the remote receiving end determines whether the input audio is the in-vehicle interactive audio processed by the protected front-end noise reduction engine by detecting whether the high-frequency band contains the single-frequency signal encoded by the specific codeword sequence, so as to protect the information security of the in-vehicle user to a certain extent. However, the watermark information added at high frequency is fixed and is easy to be broken, a fake user only needs to intercept a certain amount of audio which is added with the watermark at the noise reduction front end, and a relatively fixed coding rule is easy to find by analyzing the frequency spectrum information, so that the coding of the watermark is broken, and the protection significance is completely lost. Furthermore, adding a single frequency signal at a high frequency band also affects the listening quality of the audio from a certain point of view.

In view of the above, the invention abandons the idea of encryption based on fixed information, considers the joint encryption by adopting various dynamic information, has flexible and changeable encryption information, and obviously has higher security compared with the existing mode.

In combination with the specific embodiment, the present invention provides an embodiment of an in-vehicle interactive audio encryption method, as shown in fig. 1, which may include the following steps:

and step S0, collecting the interactive audio of the user and carrying out front-end noise reduction processing.

The process belongs to a conventional front-end technology and is the basis for realizing encryption operation subsequently, the interactive audio generally refers to human-computer interactive voice of a user and equipment such as a host computer in a vehicle, and the specific way of acquiring and reducing noise does not belong to the key point of the invention.

Step S1, acquiring the first dynamic information and the second dynamic information from the interactive audio and the interactive scene.

In order to improve the encryption security, the invention considers the utilization of dynamic information of at least two dimensions as an encryption basis, one is interactive audio from a user, and the other is related to an interactive scene, namely information of a vehicle. Because both of them will dynamically change in different application environments, and are not a constant attribute, for example, different acoustic characteristics of pronunciation of different users are different, such as voiceprint information representing pronunciation characteristics of respective users, and certainly, personalized information such as age and gender of users obtained by using interactive audio and combining with the existing speech processing technology; in addition, in the running state of the vehicle, the information of the vehicle itself may also change in real time, such as the location, the real-time vehicle speed, the tire pressure, the oil temperature, the battery level, the setting parameters of the device in the vehicle set by the user, and so on, so that these relatively flexible dynamic information are the objects utilized by the present invention, and in the actual operation, different dynamic information combination schemes may be selected from two dimensions of the user and the vehicle according to specific requirements, for example, but not limited to, in some possible implementation manners of the present invention, the first dynamic information may refer to the personal information of the user, and the voiceprint information of the user will be used as a schematic introduction hereinafter; the second dynamic information may refer to vehicle information, and preferably includes one or more of the following vehicle information: location, vehicle speed, windowing state, tire pressure, and number of occupants.

In addition, for the above-mentioned dynamic information obtaining method itself, those skilled in the art can understand that a large number of mature technologies can be selected, which is not the focus of the present invention, and will not be described herein.

And step S11, fusing the first dynamic information and the second dynamic information to generate a watermark sequence.

The invention proposes that the generation of the watermark needs to combine the two dynamic information, rather than adopt one of them alone, which is also to improve the encryption reliability. The specific watermark generation method may be to encode the dynamic information into a feature sequence, fuse the features of the two dynamic information, and then map the two dynamic information into a random sequence. Hereinafter, a specific watermark generation method will be exemplified, and details thereof will not be described herein.

And step S2, transcribing the interactive audio.

This process can also be considered as front-end processing of audio, i.e. locally recognizing the interactive audio as text, in order to obtain the specific content contained in the interactive information. Certainly, a large number of mature technologies are available for the voice recognition mode, and are not described herein, but it should be emphasized that, because the original purpose of the present invention is to encrypt the interactive audio before wireless transmission, it can be understood that the transcription process generally occurs locally, rather than being transmitted to the far end and then being recognized and returned.

And step S21, generating a phase modulation sequence based on the transcribed text.

The audio transmission needs to be phase-modulated in a conventional processing manner, but it should be emphasized that, in order to embody the joint action of multiple dynamic information, at the phase of performing phase adjustment, the present invention also proposes an idea of using the dynamic information, that is, the specific content of the interactive audio of the user is regarded as a dynamic information and is integrated into the phase modulation link, and this process will be specifically and schematically described later.

Step S3, generating a key based on the first dynamic information, the second dynamic information and the transcribed text.

However, in order to embody the joint action of multiple dynamic information again, in the key making process, the present invention also considers combining multiple dynamic information again, and here, the first dynamic information, the second dynamic information and the transcribed text are considered together, and the process will be described in detail later.

And step S22, performing phase modulation on the interactive audio according to the phase modulation sequence.

The phase modulation process will be specifically described later.

And step S12, embedding the watermark generated based on the watermark sequence into the modulated interactive audio to obtain the encrypted interactive audio.

The watermarking step will be described in detail later.

Thereby, an encrypted interactive audio with high security and a corresponding key are obtained, and it is further noted herein that: first, the sequence of the above steps is not limited by sequence numbers, for example, generation links of watermarks, keys, etc. may be the same stage, and certainly, generation of keys may also be placed after adding watermarks, and generation of watermarks and phase modulation sequences may also not be sequential, which does not limit the present invention (besides, the expressions "first" and "second" used in the present invention do not have sequence and level meanings, and are only used for distinction). Secondly, after the encrypted audio and the secret key are obtained, the two can be distributed to different transmission channels, namely the encrypted interactive audio and the secret key are transmitted to the external equipment through different channels.

As to the specific implementation of generating the phase modulation sequence, reference may be made to the following:

and after being processed by the front-end noise reduction system, the interactive audio signals are sent to a local voice recognition system, and the voice recognition result is the text content of the audio. For convenience of representation and subsequent processing, it can be uniformly represented by encoding UCS-2 into Unicode code in practical operation. The UCS-2 code is a universal Unicode code standard and comprises most characters and symbols in the world, including Chinese, English and numbers. The Unicode code represents a character by two bytes, such as the code for the kanji "jing" as 0x7ECF and the code for the letter "a" as 0x 0061. In the foregoing, after the specific content of the interactive audio is encoded as Unicode, it can be represented as a string of number sequences, which are: b ═ BiI is more than or equal to |0 and less than M }. Wherein M represents the total number of characters of the interactive audio, 0 < bi< 65536. Of course, the text encoding and the specific encoding are in a digital form, and belong to the optional variable operation, and are not limited in other embodiments.

The process of generating the phase modulation sequence may be based on: firstly, the audio content sequence is normalized to take the value of (0, 1), as shown in the formula (0.1). Then respectively using each number(32 bits) as an initial condition, a random matrix P is generated that approximately follows a gaussian distribution, P can be expressed as:

wherein, biAnd

Figure BDA0002519370080000094

the data before and after the normalization are respectively obtained,

Figure BDA0002519370080000095

pi,j+1=μpi,j(1-pi,j). Mu is a parameter for generating random sequence and is a constant, and mu is greater than 3.5699456 and less than or equal to 4.

Then, all columns of the matrix P can be spliced into a column vector according to a predetermined strategyThen taking the column vectorIn the sequence of every K/32 elements (K is FFT point number), a transformed random matrix P 'is obtained as [ P'0P′1P′2…]. The mth column of the matrix P' is taken as the sequence W of modulation phases. W is a random vector consisting of K/32-bit elements, representing text information of audio, and is represented as W ═ W in a binary sequencei|0≤i<K}。

For the specific embodiments of generating a watermark sequence and generating a watermark, reference may be made to the following:

based on the foregoing, the first dynamic information may refer to the sound feature I-vector of the in-vehicle user, and may specifically refer to the voiceprint feature of the specific in-vehicle user, where the I-vector features of different speakers are greatly differenti|0≤i<R }. The step of extracting the I-vector feature belongs to the prior art, for example, PLP features are extracted first, and then statistics of 0 order and 1 order are calculated by combining a pre-trained GMM model, so that an I-vector factor is obtained, and the invention is not limited and described in detail; for the second dynamic information, the collected information such as vehicle position information, vehicle speed information, whether windows are opened, the number of people in the vehicle, and the like can also be combined and encoded into the car-vector feature, and the invention is not limited and described in detail.

The process of generating the watermark sequence may be based on: firstly, splicing the characteristic sequences of the I-vector and the car-vector, and then carrying out normalization to make the values of the characteristic sequences of the I-vector and the car-vector be (0, 1), as shown in a formula (0.3). Then features each dimension separatelyAs an initial condition, a random watermark matrix S approximately following a gaussian distribution is generated, which can be expressed as:

S=[s1,s2,…,sR](0.4)

si=[si,0,si,1,si,2,…]T(0.5)

wherein the content of the first and second substances,

Figure BDA0002519370080000103

si,j+1=μsi,j(1-si,j). Mu is as defined above. The matrix S may then be converted into a row vector S' ═ S according to a given strategy1,0,s2,0,…,sR,0,s1,1,s2,1,…,sR,1,…]Each element of the vector S' is a floating point number, and can be represented in memory as a 32-bit binary sequence, where each element is represented in the form of a 32-bit binary sequence, i.e., S1,0=[b0,b1,…,b31],bi0, 1. The vector S ' may be represented as a continuous binary bitstream S ' ═ S '0,s′1,s′2,…],s′i0,1, i.e. the watermark sequence.

And the watermark generation process may be as described with reference to fig. 2. Wherein the content of the first and second substances, Z-1indicating the delay, the feedback coefficient ci=s′n+i

Figure BDA0002519370080000106

Is initially of

As to the specific implementation of the phase modulation process, reference may be made to the following:

the phase of the interactive audio is modulated by the generated phase modulation sequence related to the audio content, and the audio spectrum has larger difference before and after modulation, so that the encryption performance is better. Meanwhile, the relative phase between each frame of audio can be ensured to be unchanged in the actual modulation process, and the listening quality can not be influenced.

The process of phase modulation can be referred to as: firstly, input interactive audio signals x (n) are divided into frames, each frame signal is subjected to K-point FFT to obtain frequency domain signalsWherein A ismk) Represents the amplitude,And m represents the phase, and m is more than or equal to 1 and k represents the frame number and the frequency point number respectively. Then the phase is modulated according to the equations (0.6) and (0.7), and finally the signal is modulated

Figure BDA0002519370080000111

Performing K-point IFFT to obtain a time domain signal y after phase modulation1(n) of (a). In the formula (0.7), the first and second groups,

Figure BDA0002519370080000113

this process is not critical and will not be described in excessive detail herein.

As to a specific embodiment of the process of adding the watermark, reference may be made to the following:

the generated watermark sequence is similar to white noise, is not a single-frequency signal, and has the same power on each frequency band, so that the auditory sense is not influenced. And the watermark sequence is related to the interactive audio, approximately follows Gaussian distribution, is difficult to crack and has high encryption safety. In addition, the encrypted watermark is embedded into the target audio in a white noise mode and is not easy to detect. Embedding the generated white noise-like watermark into the phase-modulated audio to obtain an output signal y (n), as shown in formula (0.8):

Figure BDA0002519370080000115

finally, as to a specific implementation of the key generation process, the following can be referred to:

the interactive audio and the scene information are merged into the interactive audio through phase modulation and watermarking in the process. Although the embedding mode of the encrypted watermark has higher concealment, in order to correctly remove the embedded watermark and decrypt at the receiving back end, the invention also combines the text content of the interactive audio, the I-vector characteristic and the car-vector characteristic information to generate a single-frequency secret key.

The process of generating the key may be based on: firstly, the foregoing audio text Unicode sequence B of step 1 is given as BiI < M > is more than or equal to |0 ≦ representsForming quaternary coded sequence B ═ B'iI is more than or equal to |0 and less than M × 8 }. Then, for the ith, i is more than or equal to 0 and less than M and 8 quadrate code elements, generating a single-frequency signal eb according to a formula (0.9)i(n), wherein A is an amplitude value, which may be a fixed value; f. ofb’iAnd b'iIn a linear relationship, i.e. fb’i=kb’i. Then, the splicing sequence V of the I-vector and the car-vector is set as { V ═ ViI is more than or equal to 0 and less than R according to the same steps to obtain a single-frequency signal evi(n) of (a). Finally, the output key e (n) is shown in equation (0.10).

e(n)=ebi(k)+evi(k),n=i*W+k (0.10)

Therefore, the encrypted audio and the key transmitted to the receiving end by using the independent channel are obtained, and based on the foregoing embodiments and preferred schemes, the processing of the receiving end is described in an introductory way here: at the receiving end, the first dynamic information, the second dynamic information and the interactive content can be obtained by analyzing the secret key, and then the watermark in the interactive audio is reversely removed and demodulated, and finally a clean voice signal is obtained.

Specifically, at a receiving end, firstly, analyzing the secret key to obtain an interactive audio text sequence and I-vector and car-vector splicing feature sequences; then based on the mode of generating the phase modulation sequence and the watermark sequence, removing the watermark from the received interactive audio and carrying out phase demodulation; and finally, reversely obtaining the transcribed text of the interactive audio and the splicing characteristics of the I-vector and the car-vector according to the audio signal without the watermark, and carrying out legality judgment on the transcribed text and the splicing characteristics of the I-vector and the car-vector and the information obtained after the secret key is analyzed. It will be appreciated by those skilled in the art that the decryption process may be derived from the encryption operation described above, and the processing steps involved in the decryption process may be reversed accordingly, which is not a focus of the present invention.

In conclusion, the invention abandons the fixed information encryption idea, and provides an interactive audio encryption scheme with high safety and reliability based on various dynamic information related to the audio interaction in the vehicle and adopting the idea of multi-link combined application. Specifically, at least two kinds of dynamic information with different dimensions are extracted from interactive audio and an interactive scene of a user, a plurality of kinds of dynamic information are mutually fused to generate a watermark, a transcription text related to interactive audio content is used as another kind of dynamic information for carrying out phase modulation on the audio, besides, the various kinds of dynamic information are considered to be synthesized to generate a corresponding secret key, and the secret key and the encrypted audio are respectively sent out through independent transmission channels, so that personal information and privacy of the user in the vehicle can be effectively protected based on the combined application of various safety measures, and compared with the existing mode, the difficulty of unauthorized cracking is greatly increased.

Corresponding to the above embodiments and preferred schemes, the present invention further provides an embodiment of an in-vehicle interactive audio encryption apparatus, which may specifically include the following components as shown in fig. 3:

the interactive audio front-end processing module 0 is used for acquiring the interactive audio of the user and performing front-end noise reduction processing;

the dynamic information acquisition module 1 is used for acquiring first dynamic information and second dynamic information from an interactive audio frequency and an interactive scene of a user;

the watermark generating module 2 is configured to fuse the first dynamic information and the second dynamic information to generate a watermark sequence;

the phase modulation sequence generation module 3 is used for generating a phase modulation sequence based on the transcription text of the interactive audio;

a key generation module 4, configured to generate a key based on the first dynamic information, the second dynamic information, and the transcribed text;

the phase modulation module 5 is configured to perform phase modulation on the interactive audio according to the phase modulation sequence;

the watermark adding module 6 is used for embedding the watermark generated based on the watermark sequence into the modulated interactive audio to obtain an encrypted interactive audio;

in other embodiments, the apparatus may further include an encrypted information transmission configuration module, configured to configure the key and the encrypted interactive audio in different transmission channels, respectively.

In at least one possible implementation manner, the first dynamic information includes voiceprint information, age and gender of the user; the second dynamic information includes one or more of the following vehicle information: the position, the vehicle speed, the tire pressure, the oil temperature, the battery power, the setting parameters of the user on the equipment in the vehicle, the windowing state, the tire pressure and the number of passengers.

In at least one possible implementation manner, the watermark generating module includes:

the characteristic splicing unit is used for splicing the characteristic vector of the personal information of the user and the characteristic vector of the vehicle information to obtain a characteristic fusion sequence;

the first matrix representation unit is used for respectively taking each dimension of the feature fusion sequence as an initial condition to generate a first random matrix approximately obeying Gaussian distribution;

and the watermark acquisition unit is used for acquiring the watermark sequence from the first random matrix according to a preset strategy.

In at least one possible implementation manner, the phase modulation sequence generation module includes:

an encoding unit for encoding and representing the transcription text as a number sequence;

a second matrix representing unit, configured to generate a second random matrix approximately complying with gaussian distribution by using each number in the number sequence as an initial condition;

and the phase modulation sequence acquisition unit is used for acquiring the phase modulation sequence from the second random matrix according to a preset strategy.

In at least one possible implementation manner, the key generation module includes:

the first single-frequency signal generating unit is used for fusing the first dynamic information and the second dynamic information and generating a first single-frequency signal according to a preset strategy;

the second single-frequency signal generating unit is used for generating a second single-frequency signal according to a preset strategy and the transcribed text;

and the key generating unit is used for re-fusing the first single-frequency signal and the second single-frequency signal to obtain the key.

It should be understood that the division of the components in the in-vehicle interactive audio encryption device shown in fig. 3 is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity or physically separated. And these components may all be implemented in software invoked by a processing element; or may be implemented entirely in hardware; and part of the components can be realized in the form of calling by the processing element in software, and part of the components can be realized in the form of hardware. For example, a certain module may be a separate processing element, or may be integrated into a certain chip of the electronic device. Other components are implemented similarly. In addition, all or part of the components can be integrated together or can be independently realized. In implementation, each step of the above method or each component above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above components may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), one or more microprocessors (DSPs), one or more Field Programmable Gate Arrays (FPGAs), etc. For another example, these components may be integrated together and implemented in the form of a System-On-a-Chip (SOC).

In view of the foregoing examples and their preferred embodiments, it will be appreciated by those skilled in the art that in practice, the invention may be practiced in a variety of embodiments, and that the invention is illustrated schematically in the following vectors:

(1) an in-vehicle interactive audio encryption device, which may include:

one or more processors, memory, and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions, which when executed by the apparatus, cause the apparatus to perform the steps/functions of the foregoing embodiments or equivalent implementations.

(2) A readable storage medium, on which a computer program or the above-mentioned apparatus is stored, which, when executed, causes the computer to perform the steps/functions of the above-mentioned embodiments or equivalent implementations.

In the several embodiments provided by the present invention, any function, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. Based on this understanding, some aspects of the present invention may be embodied in the form of software products, which are described below, or portions thereof, which substantially contribute to the art.

(3) A computer program product (which may include the above-described apparatus) which, when run on a computer device, causes the device to perform the in-vehicle interactive audio encryption method of the preceding embodiment or equivalent.

From the above description of the embodiments, it is clear to those skilled in the art that all or part of the steps in the above implementation method can be implemented by software plus a necessary general hardware platform.

In the embodiments of the present invention, "at least one" means one or more, "and" a plurality "means two or more. "and/or" describes the association relationship of the associated objects, and means that there may be three relationships, for example, a and/or B, and may mean that a exists alone, a and B exist simultaneously, and B exists alone. Wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" and similar expressions refer to any combination of these items, including any combination of singular or plural items. For example, at least one of a, b, and c may represent: a, b, c, a and b, a and c, b and c or a and b and c, wherein a, b and c can be single or multiple.

Those of skill in the art will appreciate that the various modules, elements, and method steps described in the embodiments disclosed in this specification can be implemented as electronic hardware, combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In addition, the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments may be referred to each other. In particular, for embodiments of devices, apparatuses, etc., since they are substantially similar to the method embodiments, reference may be made to some of the descriptions of the method embodiments for their relevant points. The above-described embodiments of devices, apparatuses, etc. are merely illustrative, and modules, units, etc. described as separate components may or may not be physically separate, and may be located in one place or distributed in multiple places, for example, on nodes of a system network. Some or all of the modules and units can be selected according to actual needs to achieve the purpose of the above-mentioned embodiment. Can be understood and carried out by those skilled in the art without inventive effort.

The structure, features and effects of the present invention have been described in detail with reference to the embodiments shown in the drawings, but the above embodiments are merely preferred embodiments of the present invention, and it should be understood that technical features related to the above embodiments and preferred modes thereof can be reasonably combined and configured into various equivalent schemes by those skilled in the art without departing from and changing the design idea and technical effects of the present invention; therefore, the invention is not limited to the embodiments shown in the drawings, and all the modifications and equivalent embodiments that can be made according to the idea of the invention are within the scope of the invention as long as they are not beyond the spirit of the description and the drawings.

16页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种降低音频编码运算量的方法、系统、存储介质及设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类