Voice data transmission method

文档序号：1448673 发布日期：2020-02-18 浏览：16次中文

阅读说明：本技术 语音数据传输方法 (Voice data transmission method ) 是由余波于 2019-11-04 设计创作，主要内容包括：本发明公开一种语音数据传输方法,包括：将待传输语音数据分为至少一个数据片段,所述至少一个数据片段中的每一个数据片段的大小小于设定阈值；依次对所述至少一个数据片段中的每一个数据片段执行以下步骤：生成对应于当前数据片段的帧头信息和帧尾信息,所述帧头信息包括开始标识信息、命令信息和帧体长度信息,所述帧尾信息包括校验信息；根据所述帧头信息、所述帧尾信息和所述当前数据片段进行组包；将组包所得到的语音数据包发送至外围设备。在本发明实施例中,在传输语音数据时将其划分为小于设定阈值的大小的数据片段,降低了对传输带宽的要求,使得语音数据的传输更加稳定,提升了语音数据传输的可靠性。(The invention discloses a voice data transmission method, which comprises the following steps: dividing voice data to be transmitted into at least one data fragment, wherein the size of each data fragment in the at least one data fragment is smaller than a set threshold value; sequentially executing the following steps for each of the at least one data segment: generating frame header information and frame tail information corresponding to the current data fragment, wherein the frame header information comprises start identification information, command information and frame body length information, and the frame tail information comprises check information; packing according to the frame header information, the frame tail information and the current data fragment; and sending the voice data packet obtained by the packet package to the peripheral equipment. In the embodiment of the invention, the voice data is divided into the data segments with the size smaller than the set threshold value when being transmitted, so that the requirement on the transmission bandwidth is reduced, the transmission of the voice data is more stable, and the reliability of the voice data transmission is improved.)

1. A method of voice data transmission, comprising:

dividing voice data to be transmitted into at least one data fragment, wherein the size of each data fragment in the at least one data fragment is smaller than a set threshold value;

sequentially executing the following steps for each of the at least one data segment:

generating frame header information and frame tail information corresponding to the current data fragment, wherein the frame header information comprises start identification information, command information and frame body length information, and the frame tail information comprises check information;

packing according to the frame header information, the frame tail information and the current data fragment;

and sending the voice data packet obtained by the packet package to the peripheral equipment.

2. The method of claim 1, wherein the command information includes a data segment identification bit for identifying whether the current data segment is a last data segment.

3. The method of claim 1, wherein the generating end-of-frame information corresponding to a current data fragment comprises:

acquiring command information and frame length information in the frame header information;

and generating the check information according to the command information and the frame length information in the frame header information.

4. The method of claim 1, wherein the packing according to the frame header information, the frame trailer information, and the current data fragment comprises:

generating a voice data packet frame header according to the frame header information;

generating a voice data packet frame tail according to the frame tail information;

and filling the current data frame segment into a voice data packet frame body.

5. The method of claim 4, wherein,

the size of the header of the voice data packet is 5 bytes, wherein the start identification information occupies 2 bytes, the command information occupies 1 byte, and the frame length information occupies 2 bytes;

the size of the frame tail of the voice data packet is 4 bytes, and the frame tail of the voice data packet also comprises parameter information, wherein the parameter information occupies 2 bytes, and the verification information occupies 2 bytes;

the size of the voice data packet frame body is 0-2048 bytes.

6. The method of claim 4, wherein the populating the current data frame segment into a voice packet frame body comprises:

dividing the current data frame segment into a plurality of sub data frame segments;

and storing the plurality of sub data frame fragments by adopting a queue structure to obtain a voice data packet frame body.

7. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any one of claims 1-6.

8. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a voice data transmission method.

Background

At present, communication protocols of voice DSP chip manufacturers are private, and different communication protocols are possible according to functions supported by the voice DSP chips, application scenes and specific communication peripherals. The voice DSP chips of various manufacturers on the market define own application scenes and functions. But the communication protocols are based on frame structures and have different frame formats and contents.

The technology of the voice DSP chip is developed rapidly, the processing capability is enhanced continuously, and the supported functions and the application scenes are more and more. This makes the design of the communication protocol thereof challenging.

Disclosure of Invention

An embodiment of the present invention provides a voice data transmission method, which is used to solve at least one of the above technical problems.

In a first aspect, an embodiment of the present invention provides a method for transmitting voice data, including:

dividing voice data to be transmitted into at least one data fragment, wherein the size of each data fragment in the at least one data fragment is smaller than a set threshold value;

sequentially executing the following steps for each of the at least one data segment:

packing according to the frame header information, the frame tail information and the current data fragment;

and sending the voice data packet obtained by the packet package to the peripheral equipment.

In some embodiments, the command information includes a data segment identification bit for identifying whether the current data segment is the last data segment.

In some embodiments, the generating end-of-frame information corresponding to the current data segment includes:

acquiring command information and frame length information in the frame header information;

acquiring frame body parameter information and frame tail parameter information;

and generating the verification information according to the command information, the frame length information, the frame parameter information and the frame tail parameter information in the frame header information.

In some embodiments, the packing according to the frame header information, the frame trailer information, and the current data fragment includes:

generating a voice data packet frame header according to the frame header information;

generating a voice data packet frame tail according to the frame tail information;

and filling the current data frame segment into a voice data packet frame body.

In some embodiments, the size of the header of the voice data packet is 5 bytes, wherein the start identification information occupies 2 bytes, the command information occupies 1 byte, and the frame length information occupies 2 bytes;

the size of the voice data packet frame body is 0-2048 bytes.

In some embodiments, said populating said current data frame segment into a voice data packet frame body comprises: dividing the current data frame segment into a plurality of sub data frame segments; and storing the plurality of sub data frame fragments by adopting a queue structure to obtain a voice data packet frame body.

In a second aspect, an embodiment of the present invention provides a storage medium, in which one or more programs including execution instructions are stored, where the execution instructions can be read and executed by an electronic device (including but not limited to a computer, a server, or a network device, etc.) to perform any one of the above-described voice data transmission methods of the present invention.

In a third aspect, an electronic device is provided, comprising: the system comprises at least one processor and a memory which is connected with the at least one processor in a communication mode, wherein the memory stores instructions which can be executed by the at least one processor, and the instructions are executed by the at least one processor so as to enable the at least one processor to execute any voice data transmission method.

In a fourth aspect, the present invention further provides a computer program product, where the computer program product includes a computer program stored on a storage medium, and the computer program includes program instructions, and when the program instructions are executed by a computer, the computer is caused to execute any one of the voice data transmission methods.

In the embodiment of the invention, the voice data is divided into data segments with the size smaller than the set threshold value when being transmitted. Specifically, the frame header information includes: the start identifier + command information + frame body length, and the 5 th bit of the command information indicates whether the packet is the last packet, so that the method can support the transmission of long data split into a plurality of small packets. The requirement on transmission bandwidth is reduced, the transmission of voice data is more stable, and the reliability of voice data transmission is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart of a voice data transmission method according to an embodiment of the present invention;

FIG. 2 is a flow chart of another embodiment of a voice data transmission method according to the present invention;

fig. 3 is a schematic structural diagram of an embodiment of an electronic device according to the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

As shown in fig. 1, an embodiment of the present invention provides a voice data transmission method, including:

s11, dividing the voice data to be transmitted into at least one data segment, wherein the size of each data segment in the at least one data segment is smaller than a set threshold value;

s12, sequentially executing the following steps for each data fragment in the at least one data fragment:

s121, generating frame header information and frame tail information corresponding to the current data fragment, wherein the frame header information comprises start identification information, command information and frame body length information, and the frame tail information comprises check information; the command information includes a data segment identification bit for identifying whether the current data segment is the last data segment.

S122, packaging according to the frame head information, the frame tail information and the current data fragment;

and S123, sending the voice data packet obtained by the packet packaging to peripheral equipment.

In some embodiments, the generating end-of-frame information corresponding to the current data segment includes:

acquiring command information and frame length information in the frame header information;

acquiring frame body parameter information and frame tail parameter information;

The 0 th to 2 th bits of the command information are command types and indicate the types of data packets, and the current command types are as follows: a handshake type, a configuration type, and a query type, among others.

In some embodiments, the crc field is calculated from the cmd and length fields, and the sof field is not included to reduce the amount of calculation, since both the sender and receiver of the data packet will determine the sof first.

In some embodiments, the packing according to the frame header information, the frame trailer information, and the current data fragment includes:

generating a voice data packet frame header according to the frame header information;

generating a voice data packet frame tail according to the frame tail information;

and filling the current data frame segment into a voice data packet frame body.

the size of the voice data packet frame body is 0-2048 bytes.

In the embodiment of the invention, the structural frame head is compressed into 5 bytes and the structural frame tail is compressed into 4 bytes according to the protocol, which are used for describing the segmentation, identification and verification of the transmission data packet, and the rest data packets are all used for filling the data information for transmission. On the basis of ensuring the protocol expansibility, the protocol is designed to use as many bytes as possible for transmitting data without any redundant fields.

In some embodiments, each sub data frame fragment is further configured with fragment type information and fragment length information. Illustratively, for a sub data frame fragment, it is configured to: type (1 byte: 0x02) + length (2 bytes: voice length) + voice data.

In particular, the content of the frame body is organized in the format of TLV, and the subtype data belonging to the specified command type is very convenient to expand (new subtypes are added according to the expansion requirement of the service). For example, the sub-types corresponding to the command type as the query type include a wakeup word, voice data, an operation mode, and the like, that is, query wakeup word, voice, and current operation mode information. The sender frame body may contain zero or more subtypes, while the receiver's response message frame body may also contain zero or more subtypes.

The embodiment of the invention discloses a communication protocol of a voice DSP chip, wherein the frame format of the protocol is shown in the following table:

wherein, the frame body is a whole body, which contains the content of protocol transmission, and adopts the frame body format of TLV, as shown in the following table:

type1

length1

value1

type2

length2

value2

…

typen

lengthn

valuen

the description of the various fields in the frame structure is as follows:

the sending protocol and the receiving protocol adopt a unified communication protocol format. The sender of the communication protocol is peripheral soc/mcu (System on Chip/Micro Control Unit), and the receiver is a voice dsp Chip. The message frame structure consists of: a frame header, a frame body, and a frame trailer.

Specifically, the frame header includes: the start identifier + command information + frame body length, and the 5 th bit of the command information indicates whether the packet is the last packet, so that the method can support the transmission of long data split into a plurality of small packets. The 0 th to 2 th bits of the command information are command types and indicate the types of data packets, and the current command types are as follows: a handshake type, a configuration type, and a query type, among others.

Specifically, the length is the length of the frame body, and does not include the lengths of the frame head and the frame tail.

In particular, the content of the frame body is organized in the format of TLV, and the subtype data belonging to the specified command type is very convenient to expand (new subtypes are added according to the expansion requirement of the service). For example, the sub-types corresponding to the command type being the query type include a wakeup word, voice data, an operation mode, and the like, that is, query wakeup word, voice, and current operation mode information. The sender frame body may contain zero or more subtypes, while the receiver's response message frame body may also contain zero or more subtypes.

Specifically, the crc check field is calculated from the cmd, length, frame body and param fields, and the sof field is not included to reduce the amount of calculation, because both the sender and the receiver of the packet first determine the sof.

As shown in fig. 2, it is a flowchart of another embodiment of the data transmission method of the present invention, which includes the following steps:

s21, inquiring voice data of the voice buffer area;

s22, judging whether the voice data length of the voice buffer area is larger than the maximum length set by the protocol; illustratively, the frame body length is the maximum frame length set by the protocol (0x1000) -the frame header length (0x05) -the frame tail length (0x 04); type occupied frame body length of voice length-TLV (0x01) -length occupied frame body length of TLV (0x 02).

S23, if yes, obtaining the voice data with the maximum length specified by the protocol; and splitting the voice data in the buffer area into a plurality of packets for transmission.

S24, if not, acquiring all voice data in the buffer area;

s25, generating frame header data according to the protocol based on the voice data with the maximum length specified by the acquired protocol or all the voice data in the buffer area; the frame header information is a start identifier of 0x7E 7E; command information of 0x04 or 0x24, the 5 th bit identifies whether the last packet; frame length-frame body length (variable).

S26, filling a frame body according to the voice data; the frame body information is type (1 byte: 0x02) + length (2 bytes: voice length) + voice data.

S27, calculating according to the frame number data to obtain crc; the frame length calculates crc according to the command information of the frame header.

S28, generating frame tail information; the frame end information is a frame end variable (2 bytes), which is not used for the moment and can be set to 0x 0; crc information (2 bytes).

And S29, sending the voice packet.

In actual design and engineering applications, information queried at a time is changed (for example, only voice can be queried at a time, or voice and wake-up information can be queried together), so that the protocol needs to support the changed conditions, and therefore, TLVs in a frame body in a data packet sent by a peripheral device to a voice DSP chip can contain a plurality of request types to be queried. For example:

the frame structure of issuing:

tlv in the request frame structure contains only type to identify which data information is requested, value is null, so length is 0.

Frame structure returned:

tlv of the return frame structure and tlv of the request frame structure are in a one-to-one correspondence.

It should be noted that for simplicity of explanation, the foregoing method embodiments are described as a series of acts or combination of acts, but those skilled in the art will appreciate that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention. In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In some embodiments, the present invention provides a non-transitory computer-readable storage medium, in which one or more programs including executable instructions are stored, and the executable instructions can be read and executed by an electronic device (including but not limited to a computer, a server, or a network device, etc.) to perform any of the above-described voice data transmission methods of the present invention.

In some embodiments, the present invention further provides a computer program product comprising a computer program stored on a non-volatile computer-readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform any of the voice data transmission methods described above.

In some embodiments, an embodiment of the present invention further provides an electronic device, which includes: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a voice data transmission method.

In some embodiments, the present invention further provides a storage medium having a computer program stored thereon, wherein the computer program is configured to implement a voice data transmission method when executed by a processor.

Fig. 3 is a schematic diagram of a hardware structure of an electronic device for executing a voice data transmission method according to another embodiment of the present application, and as shown in fig. 3, the electronic device includes:

one or more processors 310 and a memory 320, one processor 310 being illustrated in fig. 3.

The apparatus for performing the voice data transmission method may further include: an input device 330 and an output device 340.

The processor 310, the memory 320, the input device 330, and the output device 340 may be connected by a bus or other means, such as the bus connection in fig. 3.

The memory 320 is a non-volatile computer-readable storage medium and can be used for storing non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the voice data transmission method in the embodiment of the present application. The processor 310 executes various functional applications of the server and data processing by executing nonvolatile software programs, instructions and modules stored in the memory 320, that is, implements the voice data transmission method of the above-described method embodiment.

The memory 320 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the voice data transmission apparatus, and the like. Further, the memory 320 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 320 may optionally include memory located remotely from processor 310, which may be connected to a voice data transmission device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 330 may receive input numeric or character information and generate signals related to user settings and function control of the voice data transmission device. The output device 340 may include a display device such as a display screen.

The one or more modules are stored in the memory 320 and, when executed by the one or more processors 310, perform the voice data transmission method of any of the method embodiments described above.

The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.

The electronic device of the embodiments of the present application exists in various forms, including but not limited to:

(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.

(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as ipads.

(3) Portable entertainment devices such devices may display and play multimedia content. Such devices include voice, video players (e.g., ipods), handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.

(4) The server is similar to a general computer architecture, but has higher requirements on processing capability, stability, reliability, safety, expandability, manageability and the like because of the need of providing highly reliable services.

(5) And other electronic devices with data interaction functions.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions substantially or contributing to the related art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

12页详细技术资料下载

Voice data transmission method

相关技术

网友询问留言