Voice stream data processing method, device and system

文档序号:1955184 发布日期:2021-12-10 浏览:31次 中文

阅读说明:本技术 语音流数据的处理方法、装置及系统 (Voice stream data processing method, device and system ) 是由 蒋海滨 姚逸丰 于 2021-09-14 设计创作,主要内容包括:本发明公开了一种语音流数据的处理方法、装置及系统,该方法包括:接收客户端发送的数据报文,所述数据报文封装有语音流数据;所述数据报文中包括序列字段,所述序列字段用于记录表示所述数据报文的发送顺序的序号;按照所述序号的顺序将后接收到的数据报文与先接收到的数据报文拼接后进行语义分析,得到处理结果,所述处理结果包括所述语音流数据对应的数据报文的序号;向所述客户端发送所述处理结果。通过上述方式,本发明的方案实现了在复杂网络环境中同传时发生错误的情况减少,并且在出现网络波动的情况下能够处理服务端未接收到的数据,而不影响实时的语音流的处理。(The invention discloses a method, a device and a system for processing voice stream data, wherein the method comprises the following steps: receiving a data message sent by a client, wherein the data message is packaged with voice stream data; the data message comprises a sequence field, and the sequence field is used for recording a sequence number representing the sending sequence of the data message; splicing the later received data message with the first received data message according to the sequence of the sequence numbers, and then carrying out semantic analysis to obtain a processing result, wherein the processing result comprises the sequence number of the data message corresponding to the voice stream data; and sending the processing result to the client. Through the mode, the scheme of the invention realizes the reduction of the error occurrence in the simultaneous transmission in the complex network environment, and can process the data which is not received by the server under the condition of network fluctuation without influencing the real-time voice stream processing.)

1. A method for processing voice stream data is applied to a server, and the method comprises the following steps:

receiving a data message, wherein the data message is packaged with voice stream data and a serial number representing a processing sequence between the data message and a data message related to the data message;

if the following judgment is made according to the processing sequence and the sequence number in the data message: the data messages are processed after the associated data messages corresponding to the missing serial numbers are arranged when the messages are sequentially processed according to the processing sequence; then, storing the data message;

and suspending the processing of the data message until the associated data message encapsulated with the missing serial number is received, and sequentially processing the associated data message corresponding to the missing serial number and the data message according to the processing sequence.

2. The method for processing voice stream data according to claim 1, wherein the processing includes:

the simultaneous interpretation is performed based on the voice stream data.

3. The method of processing speech stream data according to claim 1, wherein said processing order includes processing in order according to consecutive sequence numbers;

then, according to the processing sequence and the sequence number in the data message, determining: the data message is a message which is arranged after the associated data message corresponding to the missing sequence number when the message processing is performed in sequence according to the processing sequence, and the method comprises the following steps:

if the sequence number encapsulated in the data message is not continuous with the sequence number encapsulated in the data message of the first branch chain tail node of the target chain table, judging that: the data messages are processed after the associated data messages corresponding to the missing serial numbers are arranged when the messages are sequentially processed according to the processing sequence;

and the first branched chain is used for sequentially storing the messages to be processed according to the processing sequence.

4. The method for processing voice stream data according to claim 3, wherein storing the data packet comprises:

storing the data message in a second branched chain of the target linked list; and the second branched chain is used for sequentially storing the messages to be merged into the first branched chain according to the processing sequence.

5. The method according to claim 4, wherein, until receiving the associated data packet encapsulated with the missing sequence number, sequentially processing the associated data packet corresponding to the missing sequence number according to the processing sequence, and the processing of the data packet includes:

merging the data message to the first branch chain until receiving the associated data message encapsulated with the missing serial number, serving as a first branch chain tail node and storing the associated data message to the first branch chain tail node, wherein the missing serial number is continuous with the serial number encapsulated by the data message;

and sequentially processing the associated data messages corresponding to the missing serial numbers in the first branch chain and the data messages according to the processing sequence.

6. The method for processing voice stream data according to claim 1, further comprising:

performing semantic analysis on the associated data message corresponding to the missing serial number and the processed data message to obtain a semantic analysis result;

and sending the semantic analysis result to a client.

7. A method for processing voice stream data is applied to a client, and the method comprises the following steps:

sending a data message to a server, wherein the data message is encapsulated with voice stream data and a serial number representing a processing sequence between the data message and a data message related to the data message, so that the server executes the following operations:

if the server judges that: the data messages are processed after the associated data messages corresponding to the missing serial numbers are arranged when the messages are sequentially processed according to the processing sequence; then, storing the data message; and suspending the processing of the data message until the associated data message encapsulated with the missing serial number is received, and sequentially processing the associated data message corresponding to the missing serial number and the data message according to the processing sequence.

8. An apparatus for processing voice stream data, applied to a server, the apparatus comprising:

the receiving and sending module is used for receiving a data message, wherein the data message is packaged with voice stream data and a serial number representing a processing sequence between the data message and a data message related to the data message;

a processing module, configured to determine, if the data packet is a packet including the sequence number of the data packet and the processing sequence: the data messages are processed after the associated data messages corresponding to the missing serial numbers are arranged when the messages are sequentially processed according to the processing sequence; then, storing the data message; and suspending the processing of the data message until the associated data message encapsulated with the missing serial number is received, and sequentially processing the associated data message corresponding to the missing serial number and the data message according to the processing sequence.

9. An apparatus for processing voice stream data, applied to a client, the apparatus comprising:

a transceiver module, configured to send a data packet to a server, where the data packet is encapsulated with voice stream data and a sequence number representing a processing sequence between the data packet and a data packet associated with the data packet, so that the server performs the following operations:

if the following judgment is made according to the processing sequence and the sequence number in the data message: the data messages are processed after the associated data messages corresponding to the missing serial numbers are arranged when the messages are sequentially processed according to the processing sequence; then, storing the data message; and suspending the processing of the data message until the associated data message encapsulated with the missing serial number is received, and sequentially processing the associated data message corresponding to the missing serial number and the data message according to the processing sequence.

10. A system for processing voice stream data, comprising a client and a server, wherein the client comprises the apparatus for processing voice stream data according to claim 8, and the server comprises the apparatus for processing voice stream data according to claim 9.

11. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the processing method of the voice stream data according to any one of claims 1 to 6 or the operation corresponding to the processing method of the voice stream data according to claim 7.

12. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform an operation corresponding to the method of processing of voice stream data as claimed in any one of claims 1 to 6 or to perform an operation corresponding to the method of processing of voice stream data as claimed in claim 7.

Technical Field

The invention relates to the technical field of voice processing, in particular to a method, a device and a system for processing voice stream data.

Background

In the prior art, the real-time simultaneous transmission function is usually performed in a stable network environment, and the situation of voice stream packet loss caused by a network problem basically does not occur. When the network fluctuates, the piled-up voice data is transmitted to the server in a serial form.

However, under complex network conditions, for example, when a user uses simultaneous transmission, a network environment fluctuates, and at this time, a situation of packet loss of a voice data stream occurs, so that a server cannot correctly perform text transcription on a voice, and situations such as transcription failure, sentence break error and the like occur.

Disclosure of Invention

In view of the above problems, embodiments of the present invention are proposed to provide a method, an apparatus and a system for processing voice stream data that overcome or at least partially solve the above problems.

According to an aspect of the embodiments of the present invention, there is provided a method for processing voice stream data, applied to a server, the method including:

receiving a data message, wherein the data message is packaged with voice stream data and a serial number representing a processing sequence between the data message and a data message related to the data message;

if the following judgment is made according to the processing sequence and the sequence number in the data message: the data messages are processed after the associated data messages corresponding to the missing serial numbers are arranged when the messages are sequentially processed according to the processing sequence; then, storing the data message;

and suspending the processing of the data message until the associated data message encapsulated with the missing serial number is received, and sequentially processing the associated data message corresponding to the missing serial number and the data message according to the processing sequence.

According to another aspect of the embodiments of the present invention, there is provided a method for processing voice stream data, which is applied to a client, the method including:

and sending a data message to a server, wherein the data message is encapsulated with voice stream data, and a sequence number representing a processing sequence between the data message and a data message related to the data message, so that the server processes the data message according to the processing sequence and the sequence number in the data message.

According to another aspect of the embodiments of the present invention, there is provided a processing apparatus for voice stream data, including:

the receiving and sending module is used for receiving a data message, wherein the data message is packaged with voice stream data and a serial number representing a processing sequence between the data message and a data message related to the data message;

a processing module, configured to determine, if the data packet is a packet including the sequence number of the data packet and the processing sequence: the data messages are processed after the associated data messages corresponding to the missing serial numbers are arranged when the messages are sequentially processed according to the processing sequence; then, storing the data message; and suspending the processing of the data message until the associated data message encapsulated with the missing serial number is received, and sequentially processing the associated data message corresponding to the missing serial number and the data message according to the processing sequence.

According to another aspect of the embodiments of the present invention, there is provided a processing apparatus for voice stream data, including:

a transceiver module, configured to send a data packet to a server, where the data packet is encapsulated with voice stream data and a sequence number representing a processing sequence between the data packet and a data packet associated with the data packet, so that the server performs the following operations:

if the following judgment is made according to the processing sequence and the sequence number in the data message: the data messages are processed after the associated data messages corresponding to the missing serial numbers are arranged when the messages are sequentially processed according to the processing sequence; then, storing the data message; and suspending the processing of the data message until the associated data message encapsulated with the missing serial number is received, and sequentially processing the associated data message corresponding to the missing serial number and the data message according to the processing sequence.

According to a further aspect of the embodiments of the present invention, there is provided a system for processing voice stream data, including a client and a server, where the client includes the processing apparatus for voice stream data on the client side as described above, and the server includes the processing apparatus for voice stream data on the server side as described above.

According to still another aspect of an embodiment of the present invention, there is provided a computing device including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the voice stream data processing method.

According to a further aspect of the embodiments of the present invention, there is provided a computer storage medium, in which at least one executable instruction is stored, and the executable instruction causes a processor to execute operations corresponding to the processing method of voice stream data as described above.

According to the scheme provided by the embodiment of the invention, the processing method of the voice stream data at the server side can be realized by receiving a data message, wherein the data message is encapsulated with the voice stream data and a serial number representing the processing sequence between the data message and the associated data message of the data message; if the following judgment is made according to the processing sequence and the sequence number in the data message: the data messages are processed after the associated data messages corresponding to the missing serial numbers are arranged when the messages are sequentially processed according to the processing sequence; then, storing the data message; suspending the processing of the data messages until the associated data messages encapsulated with the missing serial numbers are received, and sequentially processing the associated data messages corresponding to the missing serial numbers and the data messages according to the processing sequence; therefore, the problem of errors when the simultaneous transmission is used in the complex network environment is solved, the situation that the errors occur when the user uses the simultaneous transmission in the complex network environment is reduced, and under the condition that network fluctuation occurs, the data which is not received by the server can be processed without influencing the real-time voice stream processing.

The foregoing description is only an overview of the technical solutions of the embodiments of the present invention, and the embodiments of the present invention can be implemented according to the content of the description in order to make the technical means of the embodiments of the present invention more clearly understood, and the detailed description of the embodiments of the present invention is provided below in order to make the foregoing and other objects, features, and advantages of the embodiments of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the embodiments of the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 is a flow chart illustrating a processing method of voice stream data according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating a processing method of voice stream data according to another embodiment of the present invention;

fig. 3 is a flowchart illustrating a processing method of voice stream data according to another embodiment of the present invention;

fig. 4 is a schematic flowchart illustrating a text conversion process performed on a data packet by a server in a processing method of voice stream data according to an embodiment of the present invention;

fig. 5 is a schematic diagram illustrating a situation of a data packet received by a server in a method for processing voice stream data according to an embodiment of the present invention;

fig. 6 is a schematic diagram illustrating an example of a specific implementation of a processing method of voice stream data according to an embodiment of the present invention;

fig. 7 is a schematic block diagram of a processing apparatus for processing voice stream data according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a computing device provided in an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

Fig. 1 shows a flowchart of a processing method of voice stream data according to an embodiment of the present invention. As shown in fig. 1, the method is applied to a client, and includes the following steps:

step 11, sending a data packet to a server, where the data packet is encapsulated with voice stream data and a sequence number representing a processing sequence between the data packet and a data packet associated with the data packet, so that the server performs the following operations:

if the server judges that: the data messages are processed after the associated data messages corresponding to the missing serial numbers are arranged when the messages are sequentially processed according to the processing sequence; then, storing the data message; and suspending the processing of the data message until the associated data message encapsulated with the missing serial number is received, and sequentially processing the associated data message corresponding to the missing serial number and the data message according to the processing sequence.

In this embodiment, the processing method of voice stream data is applied to a client, the client sends a data packet encapsulating voice stream data to a server, the server processes the data packet according to a sequence number and a processing sequence in the data packet, and then performs semantic analysis processing, that is, text transcription processing in real time, to obtain a semantic analysis result, wherein the semantic analysis result includes a sequence number of the data packet corresponding to the voice stream data, and the sequence number is used for indicating a sending sequence of the data packet; finally, the semantic analysis result is sent to the client, so that the client can receive the semantic analysis result and update the co-transmitted result in real time according to the serial number; the data message is preferably a data message of a websocket protocol, and the websocket protocol is a full-duplex communication protocol between the client and the server; the sequence field contained in the data packet may preferably be a seq field, and the preset field of the sequence field may preferably have a length of 32 bits and is used for recording a sequence number indicating the transmission sequence of the data packet; of course, the data packet may also be 16 bits, 64 bits, or 128 bits, and may be determined according to the length of the data packet.

In an optional embodiment of the present invention, the method for processing voice stream data applied to the client may further include:

step 12, when a response message that the data message is successfully received and fed back by the server is not received, storing the data message into a message retransmission queue;

and step 13, retransmitting the data message in the message retransmission queue to the server.

In this embodiment, after the client sends the data packet to the server, the server waits for a response message corresponding to the data packet to be returned to the client, so as to indicate that the server successfully receives the data packet; when the client does not receive a response message of successful data message reception fed back by the server, the client stores the data message into a message retransmission queue; the message retransmission queue is used for storing data messages which are failed to be sent to the server by the client, when the client does not receive response messages corresponding to the data messages sent to the server, the phenomenon of packet loss caused by problems of a transmission network and the like is shown, and the client stores the data messages into the message retransmission queue;

when the network transmission of the client side is recovered to be normal, the client side can send the data message in the message retransmission queue to the server again;

it should be noted that the response message may be preferably an ACK message (Acknowledgement character), i.e. a transmission class control character, for indicating that the sent data message is acknowledged and received without error.

In an optional embodiment of the present invention, the method for processing voice stream data applied to the client may further include:

and 14, receiving a semantic analysis result which is fed back by the server and is used for performing semantic analysis processing on the data message by the server.

In this embodiment, when the client successfully sends the data packet to the server, the client waits for the server to process the data packet that has been successfully sent to the server, and after the server completes processing the data packet, the semantic analysis result is fed back to the client, and the client can update in real time according to the fed-back semantic analysis result, wherein the semantic analysis result preferably includes a sequence number used for representing the sending sequence of the data packet; it should be noted that the semantic analysis result may preferably be a data packet of a websocket protocol.

Fig. 2 is a flowchart illustrating a specific embodiment of a method for processing voice stream data according to an embodiment of the present invention. As shown in figure 2 of the drawings, in which,

in a specific embodiment, after a user voice stream of a client is split, a data packet is obtained, where sequence fields of the data packet are Seq ═ 1, Seq ═ 2, … …, and Seq ═ n, respectively, and where 1, 2, … …, and n are sequence numbers of the transmission sequence of the data packet; the client side sequentially sends the data messages to the server, and after the data messages are successfully sent, the server returns a response message ACK message corresponding to each data message to the client side so as to indicate that the server successfully receives the data messages; when the client does not receive the ACK message with the sequence field Seq ═ 5 and the ACK message with the sequence field Seq ═ 1 in a certain time, the client stores the data message into an FIFO (first in first out) queue, the FIFO queue is a message retransmission queue, and when the network transmission of the client is recovered to be normal, the client retransmits the data message in the FIFO queue to the server.

In the embodiment of the present invention, a data packet is sent to a server, where the data packet is encapsulated with voice stream data, and a sequence number representing a processing sequence between the data packet and a data packet associated with the data packet, so that the server executes the following operations: if the server judges that: the data messages are processed after the associated data messages corresponding to the missing serial numbers are arranged when the messages are sequentially processed according to the processing sequence; then, storing the data message; suspending the processing of the data messages until the associated data messages encapsulated with the missing serial numbers are received, and sequentially processing the associated data messages corresponding to the missing serial numbers and the data messages according to the processing sequence; the problem that the packet loss is easy to occur when the user uses the simultaneous transmission in the complex network environment is solved, the delay basically does not exist when the voice stream data message which is failed to be sent due to network reasons is processed, and the simultaneous transmission function can be normally performed under the complex network environment condition to a greater extent.

Fig. 3 is a flowchart illustrating a processing method of voice stream data according to another embodiment of the present invention. As shown in fig. 3, the method is applied to a server, and includes the following steps:

step 31, receiving a data message, wherein the data message is packaged with voice stream data and a sequence number representing a processing sequence between the data message and a data message related to the data message;

step 32, if the following steps are determined according to the processing sequence and the sequence number in the data message: the data messages are processed after the associated data messages corresponding to the missing serial numbers are arranged when the messages are sequentially processed according to the processing sequence; then, storing the data message;

and step 33, suspending the processing of the data messages until the associated data messages encapsulated with the missing serial numbers are received, and sequentially processing the associated data messages corresponding to the missing serial numbers and the data messages according to the processing sequence.

In this embodiment, the method for processing voice stream data is applied to a server, the server receives a data message sent by a client, the data message is obtained by converting the voice stream data of the client into a binary byte code and encapsulating the binary byte code, and the server processes the data message and a data message received before an associated data message of the data message according to a sequence number representing a processing sequence between the data message and the associated data message and a sequence number in the data message, so as to obtain a processing result; and finally, sending the processing result to the client, so that the client can update the result of the simultaneous transmission in real time according to the sequence number. The data message comprises a sequence field for recording a sequence number representing the sending sequence of the data message, wherein the sequence field is a field with preset field bit positions;

here, the data packet may be a data packet of a websocket protocol, where the websocket protocol is a communication protocol for performing full duplex between a client and a server; the sequence field contained in the data message is a field with preset field bit length, preferably seq field, and the preset field of the sequence field preferably has 32 bit length and is used for recording sequence number representing the sending sequence of the data message; of course, the data packet may also be 16 bits, 64 bits, or 128 bits, and may be determined according to the length of the data packet.

In an optional embodiment of the present invention, the processing includes:

the simultaneous interpretation is performed based on the voice stream data.

In this embodiment, when the packet processing is performed according to the processing sequence and the sequence number in the data packet, the simultaneous interpretation of the data packet is performed based on the voice stream data.

In an optional embodiment of the present invention, when step 32 is implemented, if the sequence number encapsulated in the data packet is not consecutive to the sequence number encapsulated in the data packet of the first branch link tail node of the target link table, the method may include:

step 321, if the sequence number encapsulated in the data packet is not consecutive to the sequence number encapsulated in the data packet of the first branch link tail node of the target link table, then it is determined that: the data messages are processed after the associated data messages corresponding to the missing serial numbers are arranged when the messages are sequentially processed according to the processing sequence; the first branched chain is used for sequentially storing messages to be processed according to the processing sequence;

in an optional embodiment of the present invention, when step 32 is implemented, if the sequence number encapsulated in the data packet is consecutive to the sequence number encapsulated in the data packet of the first branch link tail node of the target link table, the method may include:

step 322, storing the data packet in a second branch chain of the target linked list; and the second branched chain is used for sequentially storing the messages to be merged into the first branched chain according to the processing sequence.

In this embodiment, a server adds a data packet sent by a client to a target linked list, and when a sequence number encapsulated in a data packet of a first linked tail node of the target linked list is not continuous with a sequence number of a data packet to be added, determines the data packet as a packet to be processed after associated data packets corresponding to missing sequence numbers are arranged when packet processing is performed in sequence according to a processing sequence; when the sequence number encapsulated in the data message of the first branch chain tail node of the target chain table is continuous with the sequence number of the data message to be added, storing the data message to be added to a second branch chain in the target chain table;

it should be noted that the first branch chain is used for sequentially storing the data packets to be processed according to the processing sequence; the second branch chain is used for sequentially storing messages to be merged into the first branch chain according to the processing sequence, namely the messages stored in the second branch chain, and then the messages are merged into the first branch chain according to the serial number of the data message. In an alternative embodiment of the present invention, step 33 includes:

step 331, until receiving the associated data packet encapsulated with the missing sequence number, storing the associated data packet as a first branch chain tail node to the first branch chain tail node, and merging the data packet to the first branch chain when the missing sequence number is continuous with the sequence number encapsulated by the data packet;

step 332, sequentially processing the associated data packets corresponding to the missing sequence numbers in the first branch chain and the data packets according to the processing sequence.

In this embodiment, when the first branch chain does not receive the associated data packet encapsulated with the missing sequence number, the processing on the data packet is suspended until the associated data packet encapsulated with the missing sequence number is received, then the received associated data packet encapsulated with the missing sequence number is stored into the first branch chain at last and last by the tail node of the first branch chain, and according to the processing sequence, the associated data packet and the data packet corresponding to the missing sequence number in the first branch chain are sequentially processed.

In an optional embodiment of the present invention, the method further comprises:

step 34, performing semantic analysis on the associated data message corresponding to the missing serial number and the processed data message to obtain a semantic analysis result;

and step 35, sending the semantic analysis result to a client.

In this embodiment, semantic analysis is performed on the results of processing the associated data messages and the data messages corresponding to the missing sequence numbers in the first branch chain in sequence according to the processing sequence in step 332, where the semantic analysis is used to analyze whether the data messages on the first branch chain can be analyzed as a statement with clear semantics, and if yes, the result of the semantic analysis is that the data messages can be sent to the client, and if not, the result of the semantic analysis is that the data messages cannot be sent to the client.

Fig. 4 is a schematic flow chart illustrating a process of text conversion processing performed on a data packet by a server in the method for processing voice stream data according to the embodiment of the present invention. As shown in figure 4 of the drawings,

the server judges according to the sequence number and the processing sequence in the sequence field of the received data message;

the server maintains a target linked list, and when the server receives the data message, if the linked list is empty, the data message is added into the linked list as a head node of a new branched chain and is sent to the server for semantic analysis and text conversion processing;

if the linked list is not empty, determining whether the data message is added into a branched chain of the linked list as a new node according to the serial number and the processing sequence of the data message received by the server, and adding the data message into a first branched chain of the linked list as a new node when the data message is real-time voice stream data (the data message which is successfully received at present); when the serial number of the current data message is non-real-time voice stream data (namely, the data message in the retransmission queue), adding the data message into the second branched chain of the linked list as a new node; then judging whether any branched chain in the linked list added by the data message can be merged with other branched chains or not, if so, merging the branched chains and creating a new thread, and performing semantic analysis and text conversion processing on all nodes on any branched chain of the linked list by using a data message corresponding to a node on the branched chain to obtain a semantic analysis result; if not, the data message is used as a head node of a new branched chain and added into the linked list, and all nodes on any branched chain of the linked list are subjected to text conversion processing to obtain a semantic analysis result.

After semantic analysis is carried out on all nodes in a linked list of the server, a semantic analysis result can be fed back in real time, wherein the analysis result comprises type identification characters and is used for indicating whether the content of the semantic analysis result is a final semantic analysis result or not; if the type identifier character is 0, continuing to perform the process of the step 32 on the received data message;

if the type identification character is 1, the voice stream data is completely transcribed, a head node of a branched chain where the voice stream data is located is disconnected from a linked list, a text obtained from a semantic analysis result of the server is translated to obtain a translated text, the semantic analysis result and the translated text are fed back to the client as a processing result, wherein the processing result comprises a serial number of a data message corresponding to the voice stream data, and the serial number is used for representing the sending sequence of the data message. And the client updates the synchronous transmission result in real time according to the serial number of the data message.

Fig. 5 is a schematic diagram illustrating a situation that a server receives a data packet in a processing method of voice stream data according to an embodiment of the present invention; and the client sends a session, the server receives the data message of the session, and determines that the data messages from the No. 3 to the No. n-2 are not successfully received according to the serial number of the data message. The server can continue to receive the data messages from the 3 rd to the n-2 th sent by the client in the subsequent time, and performs semantic analysis processing and text transfer processing based on the serial numbers of the data messages.

Fig. 6 is a schematic diagram illustrating an example of a specific implementation of a method for processing voice stream data according to an embodiment of the present invention. As shown in figure 6 of the drawings,

in a specific embodiment, the user at the client side translates the two words of "weather is good today and we want to go to spring tour" in a simultaneous transmission mode. Suppose that the voice stream of each Chinese character needs one data message to be sent (when actually processed, one Chinese character can be sent by a plurality of data messages); at this time, several cases are considered as follows:

case 1: when a user just begins to say 'today' or 'me', the linked list of the server is empty, and when the server receives a data message corresponding to the voice stream data of the 'today' or 'me' characters, the data message is placed into the linked list as a head node of a new branched chain, and the data message is directly sent to a real-time transfer service in the server for semantic analysis processing and text conversion processing.

Case 2: the data message corresponding to the weather in the voice stream spoken by the user is not successfully sent to the server due to network reasons, and the network is recovered when the user speaks 'good'. At this time, the linked list of the server already has data messages corresponding to the voice stream data of the characters of "today" and "Tian", and when the network is recovered, the server almost simultaneously receives the data messages of "Tian", "Qigong", "Wei", "good", and "I". At the moment, the 'good' data message is added into the chain table as the head node of a new branched chain, the rest data messages are sequentially added into the two branched chains according to the sequence, and until the 'very' data message is received, the server judges that the tail node and the head node of the adjacent chain table are adjacent, the two branched chains are combined, then the second branched chain is combined into the first branched chain, and text conversion processing is carried out. At this time, the transfer to "good" is finished for a while, and then the "good today's weather" branch chain is deleted from the linked list.

According to the embodiment of the invention, by receiving a data message, the data message is encapsulated with voice stream data and a serial number representing a processing sequence between the data message and a related data message of the data message; if the following judgment is made according to the processing sequence and the sequence number in the data message: the data messages are processed after the associated data messages corresponding to the missing serial numbers are arranged when the messages are sequentially processed according to the processing sequence; then, storing the data message; suspending the processing of the data message until the associated data message encapsulated with the missing serial number is received, and then sequentially processing the associated data message corresponding to the missing serial number and the data message according to the processing sequence, thereby realizing the reduction of errors when using synchronous transmission in a complex network environment, and processing the data which is not received by the server under the condition of network fluctuation without influencing the real-time voice stream processing; basically, no delay exists when the voice stream which is failed to be sent due to network reasons is processed, and the simultaneous transmission function can be normally carried out under the condition of a complex network environment to a greater extent.

Fig. 7 is a schematic structural diagram illustrating a processing apparatus for processing voice stream data according to an embodiment of the present invention.

As shown in fig. 7, the apparatus 70 includes:

a transceiver module 71, configured to receive a data packet, where the data packet is encapsulated with voice stream data and a sequence number representing a processing sequence between the data packet and a data packet associated with the data packet;

a processing module 72, configured to determine, according to the processing order and the sequence number in the data packet: the data messages are processed after the associated data messages corresponding to the missing serial numbers are arranged when the messages are sequentially processed according to the processing sequence; then, storing the data message; and suspending the processing of the data message until the associated data message encapsulated with the missing serial number is received, and sequentially processing the associated data message corresponding to the missing serial number and the data message according to the processing sequence.

Optionally, the processing includes:

the simultaneous interpretation is performed based on the voice stream data.

Optionally, the processing sequence includes processing in sequence according to consecutive sequence numbers;

then, according to the processing sequence and the sequence number in the data message, determining: the data message is a message which is arranged after the associated data message corresponding to the missing sequence number when the message processing is performed in sequence according to the processing sequence, and the method comprises the following steps:

if the sequence number encapsulated in the data message is not continuous with the sequence number encapsulated in the data message of the first branch chain tail node of the target chain table, judging that: the data messages are processed after the associated data messages corresponding to the missing serial numbers are arranged when the messages are sequentially processed according to the processing sequence;

and the first branched chain is used for sequentially storing the messages to be processed according to the processing sequence.

Optionally, storing the data packet includes:

storing the data message in a second branched chain of the target linked list; and the second branched chain is used for sequentially storing the messages to be merged into the first branched chain according to the processing sequence.

Optionally, after receiving the associated data packet encapsulated with the missing sequence number, sequentially processing the associated data packet corresponding to the missing sequence number according to the processing sequence, and the data packet, including:

merging the data message to the first branch chain until receiving the associated data message encapsulated with the missing serial number, serving as a first branch chain tail node and storing the associated data message to the first branch chain tail node, wherein the missing serial number is continuous with the serial number encapsulated by the data message;

and sequentially processing the associated data messages corresponding to the missing serial numbers in the first branch chain and the data messages according to the processing sequence.

Optionally, the processing method of the voice stream data further includes:

performing semantic analysis on the associated data message corresponding to the missing serial number and the processed data message to obtain a semantic analysis result;

and sending the semantic analysis result to a client.

It should be noted that the apparatus is an apparatus corresponding to the above-described method for processing voice stream data applied to the server, and all the implementations in the above-described method embodiment are applicable to the embodiment of the apparatus, and the same technical effects can be achieved.

The embodiment of the invention provides a processing device of voice stream data, which is applied to a client and comprises:

a transceiver module, configured to send a data packet to a server, where the data packet is encapsulated with voice stream data and a sequence number representing a processing sequence between the data packet and a data packet associated with the data packet, so that the server performs the following operations:

if the following judgment is made according to the processing sequence and the sequence number in the data message: the data messages are processed after the associated data messages corresponding to the missing serial numbers are arranged when the messages are sequentially processed according to the processing sequence; then, storing the data message; and suspending the processing of the data message until the associated data message encapsulated with the missing serial number is received, and sequentially processing the associated data message corresponding to the missing serial number and the data message according to the processing sequence.

Optionally, the processing includes:

the simultaneous interpretation is performed based on the voice stream data.

Optionally, the processing sequence includes processing in sequence according to consecutive sequence numbers;

then, according to the processing sequence and the sequence number in the data message, determining: the data message is a message which is arranged after the associated data message corresponding to the missing sequence number when the message processing is performed in sequence according to the processing sequence, and the method comprises the following steps:

if the sequence number encapsulated in the data message is not continuous with the sequence number encapsulated in the data message of the first branch chain tail node of the target chain table, judging that: the data messages are processed after the associated data messages corresponding to the missing serial numbers are arranged when the messages are sequentially processed according to the processing sequence;

and the first branched chain is used for sequentially storing the messages to be processed according to the processing sequence.

Optionally, storing the data packet includes:

storing the data message in a second branched chain of the target linked list; and the second branched chain is used for sequentially storing the messages to be merged into the first branched chain according to the processing sequence.

Optionally, after receiving the associated data packet encapsulated with the missing sequence number, sequentially processing the associated data packet corresponding to the missing sequence number according to the processing sequence, and the data packet, including:

merging the data message to the first branch chain until receiving the associated data message encapsulated with the missing serial number, serving as a first branch chain tail node and storing the associated data message to the first branch chain tail node, wherein the missing serial number is continuous with the serial number encapsulated by the data message;

and sequentially processing the associated data messages corresponding to the missing serial numbers in the first branch chain and the data messages according to the processing sequence.

Optionally, the processing method of the voice stream data further includes:

performing semantic analysis on the associated data message corresponding to the missing serial number and the processed data message to obtain a semantic analysis result;

and sending the semantic analysis result to a client.

It should be noted that the apparatus is an apparatus corresponding to the above-described method for processing voice stream data applied to the server, and all the implementations in the above-described method embodiment are applicable to the embodiment of the apparatus, and the same technical effects can be achieved.

The embodiment of the invention provides a system for processing voice stream data, which comprises a client and a server, wherein the client comprises the processing device of the voice stream data, and the server comprises the processing device of the voice stream data. All the implementation manners of the processing method of the voice stream data at the client side and the processing method of the voice stream data at the server side are applicable to the embodiment of the system, and the same technical effect can be achieved.

An embodiment of the present invention provides a non-volatile computer storage medium, where the computer storage medium stores at least one executable instruction, and the computer executable instruction may execute the processing method of the voice stream data in any method embodiment described above.

Fig. 8 is a schematic structural diagram of a computing device according to an embodiment of the present invention, and a specific embodiment of the present invention does not limit a specific implementation of the computing device.

As shown in fig. 8, the computing device may include: a processor (processor), a Communications Interface (Communications Interface), a memory (memory), and a Communications bus.

Wherein: the processor, the communication interface, and the memory communicate with each other via a communication bus. A communication interface for communicating with network elements of other devices, such as clients or other servers. And the processor is used for executing the program, and particularly can execute the relevant steps in the embodiment of the processing method of the voice stream data for the computing equipment.

In particular, the program may include program code comprising computer operating instructions.

The processor may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present invention. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

And the memory is used for storing programs. The memory may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program may specifically be configured to cause the processor to execute the processing method of the voice stream data in any of the above-described method embodiments. For specific implementation of each step in the program, reference may be made to corresponding steps and corresponding descriptions in units in the foregoing processing embodiment of the voice stream data, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of embodiments of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best modes of embodiments of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that is, the claimed embodiments of the invention require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components according to embodiments of the present invention. Embodiments of the invention may also be implemented as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing embodiments of the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. Embodiments of the invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specified otherwise.

20页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:语音识别模型的训练方法、装置、设备以及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!