Audio and video transmission control method based on artificial intelligence AI and related equipment

文档序号:1077194 发布日期:2020-10-16 浏览:24次 中文

阅读说明:本技术 基于人工智能ai的音视频的传输控制方法及相关设备 (Audio and video transmission control method based on artificial intelligence AI and related equipment ) 是由 余强 于 2020-06-23 设计创作,主要内容包括:本发明涉及人工智能技术领域,提供一种基于人工智能AI的音视频的传输控制方法,包括:向虚拟形象平台发送建立请求,并与虚拟形象平台建立超文本传输协议HTTP连接;向所述虚拟形象平台发送文本话术,以使所述虚拟形象平台根据所述文本话术以及真人虚拟形象合成算法,生成实时音视频流;通过实时流传输协议RTSP,从所述虚拟形象平台拉取所述实时音视频流,获得RTSP流;将所述RTSP流以传输控制协议TCP方式传输至对外服务器,以通过所述对外服务器将所述RTSP流传输至用户终端中。本发明还涉及区块链技术,可以将所述RTSP流上传至区块链。本发明可应用于智慧政务/智慧社区场景中,从而推动智慧城市的建设。(The invention relates to the technical field of artificial intelligence, and provides an audio and video transmission control method based on artificial intelligence AI, which comprises the following steps: sending a building request to the virtual image platform, and building hypertext transfer protocol (HTTP) connection with the virtual image platform; sending a text dialect to the virtual image platform so that the virtual image platform generates real-time audio and video stream according to the text dialect and a real person virtual image synthesis algorithm; through a real-time streaming protocol RTSP, pulling the real-time audio and video stream from the virtual image platform to obtain an RTSP stream; and transmitting the RTSP stream to an external server in a Transmission Control Protocol (TCP) mode so as to transmit the RTSP stream to a user terminal through the external server. The invention also relates to a block chain technology, and the RTSP stream can be uploaded to the block chain. The method can be applied to a smart government affair/smart community scene, so that the construction of a smart city is promoted.)

1. An audio and video transmission control method based on artificial intelligence AI is applied to a control server and is characterized in that the method comprises the following steps:

sending a building request to the virtual image platform, and building hypertext transfer protocol (HTTP) connection with the virtual image platform;

sending a text dialect to the virtual image platform so that the virtual image platform generates real-time audio and video stream according to the text dialect and a real person virtual image synthesis algorithm;

through a real-time streaming protocol RTSP, pulling the real-time audio and video stream from the virtual image platform to obtain an RTSP stream;

and transmitting the RTSP stream to an external server in a Transmission Control Protocol (TCP) mode so as to transmit the RTSP stream to a user terminal through the external server.

2. The transmission control method of audio/video based on artificial intelligence AI according to claim 1, characterized in that the method further comprises:

receiving a Uniform Resource Locator (URL) address of the RTSP returned by the virtual image platform;

the obtaining of the RTSP stream by pulling the real-time audio/video stream from the virtual image platform through a real-time streaming protocol RTSP comprises:

and pulling the real-time audio and video stream from the virtual image platform through a real-time streaming protocol (RTSP) according to the URL address to obtain an RTSP stream.

3. The transmission control method of audio/video based on artificial intelligence AI according to claim 1, characterized in that the method further comprises:

when the audio and video stream on the user terminal is detected to be closed, a DELETE request of HTTP is sent to the virtual image platform, the DELETE request carries an audio and video stream identification streamID, and the DELETE request is used for requesting the virtual image platform to stop playing the audio and video stream aiming at the streamID.

4. The transmission control method of audio/video based on artificial intelligence AI according to claim 1, characterized in that the method further comprises:

detecting whether the text conversation has an end keyword;

if the text dialect has an ending key word, when an audio and video stream of the last frame of the text dialect sent by the virtual image platform is received, sending a DELETE request of HTTP to the virtual image platform, wherein the DELETE request carries an audio and video stream identification streamID, and the DELETE request is used for requesting the virtual image platform to stop playing the audio and video stream aiming at the streamID.

5. An audio and video transmission control method based on artificial intelligence AI is applied to an avatar platform, and is characterized in that the method comprises the following steps:

receiving a building request sent by a control server, and building a hypertext transfer protocol (HTTP) connection with the control server;

receiving the text dialogues sent by the control server;

generating real-time audio and video stream according to the text dialect and the real person virtual image synthesis algorithm;

and converting the real-time audio and video stream into an RTSP stream in a real-time streaming protocol (RTSP) mode, and sending the RTSP stream to the control server so as to play the RTSP stream.

6. The transmission control method of audio/video based on artificial intelligence AI according to claim 5, characterized in that the method further comprises:

receiving audio and video stream setting information sent by the control server, wherein the audio and video stream setting information comprises an audio and video stream identification streamID, an audio and video stream format, an audio and video stream size and a pixel size;

the generating of the real-time audio and video stream according to the text dialect and the real person virtual image synthesis algorithm comprises the following steps:

and generating real-time audio and video stream according to the audio and video stream format, the audio and video stream size and the pixel size according to the text dialect and the real person virtual image synthesis algorithm, wherein the streamID is adopted for the real-time audio and video stream.

7. The transmission control method of audio/video based on artificial intelligence AI according to claim 5, characterized in that the method further comprises:

calculating text response duration according to the receiving time of the text dialogues and the generation time of the real-time audio and video stream;

acquiring standard log duration;

judging whether the text response duration is greater than the log standard duration or not;

and if the text response duration is longer than the log standard duration, capturing a real-time transport protocol (RTP) packet of the real-time audio and video stream to acquire an audio and video file of the RTP packet, and analyzing the audio and video file.

8. A control server, characterized in that the control server comprises a processor and a memory, the processor is used for executing a computer program stored in the memory to realize the transmission control method of the artificial intelligence AI-based video and audio according to any one of the claims 1 to 4.

9. An avatar platform characterized in that it comprises a processor and a memory, said processor being adapted to execute a computer program stored in the memory to implement the transmission control method of artificial intelligence AI based audio-video according to any of claims 5 to 7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores at least one instruction, which when executed by a processor, implements the transmission control method of artificial intelligence AI based audio-video according to any one of claims 1 to 4 or 5 to 7.

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an audio and video transmission control method based on Artificial Intelligence (AI) and related equipment.

Background

With the rise of Artificial Intelligence (AI) technology, AI is more and more widely applied, wherein man-machine conversation scenes based on AI are more and more popular. In the current human-computer conversation scenario, the MP4 file is usually generated in advance from the text, and then the MP4 file is played. However, this method cannot meet the requirement of real-time audio-video interaction in man-machine conversation.

Therefore, how to control the transmission of the audio and video to meet the requirement of real-time interaction is a technical problem to be solved urgently.

Disclosure of Invention

In view of the above, it is necessary to provide an audio/video transmission control method based on artificial intelligence AI and related devices, which can implement audio/video stream interfacing between a control server and an avatar platform in HTTP + RTSP streaming manner, and can meet the real-time interaction requirement of a user in a man-machine interaction scenario.

The invention provides an audio and video transmission control method based on artificial intelligence AI, which is applied to a control server and comprises the following steps:

sending a building request to the virtual image platform, and building hypertext transfer protocol (HTTP) connection with the virtual image platform;

sending a text dialect to the virtual image platform so that the virtual image platform generates real-time audio and video stream according to the text dialect and a real person virtual image synthesis algorithm;

through a real-time streaming protocol RTSP, pulling the real-time audio and video stream from the virtual image platform to obtain an RTSP stream;

and transmitting the RTSP stream to an external server in a Transmission Control Protocol (TCP) mode so as to transmit the RTSP stream to a user terminal through the external server.

In one possible implementation, the method further includes:

receiving a Uniform Resource Locator (URL) address of the RTSP returned by the virtual image platform;

the obtaining of the RTSP stream by pulling the real-time audio/video stream from the virtual image platform through a real-time streaming protocol RTSP comprises:

and pulling the real-time audio and video stream from the virtual image platform through a real-time streaming protocol (RTSP) according to the URL address to obtain an RTSP stream.

In one possible implementation, the method further includes:

when the audio and video stream on the user terminal is detected to be closed, a DELETE request of HTTP is sent to the virtual image platform, the DELETE request carries an audio and video stream identification streamID, and the DELETE request is used for requesting the virtual image platform to stop playing the audio and video stream aiming at the streamID.

In one possible implementation, the method further includes:

detecting whether the text conversation has an end keyword;

if the text dialect has an ending key word, when an audio and video stream of the last frame of the text dialect sent by the virtual image platform is received, sending a DELETE request of HTTP to the virtual image platform, wherein the DELETE request carries an audio and video stream identification streamID, and the DELETE request is used for requesting the virtual image platform to stop playing the audio and video stream aiming at the streamID.

The second aspect of the present invention provides an audio/video transmission control method based on artificial intelligence AI, which is applied to an avatar platform, and the method includes:

receiving a building request sent by a control server, and building a hypertext transfer protocol (HTTP) connection with the control server;

receiving the text dialogues sent by the control server;

generating real-time audio and video stream according to the text dialect and the real person virtual image synthesis algorithm;

and converting the real-time audio and video stream into an RTSP stream in a real-time streaming protocol (RTSP) mode, and sending the RTSP stream to the control server so as to play the RTSP stream.

In one possible implementation, the method further includes:

receiving audio and video stream setting information sent by the control server, wherein the audio and video stream setting information comprises an audio and video stream identification streamID, an audio and video stream format, an audio and video stream size and a pixel size;

the generating of the real-time audio and video stream according to the text dialect and the real person virtual image synthesis algorithm comprises the following steps:

and generating real-time audio and video stream according to the audio and video stream format, the audio and video stream size and the pixel size according to the text dialect and the real person virtual image synthesis algorithm, wherein the streamID is adopted for the real-time audio and video stream.

In one possible implementation, the method further includes:

calculating text response duration according to the receiving time of the text dialogues and the generation time of the real-time audio and video stream;

acquiring standard log duration;

judging whether the text response duration is greater than the log standard duration or not;

and if the text response duration is longer than the log standard duration, capturing a real-time transport protocol (RTP) packet of the real-time audio and video stream to acquire an audio and video file of the RTP packet, and analyzing the audio and video file.

A third aspect of the present invention provides a control server, which includes a processor and a memory, wherein the processor is configured to execute a computer program stored in the memory to implement the artificial intelligence AI-based audio/video transmission control method.

A fourth aspect of the present invention provides an avatar platform, comprising a processor and a memory, wherein the processor is configured to execute a computer program stored in the memory to implement the method for controlling transmission of audio and video based on artificial intelligence AI.

A fifth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the artificial intelligence AI-based audio/video transmission control method.

In the technical scheme, the audio and video stream between the control server and the virtual image platform is butted in an HTTP + RTSP stream mode, and the real-time interaction requirement of a user in a man-machine conversation scene can be met. Meanwhile, after the virtual image platform generates the real-time audio and video stream, the virtual image platform can actively check the response time of the real-time audio and video stream, so that the response time is favorably reduced, the interaction instantaneity is improved, in addition, the virtual image platform plays the last frame of audio and video stream, the MRCP message can be timely and accurately sent to the control server, the control server is favorably used for timely receiving the sound, and the interaction timeliness is ensured.

Drawings

Fig. 1 is a flowchart of a preferred embodiment of an audio/video transmission control method based on artificial intelligence AI disclosed in the present invention.

Fig. 2 is a flowchart of another preferred embodiment of the transmission control method of audio and video based on artificial intelligence AI disclosed in the present invention.

Fig. 3 is a functional block diagram of a transmission control apparatus according to a preferred embodiment of the present disclosure.

Fig. 4 is a functional block diagram of another preferred embodiment of the transmission control apparatus disclosed in the present invention.

Fig. 5 is a schematic structural diagram of a control server according to a preferred embodiment of the present invention, which implements an artificial intelligence AI-based audio/video transmission control method.

Fig. 6 is a schematic structural diagram of an avatar platform according to a preferred embodiment of the present invention for implementing an artificial intelligence AI-based audio/video transmission control method.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "comprises," "comprising," and "having," and any variations thereof, in the description and claims of this application, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.

The control server may refer to a computer system capable of providing services to other devices (such as an avatar platform and a user terminal) in a network.

The avatar platform is a device capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and its hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The user terminal includes, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), or the like.

Referring to fig. 1, fig. 1 is a flowchart illustrating a method for controlling audio/video transmission based on artificial intelligence AI according to a preferred embodiment of the present invention. The audio and video transmission control method based on the artificial intelligence AI is applied to a control server, the sequence of the steps in the flow chart can be changed according to different requirements, and some steps can be omitted.

And S11, sending a building request to the virtual image platform, and building a hypertext transfer protocol (HTTP) connection with the virtual image platform.

When the control server detects that the user calls in from the APP of the user terminal, the control server may send a setup request (e.g., a POST request of HTTP) to the avatar platform and establish HTTP connection with the avatar platform.

The control server is mainly used for processing audio and video streams of an uplink network user side and butting audio and video streams of a downlink network virtual image platform. The scheme focuses on how the control server is in butt joint with the audio and video of the virtual image platform.

The establishing request carries audio and video stream setting information, wherein the audio and video stream setting information comprises an audio and video stream identification streamID, an audio and video stream format, an audio and video stream size and a pixel size.

The video format is as follows:

video coding H.264AVC
AVC resolution 640x480
Frame rate 15per second
I frame spacing 1-2 seconds per I frame
Profile BaseLine Profile
Level 3.1
NAL/SLICE Single SLICE, single NAL per frame

The audio format is as follows:

audio coding PCMA
Sampling rate 8K
Number of channels Single sound channel

And S12, sending text dialogues to the virtual image platform so that the virtual image platform generates real-time audio and video streams according to the text dialogues and a real person virtual image synthesis algorithm.

The control server may obtain the text dialogs from other devices (such as a dialog management platform), where the other devices may recognize different scenes of different users, and the text dialogs of different users in different scenes are different.

The real-time audio and video stream is generated according to the current text technology, and the real-time audio and video of different users in different scenes are different, so that the requirement of real-time interaction can be met.

After the virtual image platform establishes a through video, a text sent by an external associated system is received through an HTTP interactive container of the virtual image platform, the HTTP interactive container sends the broadcasted text to a coding and decoding container, a coder and a decoder generate an RTP packet containing audio and video, the HTTP interactive container receives the RTP packet containing the audio and the video, and sends the RTP packet to an RTSP synthesis container, the RTP packet is coded and decoded according to the requirement of a front-end format, and an RTSP stream, namely a real-time audio and video stream, is generated by packaging.

Specifically, the sending of the text dialect to the avatar platform includes:

sending a text dialog for the streamID to the avatar platform.

The PUT request of HTTP may be sent, and the PUT request carries the streamID and the text conversation. Different streamIDs correspond to different text dialogs, and the audio and video streams of different users can be distinguished.

And S13, pulling the real-time audio and video stream from the virtual image platform through a real-time streaming protocol (RTSP) to obtain an RTSP stream.

Among them, RTSP (Real Time Streaming Protocol) is bidirectional, and when using RTSP, both the client and the server can issue requests. RTSP is a multimedia streaming protocol for controlling audio or video and allows simultaneous control of multiple streaming requests, the network protocol used in transmission is not within the defined range, and the server can choose to use TCP or UDP to transmit streaming content.

The method further comprises the following steps:

receiving a Uniform Resource Locator (URL) address of the RTSP returned by the virtual image platform;

the obtaining of the RTSP stream by pulling the real-time audio/video stream from the virtual image platform through a real-time streaming protocol RTSP comprises:

and pulling the real-time audio and video stream from the virtual image platform through a real-time streaming protocol (RTSP) according to the URL address to obtain an RTSP stream.

When HTTP connection is established with the virtual image platform, the virtual image platform returns a URL address so as to find a corresponding audio/video storage position by following the URL address. Where, URL (Uniform resource locator), i.e. network address, is the Uniform resource locator of WWW.

S14, transmitting the RTSP stream to an external server in a Transmission Control Protocol (TCP) mode, so as to transmit the RTSP stream to the user terminal through the external server.

Among them, the Transmission Control Protocol (TCP) is a connection-oriented, reliable transport layer communication Protocol based on a byte stream.

When the RTSP stream is transmitted to the user terminal, the APP of the user terminal starts playing the RTSP stream, so that real-time interaction of the audio and video streams is realized.

Optionally, the method further includes:

and uploading the RTSP stream to a block chain.

In order to ensure the privacy and security of data, the RTSP stream may be uploaded to a block chain for storage.

Optionally, the method further includes:

when the audio and video stream on the user terminal is detected to be closed, a DELETE request of HTTP is sent to the virtual image platform, the DELETE request carries the streamID, and the DELETE request is used for requesting the virtual image platform to stop playing the audio and video stream aiming at the streamID.

Optionally, the method further includes:

detecting whether the text conversation has an end keyword;

if the text dialect has a finishing keyword, when receiving the audio and video stream of the last frame of the text dialect sent by the virtual image platform, sending a DELETE request of HTTP to the virtual image platform, wherein the DELETE request carries the streamID, and the DELETE request is used for requesting the virtual image platform to stop playing the audio and video stream aiming at the streamID.

In the method flow described in fig. 1, the audio/video stream interfacing between the control server and the avatar platform is realized in the HTTP + RTSP stream mode, and the real-time interaction requirement of the user in the man-machine interaction scene can be satisfied.

Referring to fig. 2, fig. 2 is a flowchart illustrating another method for controlling audio/video transmission based on artificial intelligence AI according to a preferred embodiment of the present disclosure. The audio and video transmission control method based on the artificial intelligence AI is applied to the virtual image platform, the sequence of the steps in the flow chart can be changed according to different requirements, and some steps can be omitted.

And S21, receiving the establishment request sent by the control server, and establishing a hypertext transfer protocol (HTTP) connection with the control server.

And S22, receiving the text dialogs sent by the control server.

And S23, generating real-time audio and video stream according to the text dialect and the real person virtual image synthesis algorithm.

The method further comprises the following steps:

receiving audio and video stream setting information sent by a control server, wherein the audio and video stream setting information comprises an audio and video stream identification streamID, an audio and video stream format, an audio and video stream size and a pixel size;

the generating of the real-time audio and video stream according to the text dialect and the real person virtual image synthesis algorithm comprises the following steps:

and generating real-time audio and video stream according to the text dialect and the real person virtual image synthesis algorithm and the audio and video stream format, the audio and video stream size and the pixel size, wherein the streamID is adopted for the real-time audio and video stream.

The method further comprises the following steps:

calculating text response duration according to the receiving time of the text dialogues and the generation time of the real-time audio and video stream;

acquiring standard log duration;

judging whether the text response duration is greater than the log standard duration or not;

and if the text response duration is longer than the log standard duration, capturing a real-time transport protocol (RTP) packet of the real-time audio and video stream to acquire an audio and video file of the RTP packet, and analyzing the audio and video file.

In the scheme, in the real-time interaction process, the requirement on the speed of the generated stream is high, namely, the first frame of the text needs to be played within hundreds of milliseconds when one text is sent.

If the text response time is longer than the log standard time, it indicates that the avatar platform responds slowly, which may seriously affect the real-time interaction process and the user experience. In order to reduce the text response time, a Real-time transport Protocol (RTP) packet of the Real-time audio/video stream needs to be captured to obtain an audio/video file of the RTP packet, analyze the audio/video file, and take corresponding measures. The technology of capturing the RTP packet and analyzing the audio/video file belongs to the prior art, and is not described herein again.

In the scheme, the virtual image platform also checks the text response time of the audio and video stream after the audio and video stream is generated, so that problems can be found timely, the response time is reduced, and the real-time performance of text response is improved.

And S24, converting the real-time audio and video stream into an RTSP stream in a real-time streaming protocol RTSP mode, and sending the RTSP stream to a control server to play the RTSP stream.

The method further comprises the following steps:

and when the last frame of audio and video stream of the text conversation is broadcasted, sending a Media Resource Control Protocol (MRCP) notification message to the control server, wherein the MRCP notification message is used for indicating the end of the broadcasting.

When the virtual image platform finishes playing the last frame of audio and video stream of the text operation, a media resource control protocol MRCP notification message is sent to the control server, so that the control server can timely receive the sound, the accurate control of the sound reception is realized, and the problems of sound leakage and sound loss are avoided.

In the method flow described in fig. 2, after the virtual image platform generates the real-time audio/video stream, the virtual image platform can actively check the response duration of the real-time audio/video stream, which is beneficial to reducing the response duration and improving the real-time performance of the interaction.

The above description is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and it will be apparent to those skilled in the art that modifications may be made without departing from the inventive concept of the present invention, and these modifications are within the scope of the present invention.

Referring to fig. 3, fig. 3 is a functional block diagram of a transmission control apparatus according to a preferred embodiment of the present invention. In some embodiments, the transmission control device operates in a control server. The transmission control means may comprise a plurality of functional modules consisting of program code segments. Program codes of each program segment in the transmission control device may be stored in the memory and executed by the at least one processor to perform part or all of the steps in the artificial intelligence AI based audio/video transmission control method described in fig. 1, for which reference is specifically made to the relevant description in fig. 1, which is not repeated herein.

In this embodiment, the transmission control apparatus may be divided into a plurality of functional modules according to the functions performed by the transmission control apparatus. The functional module may include: a sending module 301, a establishing module 302 and a pulling module 303. The module referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in memory.

A sending module 301, configured to send a setup request to the avatar platform.

An establishing module 302, configured to establish a hypertext transfer protocol HTTP connection with the avatar platform.

The sending module 301 is further configured to send a text dialect to the avatar platform, so that the avatar platform generates a real-time audio/video stream according to the text dialect and a real-person avatar synthesis algorithm;

and the pulling module 303 is configured to pull the real-time audio/video stream from the avatar platform through a real-time streaming protocol RTSP to obtain an RTSP stream.

The sending module 301 is further configured to transmit the RTSP stream to an external server in a TCP manner, so as to transmit the RTSP stream to a user terminal through the external server.

In the transmission device described in fig. 3, the audio/video stream interfacing between the control server and the avatar platform is realized in the HTTP + RTSP stream mode, and the real-time interaction requirement of the user in the man-machine interaction scene can be satisfied.

Referring to fig. 4, fig. 4 is a functional block diagram of another transmission control apparatus according to another preferred embodiment of the present disclosure. In some embodiments, the transmission control device operates in an avatar platform. The transmission control means may comprise a plurality of functional modules consisting of program code segments. Program codes of each program segment in the transmission control device may be stored in the memory and executed by the at least one processor to perform part or all of the steps in the artificial intelligence AI based audio/video transmission control method described in fig. 2, for which reference is specifically made to the relevant description in fig. 2, which is not repeated herein.

In this embodiment, the transmission control apparatus may be divided into a plurality of functional modules according to the functions performed by the transmission control apparatus. The functional module may include: a receiving module 401, a creating module 402, a generating module 403 and a sending module 404. The module referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in memory.

A receiving module 401, configured to receive a setup request sent by the control server.

A establishing module 402, configured to establish a hypertext transfer protocol HTTP connection with the control server.

The receiving module 401 is further configured to receive the text dialogs sent by the control server.

A generating module 403, configured to generate a real-time audio/video stream according to the text dialect and the real-person avatar synthesis algorithm.

A sending module 404, configured to convert the real-time audio/video stream into an RTSP stream in a real-time streaming protocol RTSP manner, and send the RTSP stream to the control server, so as to play the RTSP stream.

In the transmission device described in fig. 4, after the real-time audio/video stream is generated, the response duration of the real-time audio/video stream can be actively checked, which is beneficial to reducing the response duration and improving the real-time performance of the interaction.

As shown in fig. 5, fig. 5 is a schematic structural diagram of a control server according to a preferred embodiment of the method for controlling audio/video transmission based on artificial intelligence AI according to the present invention. The control server 5 comprises a memory 51, at least one processor 52, a computer program 53 stored in the memory 51 and executable on the at least one processor 52, and at least one communication bus 54.

It will be understood by those skilled in the art that the schematic diagram shown in fig. 5 is merely an example of the control server 5, and does not constitute a limitation of the control server 5, and may include more or less components than those shown, or combine some components, or different components, for example, the control server 5 may further include input and output devices, network access devices, etc.

The at least one Processor 52 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The processor 52 may be a microprocessor or the processor 52 may be any conventional processor or the like, and the processor 52 is a control center of the control server 5 and connects various parts of the entire control server 5 by various interfaces and lines.

The memory 51 may be used to store the computer program 53 and/or the module/unit, and the processor 52 implements various functions of the control server 5 by running or executing the computer program and/or the module/unit stored in the memory 51 and calling data stored in the memory 51. The memory 51 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like; the storage data area may store data (such as audio data) created according to the use of the control server 5, and the like. Further, the memory 51 may include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a flash memory Card (FlashCard), at least one magnetic disk storage device, a flash memory device, or other non-volatile solid state storage device.

Referring to fig. 1, the memory 51 in the control server 5 stores a plurality of instructions to implement a transmission control method of audio and video based on artificial intelligence AI, and the processor 52 can execute the plurality of instructions to implement:

sending a building request to the virtual image platform, and building hypertext transfer protocol (HTTP) connection with the virtual image platform;

sending a text dialect to the virtual image platform so that the virtual image platform generates real-time audio and video stream according to the text dialect and a real person virtual image synthesis algorithm;

through a real-time streaming protocol RTSP, pulling the real-time audio and video stream from the virtual image platform to obtain an RTSP stream;

and transmitting the RTSP stream to an external server in a Transmission Control Protocol (TCP) mode so as to transmit the RTSP stream to a user terminal through the external server.

Specifically, the processor 52 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1, and details thereof are not repeated herein.

In the control server 5 described in fig. 5, the audio/video stream interfacing between the control server and the avatar platform is realized in the HTTP + RTSP streaming manner, and the real-time interaction requirement of the user in the man-machine interaction scene can be satisfied.

As shown in fig. 6, fig. 6 is a schematic structural diagram of an avatar platform according to a preferred embodiment of the method for implementing artificial intelligence AI-based audio/video transmission control. The avatar platform 6 comprises a memory 61, at least one processor 62, a computer program 63 stored in the memory 61 and executable on the at least one processor 62, and at least one communication bus 64.

Those skilled in the art will appreciate that the schematic diagram shown in fig. 6 is merely an example of the avatar platform 6, and does not constitute a limitation of the avatar platform 6, and may include more or less components than those shown, or combine some components, or different components, for example, the avatar platform 6 may further include input and output devices, network access devices, etc.

The at least one Processor 62 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The processor 62 may be a microprocessor or the processor 62 may be any conventional processor or the like, the processor 62 being the control center for the avatar platform 6, with various interfaces and lines connecting the various parts of the entire avatar platform 6.

The memory 61 may be used to store the computer programs 63 and/or modules/units, and the processor 62 implements various functions of the avatar platform 6 by running or executing the computer programs and/or modules/units stored in the memory 61 and calling up data stored in the memory 61. The memory 61 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data) created according to the use of the avatar platform 6, and the like. Further, the memory 61 may include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other non-volatile solid state storage device.

Referring to fig. 2, the memory 61 of the avatar platform 6 stores a plurality of instructions to implement a transmission control method of an artificial intelligence AI-based audio/video, and the processor 62 can execute the plurality of instructions to implement:

receiving a building request sent by a control server, and building a hypertext transfer protocol (HTTP) connection with the control server;

receiving the text dialogues sent by the control server;

generating real-time audio and video stream according to the text dialect and the real person virtual image synthesis algorithm;

and converting the real-time audio and video stream into an RTSP stream in a real-time streaming protocol (RTSP) mode, and sending the RTSP stream to the control server so as to play the RTSP stream.

Specifically, the processor 62 may refer to the description of the relevant steps in the embodiment corresponding to fig. 2, which is not repeated herein.

In the virtual image platform 6 described in fig. 6, after the real-time audio/video stream is generated, the response duration of the real-time audio/video stream can be actively checked, which is beneficial to reducing the response duration and improving the real-time performance of the interaction.

The control server 5/avatar platform 6 integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer memory, and Read-only memory (ROM).

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned. The units or means recited in the system claims may also be implemented by software or hardware.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

17页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于人工智能AI的音视频流的对接处理方法及相关设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类