Digital television display method and device based on video data packet and audio data packet

文档序号:38550 发布日期:2021-09-24 浏览:16次 中文

阅读说明:本技术 基于视频数据包与音频数据包的数字电视展示方法与装置 (Digital television display method and device based on video data packet and audio data packet ) 是由 罗俊强 廖佳秋 于 2021-08-30 设计创作,主要内容包括:本申请揭示了一种基于视频数据包与音频数据包的数字电视展示方法,获取视频数据包与音频数据包;在屏幕上展现视频图像;进行第一声音输出操作,从而以球面声波的方式播放所述主音频;获取所有观看者分别选择的子音频组;将第一子音频组中的共有音频发送至对应的移动终端;要求移动终端进行第二声音输出操作;进行图像采集处理,以得到环境图像,并将所述环境图像输入预设的人物识别模型中进行处理,以得到所述人物识别模型输出的识别结果,并根据所述识别结果获取所有观看者的位置;将专有音频均发送至所述超声波定向声音输出器件阵列;要求超声波定向声音输出器件进行第三声音输出操作,从而提高了数字电视的播放效果。(The application discloses a digital television display method based on a video data packet and an audio data packet, which comprises the steps of obtaining the video data packet and the audio data packet; displaying a video image on a screen; performing a first sound output operation to play the main audio in a spherical sound wave manner; acquiring sub-audio groups respectively selected by all viewers; sending the common audio in the first sub-audio group to a corresponding mobile terminal; requesting the mobile terminal to perform a second sound output operation; carrying out image acquisition processing to obtain an environment image, inputting the environment image into a preset character recognition model for processing to obtain a recognition result output by the character recognition model, and acquiring the positions of all viewers according to the recognition result; sending proprietary audio to the array of ultrasonic directional sound output devices; the ultrasonic directional sound output device is required to perform a third sound output operation, thereby improving the playing effect of the digital television.)

1. A digital television display method based on video data packets and audio data packets is characterized by comprising the following steps:

s1, the digital television terminal acquires a video data packet and an audio data packet; the audio data packet is composed of a main audio and a plurality of sub audio groups, and each sub audio group is composed of a common audio and a special audio;

s2, generating a video image according to the video data packet, and displaying the video image on a screen of the digital television terminal;

s3, performing a first sound output operation by adopting a first sound output device array preset in the digital television terminal, so as to play the main audio in a spherical sound wave mode;

s4, respectively establishing first communication connection with the mobile terminals of all the viewers, and acquiring the sub-audio groups respectively selected by all the viewers through the first communication connection, so as to construct the three-element corresponding relation of the viewers, the mobile terminals and the sub-audio groups; each viewer carries a mobile terminal, and n viewers are in total, wherein n is an integer which is more than or equal to 2 and less than or equal to the number of the plurality of sub audio groups;

s5, according to the three-element correspondence of the viewer-mobile terminal-sub audio group, sending the common audio in the first sub audio group to the corresponding first mobile terminal, sending the common audio in the second sub audio group to the corresponding second mobile terminal, …, and sending the common audio in the nth sub audio group to the corresponding nth mobile terminal;

s6, respectively sending audio playing instructions to the mobile terminals of all the viewers to request the mobile terminals to perform second audio output operation, so that the inherent speakers are respectively adopted to play the received common audio;

s7, carrying out image acquisition processing on the environment where the viewer is located through a preset camera to obtain an environment image, inputting the environment image into a preset character recognition model for processing to obtain a recognition result output by the character recognition model, and acquiring the positions of all the viewers according to the recognition result; the character recognition model is trained on the basis of a convolutional neural network model;

s8, establishing a second communication connection with a preset ultrasonic directional sound output device, and sending the positions of all viewers, the special audio in the first sub-audio group, the special audio in the second sub-audio group, … and the special audio in the nth sub-audio group to the ultrasonic directional sound output device array;

s9, sending an audio playing instruction to the ultrasonic directional sound output device to request the ultrasonic directional sound output device to perform a third sound output operation, so as to output the proprietary audio in the first sub-audio group in a first directional sound output mode, output the proprietary audio in the second sub-audio group in a second directional sound output mode, …, and output the proprietary audio in the nth sub-audio group in an nth directional sound output mode; wherein the first directional sound is output in a manner such that only a first viewer can hear the proprietary audio in the first sub-audio group, the second directional sound is output in a manner such that only a second viewer can hear the proprietary audio in the second sub-audio group, and the nth directional sound is output in a manner such that only an nth viewer can hear the proprietary audio in the nth sub-audio group.

2. The method for displaying digital televisions according to claim 1, wherein, before step S6 of sending an audio playing command to each of the mobile terminals of all the viewers to request the mobile terminals to perform the second audio output operation so as to play the received common audio with the inherent speakers, the method comprises:

s51, respectively sending position adjustment requests to the mobile terminals of all the viewers to require the first viewer to place the first mobile terminal at the first position, the second viewer to place the second mobile terminal at the second position, …, and the nth viewer to place the nth mobile terminal at the nth position; wherein the distance between the first position and the first viewer is not greater than a preset distance threshold, the distance between the second position and the second viewer is not greater than a preset distance threshold, and the distance between the nth position and the nth viewer is not greater than a preset distance threshold; the distance between the first position and other viewers except the first viewer is not less than a preset distance threshold, the distance between the second position and other viewers except the second viewer is not less than a preset distance threshold, and the distance between the nth position and other viewers except the nth viewer is not less than a preset distance threshold.

3. The method for displaying digital televisions according to claim 1, wherein, before step S7, in which the preset camera captures an image of an environment where the viewer is located to obtain an environment image, and the environment image is input into a preset character recognition model for processing to obtain a recognition result output by the character recognition model, and the positions of all the viewers are obtained according to the recognition result, the method comprises:

s61, carrying out image acquisition processing on a plurality of preset scenes to obtain a plurality of sample images; each preset scene at least comprises two persons;

s62, dividing the sample images into a plurality of training images and a plurality of verification images;

s63, calling a preset convolutional neural network model, and inputting the plurality of training images into the convolutional neural network model for training to obtain a temporary character recognition model;

s64, carrying out verification processing on the temporary character recognition model by using the plurality of verification images to obtain a verification result;

s65, judging whether the verification result is qualified;

and S66, if the verification result is that the verification is qualified, marking the temporary character recognition model as a final character recognition model.

4. The digital television display method based on the video data packets and the audio data packets according to claim 1, wherein the first communication connection and the second communication connection are both ZigBee wireless communication connections.

5. A digital TV display device based on video data packet and audio data packet, characterized by comprising:

the data packet acquisition unit is used for indicating the digital television terminal to acquire a video data packet and an audio data packet; the audio data packet is composed of a main audio and a plurality of sub audio groups, and each sub audio group is composed of a common audio and a special audio;

the video image display unit is used for indicating to generate a video image according to the video data packet and displaying the video image on a screen of the digital television terminal;

the first sound output unit is used for indicating that a first sound output device array preset in the digital television terminal is adopted to carry out first sound output operation, so that the main audio is played in a spherical sound wave mode;

the three-element corresponding relation building unit is used for indicating that first communication connection is built with the mobile terminals of all the viewers respectively, and acquiring the sub-audio groups selected by all the viewers respectively through the first communication connection, so that three-element corresponding relation of the viewers-the mobile terminals-the sub-audio groups is built; each viewer carries a mobile terminal, and n viewers are in total, wherein n is an integer which is more than or equal to 2 and less than or equal to the number of the plurality of sub audio groups;

the shared audio sending unit is used for indicating that the shared audio in the first sub audio group is sent to the corresponding first mobile terminal, the shared audio in the second sub audio group is sent to the corresponding second mobile terminal and …, and the shared audio in the nth sub audio group is sent to the corresponding nth mobile terminal according to the three-element correspondence of the viewer-mobile terminal-sub audio group;

a second sound output unit, configured to instruct to send audio playing instructions to the mobile terminals of all viewers, respectively, so as to request the mobile terminals to perform a second sound output operation, so as to play the received common audio with the inherent speakers, respectively;

the character recognition unit is used for indicating a preset camera to acquire an image of the environment where the viewer is located so as to obtain an environment image, inputting the environment image into a preset character recognition model to be processed so as to obtain a recognition result output by the character recognition model, and acquiring the positions of all the viewers according to the recognition result; the character recognition model is trained on the basis of a convolutional neural network model;

the special audio sending unit is used for indicating to establish a second communication connection with a preset ultrasonic directional sound output device, and sending the positions of all the viewers, the special audio in the first sub-audio group, the special audio in the second sub-audio group, … and the special audio in the nth sub-audio group to the ultrasonic directional sound output device array;

a third sound output unit configured to instruct transmission of an audio playback instruction to the ultrasonic directional sound output device to request the ultrasonic directional sound output device to perform a third sound output operation so as to output the exclusive audio in the first sub-audio group in a first directional sound output manner, output the exclusive audio in the second sub-audio group in a second directional sound output manner, …, and output the exclusive audio in the nth sub-audio group in an nth directional sound output manner; wherein the first directional sound is output in a manner such that only a first viewer can hear the proprietary audio in the first sub-audio group, the second directional sound is output in a manner such that only a second viewer can hear the proprietary audio in the second sub-audio group, and the nth directional sound is output in a manner such that only an nth viewer can hear the proprietary audio in the nth sub-audio group.

6. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 4 when executing the computer program.

7. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.

Technical Field

The present application relates to the field of digital televisions, and in particular, to a digital television display method and apparatus based on video data packets and audio data packets.

Background

When watching digital television programs, digital television terminals generally implement stereo playing in a two-channel manner, but the stereo still has a certain difference from the original sound scene. Also, such stereo sound is the same for all viewers and is less adaptive to interactive television programs. The interactive television program refers to that a plurality of viewers can respectively select one viewing angle (the television program comprises a plurality of viewing angles, and the difference between the viewing angles at least comprises the difference of audios) in the television program to watch the television program. However, the existing scheme can only output stereo based on the two-channel principle, and the effect is poor and the adaptability is poor.

Disclosure of Invention

The application provides a digital television display method based on a video data packet and an audio data packet, which comprises the following steps:

s1, the digital television terminal acquires a video data packet and an audio data packet; the audio data packet is composed of a main audio and a plurality of sub audio groups, and each sub audio group is composed of a common audio and a special audio;

s2, generating a video image according to the video data packet, and displaying the video image on a screen of the digital television terminal;

s3, performing a first sound output operation by adopting a first sound output device array preset in the digital television terminal, so as to play the main audio in a spherical sound wave mode;

s4, respectively establishing first communication connection with the mobile terminals of all the viewers, and acquiring the sub-audio groups respectively selected by all the viewers through the first communication connection, so as to construct the three-element corresponding relation of the viewers, the mobile terminals and the sub-audio groups; each viewer carries a mobile terminal, and n viewers are in total, wherein n is an integer which is more than or equal to 2 and less than or equal to the number of the plurality of sub audio groups;

s5, according to the three-element correspondence of the viewer-mobile terminal-sub audio group, sending the common audio in the first sub audio group to the corresponding first mobile terminal, sending the common audio in the second sub audio group to the corresponding second mobile terminal, …, and sending the common audio in the nth sub audio group to the corresponding nth mobile terminal;

s6, respectively sending audio playing instructions to the mobile terminals of all the viewers to request the mobile terminals to perform second audio output operation, so that the inherent speakers are respectively adopted to play the received common audio;

s7, carrying out image acquisition processing on the environment where the viewer is located through a preset camera to obtain an environment image, inputting the environment image into a preset character recognition model for processing to obtain a recognition result output by the character recognition model, and acquiring the positions of all the viewers according to the recognition result; the character recognition model is trained on the basis of a convolutional neural network model;

s8, establishing a second communication connection with a preset ultrasonic directional sound output device, and sending the positions of all viewers, the special audio in the first sub-audio group, the special audio in the second sub-audio group, … and the special audio in the nth sub-audio group to the ultrasonic directional sound output device array;

s9, sending an audio playing instruction to the ultrasonic directional sound output device to request the ultrasonic directional sound output device to perform a third sound output operation, so as to output the proprietary audio in the first sub-audio group in a first directional sound output mode, output the proprietary audio in the second sub-audio group in a second directional sound output mode, …, and output the proprietary audio in the nth sub-audio group in an nth directional sound output mode; wherein the first directional sound is output in a manner such that only a first viewer can hear the proprietary audio in the first sub-audio group, the second directional sound is output in a manner such that only a second viewer can hear the proprietary audio in the second sub-audio group, and the nth directional sound is output in a manner such that only an nth viewer can hear the proprietary audio in the nth sub-audio group.

Further, before the step S6 of sending an audio playing instruction to the mobile terminals of all the viewers respectively to request the mobile terminals to perform a second audio output operation so as to play the received common audio with the inherent speakers respectively, the method includes:

s51, respectively sending position adjustment requests to the mobile terminals of all the viewers to require the first viewer to place the first mobile terminal at the first position, the second viewer to place the second mobile terminal at the second position, …, and the nth viewer to place the nth mobile terminal at the nth position; wherein the distance between the first position and the first viewer is not greater than a preset distance threshold, the distance between the second position and the second viewer is not greater than a preset distance threshold, and the distance between the nth position and the nth viewer is not greater than a preset distance threshold; the distance between the first position and other viewers except the first viewer is not less than a preset distance threshold, the distance between the second position and other viewers except the second viewer is not less than a preset distance threshold, and the distance between the nth position and other viewers except the nth viewer is not less than a preset distance threshold.

Further, before the step S7 of acquiring an image of an environment where a viewer is located by a preset camera to obtain an environment image, inputting the environment image into a preset character recognition model for processing to obtain a recognition result output by the character recognition model, and obtaining positions of all viewers according to the recognition result, the method includes:

s61, carrying out image acquisition processing on a plurality of preset scenes to obtain a plurality of sample images; each preset scene at least comprises two persons;

s62, dividing the sample images into a plurality of training images and a plurality of verification images;

s63, calling a preset convolutional neural network model, and inputting the plurality of training images into the convolutional neural network model for training to obtain a temporary character recognition model;

s64, carrying out verification processing on the temporary character recognition model by using the plurality of verification images to obtain a verification result;

s65, judging whether the verification result is qualified;

and S66, if the verification result is that the verification is qualified, marking the temporary character recognition model as a final character recognition model.

Furthermore, the first communication connection and the second communication connection are both ZigBee wireless communication connections.

The application provides a digital television display device based on video data package and audio data package, includes:

the data packet acquisition unit is used for indicating the digital television terminal to acquire a video data packet and an audio data packet; the audio data packet is composed of a main audio and a plurality of sub audio groups, and each sub audio group is composed of a common audio and a special audio;

the video image display unit is used for indicating to generate a video image according to the video data packet and displaying the video image on a screen of the digital television terminal;

the first sound output unit is used for indicating that a first sound output device array preset in the digital television terminal is adopted to carry out first sound output operation, so that the main audio is played in a spherical sound wave mode;

the three-element corresponding relation building unit is used for indicating that first communication connection is built with the mobile terminals of all the viewers respectively, and acquiring the sub-audio groups selected by all the viewers respectively through the first communication connection, so that three-element corresponding relation of the viewers-the mobile terminals-the sub-audio groups is built; each viewer carries a mobile terminal, and n viewers are in total, wherein n is an integer which is more than or equal to 2 and less than or equal to the number of the plurality of sub audio groups;

the shared audio sending unit is used for indicating that the shared audio in the first sub audio group is sent to the corresponding first mobile terminal, the shared audio in the second sub audio group is sent to the corresponding second mobile terminal and …, and the shared audio in the nth sub audio group is sent to the corresponding nth mobile terminal according to the three-element correspondence of the viewer-mobile terminal-sub audio group;

a second sound output unit, configured to instruct to send audio playing instructions to the mobile terminals of all viewers, respectively, so as to request the mobile terminals to perform a second sound output operation, so as to play the received common audio with the inherent speakers, respectively;

the character recognition unit is used for indicating a preset camera to acquire an image of the environment where the viewer is located so as to obtain an environment image, inputting the environment image into a preset character recognition model to be processed so as to obtain a recognition result output by the character recognition model, and acquiring the positions of all the viewers according to the recognition result; the character recognition model is trained on the basis of a convolutional neural network model;

the special audio sending unit is used for indicating to establish a second communication connection with a preset ultrasonic directional sound output device, and sending the positions of all the viewers, the special audio in the first sub-audio group, the special audio in the second sub-audio group, … and the special audio in the nth sub-audio group to the ultrasonic directional sound output device array;

a third sound output unit configured to instruct transmission of an audio playback instruction to the ultrasonic directional sound output device to request the ultrasonic directional sound output device to perform a third sound output operation so as to output the exclusive audio in the first sub-audio group in a first directional sound output manner, output the exclusive audio in the second sub-audio group in a second directional sound output manner, …, and output the exclusive audio in the nth sub-audio group in an nth directional sound output manner; wherein the first directional sound is output in a manner such that only a first viewer can hear the proprietary audio in the first sub-audio group, the second directional sound is output in a manner such that only a second viewer can hear the proprietary audio in the second sub-audio group, and the nth directional sound is output in a manner such that only an nth viewer can hear the proprietary audio in the nth sub-audio group.

The present application provides a computer device comprising a memory storing a computer program and a processor implementing the steps of any of the above methods when the processor executes the computer program.

The present application provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of any of the above.

The digital television display method and device based on the video data packet and the audio data packet, the computer equipment and the storage medium obtain the video data packet and the audio data packet; displaying a video image on a screen of the digital television terminal; performing a first sound output operation to play the main audio in a spherical sound wave manner; acquiring sub-audio groups respectively selected by all viewers, thereby constructing a three-element corresponding relation of the viewers, the mobile terminal and the sub-audio groups; sending the common audio in the first sub-audio group to the corresponding first mobile terminal, sending the common audio in the second sub-audio group to the corresponding second mobile terminal, …, and sending the common audio in the nth sub-audio group to the corresponding nth mobile terminal; requesting the mobile terminal to perform a second sound output operation; carrying out image acquisition processing to obtain an environment image, inputting the environment image into a preset character recognition model for processing to obtain a recognition result output by the character recognition model, and acquiring the positions of all viewers according to the recognition result; sending proprietary audio to the array of ultrasonic directional sound output devices; the ultrasonic directional sound output device is required to perform a third sound output operation, thereby improving the playing effect of the digital television.

Drawings

Fig. 1 is a schematic flowchart illustrating a digital television displaying method based on video data packets and audio data packets according to an embodiment of the present application;

FIG. 2 is a block diagram of a digital TV display device based on video data packets and audio data packets according to an embodiment of the present application;

fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Referring to fig. 1, an embodiment of the present application provides a digital television display method based on a video data packet and an audio data packet, including the following steps:

s1, the digital television terminal acquires a video data packet and an audio data packet; the audio data packet is composed of a main audio and a plurality of sub audio groups, and each sub audio group is composed of a common audio and a special audio;

s2, generating a video image according to the video data packet, and displaying the video image on a screen of the digital television terminal;

s3, performing a first sound output operation by adopting a first sound output device array preset in the digital television terminal, so as to play the main audio in a spherical sound wave mode;

s4, respectively establishing first communication connection with the mobile terminals of all the viewers, and acquiring the sub-audio groups respectively selected by all the viewers through the first communication connection, so as to construct the three-element corresponding relation of the viewers, the mobile terminals and the sub-audio groups; each viewer carries a mobile terminal, and n viewers are in total, wherein n is an integer which is more than or equal to 2 and less than or equal to the number of the plurality of sub audio groups;

s5, according to the three-element correspondence of the viewer-mobile terminal-sub audio group, sending the common audio in the first sub audio group to the corresponding first mobile terminal, sending the common audio in the second sub audio group to the corresponding second mobile terminal, …, and sending the common audio in the nth sub audio group to the corresponding nth mobile terminal;

s6, respectively sending audio playing instructions to the mobile terminals of all the viewers to request the mobile terminals to perform second audio output operation, so that the inherent speakers are respectively adopted to play the received common audio;

s7, carrying out image acquisition processing on the environment where the viewer is located through a preset camera to obtain an environment image, inputting the environment image into a preset character recognition model for processing to obtain a recognition result output by the character recognition model, and acquiring the positions of all the viewers according to the recognition result; the character recognition model is trained on the basis of a convolutional neural network model;

s8, establishing a second communication connection with a preset ultrasonic directional sound output device, and sending the positions of all viewers, the special audio in the first sub-audio group, the special audio in the second sub-audio group, … and the special audio in the nth sub-audio group to the ultrasonic directional sound output device array;

s9, sending an audio playing instruction to the ultrasonic directional sound output device to request the ultrasonic directional sound output device to perform a third sound output operation, so as to output the proprietary audio in the first sub-audio group in a first directional sound output mode, output the proprietary audio in the second sub-audio group in a second directional sound output mode, …, and output the proprietary audio in the nth sub-audio group in an nth directional sound output mode; wherein the first directional sound is output in a manner such that only a first viewer can hear the proprietary audio in the first sub-audio group, the second directional sound is output in a manner such that only a second viewer can hear the proprietary audio in the second sub-audio group, and the nth directional sound is output in a manner such that only an nth viewer can hear the proprietary audio in the nth sub-audio group.

The present application has advantages in that it can provide more effective stereo sound (which is implemented by using the inherent first sound output device array of the digital television terminal, the speakers of a plurality of mobile terminals, and the ultrasonic directional sound output device together), and in that it can realize the playing of multi-view interactive television programs. Both of these advantages can be achieved because the present application employs a first sound output operation, a second sound output operation, and a third sound output operation to provide different sound environments for multiple viewers while playing video images.

The method and the device can be suitable for displaying any feasible digital television programs, and are particularly suitable for displaying the group image television programs, because the viewing angles of the group image television programs are more.

The implementation process of the present application is described by taking the presence of three viewers as an example:

the three viewers respectively place the three mobile terminals on the tea table (wherein, when the mobile terminals are placed on the tea table, the viewers only need to be placed at the positions close to the viewers and far away from other people, even the mobile terminals can be placed close to the skin, as long as the sound output is not influenced), and the sound output direction of the loudspeaker of the mobile terminal faces to the viewers. Although the speaker of the mobile terminal does not have a directional sound emitting function, the speaker of the mobile terminal is generally located at the bottom of the mobile terminal, and the sound field in the direction that the front surface of the bottom faces is stronger than that in other directions. And all three mobile terminals are closer to themselves. When a digital television program is presented, a video image is presented on the screen. It should be noted that the video images seen by the three viewers are the same, but the audio content is not exactly the same. Meanwhile, the first sound output device array of the digital television terminal outputs the main audio, and its purpose includes two aspects, the first aspect is to provide a sound such as background music, which does not require a sense of distance, and the second aspect is to form a stereo sound together with other mobile terminals.

The three mobile terminals respectively play three common audios, the common audios are different, and the three common audios and the main audio are used together to construct a stereo. Since the three mobile terminals and the first sound output device arrays of the digital television terminal are at a certain distance from each other, the distance difference is certain, and the requirement of stereo output based on two channels or multiple channels is met, so that a preliminary stereo effect can be provided (the common audio in the application also has two channels or multiple channels, and all other audio can also have two channels). The process of constructing the stereo effect is similar to the traditional stereo playback based on the two-channel principle, and is not described herein again, but two speakers are not required to be specially arranged in the present application.

Finally, the ultrasonic directional sound output device outputs three different special audios to three observers in a directional mode. The dedicated audio is audio that only one viewer can hear. This arrangement is particularly suitable for interactive television programs, for example where in a scene there are A, B, C three characters and far apart from each other, when a ringing occurs somewhere in the scene, the sound heard by A, B, C three characters is different, so that three different dedicated audios are directed to the three viewers respectively selecting the three characters. Further, in this scenario A, B is closer to the narrator and the C character is farther away, so A, B can clearly hear the full narration, but the C character hears the more vague narration. Such a scene is often present in a spy video, and is also a video type to which the present application is adapted. Of course, the application can also be applied to ordinary video, and in extreme cases, the content of the proprietary audio can also be identical, thereby focusing on building stereo sound effects.

The ultrasonic directional sound output device is realized based on the strong directivity of ultrasonic waves, and the principle of the ultrasonic directional sound output device is that the ultrasonic waves are shorter in wavelength and have better directivity relative to common sound waves, so that the common sound waves are transmitted in a spherical mode, and the ultrasonic waves can be transmitted in a linear mode. However, the ultrasonic waves themselves cannot be heard by the human ear, and therefore, the ultrasonic waves should be demodulated to sound waves that can be heard by the human ear in the vicinity of the human ear. Therefore, the specific process of outputting ultrasonic directional sound is to use two or more ultrasonic waves of different frequencies as carrier waves and then demodulate them in the vicinity of the viewer (for example, the head of the viewer or the whole human body), and the demodulation process is to superimpose two or more ultrasonic waves at the positions so as to realize frequency superimposition or frequency reduction (this is realized by controlling the phase), and the frequency superimposition or frequency reduction so as to generate ordinary sound waves of lower frequencies so as to fall within the frequency range that the human ear can hear. At this time, only the position where the viewer is located can hear the directional sound, and viewers at other positions cannot hear the directional sound.

Accordingly, the ultrasonic directional sound output device is adopted to output sound directionally, the ultrasonic directional sound output device can be presented in an array form, and when sound is output directionally, a plurality of directional sounds are output to a plurality of viewers (determined according to the time axis of the special audio, if two special audios have sound signals at the same time point, the directional sounds are output to two corresponding viewers at the same time point, if only one special audio has the sound signal at one time point, the directional sound is output to the corresponding viewer at the time point, and other viewers do not output the directional sound).

As described in the above steps S1-S3, the digital tv terminal obtains the video data packet and the audio data packet; the audio data packet is composed of a main audio and a plurality of sub audio groups, and each sub audio group is composed of a common audio and a special audio; generating a video image according to the video data packet, and displaying the video image on a screen of the digital television terminal; and performing a first sound output operation by adopting a first sound output device array preset in the digital television terminal so as to play the main audio in a spherical sound wave mode.

The video data packet and the audio data packet are received in a digital signal manner, and the video data packet can decode a video image in a preset decoding manner, wherein the video image is formed by sequentially arranging a plurality of images. In addition, the audio data packet is composed of a main audio and a plurality of sub audio groups, each of which is composed of a common audio and a private audio, and is intended to output sound simultaneously with the video image, so that the video image, the main audio, the common audio and the private audio follow the same time axis, and can be regarded as using the same timer, thereby ensuring synchronous playing.

The digital television terminal is presented in the form of a digital television, for example, which includes a screen for presenting video images, an inherent speaker array; and the inherent loudspeaker array, namely the first sound output device array, is used for carrying out a first sound output operation so as to play the main audio in a spherical sound wave mode. Wherein the spherical sound wave indicates that the first sound output device array is outputting sound waves without orientation to the environment, so that the main audio heard by each viewer is the same. The main audio mainly outputs sounds without space distance requirements such as background music and the like, and on the other hand, the main audio and the speakers of the plurality of mobile terminals form a preliminary stereo.

Further, the implementation process of the present application further includes obtaining the positions of the plurality of mobile terminals and the position of the first sound output device array, adjusting the sound playing parameters of each sub-audio group according to the positions of the plurality of mobile terminals and the position of the first sound output device array (of course, the sound playing parameters may also be directly played according to the originally recorded sound channel without adjusting the sound playing parameters), then sending the common audio of each sub-audio group to the corresponding mobile terminal, sending the exclusive audio of each sub-audio group to the ultrasonic directional sound output device, so that the first sound output device array, the speaker of each mobile terminal, and the ultrasonic directional sound output device perform matching operations according to the same time axis, thereby simultaneously implementing the first sound output operation, the second sound output operation, and the third sound output operation to form stereo sound with better effect, while enabling human-specific audio output.

As described in the above steps S4-S6, the first communication connections are respectively established with the mobile terminals of all viewers, and the sub-audio groups respectively selected by all viewers are obtained through the first communication connections, so as to construct the three-element correspondence relationship between the viewer-mobile terminal-sub-audio group; each viewer carries a mobile terminal, and n viewers are in total, wherein n is an integer which is more than or equal to 2 and less than or equal to the number of the plurality of sub audio groups; according to the three-element correspondence of the viewer-mobile terminal-sub audio group, sending the common audio in the first sub audio group to the corresponding first mobile terminal, sending the common audio in the second sub audio group to the corresponding second mobile terminal …, and sending the common audio in the nth sub audio group to the corresponding nth mobile terminal; and respectively sending audio playing instructions to the mobile terminals of all the viewers to request the mobile terminals to perform second sound output operation, so that the received common audio is respectively played by adopting the inherent loudspeakers.

The obtaining of the sub-audio groups respectively selected by all viewers is a conventional description in the art, and means that the viewer selects one sub-audio group, so that the digital television terminal determines the selected sub-audio group, which is the meaning of obtaining.

The first communication connection may be implemented using any feasible connection, for example using short-range communication technology, preferably using a ZigBee wireless communication connection. In the application, the viewer has a corresponding relationship with the mobile terminal, and the viewer selects the sub-audio group through the mobile terminal, so that the three-element corresponding relationship of the viewer, the mobile terminal and the sub-audio group can be established. The sub-audio groups correspond to views, in a television program, one view corresponds to one sub-audio group, and the viewer selects one sub-audio group, i.e., selects one substituted view.

The application relates to the description of common audio and proprietary audio, common audio means that although emitted at the speakers of different mobile terminals, the emitted sound is not selective and can be heard by all viewers; proprietary audio means that only one viewer can hear the wind, with selectivity, only one viewer can hear. And respectively adopting the inherent loudspeaker to play the received common audio, wherein the inherent loudspeaker refers to the loudspeaker of each mobile terminal.

In the present application, the ellipses … are used, and the meanings of the ellipses are the same as those of the ellipses in the mathematical field, for example, the common audio in the first sub-audio group is transmitted to the corresponding first mobile terminal, the common audio in the second sub-audio group is transmitted to the corresponding second mobile terminal, …, and the common audio in the nth sub-audio group is transmitted to the corresponding nth mobile terminal, wherein the ellipses are the ellipses from the second sub-audio group to the nth sub-audio group, and the transmission process of other sub-audio groups is omitted.

Further, before the step S6 of sending an audio playing instruction to the mobile terminals of all the viewers respectively to request the mobile terminals to perform a second audio output operation so as to play the received common audio with the inherent speakers respectively, the method includes:

s51, respectively sending position adjustment requests to the mobile terminals of all the viewers to require the first viewer to place the first mobile terminal at the first position, the second viewer to place the second mobile terminal at the second position, …, and the nth viewer to place the nth mobile terminal at the nth position; wherein the distance between the first position and the first viewer is not greater than a preset distance threshold, the distance between the second position and the second viewer is not greater than a preset distance threshold, and the distance between the nth position and the nth viewer is not greater than a preset distance threshold; the distance between the first position and other viewers except the first viewer is not less than a preset distance threshold, the distance between the second position and other viewers except the second viewer is not less than a preset distance threshold, and the distance between the nth position and other viewers except the nth viewer is not less than a preset distance threshold.

The position adjustment request is used for reminding the viewer to move the mobile terminal to a proper position, and the specific requirement is that the distance between the first position and the first viewer is not greater than a preset distance threshold value, and the distance between the first position and other viewers except the first viewer is not less than the preset distance threshold value. This is because, if all the mobile terminals are placed together, the necessity of outputting the second sound by using a plurality of mobile terminals does not exist.

As described in the above steps S7-S9, the environment where the viewer is located is captured by the preset camera to obtain an environment image, the environment image is input into the preset character recognition model to be processed to obtain the recognition result output by the character recognition model, and the positions of all viewers are obtained according to the recognition result; the character recognition model is trained on the basis of a convolutional neural network model; establishing a second communication connection with a preset ultrasonic directional sound output device, and sending the positions of all viewers, the special audio in the first sub-audio group, the special audio in the second sub-audio group, … and the special audio in the nth sub-audio group to the ultrasonic directional sound output device array; sending an audio playing instruction to the ultrasonic directional sound output device to request the ultrasonic directional sound output device to perform a third sound output operation, so as to output the exclusive audio in the first sub-audio group in a first directional sound output manner, output the exclusive audio in the second sub-audio group in a second directional sound output manner, …, and output the exclusive audio in the nth sub-audio group in an nth directional sound output manner; wherein the first directional sound is output in a manner such that only a first viewer can hear the proprietary audio in the first sub-audio group, the second directional sound is output in a manner such that only a second viewer can hear the proprietary audio in the second sub-audio group, and the nth directional sound is output in a manner such that only an nth viewer can hear the proprietary audio in the nth sub-audio group.

The environment image of the present application includes images of all viewers, and if all viewers sit or lie on a sofa, for example, the environment image is an image obtained by image-capturing the sofa. And the environmental image of the present application includes face images of all viewers for more accurate person identification. The convolutional neural network model is a machine learning model that includes an input layer, a convolutional layer, a pooling layer, a fully-connected layer, an output layer, and the like, which is suitable for performing an image recognition process, and can be competent for the task of character recognition.

Since the present application requires the directional sound output that varies from person to person, the position of the viewer needs to be determined first, and therefore, the environment image is used to identify the person (i.e., the identification result), and then the positions of all the viewers are obtained from the environment image, thereby providing a possibility for realizing the directional sound output.

In addition, the character recognition model adopted by the method is different from a common model, the requirement on the recognition accuracy is low, specifically, only a limited object needs to be distinguished, and the people who gather together to watch television generally do not belong to strangers, so that the character recognition model has few recognition results in practice, and the training data required in the training process is few (which is also the reason for low requirement on the recognition accuracy), so that the training speed is high, and the method is easy to implement.

And the ultrasonic directional sound output device can be arranged at any feasible position, and is preferably arranged on a ceiling. The ultrasonic directional sound output device is arranged on the ceiling, so that the problem of crossing of obstacles and people does not need to be considered too much when the ultrasonic directional sound output device outputs directional sound, and the implementation is easier. Of course, the cloth can be arranged on the side surface of the sofa.

The principle of the ultrasonic directional sound output device is as described above, and it is necessary to output ultrasonic waves of at least two different frequencies in a directional manner, and to realize frequency superposition at a target position, generating a relatively low-frequency sound of a difference in frequency so that a viewer can hear the low-frequency sound.

Further, before the step S7 of acquiring an image of an environment where a viewer is located by a preset camera to obtain an environment image, inputting the environment image into a preset character recognition model for processing to obtain a recognition result output by the character recognition model, and obtaining positions of all viewers according to the recognition result, the method includes:

s61, carrying out image acquisition processing on a plurality of preset scenes to obtain a plurality of sample images; each preset scene at least comprises two persons;

s62, dividing the sample images into a plurality of training images and a plurality of verification images;

s63, calling a preset convolutional neural network model, and inputting the plurality of training images into the convolutional neural network model for training to obtain a temporary character recognition model;

s64, carrying out verification processing on the temporary character recognition model by using the plurality of verification images to obtain a verification result;

s65, judging whether the verification result is qualified;

and S66, if the verification result is that the verification is qualified, marking the temporary character recognition model as a final character recognition model.

Therefore, the character recognition model obtained by training can be qualified for recognition tasks in a multi-person scene. Furthermore, the plurality of sample images are only images of all members of a family member, so that the training is more targeted, the training speed is higher, and the character recognition model obtained by training in the way cannot perform wide-range character recognition (the recognition accuracy of other characters is lower), but the recognition accuracy of the family is high.

According to the digital television display method based on the video data packet and the audio data packet, the video data packet and the audio data packet are obtained; displaying a video image on a screen of the digital television terminal; performing a first sound output operation to play the main audio in a spherical sound wave manner; acquiring sub-audio groups respectively selected by all viewers, thereby constructing a three-element corresponding relation of the viewers, the mobile terminal and the sub-audio groups; sending the common audio in the first sub-audio group to the corresponding first mobile terminal, sending the common audio in the second sub-audio group to the corresponding second mobile terminal, …, and sending the common audio in the nth sub-audio group to the corresponding nth mobile terminal; requesting the mobile terminal to perform a second sound output operation; carrying out image acquisition processing to obtain an environment image, inputting the environment image into a preset character recognition model for processing to obtain a recognition result output by the character recognition model, and acquiring the positions of all viewers according to the recognition result; sending proprietary audio to the array of ultrasonic directional sound output devices; the ultrasonic directional sound output device is required to perform a third sound output operation, thereby improving the playing effect of the digital television.

Referring to fig. 2, an embodiment of the present application provides a digital television display apparatus based on video data packets and audio data packets, including:

a data packet obtaining unit 10, configured to instruct a digital television terminal to obtain a video data packet and an audio data packet; the audio data packet is composed of a main audio and a plurality of sub audio groups, and each sub audio group is composed of a common audio and a special audio;

a video image display unit 20, configured to instruct to generate a video image according to the video data packet, and display the video image on a screen of the digital television terminal;

a first sound output unit 30, configured to instruct a first sound output device array preset in the digital television terminal to perform a first sound output operation, so as to play the main audio in a spherical sound wave manner;

a three-element correspondence relationship establishing unit 40, configured to instruct to establish first communication connections with the mobile terminals of all viewers respectively, and obtain sub-audio groups selected by all viewers respectively through the first communication connections, so as to establish a three-element correspondence relationship between the viewers and the mobile terminals and the sub-audio groups; each viewer carries a mobile terminal, and n viewers are in total, wherein n is an integer which is more than or equal to 2 and less than or equal to the number of the plurality of sub audio groups;

a common audio sending unit 50, configured to instruct, according to the three-element correspondence of the viewer-mobile terminal-sub audio group, to send the common audio in the first sub audio group to the corresponding first mobile terminal, send the common audio in the second sub audio group to the corresponding second mobile terminal, …, and send the common audio in the nth sub audio group to the corresponding nth mobile terminal;

a second sound output unit 60 configured to instruct the mobile terminals of all viewers to send audio playing instructions to the mobile terminals, respectively, so as to request the mobile terminals to perform a second sound output operation, thereby playing the received common audio with the inherent speakers, respectively;

the character recognition unit 70 is configured to instruct a preset camera to perform image acquisition processing on an environment where a viewer is located so as to obtain an environment image, input the environment image into a preset character recognition model for processing so as to obtain a recognition result output by the character recognition model, and obtain positions of all viewers according to the recognition result; the character recognition model is trained on the basis of a convolutional neural network model;

a dedicated audio transmitting unit 80, configured to instruct to establish a second communication connection with a preset ultrasonic directional sound output device, and transmit the positions of all viewers, the dedicated audio in the first sub-audio group, the dedicated audio in the second sub-audio group, …, and the dedicated audio in the nth sub-audio group to the ultrasonic directional sound output device array;

a third sound output unit 90 configured to instruct an audio play instruction to be sent to the ultrasonic directional sound output device to request the ultrasonic directional sound output device to perform a third sound output operation so as to output the private audio in the first sub-audio group in a first directional sound output manner, output the private audio in the second sub-audio group in a second directional sound output manner, …, and output the private audio in the nth sub-audio group in an nth directional sound output manner; wherein the first directional sound is output in a manner such that only a first viewer can hear the proprietary audio in the first sub-audio group, the second directional sound is output in a manner such that only a second viewer can hear the proprietary audio in the second sub-audio group, and the nth directional sound is output in a manner such that only an nth viewer can hear the proprietary audio in the nth sub-audio group.

The operations performed by the units are respectively corresponding to the steps of the digital television display method based on the video data packet and the audio data packet in the foregoing embodiment one by one, and are not described herein again.

The digital television display device based on the video data packet and the audio data packet obtains the video data packet and the audio data packet; displaying a video image on a screen of the digital television terminal; performing a first sound output operation to play the main audio in a spherical sound wave manner; acquiring sub-audio groups respectively selected by all viewers, thereby constructing a three-element corresponding relation of the viewers, the mobile terminal and the sub-audio groups; sending the common audio in the first sub-audio group to the corresponding first mobile terminal, sending the common audio in the second sub-audio group to the corresponding second mobile terminal, …, and sending the common audio in the nth sub-audio group to the corresponding nth mobile terminal; requesting the mobile terminal to perform a second sound output operation; carrying out image acquisition processing to obtain an environment image, inputting the environment image into a preset character recognition model for processing to obtain a recognition result output by the character recognition model, and acquiring the positions of all viewers according to the recognition result; sending proprietary audio to the array of ultrasonic directional sound output devices; the ultrasonic directional sound output device is required to perform a third sound output operation, thereby improving the playing effect of the digital television.

Referring to fig. 3, an embodiment of the present invention further provides a computer device, where the computer device may be a server, and an internal structure of the computer device may be as shown in the figure. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer device is used for storing data used by the digital television display method based on the video data packets and the audio data packets. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a digital television presentation method based on video data packets and audio data packets.

The processor executes the digital television display method based on the video data packet and the audio data packet, wherein the steps included in the method correspond to the steps of executing the digital television display method based on the video data packet and the audio data packet in the foregoing embodiment one to one, and are not described herein again.

It will be understood by those skilled in the art that the structures shown in the drawings are only block diagrams of some of the structures associated with the embodiments of the present application and do not constitute a limitation on the computer apparatus to which the embodiments of the present application may be applied.

The computer equipment acquires a video data packet and an audio data packet; displaying a video image on a screen of the digital television terminal; performing a first sound output operation to play the main audio in a spherical sound wave manner; acquiring sub-audio groups respectively selected by all viewers, thereby constructing a three-element corresponding relation of the viewers, the mobile terminal and the sub-audio groups; sending the common audio in the first sub-audio group to the corresponding first mobile terminal, sending the common audio in the second sub-audio group to the corresponding second mobile terminal, …, and sending the common audio in the nth sub-audio group to the corresponding nth mobile terminal; requesting the mobile terminal to perform a second sound output operation; carrying out image acquisition processing to obtain an environment image, inputting the environment image into a preset character recognition model for processing to obtain a recognition result output by the character recognition model, and acquiring the positions of all viewers according to the recognition result; sending proprietary audio to the array of ultrasonic directional sound output devices; the ultrasonic directional sound output device is required to perform a third sound output operation, thereby improving the playing effect of the digital television.

An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored thereon, and when the computer program is executed by a processor, the method for displaying a digital television based on a video data packet and an audio data packet is implemented, where steps included in the method correspond to steps of executing the method for displaying a digital television based on a video data packet and an audio data packet in the foregoing embodiment one to one, and are not described herein again.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with a computer program or instructions, the computer program can be stored in a non-volatile computer-readable storage medium, and the computer program can include the processes of the embodiments of the methods described above when executed. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

18页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:视频添加水印的处理方法及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类