Multimedia file playing system, related method, device and equipment

文档序号：1820094 发布日期：2021-11-09 浏览：25次中文

阅读说明：本技术 多媒体文件播放系统、相关方法、装置及设备 (Multimedia file playing system, related method, device and equipment ) 是由周明智龙舟于 2020-05-06 设计创作，主要内容包括：本申请公开了多媒体文件播放相关系统、方法、装置及设备。所述系统,针对客户端播放器当前播放的多媒体文件,通过客户端提取与播放进度对应的音频流；向服务端发送所述音频流；以及,在播放器中显示服务端回送的所述音频流的语音翻译文本；服务端通过语音翻译模型,确定所述语音翻译文本,向客户端回送所述语音翻译文本。采用这种处理方式,使得根据当前用户产生的音频流调用语音翻译服务,实现语音即时翻译；因此,可以有效确保用户观看新文件也可同步显示字幕,达到“所听既所见”的实时字幕效果,同时可满足不同语言用户的字幕观看需求。(The application discloses a multimedia file playing related system, method, device and equipment. The system extracts an audio stream corresponding to the playing progress through the client aiming at a multimedia file currently played by the client player; sending the audio stream to a server; displaying the voice translation text of the audio stream returned by the server side in the player; and the server determines the voice translation text through a voice translation model and returns the voice translation text to the client. By adopting the processing mode, the voice translation service is called according to the audio stream generated by the current user, and the instant translation of the voice is realized; therefore, the method can effectively ensure that the user can watch the new file and synchronously display the subtitles, achieves the real-time subtitle effect of 'what you hear is' and can meet the subtitle watching requirements of users with different languages.)

1. A multimedia file playing system, comprising:

the client is used for extracting an audio stream corresponding to the playing progress aiming at the multimedia file currently played by the player; sending the audio stream to a server; displaying the voice translation text of the audio stream returned by the server side in the player;

and the server is used for determining the voice translation text through a voice translation model and returning the voice translation text to the client.

2. A method for playing a multimedia file, comprising:

extracting an audio stream corresponding to the playing progress aiming at the multimedia file currently played by the player;

sending the audio stream to a server;

and displaying the voice translation text of the audio stream returned by the server side in the player.

3. The method of claim 2,

the player comprises a browser player;

the extracting of the audio stream corresponding to the playing progress includes:

and acquiring the audio stream through a data stream capturing module of the browser player.

4. The method of claim 2,

the audio stream comprises an audio stream of millisecond duration.

5. The method of claim 2,

the method further comprises the following steps:

performing compression processing on the audio stream;

the sending the audio stream to a server includes:

and sending the compressed audio stream to the server.

6. The method of claim 5, wherein the performing the compression process on the audio stream is performed by at least one of:

performing a down-sampling process on the audio stream;

and performing gain reduction processing on the audio stream according to the volume data of the audio stream.

7. The method of claim 6, wherein the performing down-sampling processing on the audio stream comprises:

determining a down-sampling rate;

and performing down-sampling processing on the audio stream according to the down-sampling rate.

8. The method of claim 5,

the player comprises a browser player;

the performing compression processing on the audio stream includes:

creating an audio input node according to the audio stream;

creating an audio handler for the audio stream according to the audio input node;

performing compression processing on the audio stream by an audio processing program.

9. The method of claim 2,

the extracting of the audio stream corresponding to the playing progress includes:

extracting an audio stream to be played;

after the audio stream is sent to the server, the audio stream to be played is played through the player, so that when the audio stream to be played is played, the speech translation text of the audio stream to be played is displayed.

10. The method of claim 2, further comprising:

and sending target language information to the server so that the server translates the audio stream into a text of the target language.

11. A method for playing a multimedia file, comprising:

receiving an audio stream corresponding to the playing progress of a currently played multimedia file sent by a client;

determining a speech translation text of the audio stream through a speech translation model;

and returning the voice translation text to the client so that the client displays the voice translation text when playing the audio stream.

12. A multimedia file playback apparatus, comprising:

the audio stream extraction unit is used for extracting an audio stream corresponding to the playing progress aiming at the multimedia file currently played by the player;

the audio stream sending unit is used for sending the audio stream to a server;

and the text display unit is used for displaying the voice translation text of the audio stream returned by the server side in the player.

13. An electronic device, comprising:

a processor; and

a memory for storing a program for implementing a multimedia file playing method, the device executing the following steps after being powered on and running the program of the method through the processor: extracting an audio stream corresponding to the playing progress aiming at the multimedia file currently played by the player; sending the audio stream to a server; and displaying the voice translation text of the audio stream returned by the server side in the player.

14. A multimedia file playback apparatus, comprising:

the data receiving unit is used for receiving an audio stream which is sent by the client and corresponds to the playing progress of the currently played multimedia file;

the translation unit is used for determining a voice translation text of the audio stream through a voice translation model;

and the text returning unit is used for returning the voice translation text to the client so that the client displays the voice translation text when playing the audio stream.

15. An electronic device, comprising:

a processor; and

a memory for storing a program for implementing a multimedia file playing method, the device executing the following steps after being powered on and running the program of the method through the processor: receiving an audio stream corresponding to the playing progress of a currently played multimedia file sent by a client; determining a speech translation text of the audio stream through a speech translation model; and returning the voice translation text to the client so that the client displays the voice translation text when playing the audio stream.

16. A speech translation model quality assessment system, comprising:

the server is used for collecting at least one multimedia file for evaluating the quality of the real-time voice translation model and sending the multimedia file to the client; receiving an audio stream which is sent by a client and corresponds to the playing progress of the multimedia file; determining a voice translation text of the audio stream through the translation model, and returning the voice translation text to a client; receiving voice translation quality information which is sent by a client and corresponds to the multimedia file; determining quality information of the translation model according to the quality information of at least one multimedia file;

the client is used for playing the multimedia file through a browser and extracting the audio stream; and displaying the speech translation text in a player; and determining the voice translation quality information according to the voice translation text.

17. A speech translation model quality assessment method is characterized by comprising the following steps:

collecting at least one multimedia file for evaluating the quality of a real-time speech translation model, and sending the multimedia file to a client;

receiving an audio stream which is sent by a client and corresponds to the playing progress of the multimedia file;

determining a voice translation text of the audio stream through the translation model, and returning the voice translation text to a client;

receiving voice translation quality information which is sent by a client and corresponds to the multimedia file;

determining quality information of the translation model based on the quality information of at least one multimedia file.

18. A speech translation model quality assessment method is characterized by comprising the following steps:

playing a multimedia file for evaluating the quality of the real-time voice translation model through a browser;

extracting an audio stream corresponding to the playing progress of the multimedia file, and sending the audio stream to a server;

displaying the voice translation text of the audio stream returned by the server side in a player;

and determining voice translation quality information corresponding to the multimedia file according to the voice translation text, and sending the quality information to a server.

19. A multimedia file playing control method is characterized by comprising the following steps:

aiming at a multimedia file currently played by a player, extracting an audio stream corresponding to the playing progress, and sending the audio stream to a server;

determining display delay time length information of the voice translation text;

and displaying the voice translation text of the audio stream returned by the server side in the player according to the duration information.

20. The method of claim 19, wherein determining the display delay duration information of the speech translation text comprises:

and determining the duration information according to the hearing level information of the user.

21. The method of claim 19, further comprising:

if the voice hearing difficulty exceeds the hearing level of the user, pausing playing the multimedia file and repeatedly playing the played file segment;

and adjusting the duration information according to the repeated playing times.

22. The method of claim 19, further comprising:

and if the original text of the audio stream comprises words which are not included in the user source language word list, repeatedly playing the audio stream.

23. The method of claim 22, wherein said repeatedly playing the audio stream comprises:

determining reading following duration information;

and determining the playing time interval of the two adjacent audio streams according to the follow-up reading time length information.

24. The method of claim 19, further comprising:

collecting the following reading voice data of a user;

determining a reading following score according to the reading following voice data;

and determining the repeated playing times of the audio stream according to the reading-after score.

25. The method of claim 19, further comprising:

intercepting a file segment of the multimedia file, wherein the voice hearing difficulty of the file exceeds the hearing level of a user;

and storing the file segments so as to repeatedly play the file segments.

26. A multimedia file playing system, comprising:

the client is used for extracting an audio stream corresponding to the playing progress aiming at the multimedia file currently played by the player; sending the audio stream to a server; playing the voice data of the target language of the audio stream returned by the server side in the player;

the server is used for determining the voice translation text through a voice translation model; determining voice data of the target language through a voice synthesis model; and returning the voice data of the target language to the client.

27. A method for playing a multimedia file, comprising:

extracting an audio stream corresponding to the playing progress aiming at the multimedia file currently played by the player;

sending the audio stream to a server;

and playing the voice data of the target language of the audio stream returned by the server side in the player.

28. A method for playing a multimedia file, comprising:

receiving an audio stream corresponding to the playing progress of a currently played multimedia file sent by a client;

determining the voice translation text through a voice translation model;

determining voice data of the target language through a voice synthesis model;

and returning the voice data of the target language to the client.

Technical Field

The application relates to the technical field of voice processing, in particular to a multimedia file playing system, method and device, a voice translation model quality evaluation system and method and electronic equipment.

Background

With the continuous development of internet technology, video websites have been increasingly widely used. When a user watches the audio and video files, the video website can accurately match the current playing progress of the audio and video files and display multi-language subtitles in real time, so that the user can better understand the audio and video contents.

At present, a video network station mainly adopts an off-line voice translation scheme and generates multi-language subtitles based on a video file. Specifically, the scheme calls a voice recognition and translation service to recognize the whole file through the whole voice file provided by the user, and the user can see the real-time caption result of the synchronization of the sound picture and the translation caption after the whole voice file is translated.

However, in the process of implementing the invention, the inventor finds that the technical scheme has at least the following problems: 1) for the newly added audio and video, because the voice translation subtitle of the newly added file is generated in an off-line voice translation mode, a user needs to wait for a certain time, and can only see the synchronous voice translation subtitle of the newly added audio and video after the system completes voice recognition and translation processing on the whole newly added file, but only the file without the subtitle can be watched before the whole newly added file is translated, and the real-time subtitle effect which can be seen by listening cannot be achieved; 2) off-line speech translation usually only generates translation subtitles of one common language, and cannot meet the subtitle viewing requirements of users of different languages. In summary, how to implement real-time speech translation to achieve the effect of synchronizing sound and picture with subtitles, and meet the viewing requirements of users with different languages, is a technical problem that needs to be solved urgently by technical personnel in the field.

Disclosure of Invention

The application provides a multimedia file playing system, which aims to solve the problem that subtitles cannot be displayed when a new file is watched in the prior art. The application further provides a multimedia file playing method and device, a voice translation model quality evaluation system and method and electronic equipment.

The application provides a multimedia file playing system, comprising:

and the server is used for determining the voice translation text through a voice translation model and returning the voice translation text to the client.

The application also provides a multimedia file playing method, which comprises the following steps:

extracting an audio stream corresponding to the playing progress aiming at the multimedia file currently played by the player;

sending the audio stream to a server;

and displaying the voice translation text of the audio stream returned by the server side in the player.

Optionally, the player comprises a browser player;

the extracting of the audio stream corresponding to the playing progress includes:

and acquiring the audio stream through a data stream capturing module of the browser player.

Optionally, the audio stream includes an audio stream of millisecond duration.

Optionally, the method further includes:

performing compression processing on the audio stream;

the sending the audio stream to a server includes:

and sending the compressed audio stream to the server.

Optionally, the compressing the audio stream is performed by at least one of the following methods:

performing a down-sampling process on the audio stream;

and performing gain reduction processing on the audio stream according to the volume data of the audio stream.

Optionally, the performing down-sampling processing on the audio stream includes:

determining a down-sampling rate;

and performing down-sampling processing on the audio stream according to the down-sampling rate.

Optionally, the player comprises a browser player;

the performing compression processing on the audio stream includes:

creating an audio input node according to the audio stream;

creating an audio handler for the audio stream according to the audio input node;

performing compression processing on the audio stream by an audio processing program.

Optionally, the extracting the audio stream corresponding to the playing progress includes:

extracting an audio stream to be played;

Optionally, the method further includes:

and sending target language information to the server so that the server translates the audio stream into a text of the target language.

The application also provides a multimedia file playing method, which comprises the following steps:

receiving an audio stream corresponding to the playing progress of a currently played multimedia file sent by a client;

determining a speech translation text of the audio stream through a speech translation model;

and returning the voice translation text to the client so that the client displays the voice translation text when playing the audio stream.

The present application further provides a multimedia file playing apparatus, including:

the audio stream extraction unit is used for extracting an audio stream corresponding to the playing progress aiming at the multimedia file currently played by the player;

the audio stream sending unit is used for sending the audio stream to a server;

and the text display unit is used for displaying the voice translation text of the audio stream returned by the server side in the player.

The present application further provides an electronic device, comprising:

a processor; and

The present application further provides a multimedia file playing apparatus, including:

the data receiving unit is used for receiving an audio stream which is sent by the client and corresponds to the playing progress of the currently played multimedia file;

the translation unit is used for determining a voice translation text of the audio stream through a voice translation model;

and the text returning unit is used for returning the voice translation text to the client so that the client displays the voice translation text when playing the audio stream.

The present application further provides an electronic device, comprising:

a processor; and

a memory for storing a program for implementing a multimedia file playing method, the device executing the following steps after being powered on and running the program of the method through the processor: receiving an audio stream corresponding to the playing progress of a currently played multimedia file sent by a client; determining a speech translation text of the audio stream through a speech translation model; and returning the voice translation text to the client so that the client displays the voice translation text when playing the audio stream.

The present application further provides a speech translation model quality evaluation system, including:

The application also provides a method for evaluating the quality of the voice translation model, which comprises the following steps: