Sound mixing system, method, server and storage medium based on voice room

文档序号：88024 发布日期：2021-10-08 浏览：36次中文

阅读说明：本技术 基于语音房的混音系统、方法、服务器和存储介质 (Sound mixing system, method, server and storage medium based on voice room ) 是由苏龙超成家雄钟少奋吴济宇黄金强于 2021-08-18 设计创作，主要内容包括：本发明公开了基于语音房的混音系统、方法、服务器和存储介质。其中,该系统包括：服务守护进程、媒体服务中心、媒体服务集群和混音服务集群；服务守护进程根据混音服务集群内每一混音服务器的服务信息,向媒体服务中心通知当前语音房在混音服务集群内的各关联混音服务器；媒体服务中心将各关联混音服务器对应分配给当前语音房在媒体服务集群内指向的各目标媒体服务器,以使每一目标媒体服务器所分配的关联混音服务器为该目标媒体服务器在当前语音房内的每一关联用户生成对应的混音语音流。本发明可以实现当前语音房内各用户在不同关联混音服务器上的分区混音,降低当前语音房在每一关联混音服务器上的混音开销,提高当前语音房混音的高效性。(The invention discloses a sound mixing system, a sound mixing method, a server and a storage medium based on a voice room. Wherein, this system includes: the system comprises a service daemon process, a media service center, a media service cluster and a sound mixing service cluster; the service daemon informs the media service center of each associated sound mixing server of the current voice room in the sound mixing service cluster according to the service information of each sound mixing server in the sound mixing service cluster; the media service center correspondingly allocates each associated mixing server to each target media server pointed by the current voice room in the media service cluster, so that the associated mixing server allocated by each target media server generates a corresponding mixing voice stream for each associated user of the target media server in the current voice room. The invention can realize the subarea sound mixing of each user in the current voice room on different associated sound mixing servers, reduce the sound mixing overhead of the current voice room on each associated sound mixing server and improve the sound mixing efficiency of the current voice room.)

1. A sound mixing system based on a speech room is characterized by comprising: the system comprises a service daemon process, a media service center, a media service cluster and a sound mixing service cluster, wherein the service daemon process manages service information of each sound mixing server in the sound mixing service cluster; wherein the content of the first and second substances,

the service daemon informs the media service center of each associated sound mixing server of the current voice room in the sound mixing service cluster according to the service information of each sound mixing server in the sound mixing service cluster;

and the media service center correspondingly allocates each associated mixing server to each target media server pointed by the current voice room in the media service cluster, so as to generate a corresponding mixing voice stream for each user associated with the target media server in the current voice room through the associated mixing server allocated by each target media server.

2. A mixing method based on speech room, which is applied in the mixing system based on speech room of claim 1, comprising:

when a current voice room is started, notifying a media service center of each associated voice mixing server of the current voice room in a voice mixing service cluster through service information of each voice mixing server in the voice mixing service cluster, wherein the service information of each voice mixing server in the voice mixing service cluster is stored in a service daemon process;

correspondingly allocating each associated mixing server to each target media server pointed by the current voice room in the media service cluster through the media service center;

and generating a corresponding audio-mixing voice stream for each user associated with each target media server in the current voice room through the associated audio-mixing server distributed by each target media server.

3. The method according to claim 2, wherein the generating, by the associated mixing server allocated to each target media server, a corresponding mixed voice stream for each user associated with the target media server in the current voice room comprises:

determining associated users of each target media server within the current voice room, the associated users including at least one of a boarding user and an audience of the current voice room;

forwarding the uplink voice stream of each uploading user in the associated user of the target media server to other target media servers through each target media server so that each target media server receives the uplink voice streams of all uploading users in the current voice room;

and aiming at each associated user of each target media server, mixing the uplink voice flow of each microphone user except the associated user in the current voice room through the associated mixing server distributed by the target media server to obtain the mixed voice flow of the associated user.

4. The method according to claim 2, wherein the notifying a media service center of each associated mixing server of the current speech room in the mixing service cluster through service information of each mixing server in the mixing service cluster when the current speech room is turned on comprises:

when a media service center detects that a current voice room is opened, sending a mixed sound distribution request of the current voice room to a service daemon process through the media service center;

responding to the mixing distribution request, finding out a corresponding idle mixing server from the mixing service cluster according to the service information of each mixing server in the mixing service cluster managed in the service daemon, taking the idle mixing server as the associated mixing server of the current voice room in the mixing service cluster, and informing the media service center.

5. The method according to claim 4, wherein the sending, by the media service center, the request for the mixing distribution of the current speech room to the service daemon includes:

and if the current voice room is a high-tone-quality voice room, or the number of the microphone users in the current voice room exceeds a preset threshold value, sending a mixing distribution request of the current voice room to the service daemon through the media service center.

6. The method according to claim 3, wherein for each associated user of each target media server, mixing uplink voice streams of users who go to the home in the current speech room except the associated user through an associated mixing server allocated by the target media server to obtain a mixed voice stream of the associated user, comprises:

if the relevant user of each target media server contains the audience, mixing the uplink voice streams of all the boarding users in the current voice room through the relevant mixing server distributed by the target media server to obtain the mixing voice stream of the audience;

if the associated user of each target media server comprises a first uploading user with an uplink voice stream, performing audio mixing on the uplink voice streams of other uploading users except the first uploading user in the current voice room through an associated audio mixing server distributed by the target media server to obtain an audio mixing voice stream of the first uploading user;

and if the associated users of each target media server include second boarding users without uplink voice streams, taking the mixed voice stream of the audience in the current voice room of the target media server as the mixed voice stream of the second boarding users.

7. The method according to claim 2, wherein the associated mixing servers allocated to each target media server are deployed in a master-standby manner; correspondingly, when a corresponding audio-mixing voice stream is generated for each user associated with each target media server in the current voice room through the associated audio-mixing server allocated by each target media server, the method further includes:

detecting the running state of a main associated mixing server distributed by each target media server through the heartbeat packet of each target media server, or detecting the load state of the main associated mixing server distributed by each target media server;

and if the main associated mixing server distributed by the target media server runs abnormally or the load is overhigh, executing the main-standby switching operation of the associated mixing server distributed by the target media server, and generating a corresponding mixing voice stream for each user associated with the target media server in the current voice room through the standby associated mixing server distributed by the target media server.

8. The method of claim 2, wherein a many-to-many distribution mode is adopted between the target media server and the associated mixing servers, one target media server supports distribution of a plurality of associated mixing servers, and one associated mixing server support is distributed to a plurality of target media servers.

9. A server, characterized in that the server comprises:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the speech room-based mixing method of any one of claims 2-8.

10. A computer-readable storage medium on which a computer program is stored, the program, when being executed by a processor, implementing the room-based mixing method according to any one of claims 2 to 8.

Technical Field

The embodiment of the invention relates to the technical field of voice live broadcast, in particular to a voice room-based sound mixing system, method, server and storage medium.

Background

Due to the real-time and interactive requirements of voice chat, voice rooms are widely applied to daily life of people, and rich theme playing methods in the voice rooms are enjoyed by more and more users. At this time, the server typically mixes the multiple voice streams reported by the voice uploading user in the voice room, so as to correspondingly send one voice stream after mixing to each user in the voice room, thereby reducing the occupied bandwidth when each user in the voice room receives the downlink voice stream.

At present, when multi-channel voice streams in the same voice room are subjected to voice mixing, a corresponding voice mixing server is generally allocated to the voice room first, and the voice mixing server performs voice mixing on the multi-channel voice streams reported by other voice users for each voice user, so as to obtain one channel of voice mixing stream corresponding to the voice user, and performs voice mixing on the multi-channel voice streams reported by all the voice users, so as to obtain one channel of voice mixing stream corresponding to a common listener.

However, when the multi-path voice stream corresponding to each microphone user and the general audience in the voice room is mixed by using the mixing server allocated to the voice room, there is a great expense in calculating the mixing, and the mixing efficiency of the multi-path voice stream is reduced, thereby affecting the real-time interaction in the voice room.

Disclosure of Invention

The embodiment of the invention provides a sound mixing system, a sound mixing method, a server and a storage medium based on a voice room, which can realize the partition sound mixing of each user in the current voice room on different associated sound mixing servers, greatly reduce the sound mixing overhead of the current voice room on each associated sound mixing server and improve the high efficiency of the sound mixing of the current voice room

In a first aspect, an embodiment of the present invention provides a mixing system based on a speech room, where the mixing system includes: the system comprises a service daemon process, a media service center, a media service cluster and a sound mixing service cluster, wherein the service daemon process manages service information of each sound mixing server in the sound mixing service cluster; wherein the content of the first and second substances,

In a second aspect, an embodiment of the present invention provides a mixing method based on a speech room, which is applied to the mixing system based on the speech room provided in the first aspect, and the method includes:

correspondingly allocating each associated mixing server to each target media server pointed by the current voice room in the media service cluster through the media service center;

In a third aspect, an embodiment of the present invention provides a server, where the server includes:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the speech room-based mixing method according to any embodiment of the present invention.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a speech room-based mixing method according to any embodiment of the present invention.

When the current speech room is started, the service daemon allocates a plurality of associated mixing servers for the current speech room from the mixing service cluster according to the service information of each mixing server in the mixing service cluster, then the media service center correspondingly allocates each associated mixing server to each target media server pointed by the current speech room in the media service cluster, so that each target media server is allocated with a corresponding associated mixing server, and at the moment, the associated mixing server allocated by each target media server generates a corresponding mixing speech stream for each associated user of the target media server in the current speech room, thereby realizing the partitioned mixing of each user in the current speech room on different associated mixing servers, need not to adopt same sound mixing server to generate the interior audio mixing speech flow of all users of pronunciation room, greatly reduce the audio mixing overhead of current pronunciation room on each relevant sound mixing server, improve the high efficiency of current pronunciation room audio mixing, and then each user's in the reinforcing current pronunciation room pronunciation efficiency down, the interactive real-time of pronunciation in the promotion current pronunciation room.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:

fig. 1A is a schematic architecture diagram of a mixing system based on a speech room according to an embodiment of the present invention;

fig. 1B is an exemplary schematic diagram of a speech room mixing process according to an embodiment of the present invention;

fig. 2 is a flowchart of a mixing method based on a speech room according to a second embodiment of the present invention;

fig. 3 is a flowchart of a mixing method based on a speech room according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of a server according to a fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures. In addition, the embodiments and features of the embodiments in the present invention may be combined with each other without conflict.

Example one

Fig. 1A is a schematic architecture diagram of a mixing system based on a speech room according to an embodiment of the present invention, which is applicable to a situation where uplink speech streams of each microphone user in any speech room are mixed correspondingly. Referring to fig. 1A, the mixing system based on a speech room in the present embodiment may include a service daemon 110, a media service center 120, a media service cluster 130, and a mixing service cluster 140.

Wherein the service daemon 110 manages service information of each mixing server within the mixing service cluster 140.

Specifically, the service daemon 110 notifies the media service center 120 of each associated mixing server of the current speech room in the mixing service cluster 140 according to the service information of each mixing server in the mixing service cluster 140; the media service center 120 correspondingly allocates each associated mixing server to each target media server pointed by the current voice room in the media service cluster 130, so that the associated mixing server allocated by each target media server generates a corresponding mixing voice stream for each user associated with the target media server in the current voice room.

In this embodiment, the service daemon 110 may be configured to maintain service discovery of each front-end process related to the voice room, which is denoted as ServerDaemon. Therefore, in order to accurately detect the working service condition of each server related to the voice room, the media service center 120, each media server in the media service cluster 130, and each mixing server in the mixing service cluster 140 initiate a corresponding registration request to the service daemon 110, so as to successfully register in the service daemon 110. Then, the service daemon 110 may record the service information of each media server in the media service center 120, the media service cluster 130, and each mixing server in the mixing service cluster 140 registered thereon in the corresponding databases in order to efficiently manage the service information of the respective servers.

Moreover, in consideration that the machine rooms in which the users in the voice room are located are set in different areas, when each user joins the voice room, the media servers corresponding to different machine rooms may be used to log in the voice room, so that the embodiment may use each media server in the media service cluster 130 to achieve user comprehensiveness of the voice room. The media service center 120 is used to manage the service work of each media server in the media service cluster 130, for example, each media server is assigned with a corresponding mixing server.

Further, in order to ensure the high efficiency of the sound mixing processing of the speech room, in this embodiment, a corresponding sound mixing service cluster 140 is set, and the service daemon 110 may allocate a corresponding sound mixing server to each media server in the media service cluster 130 by analyzing the working state, the load condition, and the like of each sound mixing server in the sound mixing service cluster 140, so as to perform the sound mixing operation of each user in the speech room by using a plurality of sound mixing servers.

The following describes an exemplary mixing process for generating a corresponding mixed voice stream for each user in a currently opened voice room in this embodiment:

first, the media service center 120 may detect whether there is a currently opened voice room in real time, and after detecting an opening instruction of a certain voice room, the voice room is used as the current voice room in this embodiment, and determine room attribute information such as a room type and a room identifier (groupId) of the current voice room. At this time, after the current voice room is opened, users participating in the voice chat are continuously added, wherein the users include the boarding user and the ordinary audience. Since the areas of the machine rooms in which the users in the current voice room are located are different, the users in the current voice room can log in the current voice room through a plurality of different media servers in the media service cluster 130. Therefore, the media service center 120 may find out the target media servers pointed by the current voice room from the media service cluster 130 according to the login information of the users in the current voice room, and according to the user login information, each target media server may have a plurality of associated users in the current voice room, and each associated user joins the current voice room through the target media server.

Then, when determining that a corresponding mixing operation needs to be performed on the uplink voice stream of the user in the current voice room, the media service center 120 may require to allocate a corresponding mixing server to each target media server pointed by the current voice room in the media service cluster 130, so as to perform a corresponding mixing operation on the uplink voice stream sent by each microphone user and received by each target media server.

For example, each target media server pointed by the current speech room in the media service cluster 130 may initiate a corresponding mixing distribution request to the media service center 120 to request to distribute the corresponding mixing server to each target media server. The media service center 120 forwards the mixing distribution request of the current speech room to the service daemon 110, and the service daemon 110, in response to the mixing distribution request, analyzes the service information of each mixing server in the mixing service cluster 140 managed thereon to determine the working state and load condition of each mixing server in the mixing service cluster 140. Then, the service daemon 110 may screen, according to the working state and load condition of each mixing server in the mixing service cluster 140, a plurality of mixing servers suitable for performing corresponding mixing operation for the current speech room from the mixing service cluster 140, as each associated mixing server of the current speech room in the mixing service cluster 140 in this embodiment, and notify each associated mixing server to the media service center 120, so that the media service center 120 is connected to each associated mixing server. Then, each associated mixing server reports the speech room identifier (S _ ID) of the associated mixing server that is responsible for mixing to the media service center 120, so as to know the mixing operation of each associated mixing server.

Finally, the media service center 120 correspondingly allocates each associated mixing server to each target media server pointed by the current speech room in the media service cluster 130 according to the mixing working condition of each associated mixing server, so that each target media server can be connected with the allocated associated mixing server. Then, the associated mixing servers allocated by each target media server can determine each associated user of the target media server in the current voice room, that is, all users of the current voice room are correspondingly divided into different associated mixing servers according to the target media server where the user is located, and corresponding mixing processing is performed, so that the associated mixing server allocated by each target media server only needs to generate the mixing voice stream of each associated user of the target media server in the current voice room, and does not need to generate the mixing voice streams of each associated user of other target media servers in the current voice room, thereby correspondingly distributing the mixing processing flows of all users in the current voice room to different associated mixing servers for execution, and realizing the partitioned mixing of each user in the current voice room on different associated mixing servers, the voice mixing flows of all users in the voice room are generated without adopting the same voice mixing server, and the voice mixing overhead of the current voice room on each associated voice mixing server is greatly reduced.

In addition, in order to ensure the audio mixing accuracy of the current audio room for each user, in this embodiment, after receiving the uplink audio stream sent by the customer on the customer in the associated user of the target media server in the current audio room, each target media server forwards the received uplink audio stream to other target media servers, so that each target media server can receive the uplink audio stream sent by each customer on the customer in the current audio room, and the audio mixing comprehensiveness of the current audio room is ensured.

Illustratively, as shown in fig. 1B, if the current voice party includes 3 boarding users and 2 listeners, the boarding user 1 and the boarding user 2 join the current voice room through the target media server 1, the boarding user 3, the listener 4 and the listener 5 join the current voice room through the target media server 2, the target media server 1 is assigned with the associated mixing server 1, and the target media server 2 is assigned with the associated mixing server 2. At this time, the target media server 1 may receive two uplink voice streams sent by the microphone user 1 and the microphone user 2, the target media server 2 may receive one uplink voice stream sent by the microphone user 3, and the target media server 1 and the target media server 2 forward the received uplink voice streams to each other, so that the target media server 1 and the target media server 2 can receive three uplink voice streams sent by the microphone users 1, 2, and 3. Then, the associated audio mixing server 1 performs audio mixing processing on the two uplink voice streams sent by the microphone user 2 and the microphone user 3 to generate an audio mixing voice stream of the microphone user 1, and the associated audio mixing server 1 performs audio mixing processing on the two uplink voice streams sent by the microphone user 1 and the microphone user 3 to generate an audio mixing voice stream of the microphone user 2 without generating audio mixing voice streams of the microphone user 3, the audience 4 and the audience 5; however, the associated mixing server 2 performs mixing processing on two uplink voice streams sent by the microphone user 1 and the microphone user 2 to generate a mixed voice stream of the microphone user 3, and the associated mixing server 1 further performs mixing processing on three uplink voice streams sent by the microphone user 1, the microphone user 2 and the microphone user 3 to generate mixed voice streams of the listeners 4 and the listeners 5 without generating the mixed voice streams of the microphone user 1 and the microphone user 2, thereby implementing the partition mixing of each user in the current voice room on different associated mixing servers.

It should be noted that, in order to ensure high availability of the associated mixing servers allocated to each target media server, in this embodiment, when the media service center 120 correspondingly allocates each associated mixing server to each target media server pointed by the current voice room in the media service cluster 130, the associated mixing servers allocated to each target media server may be deployed in a master-standby manner. That is, each target media server may be allocated at least two associated mixing servers, which are divided into a primary associated mixing server and a standby associated mixing server. In the process of mixing the downstream voice streams of each user in the current voice room, each target media server sends a corresponding heartbeat packet to the allocated main associated mixing server to detect the running state of the main associated mixing server, or, the main associated mixing server distributed by each target media server detects the load state of the server, so that when the primary associated mixing server allocated by a certain target media server runs abnormally or is overloaded, can execute the main-standby switching operation of the associated mixing servers distributed by the target media server, switch and connect the target media server from the main associated mixing server to the standby associated mixing server, and generating a corresponding audio-mixing voice stream for each associated user of the target media server in the current voice room by using the standby associated audio-mixing server distributed by the target media server.

Furthermore, when the media service center 120 correspondingly allocates an associated mixing server for each target media server, a many-to-many allocation mode may be adopted between the target media server and the associated mixing server, that is, one target media server supports allocation of a plurality of associated mixing servers, and one associated mixing server supports allocation of a plurality of target media servers.

In addition, when the media service center 120 allocates a certain associated mixing server to a certain target media server, the associated mixing server may select whether to accept the allocation connection with the target media server according to its own load condition. However, when there is no available associated mixing server in the mixing service cluster 140, the media service center 120 cannot allocate a corresponding mixing server to each target media server, and each target media server may periodically initiate a corresponding mixing allocation request to the media service center 120, so that when there is an available associated mixing server in the mixing service cluster 140, it can be allocated to each target media server in time.

Also, each associated mixing server of the current speech room in the mixing service cluster 140 requested by the media service center 120 from the service daemon 110 may be an idle mixing server in the mixing service cluster 140.

Meanwhile, in the embodiment, whether the current speech party needs to perform sound mixing or not is judged, and the judgment can be determined by two modes, namely the room type of the current speech room and the number of the users who get on the microphone.

1) If the current speech room is a high-tone speech room, it indicates that the code rate of the current speech party for the speech stream is high, and therefore, in order to reduce the occupied traffic of the downlink speech stream and ensure the real-time performance of speech room interaction, the downlink speech stream of each user in the current speech party needs to be mixed, and therefore, the media service center 120 sends a mixed-sound allocation request of the current speech room to the service daemon 110.

2) When the number of the users who go to the microphone in the current speech room exceeds the preset threshold, it indicates that the uplink speech flow of the current speech room is more, and in order to ensure the real-time interaction of the speech room, the downlink speech flow of each user in the current speech room needs to be subjected to audio mixing processing, so that the media service center 120 sends an audio mixing allocation request of the current speech room to the service daemon 110.

It should be noted that the boarding status of each user in the current speech room can be judged by two modes, namely the boarding signaling and the uplink speech stream of each user in the current speech room, so as to ensure the boarding accuracy of the users in the current speech room, and avoid the situation that the uplink speech stream of a certain boarding user is omitted during the audio mixing process, which causes errors in the audio mixing.

In the technical solution provided in this embodiment, when the current speech room is started, the service daemon allocates a plurality of associated mixing servers from the mixing service cluster to the current speech room according to the service information of each mixing server in the mixing service cluster, and then the media service center allocates each associated mixing server to each target media server pointed by the current speech room in the media service cluster, so that each target media server is allocated with a corresponding associated mixing server, and at this time, the associated mixing server allocated by each target media server generates a corresponding mixing speech stream for each associated user of the target media server in the current speech room, thereby implementing zoned mixing of each user in the current speech room on different associated mixing servers, and there is no need to generate mixing speech streams of all users in the speech room by using the same mixing server, the voice mixing cost of the current voice room on each associated voice mixing server is greatly reduced, the high efficiency of the current voice room voice mixing is improved, the voice downlink efficiency of each user in the current voice room is further enhanced, and the real-time performance of voice interaction in the current voice room is improved.

Example two

Fig. 2 is a flowchart of a mixing method based on a speech room according to a second embodiment of the present invention. The embodiment can be applied to the corresponding mixing of the uplink voice streams of each microphone user in any voice room, and can be applied to the mixing system based on the voice room provided by the embodiment. The sound mixing method based on the voice room provided by the embodiment of the invention can be executed by the server provided by the embodiment of the invention, and the server can be a corresponding service cluster and is formed by jointly integrating a service daemon process, a media service center, a media service cluster and a sound mixing service cluster which execute corresponding functions.

Specifically, as shown in fig. 2, the method may include the following steps:

and S210, when the current voice room is started, notifying the media service center of each associated voice mixing server of the current voice room in the voice mixing service cluster through the service information of each voice mixing server in the voice mixing service cluster managed in the service daemon.

In this embodiment, the media service center may detect whether there is a currently opened voice room in real time, and after detecting an opening instruction of a certain voice room, the voice room is used as the current voice room in this embodiment.

At this time, after the current voice room is opened, users participating in the voice chat are continuously added, wherein the users include the boarding user and the ordinary audience. In order to ensure the comprehensiveness of the user who gets to the microphone in the current voice room, the embodiment can detect the microphone signaling and the uplink voice stream of each user in the current voice room in real time to set the microphone state of the user in the current voice room, that is, the microphone state of each user in the current voice room is set in two ways, that is, the problem of audio mixing error of each user in the current voice room caused by the omission of the microphone user in the current voice room is avoided.

Because the areas of the machine rooms in which the users in the current voice room are located are different, the users in the current voice room can log in the current voice room through a plurality of different media servers in the media service cluster. Therefore, the media service center can find out each target media server pointed by the current voice room from the media service cluster according to the login information of each user in the current voice room, and each target media server has a plurality of associated users in the current voice room according to the user login information, and each associated user is added into the current voice room through the target media server.

Specifically, when it is determined that a corresponding mixing operation needs to be performed on an uplink voice stream of a user in a current voice room through the media service center, it is required to allocate a corresponding mixing server to each target media server pointed by the current voice room in the media service cluster, so as to perform a corresponding mixing operation on the uplink voice stream sent by each microphone user and received by each target media server.

Therefore, a mixing distribution request of the current voice room is sent to the service daemon through the media service center, and then the service daemon analyzes the service information of each mixing server in the mixing service cluster managed on the service daemon in response to the mixing distribution request so as to judge the working state, the load condition and the like of each mixing server in the mixing service cluster. Then, according to the working state and load condition of each mixing server in the mixing service cluster, the service daemon can screen out a plurality of mixing servers suitable for performing corresponding mixing operation for the current voice room from the mixing service cluster, and the mixing servers serve as each associated mixing server of the current voice room in the mixing service cluster in the embodiment, and notify each associated mixing server to the media service center, so as to connect with each associated mixing server through the media service center. Then, each associated mixing server reports the speech room identifier (S _ ID) of the associated mixing server that is already responsible for mixing to the media service center, so as to obtain the mixing operation of each associated mixing server.

And S220, correspondingly distributing each associated mixing server to each target media server pointed by the current voice room in the media service cluster through the media service center.

After the media service center obtains the sound mixing working condition of each associated sound mixing server, each associated sound mixing server can be correspondingly allocated to each target media server pointed by the current voice room in the media service cluster through the media service center according to the sound mixing working condition of each associated sound mixing server, so that each target media server can be connected with the allocated associated sound mixing server.

At this time, in order to ensure high availability of the associated mixing servers allocated to each target media server, in this embodiment, when the media service center correspondingly allocates each associated mixing server to each target media server pointed by the current voice room in the media service cluster, the associated mixing servers allocated to each target media server may be deployed in a master-standby manner. That is, each target media server may be allocated at least two associated mixing servers, which are divided into a primary associated mixing server and a standby associated mixing server. In the process of mixing the downstream voice streams of each user in the current voice room, the present embodiment may detect the operating state of the main associated mixing server allocated by each target media server through the heartbeat packet of the target media server, or detect the load state of the main associated mixing server allocated by each target media server; and if the main associated mixing server distributed by the target media server runs abnormally or the load is overhigh, executing the main-standby switching operation of the associated mixing server distributed by the target media server so as to generate a corresponding mixing voice stream for each associated user of the target media server in the current voice room through the standby associated mixing server distributed by the target media server.

That is to say, each target media server sends a corresponding heartbeat packet to the allocated primary associated mixing server to detect the operating state of the primary associated mixing server, or each target media server allocates the primary associated mixing server to detect its own load state, so that when the primary associated mixing server allocated by a certain target media server operates abnormally or has an excessively high load, the primary/standby switching operation of the associated mixing server allocated by the target media server can be executed, the target media server is switched from the primary associated mixing server to the standby associated mixing server, and a corresponding mixed voice stream is generated for each associated user of the target media server in the current voice room by the standby associated mixing server allocated by the target media server.

Moreover, when the media service center correspondingly allocates the associated mixing servers for each target media server, a many-to-many allocation mode can be adopted between the target media server and the associated mixing servers, that is, one target media server supports allocation of a plurality of associated mixing servers, and one associated mixing server supports allocation of a plurality of target media servers.

In addition, when a certain associated mixing server is allocated to a certain target media server through the media service center, the associated mixing server can select whether to accept allocation connection with the target media server according to self load condition. However, when there is no available associated mixing server in the mixing service cluster, it is impossible to allocate a corresponding mixing server to each target media server through the media service center, and each target media server may periodically initiate a corresponding mixing allocation request to the media service center, so that when there is an available associated mixing server in the mixing service cluster, it can be timely allocated to each target media server.

And S230, generating a corresponding audio-mixing voice stream for each user associated with each target media server in the current voice room through the associated audio-mixing server distributed by each target media server.

In this embodiment, the associated mixing servers allocated to each target media server determine each associated user of the target media server in the current voice room, that is, all users of the current voice room are divided into different associated mixing servers according to the target media server where the user is located, and perform corresponding mixing processing, so that the associated mixing server allocated to each target media server only needs to generate a mixing voice stream of each associated user of the target media server in the current voice room, and does not need to generate mixing voice streams of each associated user of other target media servers in the current voice room, thereby correspondingly distributing the mixing processing flows of all users in the current voice room to different associated mixing servers for execution, and implementing partitioned mixing of each user in the current voice room on different associated mixing servers, the voice mixing flows of all users in the voice room are generated without adopting the same voice mixing server, and the voice mixing overhead of the current voice room on each associated voice mixing server is greatly reduced.

In the technical solution provided in this embodiment, when a current speech room is started, a service daemon allocates a plurality of associated mixing servers to the current speech room from a mixing service cluster according to service information of each mixing server in the mixing service cluster, and then allocates each associated mixing server to each target media server pointed by the current speech room in a media service cluster through a media service center, so that each target media server is allocated with a corresponding associated mixing server, at this time, the associated mixing server allocated by each target media server generates a corresponding mixing speech stream for each associated user of the target media server in the current speech room, thereby implementing partitioned mixing of each user in the current speech room on different associated mixing servers, and generating mixing speech streams of all users in the speech room without using the same mixing server, the voice mixing cost of the current voice room on each associated voice mixing server is greatly reduced, the high efficiency of the current voice room voice mixing is improved, the voice downlink efficiency of each user in the current voice room is further enhanced, and the real-time performance of voice interaction in the current voice room is improved.

EXAMPLE III

Fig. 3 is a flowchart of a mixing method based on a speech room according to a third embodiment of the present invention. The embodiment is optimized on the basis of the embodiment. As shown in fig. 3, the present embodiment mainly explains in detail a specific mixing process of mixing voice streams corresponding to users in a current voice room.

Optionally, as shown in fig. 3, the method may include the following steps:

and S310, when the media service center detects that the current voice room is opened, sending a mixing distribution request of the current voice room to the service daemon through the media service center.

Optionally, in this embodiment, the media service center detects whether there is a currently opened speech room in real time, and takes the currently opened speech room as the current speech room in this embodiment. When the current voice room is detected to be opened and the current voice room is determined to need to execute the voice mixing operation, the voice mixing allocation request of the current voice room is sent to the service daemon through the media service center, so that the service daemon responds to the voice mixing allocation request and allocates a corresponding associated voice mixing server for the current voice room from the voice mixing service cluster.

It should be noted that, in this embodiment, whether the current speech party needs to perform sound mixing or not may be determined by two ways, i.e., the room type of the current speech room and the number of users who have logged in the microphone. And if the current voice room is a high-tone-quality voice room or the number of the microphone users in the current voice room exceeds a preset threshold value, sending a mixed sound distribution request of the current voice room to a service daemon through the media service center.

Illustratively, when the current voice room is switched from the low-tone-quality voice room to the high-tone-quality voice room, a corresponding mixed sound distribution request is generated to apply for mixed sound service; when the current voice room is switched from the high-tone-quality voice room to the low-tone-quality voice room, a corresponding mixed sound cancellation request is also generated to release mixed sound service.

When the number of the users who have got the voice at the current voice room exceeds a preset threshold value, the current voice room is switched to a voice mixing mode, and at the moment, the states of all the users who have got the voice in the associated users of each target media server are synchronized to the associated voice mixing server distributed by each target media server, so that voice mixing processing is carried out on each associated user of the target media server; and when the number of the users who go to the microphone in the current voice room is lower than the preset threshold, the current voice room is switched to a non-voice mixing mode, that is, the current voice room does not need to carry out voice mixing processing, and at the moment, the corresponding voice-off message is notified to the associated voice mixing server distributed by each target media server, so that the associated voice mixing server distributed by each target media server can exit the current voice room and destroy the voice mixing resources before the current voice room.

And S320, responding to the mixing distribution request, finding out a corresponding idle mixing server from the mixing service cluster according to the service information of each mixing server in the mixing service cluster managed in the service daemon, using the idle mixing server as a related mixing server of the current voice room in the mixing service cluster, and informing the media service center.

In this embodiment, in order to ensure the efficiency of mixing sound of the current speech room, in response to the mixing sound allocation request, the embodiment analyzes the service information of each mixing sound server in the mixing sound service cluster managed in the service daemon, determines the working state and load condition of each mixing sound server, further finds out a corresponding idle mixing sound server from the mixing sound service cluster, uses each idle mixing sound server as an associated mixing sound server of the current speech room in the mixing sound service cluster, and notifies the media service center, so that each associated mixing sound server is correspondingly allocated to each target media server pointed by the current speech room in the media service cluster through the media service center.

S330, correspondingly distributing each associated mixing server to each target media server pointed by the current voice room in the media service cluster through the media service center.

And S340, determining the associated users of each target media server in the current voice room.

Optionally, when performing audio mixing processing for each associated user of each target media server in the current voice room through the associated audio mixing server allocated to each target media server, firstly, the associated user of each target media server in the current voice room is determined according to the login information of each user in the current voice room, and the associated user of each target media server is added into the current voice room through the target media server. Wherein the associated user of each target media server may comprise at least one of a boarding user and an audience of the current voice room.

And S350, forwarding the uplink voice stream of each boarding user in the associated user of the target media server to other target media servers through each target media server so that each target media server receives the uplink voice streams of all boarding users in the current voice room.

After determining the associated user of each target media server in the current voice room, each target media server receives an uplink voice stream sent by each customer in the associated user of the target media server. Therefore, in order to ensure the audio mixing accuracy of the current audio room for each user, in this embodiment, after receiving the uplink audio stream sent by each boarding user in the associated user of the target media server in the current audio room, each target media server forwards the received uplink audio stream to other target media servers, so that each target media server can receive the uplink audio stream sent by each boarding user in the current audio room, and the audio mixing comprehensiveness of the current audio room is ensured.

And S360, aiming at each associated user of each target media server, mixing the uplink voice flow of each microphone user except the associated user in the current voice room through the associated mixing server distributed by the target media server to obtain the mixed voice flow of the associated user.

Optionally, after the associated mixing server allocated by each target media server receives the uplink voice stream of each boarding user in the current voice room, for each associated user of the target media server, the associated mixing server allocated by the target media server performs mixing processing on the uplink voice streams of all boarding users except the associated user, so as to generate a mixed voice stream of the associated user. At this time, the above steps are respectively executed for each associated user through the associated mixing server allocated by the target media server, so that the mixed voice stream of each associated user of the target media server can be obtained.

It should be noted that, the associated users of each target media server in the current voice room are divided into two types, namely, a user going to the home and a listener, and the user going to the home also has two types, namely, sending the uplink voice stream and not sending the uplink voice stream. At this time, the mixing process for different types of associated users of each target media server in this embodiment can be divided into the following three types:

1) and if the associated user of each target media server contains the audience, mixing the uplink voice streams of all the boarding users in the current voice room through the associated mixing server distributed by the target media server to obtain the mixed voice stream of the audience.

For each listener associated with each target media server, the upstream voice streams of all the boarding users in the current voice room need to be mixed by the associated mixing server allocated by the target media server, so as to obtain the mixed voice stream of the listener, thereby ensuring the comprehensiveness of the listener mixing. The listener's mixed voice stream is then distributed to each listener within the associated user of the target media server via the target media server.

2) And if the associated user of each target media server comprises a first uploading user with an uplink voice stream, performing voice mixing on the uplink voice streams of other uploading users except the first uploading user in the current voice room through the associated voice mixing server distributed by the target media server to obtain the voice mixing stream of the first uploading user.

For each first uploading user with an uplink voice stream contained in the associated user of each target media server, the uplink voice stream of the first uploading user does not need to be mixed, so that the associated mixing server distributed by the target media server can mix the uplink voice streams of other uploading users in the current voice room except the first uploading user, and the mixed voice stream of the first uploading user can be obtained. And executing the steps for each first microphone user with an uplink voice stream contained in the associated user of the target media server through the associated audio mixing server distributed by the target media server, so as to obtain the audio mixing voice stream of each first microphone user in the associated user of the target media server.

3) And if the associated users of each target media server comprise second voice-up users without the uplink voice stream, taking the mixed voice stream of the audience in the current voice room of the target media server as the mixed voice stream of the second voice-up users.

For each second uploading user which is contained in the associated user of each target media server and has no uplink voice stream, the audio mixing processing of the second uploading user is consistent with the audio mixing processing of the audience in the target media server, so in order to avoid the consumption of the audio mixing processing, the audio mixing voice stream of the audience in the current voice room of the target media server can be directly used as the audio mixing voice stream of the second uploading user, no additional audio mixing processing is needed to be carried out on the second uploading user, and the high efficiency of audio room audio mixing is ensured.

Example four

Fig. 4 is a schematic structural diagram of a server according to a fourth embodiment of the present invention, as shown in fig. 4, the server includes a processor 40, a storage device 41, and a communication device 42; the number of the processors 40 in the server may be one or more, and one processor 40 is taken as an example in fig. 4; the processor 40, the storage device 41 and the communication device 42 in the server may be connected by a bus or other means, and the bus connection is taken as an example in fig. 4.

The server provided by the embodiment can be used for executing the mixing method based on the speech room provided by any embodiment, and has corresponding functions and beneficial effects.

EXAMPLE five

Fifth, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, can implement the sound mixing method based on a speech room in any of the above embodiments.

Of course, the storage medium containing the computer-executable instructions provided by the embodiments of the present invention is not limited to the method operations described above, and may also perform related operations in the mixing method based on the speech room provided by any embodiment of the present invention.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

17页详细技术资料下载

Sound mixing system, method, server and storage medium based on voice room

相关技术

网友询问留言