method for conducting an audio and/or video conference

文档序号：1786421 发布日期：2019-12-06 浏览：12次中文

阅读说明：本技术 用于进行音频和/或视频会议的方法 (method for conducting an audio and/or video conference ) 是由 K.克拉格霍费尔于 2018-04-12 设计创作，主要内容包括：一种用于进行音频和/或视频会议的方法包含：终端设备中的一个与中央会议控制装置(K-A)耦合的终端设备(EP 1K)承担媒体服务器的角色,其中这在中央会议控制装置(K-A)的控制下进行。(A method for conducting an audio and/or video conference comprising: one of the terminal devices (EP 1K) coupled to the central conference control device (K-A) assumes the role of a media server, wherein this takes place under the control of the central conference control device (K-A).)

1. A method for conducting an audio and/or video conference, wherein a plurality of terminal devices (EP 1K, EP2, EP 3) are coupled via a data network (14) to a central conference control device (K-A), wherein at least one first terminal device (EP 1K) of the terminal devices (EP 1K, EP2, EP 3) comprises a data processing device which enables the terminal device (EP 1K) to participate in the conference by running an application (24, 34), in particular a browser application (34), characterized in that the terminal device (EP 1K) of the at least one first terminal device receives audio and/or video data streams from the other terminal devices (EP 2, EP 3) at least when a predetermined criterion is fulfilled on the basis of control instructions of the central conference control device (K-A), and by means of an application (34), in particular a browser application,

(a) The audio and/or video data stream is recorded in a mixed manner and is transmitted to the other terminal equipment in a mixed manner

(b) The selection of the received audio and/or video data is handed over to the other terminal device.

2. The method of claim 1, wherein the application is a Web Real Time controlled, Webrtc, Web Real-Time Control, browser application (34).

3. A method according to claim 2 or 3, characterized in that a predetermined terminal device (EP 1K) of at least one terminal device (EP 1K, EP2, EP 3) receives audio and/or video streams from all other terminal devices (EP 2, EP 3) and mixedly records and/or selects said audio and/or video streams.

4. Method according to claim 3, characterized in that all audio and/or video data streams flow only through a predetermined terminal device (EP 1K) of the at least one first terminal device (EP 1K, EP2, EP 3).

5. Method according to claim 1 or 2, characterized in that a predetermined terminal device of the at least one first terminal device receives audio and/or video data streams from only a subset of the other terminal devices and that the audio and/or video data streams are mixed recorded and/or that the audio and/or video data streams are selected.

6. Method according to one of claims 3 to 5, characterized in that said predetermined criteria comprise: other terminal devices from which audio and/or payload data streams are received are coupled to one another in a common Local Area Network (10), i.e. LAN, Local Area Network.

7. Method according to one of the preceding claims, characterized in that any terminal device (EP 1K, EP2, EP 3) has to register (S10, S20, S28) at the central conference control (K-a) in order to participate in a conference; and predetermined terminal devices (EP 1K) of at least one terminal device (EP 1K, EP2, EP 3) transmit information to the central conference control device (K-A) that the terminal device can receive audio and/or video data streams, and

(a) mixedly recording the audio and/or video data streams and/or (b) selecting the audio and/or video data streams; and the predetermined criteria comprises: this information has already been transmitted.

8. A computer program product for providing and extending an application, in particular a browser application, on a first data processing device (EP 1K), the computer program product being designed to:

-giving said first data processing device (EP 1K) the following capabilities: audio and/or video data are received from one or more second data processing means (EP 2, EP 3), and the audio and/or video data received on the one hand from the second data processing means and on the other hand provided by the first data processing means and/or received from other second data processing means are mixed and recorded and/or selected, and the mixed and/or selected audio and/or video data are forwarded to at least one second data processing means (EP 2, EP 3).

9. A computer program product for providing and extending an application, in particular a browser application, on a second data processing apparatus, the computer program product being designed to:

-assigning the second data processing device (EP 2, EP 3) the following capabilities: exchange control signals with a central data processing device (K-A) and transmit audio and/or video signals to a first data processing device (EP 1K) outside the central data processing device (K-A).

10. A computer program product for providing and extending applications, in particular multi-party conferencing applications, on a third data processing device, the computer program product being designed to:

-giving the third data processing device the ability to: -obtaining information from a first data processing device (EP 1K) stating that said first data processing device is capable of receiving audio and/or video data from a second data processing device (EP 2, EP 3) and mix-recording and/or selecting said audio and/or video data and forwarding the mix-recorded and/or selected audio and/or video data to at least one second data processing device (EP 2, EP 3), wherein said third data processing device (K-a), upon obtaining such information, transfers the task of such mix-recording and/or selecting to said first data processing device (EP 1K).

Technical Field

the invention relates to a method for carrying out an audio and/or video conference, wherein a plurality of terminal devices are coupled to a central conference control device via a data network. At least one predetermined terminal of the terminals comprises a data processing device which enables the terminal to participate in the conference by running an application program (application), in particular a Browser application program (Browser-application). Typically, such a predetermined terminal device is a personal computer having an embedded camera and an embedded microphone (or connecting a camera and a microphone) and a screen and a speaker (or connecting). The data processing device can likewise be a correspondingly equipped tablet computer device, a smartphone or the like portable terminal device. With the help of a browser, a personal computer (or other device) has the ability to participate in a meeting: images are taken with a camera and expressions of language are detected with a microphone. The image of the user and/or the image of the other participants are displayed on the screen, and voice output is performed through a microphone.

Background

Common to date are: a plurality of participants (clients) are registered by means of their browsers on a central conference control apparatus, which is usually in the cloud. This is exemplarily illustrated here in accordance with fig. 1.

Fig. 1 shows an enterprise network 10 in which the individual clients EP1, EP2 and EP3 (Webrtc browser, web real-time controlled browser) jointly wish to carry out a multiparty conference and in this case register in the Public Cloud (Public Cloud) 12 on a central conference control device K-a, where a Konferenz-application runs under Webrtc. The registration is performed by means of a signalling path shown by a solid line in fig. 1. I.e. the signalling is from the respective client or browser EP1, EP2, EP3 to the conferencing application K-a. Now, the conferencing application K-a provides the resources through which audio and/or video conferencing can take place. This resource is called MediaServer, part also called MediaNode, while in fig. 1 it is abbreviated by "MS" for MediaServer. The individual audio and video data packets are transmitted via a data line 16 drawn with a dashed line to the media server MS, mixed recorded by the media server and re-sent back to the individual clients for providing audio and/or video conference data. Instead of hybrid recording, it is also possible to "select" (i.e. select from the received audio or video data channels, forward the received data streams to the respective clients, which are then called Selective Forwarding Units (SFUs)) to arrange the data streams in the clients themselves. For example, the corresponding speaking person may be displayed and the person who remains silent is not selected, but is filtered out.

The disadvantage in the case of the procedure according to fig. 1 is that: the audio and video data leaves enterprise network 10 for the purpose of a multi-party conference. Thus, secure data exchange may be compromised anyway. Furthermore, more resources are required for data transmission than when the data transmission would be performed within the network.

a conference server is known from US 2014/0122600 a1, which takes on the task of hybrid recording of audio or video data. Each browser running in JavaScript has the corresponding capability to attend the conference, but does not have its own mixer.

EP 1014666 a1 discloses a method for implementing a multipoint connection between a plurality of terminal devices of an h.323 data communication network, i.e. a multiparty conference, wherein a conference unit causes the opening of an active data channel, i.e. an audio and video data channel, between the terminal devices via the conference unit.

EP 1001596 a2 describes a multimedia terminal for calls, which is itself capable of multipoint connection.

It is described in US 2004/0210637 a1 to provide external conference bridges in addition to other paths.

On 3.4.2017, a paper is available under computer links https:// Webrtchacks. com/web-audio-conference/from which the following principles can be learned: the browser assumes the role of a central conference control device for conducting local audio conferences. The paper mentions "the conference solution of Poor-Man" because no server is used. From this paper it is not known how such an audio conference can be held.

disclosure of Invention

The object of the present invention is to provide a method for audio and/or video conferencing, which is preferably implemented by means of clients having applications, in particular browser applications, and in which, in the case of the method, there is as efficient a use of the data network as possible and/or as optimal a security as possible in the exchange of audio and/or video data (payload data).

In one aspect, this task is solved by a method according to claim 1, in other aspects this task is solved by a computer program product according to claim 8, according to claim 9 and according to claim 10.

The method according to the invention for audio and/or video conferencing (in which a plurality of terminal devices are coupled to a central conference control device via a data network and in which at least one first terminal device comprises a data processing device which enables the terminal devices to participate in a conference by running an application, in particular a browser application) has the features according to the invention: a predetermined terminal of the at least one first terminal receives audio and/or video data streams (i.e. active data streams) from the other terminals at least when a predetermined criterion is fulfilled on the basis of control commands (signaling) of the central conference control device, and

a) Hybrid recording of audio and/or video data streams (i.e. useful data streams), in particular also in the case of audio and/or video data streams generated by the first terminal device itself, by means of an application, in particular a browser application, and transmission of the audio and/or video data streams (i.e. useful data streams) to the further terminal device and/or to the further terminal device in the form of hybrid recording

b) The selection of the received audio and/or video data and of the received audio and/or video data generated by the at least one first terminal device is transferred to the further terminal device.

The invention is thus based on the following principle: using a (centralized or distributed) media server outside the client, which is typically the case in centralized conferences; more precisely, one of the clients (i.e. the predetermined one of the at least one first terminal device) itself acts as a media server or media node, which is usually only the case in conferencing methods without a conference server. Thus, data exchange can be performed between the respective browsers or the associated terminal devices, without however having to forego the advantages of a central conference control (with corresponding conference room User Experience, User Authentication, etc.). Thus, if, for example, all terminal devices are within the same enterprise network, security-sensitive audio and/or video data does not have to leave the enterprise network. That is to say, otherwise, if audio and video data are transmitted encrypted under the Webrtc standard between the client and the central Webrtc conference server, the audio packets must be decrypted by the conference server (in the cloud), for example, in order to be mixedly recordable, so that they can be sent encrypted again to the client. That is, there is often no complete "end-to-end privacy". It can be seen from this that: for privacy reasons, it may be advantageous to keep the media data local in the LAN. A further advantage is the significant reduction in required WAN bandwidth, i.e. bandwidth between the residence/enterprise network/client and the data centre of the cloud service provider of the Webrtc conferencing solution.

Preferably, the application is a Webrtc (Web Real-Time Control) browser application, i.e. a Web Real-Time controlled browser application. Thus, the application may be constructed based on Webrtc technology. One predetermined terminal device of the at least one terminal device only has to be provided with a suitable application plug-in. Here, the application plug-ins cannot be confused with browser plug-ins, since Webrtc browsers (e.g., Google Chrome, Mozilla Firefox, etc.) do not require browser plug-ins, at least for the underlying Webrtc functionality, by definition in the W3C Webrtc Standard (World Wide Web Consortium).

In one variant, one of the at least one first terminal receives audio and/or video data streams from all other terminals and the audio and/or video data streams are mixed and recorded or selected. The entire conference is therefore supported in principle by the first predetermined terminal with regard to the available data stream. Further preferably, provision is made in this case for: all audio and/or video data streams flow only through the predetermined first terminal, i.e. no parallel active data streams are provided. In this way, security can be ensured in the proper context of the client or its browser in a closed network (e.g., an enterprise network).

in a second variant, one of the at least one first terminal receives audio and/or video data streams from only a subset of the other terminal devices and the audio and/or video data streams are mixed and recorded or selected. In this case, for example, an additional conference may be conducted: those participants at terminal devices within the enterprise network may communicate separately, for example, to discuss their attitudes towards participants outside the enterprise network negotiating therewith. Then, the security-relevant data also remains here within the closed network, i.e. the enterprise network.

Therefore, it is preferably provided that: the predetermined criteria (when the predetermined criteria are met, the central conference control unit first generally causes a predetermined terminal device of the at least one first terminal device to receive and mix/make a selection) include: other terminal devices from which audio and/or video data streams are received are coupled to each other in a common Local Area Network, i.e. LAN, Local Area Network.

In a preferred embodiment of the invention, any terminal device must be registered at the central conference control in order to participate in the conference. The predetermined terminal device transmits the following information (at the time of such registration) or (at the time of the previous first registration) to the central conference control: the predetermined terminal device is capable of receiving audio and/or video data streams and a) mixed recording of these audio and/or video data streams and b) selecting these audio and/or video data streams, and the predetermined criteria comprise: this information has already been transmitted. In this way, the central conference control apparatus can be generally designed to: for which the active data streams are mixed recorded or selected. Only if one of the terminal devices with the correspondingly set-up browser reports that it has the appropriate browser extension for the conference itself, the task is transferred to the corresponding browser.

In a further aspect of the invention, a computer program product for providing or extending (the latter in the form of a plug-in) an application, in particular a browser application, on a first data processing device is provided. The computer program product is designed to: the first data processing apparatus is given the following capabilities: audio and/or video data are received from at least one second data processing device and audio and/or video data received on the one hand from the second data processing device and on the other hand provided by the first data processing device and/or received from other second data processing devices are mixed and recorded and/or selected and the mixed and recorded and/or selected audio and/or video data are forwarded to the at least one second data processing device. The computer program product thus causes the operation of a browser having the above-described properties, that is to say the task of mixed recording of audio and/or video data and/or selection from audio and/or video data is shared for the central conference control. In particular, the browser application can, via the computer product, issue attractive messages to the central conference control: the browser has conferencing capabilities.

In a further aspect of the invention, a computer program product for providing or extending an application, in particular a browser application, on a second data processing apparatus (in the form of a plug-in) is provided, which computer program product is designed to: the second data processing apparatus is given the following capabilities: exchange control signals with the central data processing device and transmit the audio and/or video signals to a first data processing device outside the central data processing device. The computer program product allows a browser application which does not have the ability to attend a meeting itself to use a browser application which has the ability to attend a meeting under the control of the central data processing apparatus.

In a further aspect, the invention provides a computer program product for providing or extending (in the form of a plug-in) an application on a third data processing apparatus, in particular a multi-party conferencing application, which computer program product is designed to give the third data processing apparatus the ability to: information is obtained from a first data processing device which states that the first data processing device is able to receive audio and/or video data from a second data processing device and to mixedly record and/or select the audio and/or video data and to forward the mixedly recorded and/or selected audio and/or video data to the other data processing device, wherein a third data processing device, upon obtaining such information, transfers the task of such mixedly recording and/or selecting to the first data processing device which sent the information.

Drawings

Preferred embodiments of the present invention are further described hereinafter with reference to the accompanying drawings, in which

Fig. 1 illustrates an apparatus for conducting a conference, which apparatus implements a method according to the prior art;

Fig. 2 illustrates an apparatus for conducting a conference, which apparatus implements a method according to the invention;

Fig. 3 illustrates details regarding a Webrtc browser with conferencing capabilities and the data flow from and to the browser for use in the method according to the invention;

Fig. 4 shows a flow chart of the exchange of messages according to the method of the invention.

Detailed Description

In the method according to the invention illustrated with reference to fig. 2, instead of the usual Webrtc browser, such a Webrtc browser EP1K is provided with conferencing capabilities (with conferencing resources). The conference resources are controlled by a central application (conference control application) involving cloud Webrtc. Other browsers EP2 and EP3 are present in the enterprise network 10. The conference control application K-a is in the cloud (data center) 12. As also in the prior art, there is a signalling path 14 between the respective browsers EP1K, EP2 and EP3 and the conferencing application K-a. However, these signaling paths are not supplemented by corresponding active data paths outside the enterprise network. In particular, there is no media server MS or no central media server is required for the described scenario. Instead, the browser EP1K assumes the role of a media server or media node, and the payload data streams (audio and/or video data) are transmitted from the respective other browsers EP2 and EP3 via the signal path 20 to the browser EP1K with conference participation capability, are mixed and recorded there and are sent back again via the same path in the form of a mixed recording. Alternatively or additionally to hybrid recording, provision may be made for: selection is made from the perspective of a browser EP1K with conference participation capability: for example, an image of a participant is presented on the browser whenever the participant happens to be speaking, and is not otherwise presented on the browser. The advantage of the approach according to fig. 2 is that: the active data stream remains within the enterprise network 10. Thus, resulting in higher data security. In addition, less bandwidth is required between enterprise network 10 and cloud data center 12.

The construction of a browser EP1K with conferencing capability is subsequently further described with reference to fig. 3.

First, the conference capable browser has a client application, such as a JavaScript client application 22, that gives the conference capable browser the ability to communicate with the central conference control application K-a. Thus, the conference capable browser can assume the role of a client. As a plug-in or embedded in the client application, the conferencing client application 24 is set up, such as also in JavaScript, which gives the client the ability to: the client is signaled that the client has the ability to participate in the conference. The Client application 22 is generally responsible for Webrtc Client Signaling (see reference signs "SC", "Signaling Client") with the conference application K-a, while the further application 24 is responsible for implementing additional Signaling ("Signaling Client" SKC) of the conference Client, which also accepts, for example, multiparty conference instructions of the central conference control (media server role in EP 1K).

The browser 25 comprises a network interface, in particular a network real-time control interface 26, the english being: webrtc API, Web Real-Time Control Application Programming Interface. The browser comprises a unit 28 for managing sessions and conferences ("Session Management and Conference Management"). In addition, the browser requires a corresponding tuning unit 30 ("Voice Engine" with a corresponding codec) for encoding and decoding, and the same tuning unit ("Video Engine" with a corresponding codec) for the Video data in the unit 32. Exemplary codecs are g.711, Opus, h.264, VP8, and so on. There is also a unit 34 for hybrid recording or for switching (selection, forwarding). The transport interface (Transportschnittstelle) 36 is responsible for forwarding data and there is a client/browser OS interface towards the other browsers EP2 and EP 3.

the browsers EP2 and EP3 exchange corresponding signals SC with the conference application K-a as well, wherein EP denotes "end point", i.e. a telecommunication member. By means of the signaling SKC, the conference application decides that the browser EP1K should conference media-specifically. In reaction to this, the browsers EP2 and EP3 send their useful data N2 and N3 to the unit 34, where these useful data N2 and N3 are mixed-recorded with the corresponding useful data N1 generated by the units 30 and 32 or select these useful data N2 and N3, with the mixed-recorded data being sent back or selected ("exchanged") as data M2 and M3 as SW2 and SW 3. Here, the signal path 20 shown in fig. 2 is divided into two paths, a path 20a for a direction toward the unit 34 and a path 20b for a direction away from the unit 34 from the browsers EP2 and EP 3.

In detail, the exchange of data may be performed as set forth subsequently:

Fig. 4 shows the corresponding signals:

In fig. 4, first, in step S10, a request to join or start a conference room is sent to the conference control application K-a through the browser EP1K (see "EP 1K JoinRequest"). Parts of the content of the signaling in step S10 are, for example, the IP address and Port (Port) of the client EP1, the supported codecs (Codec), and other conference-related Capabilities of the client 1 (english "ConfCaps" = konreference Capabilities). Based on this signaling, the conference application K-a recognizes: the client/browser EP1K has the capability to support local browser conferencing, including conference detail capability. Examples of conference detail capabilities (ConfCaps in english) belonging to the signalling are: 1) conference modalities (audio Conference, video Conference, screen sharing Conference (mostly video)), 2) Conference type (Conference mix recording, selective forwarding unit.), supported codecs (g.711, OPUS, h.264, VP8, VP 9.), Conference Credentials (Conference credits) for Authentication (Authentication).

In a next step S12, the conferencing application K-a requests the corresponding media resource provided in EP1K (see "Conf Create Request" in fig. 4) from the browser EP1K (which is a "media node"). In step S12, the IP address and port one of client 1 signals the browser media resource. In step S14, the browser EP1K (in its role as "media server") replies with an acknowledgement as follows: the Browser has provided the desired Media Resource or resources and signals the IP address/port of the Webrtc Browser conferencing Resource (Webrtc Browser Konferenz Resource) within the framework of step S14. For this purpose, in fig. 4, reference is made to the instruction "Conf Create Confirm" which contains the following information to be sent to the conference application K-a: the browser EP1K has started a media resource or maintained a conference resource. Then, in step S16, the conference control application K-a confirms to the browser EP1K (in the role of this browser as "client/Webrtc browser"): a media resource has been generated and the client EP1K can send the media resource to the conference control application ("EP 1K Join Confirm"). Thereby, as shown in step S18, the application conference room is provided active, with the possibility of participants (EP 1K) and other Webrtc browsers joining in the conference.

In step S20, the browser EP2 inquires whether it can Join the conference ("EP 2 Join Request"). Next, in step S22, the browser EP1K (in its role as "media server") obtains a delegation of the conference control device to join EP2 to the conference (Conf Add Request) and, in terms of it, confirms this ("Conf Add Confirm") in step S24. Next, in step S26, the conference application K-a sends the following confirmation to the browser EP 2: a conference with EP1K has been joined. Similarly to steps S10, S12, S14, S16 regarding the exchange of IP and port reception addresses between the client EP1K and the conference resource (in the client EP 1K), reception IP addresses and ports are also exchanged between EP2 and the conference resource (in the EP 1K) in steps S20, S22, S24 and S26.

The corresponding steps S28, S30, S32 and S34 are also carried out in the third browser EP 3. Then, after the end of step S26 or S34, the browsers EP2 and EP3 send their audio and video data to the conference resource of the browser EP1K in steps S36 and S38. Illustrated by arrow S40: the browser EP1K itself uses its own audio and video data for mixed recording, which are captured with an assigned microphone or an assigned camera (in its role as client). Here, the client EP1K does not have to send the client's locally generated media data to the media resources of the same EP1K via a LAN interface (IP address), but can do this inside a browser. In a step S42 of the hybrid recording, the available data obtained are hybrid recorded in such a way that the corresponding data can then be output in step S44 on the browser EP1K itself or can be transmitted to the browsers EP2 and EP3 in steps S46 and S48 and can be output there.

The browsers shown in these figures so far are all conference participants. The invention can also be applied when a conference already exists and only a subset in the enterprise network wants to have an additional conference. In this case, the requesting user requires corresponding authorization and a (GUI) operating element (e.g., "Split local Conference") on the user's browser client, to be sent via the central Conference control application in association with the corresponding payload reconfiguration command.

13页详细技术资料下载

method for conducting an audio and/or video conference

相关技术

网友询问留言