Audio spatialization and enhancement between multiple head-mounted devices

文档序号：1967158 发布日期：2021-12-14 浏览：10次中文

阅读说明：本技术 多个头戴式装置之间的音频空间化和增强 (Audio spatialization and enhancement between multiple head-mounted devices ) 是由威廉·欧文·布里米约因二世安德鲁·洛维特菲利普·罗宾逊于 2020-05-05 设计创作，主要内容包括：一种共享通信信道允许在多个用户之间传输和接收音频内容。每个用户与头戴式装置相关联,该头戴式装置被配置为向其他用户的头戴式装置传输音频数据和从其他用户的头戴式装置接收音频数据。在第一用户的头戴式装置接收到对应于第二用户的音频数据之后,头戴式装置基于第一用户和第二用户的相对位置来空间化音频数据,使得当音频数据被呈现给第一用户时,音频数据的声音看起来源自对应于第二用户的位置。头戴式装置基于第二用户的位置与第一用户的注视方向之间的偏差来增强音频数据,以允许第一用户更清楚地听到来自其他用户的他们正在关注的音频数据。(A shared communication channel allows audio content to be transmitted and received between multiple users. Each user is associated with a headset configured to transmit and receive audio data to and from the other user's headsets. After the headset of the first user receives the audio data corresponding to the second user, the headset spatializes the audio data based on the relative positions of the first user and the second user such that when the audio data is presented to the first user, the sound of the audio data appears to originate from the position corresponding to the second user. The headset enhances the audio data based on the deviation between the position of the second user and the gaze direction of the first user to allow the first user to hear more clearly the audio data from the other users that they are focusing on.)

1. A head-mounted device, comprising:

gaze determination circuitry configured to determine a gaze direction of a first user of the headset;

a transceiver configured to receive an audio signal associated with a headset of a second user;

a processing circuit configured to:

determining a relative position associated with the second user with respect to the first user;

determining a deviation of a position of the second user from an augmented direction of the first user, wherein the augmented direction is based at least in part on a gaze direction of the first user;

spatializing an audio signal associated with the second user based at least in part on the relative position associated with the second user; and

enhancing a magnitude of an audio output signal based at least in part on the identified deviation of the location of the second user relative to the enhancement direction of the first user; and

a speaker assembly configured to project sound based on the spatialized and enhanced audio output signal such that the projected sound is perceived to originate from the location of the second user.

2. The headset of claim 1, further comprising a microphone array comprising a plurality of microphones arranged in a plurality of different locations, the microphone array configured to capture sound in a local area of the first user and generate an audio input signal.

3. The headset of claim 2, wherein the processing circuitry is further configured to:

analyzing the audio input signal to identify sounds originating from a particular one of the local regions of the first user; and

generating a user audio signal from the audio input signal by enhancing a portion of the audio input signal corresponding to sound originating from the particular region.

4. The headset of claim 3, wherein the particular region corresponds to the first user's mouth.

5. The headset of claim 1, wherein the transceiver is further configured to receive location information of the second user.

6. The headset of claim 1, further comprising an antenna array configured to determine the relative position associated with the second user with respect to the first user.

7. The headset of claim 1, wherein the processing circuit is further configured to spatialize the audio output signal based on whether a line of sight exists between the first user and the second user.

8. The headset of claim 1, wherein the gaze determination circuit is configured to:

receiving a position of the first user, the position comprising at least a head orientation of the first user; and

determining a relative orientation of the first user's eyes with respect to the first user's head; and are

Wherein spatializing the audio output signals associated with the second user is based on a relative direction of a position of the second user with respect to a head orientation of the first user.

9. The headset of claim 1, wherein the receiver is further configured to receive a second audio signal from a third user, and the processing circuit is further configured to:

identifying a relative position associated with the third user with respect to the first user;

determining a deviation of the identified relative position of the third user from the direction of augmentation of the first user;

comparing the identified deviation in the relative position of the third user to the identified deviation in the relative position of the second user; and

based on a result of the comparison, enhancing an amplitude of the second audio signal associated with the third user.

10. A method, comprising:

determining, at a head mounted device of a first user, an augmentation direction of the first user;

receiving, at a first user's headset, an audio signal associated with a second user's headset;

identifying a relative position associated with the second user with respect to the first user;

determining a deviation of the identified relative position of the second user from the direction of augmentation of the first user;

spatializing an audio signal associated with the second user based at least in part on the relative position associated with the second user; and

enhancing the amplitude of an audio output signal based at least in part on the identified deviation of the position of the second user relative to the enhancement direction of the first user; and

projecting sound based on the spatialized and enhanced audio output signal such that the projected sound is perceived to originate from the location of the second user.

11. The method of claim 10, further comprising capturing sound in a local area of the first user and generating an audio input signal using a microphone array comprising a plurality of microphones arranged in a plurality of different locations.

12. The method of claim 11, further comprising:

analyzing the audio input signal to identify sounds originating from a particular one of the local regions of the first user;

generating a user audio signal from the audio input signal by enhancing a portion of the audio input signal corresponding to sound originating from the particular region.

13. The method of claim 12, wherein the particular region corresponds to the first user's mouth.

14. The method of claim 10, further comprising receiving location information of the second user.

15. The method of claim 10, further comprising receiving signals from a headset of the second user at an antenna array and determining a relative position associated with the second user with respect to the first user based on the received signals.

16. The method of claim 10, wherein spatializing the audio output signal is based on whether a line of sight exists between the first user and the second user.

17. The method of claim 10, wherein determining the augmented direction of the first user comprises determining the first user's gaze direction by:

receiving a position of the first user, the position comprising at least a head orientation of the first user;

determining a relative orientation of the first user's eyes with respect to the first user's head; and

determining the gaze direction based on the head orientation and a relative orientation of the first user's eyes with respect to the first user's head; and is

Wherein spatializing the audio output signal associated with the second user is based on a relative direction of the position of the second user with respect to the orientation of the first user.

18. The method of claim 10, further comprising:

receiving a second audio signal from a third user;

identifying a relative position associated with the third user with respect to the first user;

determining a deviation of the identified relative position of the third user from the direction of augmentation of the first user;

comparing the identified deviation in the relative position of the third user to the identified deviation in the relative position of the second user; and

based on a result of the comparison, enhancing an amplitude of the second audio signal associated with the third user.

19. A non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:

determining, at a head mounted device of a first user, an augmentation direction of the first user;

receiving, at a first user's headset, an audio signal associated with a second user's headset;

identifying a relative position associated with the second user with respect to the first user;

determining a deviation of the identified relative position of the second user from the direction of augmentation of the first user;

spatializing an audio signal associated with the second user based at least in part on the relative position associated with the second user; and

enhancing the amplitude of an audio output signal based at least in part on the identified deviation of the position of the second user relative to the enhancement direction of the first user; and

projecting sound based on the spatialized and enhanced audio output signal such that the projected sound is perceived to originate from the location of the second user.

20. The non-transitory computer-readable medium of claim 19, wherein determining the augmented direction of the first user comprises determining the first user's gaze direction by:

receiving a position of the first user, the position comprising at least a head orientation of the first user;

determining a relative orientation of the first user's eyes with respect to the first user's head; and

determining the gaze direction based on the head orientation and a relative orientation of the first user's eyes with respect to the first user's head; and is

Wherein spatializing the audio output signal associated with the second user is based on a relative direction of the position of the second user with respect to the orientation of the first user.

Background

The present disclosure relates generally to audio communication between users on a shared communication channel, and in particular to spatialization and enhancement of audio signals transmitted between a plurality of different users of the shared communication channel.

In an environment with multiple sound sources, a listener may experience the trouble of tuning out sound sources from other sound sources simultaneously while tuning to a particular sound source. For example, in a busy room where multiple people are speaking at the same time, it is difficult for a listener to distinguish the speech of a particular speaker from the sounds of other speakers in the room. This phenomenon is known as cocktail party problem (cocktail party clearance problem). In some cases, different sound sources, such as speakers, may have microphones that record their voices, which are transmitted to the listener for listening. However, it may be difficult for listeners to discern the sound source, especially when there are a large number of sound sources, or to switch their attention between different sound sources.

SUMMARY

Embodiments relate to establishing a shared communication channel between multiple users for transmitting and receiving audio content. Each user is associated with a headset configured to transmit and receive audio data to and from the other user's headsets. The first user's headset, in response to receiving audio data corresponding to the second user, spatializes the audio data based on the relative positions of the first user and the second user such that the audio data presented to the first user appears to originate from a position corresponding to the second user. The headset may also enhance the audio data based on the deviation between the location of the second user and the enhancement direction (e.g., the first user's gaze direction), allowing the first user to hear more clearly the audio data from the other users that they are focusing on.

In some embodiments, a head-mounted device is described. The headset includes a gaze determination system configured to determine a gaze direction of a first user wearing the headset. The headset also includes a receiver configured to receive audio data associated with a second user, the audio data including an audio output signal. The headset also includes processing circuitry configured to identify a relative position associated with the second user with respect to the first user, and determine a deviation of the identified relative position of the second user with respect to a gaze direction of the first user. The processing circuit spatializes the audio output signal associated with the second user based on the relative position associated with the second user. In response to the identified deviation of the location of the second user relative to the gaze direction of the first user being within a threshold amount, the processing circuitry may further enhance the amplitude of the audio output signal based on the deviation. The head-mounted device also includes an audio output interface configured to send the spatialized and enhanced audio output signals to the one or more speakers to produce an output sound for presentation to the first user such that the output sound is perceived to originate from the location of the second user.

The method may be performed by an audio system. For example, an audio system (e.g., near-eye display, head mounted display) that is part of the head mounted device. The audio system includes a microphone assembly, a transceiver, a controller, and a speaker assembly (e.g., a speaker array).

In particular, embodiments according to the invention are disclosed in the appended claims relating to head-mounted devices, methods and storage media, wherein any feature mentioned in one claim category (e.g. head-mounted devices) may also be claimed in another claim category (e.g. methods, storage media, systems and computer program products). The dependencies or references back in the appended claims are chosen for formal reasons only. However, any subject matter resulting from an intentional back-referencing of any previous claim (especially multiple dependencies) may also be claimed, such that any combination of a claim and its features is disclosed and may be claimed regardless of the dependency selected in the appended claims. The claimed subject matter comprises not only the combination of features set out in the appended claims, but also any other combination of features in the claims, wherein each feature mentioned in the claims may be combined with any other feature or combination of other features in the claims. Furthermore, any embodiments and features described or depicted herein may be claimed in a separate claim and/or in any combination with any embodiment or feature described or depicted herein or with any feature of the appended claims.

In one embodiment, a head-mounted device may include:

a gaze determination circuit configured to determine a gaze direction of a first user of a headset;

a transceiver configured to receive an audio signal associated with a headset of a second user;

a processing circuit configured to:

determining a relative position associated with the second user with respect to the first user;

determining a deviation of the location of the second user from an enhancement direction of the first user, wherein the enhancement direction is based at least in part on the gaze direction of the first user;

spatializing the audio signal associated with the second user based at least in part on the relative position associated with the second user; and

enhancing the amplitude of the audio output signal based at least in part on the deviation of the identified position of the second user relative to the enhancement direction of the first user; and

a speaker assembly configured to project sound based on the spatialized and enhanced audio output signal such that the projected sound is perceived to originate from the location of the second user.

In one embodiment, a headset may include a microphone array including a plurality of microphones arranged in a plurality of different locations, the microphone array may be configured to capture sound in a localized area of a first user and generate an audio input signal.

The processing circuitry may be configured to:

analyzing the audio input signal to identify sounds originating from a particular one of the local regions of the first user; and

a user audio signal is generated from the audio input signal by enhancing a portion of the audio input signal corresponding to sound originating from the particular region.

The particular region may correspond to a mouth of the first user.

The transceiver may be configured to receive location information of a second user.

In one embodiment, the head-mounted device may include an antenna array configured to determine a relative position associated with the second user with respect to the first user.

The processing circuit may be configured to spatialize the audio output signal based on whether there is a line of sight (line of sight) between the first user and the second user.

The gaze determination circuitry may be configured to:

receiving a position of a first user, the position including at least a head orientation of the first user; and

determining a relative orientation of the first user's eyes with respect to the first user's head; and is

Wherein spatializing the audio output signal associated with the second user is based on a relative direction of a position of the second user with respect to a head orientation of the first user.

The receiver may be configured to receive a second audio signal from a third user, and the processing circuitry may be configured to:

identifying a relative position associated with a third user with respect to the first user;

determining a deviation of the identified relative position of the third user from the direction of augmentation of the first user;

comparing the deviation of the relative position of the identified third user with the deviation of the relative position of the identified second user; and

based on the result of the comparison, the amplitude of the second audio signal associated with the third user is enhanced.

In an embodiment, a method may include:

determining, at a head mounted device of a first user, an augmentation direction of the first user;

receiving, at a first user's headset, an audio signal associated with a second user's headset;

identifying a relative position associated with the second user with respect to the first user;

determining a deviation of the identified relative position of the second user from the direction of augmentation of the first user;

spatializing the audio signal associated with the second user based at least in part on the relative position associated with the second user; and

enhancing the amplitude of the audio output signal based at least in part on the deviation of the identified position of the second user relative to the enhancement direction of the first user; and

projecting sound based on the spatialized and enhanced audio output signal such that the projected sound is perceived to originate from the location of the second user.

In one embodiment, a method may include capturing sound in a local area of a first user using a microphone array including a plurality of microphones arranged in a plurality of different locations and generating an audio input signal.

In one embodiment, a method may comprise:

analyzing the audio input signal to identify sounds originating from a particular one of the local regions of the first user;

a user audio signal is generated from the audio input signal by enhancing a portion of the audio input signal corresponding to sound originating from the particular region.

The particular region may correspond to a mouth of the first user.

In one embodiment, a method may include receiving location information of a second user.

In one embodiment, a method may comprise: at the antenna array, a signal from a headset of a second user is received, and based on the received signal, a relative position associated with the second user with respect to the first user is determined.

The spatialized audio output signal may be based on whether there is a line of sight between the first user and the second user.

Determining the augmented direction of the first user may include determining a gaze direction of the first user by:

receiving a position of a first user, the position including at least a head orientation of the first user;

determining a relative orientation of the first user's eyes with respect to the first user's head; and

determining a gaze direction based on the head orientation and a relative orientation of the first user's eyes with respect to the first user's head; and is

Wherein spatializing the audio output signal associated with the second user is based on a relative direction of the position of the second user with respect to the orientation of the first user.

In one embodiment, a method may comprise:

receiving a second audio signal from a third user;

identifying a relative position associated with a third user with respect to the first user;

determining a deviation of the identified relative position of the third user from the direction of augmentation of the first user;

comparing the deviation of the relative position of the identified third user with the deviation of the relative position of the identified second user; and

based on the result of the comparison, the amplitude of the second audio signal associated with the third user is enhanced.

In one embodiment, a non-transitory computer-readable medium may store instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: