Audio spatialization and enhancement between multiple head-mounted devices

文档序号:1967158 发布日期:2021-12-14 浏览:10次 中文

阅读说明:本技术 多个头戴式装置之间的音频空间化和增强 (Audio spatialization and enhancement between multiple head-mounted devices ) 是由 威廉·欧文·布里米约因二世 安德鲁·洛维特 菲利普·罗宾逊 于 2020-05-05 设计创作,主要内容包括:一种共享通信信道允许在多个用户之间传输和接收音频内容。每个用户与头戴式装置相关联,该头戴式装置被配置为向其他用户的头戴式装置传输音频数据和从其他用户的头戴式装置接收音频数据。在第一用户的头戴式装置接收到对应于第二用户的音频数据之后,头戴式装置基于第一用户和第二用户的相对位置来空间化音频数据,使得当音频数据被呈现给第一用户时,音频数据的声音看起来源自对应于第二用户的位置。头戴式装置基于第二用户的位置与第一用户的注视方向之间的偏差来增强音频数据,以允许第一用户更清楚地听到来自其他用户的他们正在关注的音频数据。(A shared communication channel allows audio content to be transmitted and received between multiple users. Each user is associated with a headset configured to transmit and receive audio data to and from the other user's headsets. After the headset of the first user receives the audio data corresponding to the second user, the headset spatializes the audio data based on the relative positions of the first user and the second user such that when the audio data is presented to the first user, the sound of the audio data appears to originate from the position corresponding to the second user. The headset enhances the audio data based on the deviation between the position of the second user and the gaze direction of the first user to allow the first user to hear more clearly the audio data from the other users that they are focusing on.)

1. A head-mounted device, comprising:

gaze determination circuitry configured to determine a gaze direction of a first user of the headset;

a transceiver configured to receive an audio signal associated with a headset of a second user;

a processing circuit configured to:

determining a relative position associated with the second user with respect to the first user;

determining a deviation of a position of the second user from an augmented direction of the first user, wherein the augmented direction is based at least in part on a gaze direction of the first user;

spatializing an audio signal associated with the second user based at least in part on the relative position associated with the second user; and

enhancing a magnitude of an audio output signal based at least in part on the identified deviation of the location of the second user relative to the enhancement direction of the first user; and

a speaker assembly configured to project sound based on the spatialized and enhanced audio output signal such that the projected sound is perceived to originate from the location of the second user.

2. The headset of claim 1, further comprising a microphone array comprising a plurality of microphones arranged in a plurality of different locations, the microphone array configured to capture sound in a local area of the first user and generate an audio input signal.

3. The headset of claim 2, wherein the processing circuitry is further configured to:

analyzing the audio input signal to identify sounds originating from a particular one of the local regions of the first user; and

generating a user audio signal from the audio input signal by enhancing a portion of the audio input signal corresponding to sound originating from the particular region.

4. The headset of claim 3, wherein the particular region corresponds to the first user's mouth.

5. The headset of claim 1, wherein the transceiver is further configured to receive location information of the second user.

6. The headset of claim 1, further comprising an antenna array configured to determine the relative position associated with the second user with respect to the first user.

7. The headset of claim 1, wherein the processing circuit is further configured to spatialize the audio output signal based on whether a line of sight exists between the first user and the second user.

8. The headset of claim 1, wherein the gaze determination circuit is configured to:

receiving a position of the first user, the position comprising at least a head orientation of the first user; and

determining a relative orientation of the first user's eyes with respect to the first user's head; and are

Wherein spatializing the audio output signals associated with the second user is based on a relative direction of a position of the second user with respect to a head orientation of the first user.

9. The headset of claim 1, wherein the receiver is further configured to receive a second audio signal from a third user, and the processing circuit is further configured to:

identifying a relative position associated with the third user with respect to the first user;

determining a deviation of the identified relative position of the third user from the direction of augmentation of the first user;

comparing the identified deviation in the relative position of the third user to the identified deviation in the relative position of the second user; and

based on a result of the comparison, enhancing an amplitude of the second audio signal associated with the third user.

10. A method, comprising:

determining, at a head mounted device of a first user, an augmentation direction of the first user;

receiving, at a first user's headset, an audio signal associated with a second user's headset;

identifying a relative position associated with the second user with respect to the first user;

determining a deviation of the identified relative position of the second user from the direction of augmentation of the first user;

spatializing an audio signal associated with the second user based at least in part on the relative position associated with the second user; and

enhancing the amplitude of an audio output signal based at least in part on the identified deviation of the position of the second user relative to the enhancement direction of the first user; and

projecting sound based on the spatialized and enhanced audio output signal such that the projected sound is perceived to originate from the location of the second user.

11. The method of claim 10, further comprising capturing sound in a local area of the first user and generating an audio input signal using a microphone array comprising a plurality of microphones arranged in a plurality of different locations.

12. The method of claim 11, further comprising:

analyzing the audio input signal to identify sounds originating from a particular one of the local regions of the first user;

generating a user audio signal from the audio input signal by enhancing a portion of the audio input signal corresponding to sound originating from the particular region.

13. The method of claim 12, wherein the particular region corresponds to the first user's mouth.

14. The method of claim 10, further comprising receiving location information of the second user.

15. The method of claim 10, further comprising receiving signals from a headset of the second user at an antenna array and determining a relative position associated with the second user with respect to the first user based on the received signals.

16. The method of claim 10, wherein spatializing the audio output signal is based on whether a line of sight exists between the first user and the second user.

17. The method of claim 10, wherein determining the augmented direction of the first user comprises determining the first user's gaze direction by:

receiving a position of the first user, the position comprising at least a head orientation of the first user;

determining a relative orientation of the first user's eyes with respect to the first user's head; and

determining the gaze direction based on the head orientation and a relative orientation of the first user's eyes with respect to the first user's head; and is

Wherein spatializing the audio output signal associated with the second user is based on a relative direction of the position of the second user with respect to the orientation of the first user.

18. The method of claim 10, further comprising:

receiving a second audio signal from a third user;

identifying a relative position associated with the third user with respect to the first user;

determining a deviation of the identified relative position of the third user from the direction of augmentation of the first user;

comparing the identified deviation in the relative position of the third user to the identified deviation in the relative position of the second user; and

based on a result of the comparison, enhancing an amplitude of the second audio signal associated with the third user.

19. A non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:

determining, at a head mounted device of a first user, an augmentation direction of the first user;

receiving, at a first user's headset, an audio signal associated with a second user's headset;

identifying a relative position associated with the second user with respect to the first user;

determining a deviation of the identified relative position of the second user from the direction of augmentation of the first user;

spatializing an audio signal associated with the second user based at least in part on the relative position associated with the second user; and

enhancing the amplitude of an audio output signal based at least in part on the identified deviation of the position of the second user relative to the enhancement direction of the first user; and

projecting sound based on the spatialized and enhanced audio output signal such that the projected sound is perceived to originate from the location of the second user.

20. The non-transitory computer-readable medium of claim 19, wherein determining the augmented direction of the first user comprises determining the first user's gaze direction by:

receiving a position of the first user, the position comprising at least a head orientation of the first user;

determining a relative orientation of the first user's eyes with respect to the first user's head; and

determining the gaze direction based on the head orientation and a relative orientation of the first user's eyes with respect to the first user's head; and is

Wherein spatializing the audio output signal associated with the second user is based on a relative direction of the position of the second user with respect to the orientation of the first user.

Background

The present disclosure relates generally to audio communication between users on a shared communication channel, and in particular to spatialization and enhancement of audio signals transmitted between a plurality of different users of the shared communication channel.

In an environment with multiple sound sources, a listener may experience the trouble of tuning out sound sources from other sound sources simultaneously while tuning to a particular sound source. For example, in a busy room where multiple people are speaking at the same time, it is difficult for a listener to distinguish the speech of a particular speaker from the sounds of other speakers in the room. This phenomenon is known as cocktail party problem (cocktail party clearance problem). In some cases, different sound sources, such as speakers, may have microphones that record their voices, which are transmitted to the listener for listening. However, it may be difficult for listeners to discern the sound source, especially when there are a large number of sound sources, or to switch their attention between different sound sources.

SUMMARY

Embodiments relate to establishing a shared communication channel between multiple users for transmitting and receiving audio content. Each user is associated with a headset configured to transmit and receive audio data to and from the other user's headsets. The first user's headset, in response to receiving audio data corresponding to the second user, spatializes the audio data based on the relative positions of the first user and the second user such that the audio data presented to the first user appears to originate from a position corresponding to the second user. The headset may also enhance the audio data based on the deviation between the location of the second user and the enhancement direction (e.g., the first user's gaze direction), allowing the first user to hear more clearly the audio data from the other users that they are focusing on.

In some embodiments, a head-mounted device is described. The headset includes a gaze determination system configured to determine a gaze direction of a first user wearing the headset. The headset also includes a receiver configured to receive audio data associated with a second user, the audio data including an audio output signal. The headset also includes processing circuitry configured to identify a relative position associated with the second user with respect to the first user, and determine a deviation of the identified relative position of the second user with respect to a gaze direction of the first user. The processing circuit spatializes the audio output signal associated with the second user based on the relative position associated with the second user. In response to the identified deviation of the location of the second user relative to the gaze direction of the first user being within a threshold amount, the processing circuitry may further enhance the amplitude of the audio output signal based on the deviation. The head-mounted device also includes an audio output interface configured to send the spatialized and enhanced audio output signals to the one or more speakers to produce an output sound for presentation to the first user such that the output sound is perceived to originate from the location of the second user.

The method may be performed by an audio system. For example, an audio system (e.g., near-eye display, head mounted display) that is part of the head mounted device. The audio system includes a microphone assembly, a transceiver, a controller, and a speaker assembly (e.g., a speaker array).

In particular, embodiments according to the invention are disclosed in the appended claims relating to head-mounted devices, methods and storage media, wherein any feature mentioned in one claim category (e.g. head-mounted devices) may also be claimed in another claim category (e.g. methods, storage media, systems and computer program products). The dependencies or references back in the appended claims are chosen for formal reasons only. However, any subject matter resulting from an intentional back-referencing of any previous claim (especially multiple dependencies) may also be claimed, such that any combination of a claim and its features is disclosed and may be claimed regardless of the dependency selected in the appended claims. The claimed subject matter comprises not only the combination of features set out in the appended claims, but also any other combination of features in the claims, wherein each feature mentioned in the claims may be combined with any other feature or combination of other features in the claims. Furthermore, any embodiments and features described or depicted herein may be claimed in a separate claim and/or in any combination with any embodiment or feature described or depicted herein or with any feature of the appended claims.

In one embodiment, a head-mounted device may include:

a gaze determination circuit configured to determine a gaze direction of a first user of a headset;

a transceiver configured to receive an audio signal associated with a headset of a second user;

a processing circuit configured to:

determining a relative position associated with the second user with respect to the first user;

determining a deviation of the location of the second user from an enhancement direction of the first user, wherein the enhancement direction is based at least in part on the gaze direction of the first user;

spatializing the audio signal associated with the second user based at least in part on the relative position associated with the second user; and

enhancing the amplitude of the audio output signal based at least in part on the deviation of the identified position of the second user relative to the enhancement direction of the first user; and

a speaker assembly configured to project sound based on the spatialized and enhanced audio output signal such that the projected sound is perceived to originate from the location of the second user.

In one embodiment, a headset may include a microphone array including a plurality of microphones arranged in a plurality of different locations, the microphone array may be configured to capture sound in a localized area of a first user and generate an audio input signal.

The processing circuitry may be configured to:

analyzing the audio input signal to identify sounds originating from a particular one of the local regions of the first user; and

a user audio signal is generated from the audio input signal by enhancing a portion of the audio input signal corresponding to sound originating from the particular region.

The particular region may correspond to a mouth of the first user.

The transceiver may be configured to receive location information of a second user.

In one embodiment, the head-mounted device may include an antenna array configured to determine a relative position associated with the second user with respect to the first user.

The processing circuit may be configured to spatialize the audio output signal based on whether there is a line of sight (line of sight) between the first user and the second user.

The gaze determination circuitry may be configured to:

receiving a position of a first user, the position including at least a head orientation of the first user; and

determining a relative orientation of the first user's eyes with respect to the first user's head; and is

Wherein spatializing the audio output signal associated with the second user is based on a relative direction of a position of the second user with respect to a head orientation of the first user.

The receiver may be configured to receive a second audio signal from a third user, and the processing circuitry may be configured to:

identifying a relative position associated with a third user with respect to the first user;

determining a deviation of the identified relative position of the third user from the direction of augmentation of the first user;

comparing the deviation of the relative position of the identified third user with the deviation of the relative position of the identified second user; and

based on the result of the comparison, the amplitude of the second audio signal associated with the third user is enhanced.

In an embodiment, a method may include:

determining, at a head mounted device of a first user, an augmentation direction of the first user;

receiving, at a first user's headset, an audio signal associated with a second user's headset;

identifying a relative position associated with the second user with respect to the first user;

determining a deviation of the identified relative position of the second user from the direction of augmentation of the first user;

spatializing the audio signal associated with the second user based at least in part on the relative position associated with the second user; and

enhancing the amplitude of the audio output signal based at least in part on the deviation of the identified position of the second user relative to the enhancement direction of the first user; and

projecting sound based on the spatialized and enhanced audio output signal such that the projected sound is perceived to originate from the location of the second user.

In one embodiment, a method may include capturing sound in a local area of a first user using a microphone array including a plurality of microphones arranged in a plurality of different locations and generating an audio input signal.

In one embodiment, a method may comprise:

analyzing the audio input signal to identify sounds originating from a particular one of the local regions of the first user;

a user audio signal is generated from the audio input signal by enhancing a portion of the audio input signal corresponding to sound originating from the particular region.

The particular region may correspond to a mouth of the first user.

In one embodiment, a method may include receiving location information of a second user.

In one embodiment, a method may comprise: at the antenna array, a signal from a headset of a second user is received, and based on the received signal, a relative position associated with the second user with respect to the first user is determined.

The spatialized audio output signal may be based on whether there is a line of sight between the first user and the second user.

Determining the augmented direction of the first user may include determining a gaze direction of the first user by:

receiving a position of a first user, the position including at least a head orientation of the first user;

determining a relative orientation of the first user's eyes with respect to the first user's head; and

determining a gaze direction based on the head orientation and a relative orientation of the first user's eyes with respect to the first user's head; and is

Wherein spatializing the audio output signal associated with the second user is based on a relative direction of the position of the second user with respect to the orientation of the first user.

In one embodiment, a method may comprise:

receiving a second audio signal from a third user;

identifying a relative position associated with a third user with respect to the first user;

determining a deviation of the identified relative position of the third user from the direction of augmentation of the first user;

comparing the deviation of the relative position of the identified third user with the deviation of the relative position of the identified second user; and

based on the result of the comparison, the amplitude of the second audio signal associated with the third user is enhanced.

In one embodiment, a non-transitory computer-readable medium may store instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:

determining, at a head mounted device of a first user, an augmentation direction of the first user;

receiving, at a first user's headset, an audio signal associated with a second user's headset;

identifying a relative position associated with the second user with respect to the first user;

determining a deviation of the identified relative position of the second user from the direction of augmentation of the first user;

spatializing the audio signal associated with the second user based at least in part on the relative position associated with the second user; and

enhancing the amplitude of the audio output signal based at least in part on the deviation of the identified position of the second user relative to the enhancement direction of the first user; and

projecting sound based on the spatialized and enhanced audio output signal such that the projected sound is perceived to originate from the location of the second user.

Determining the augmented direction of the first user may include determining a gaze direction of the first user by:

receiving a position of a first user, the position including at least a head orientation of the first user;

determining a relative orientation of the first user's eyes with respect to the first user's head; and

determining a gaze direction based on the head orientation and a relative orientation of the first user's eyes with respect to the first user's head; and is

Wherein spatializing the audio output signal associated with the second user is based on a relative direction of the position of the second user with respect to the orientation of the first user.

In an embodiment, one or more computer-readable non-transitory storage media may contain software that is operable when executed to perform a method according to or in any of the embodiments described above.

In an embodiment, a system may comprise: one or more processors; and at least one memory coupled to the processor and comprising instructions executable by the processor, the processor being operable when executing the instructions to perform a method according to or in any of the embodiments described above.

In an embodiment, a computer program product, preferably comprising a computer readable non-transitory storage medium, is operable when executed on a data processing system to perform a method according to or in any of the embodiments described above.

Brief Description of Drawings

Fig. 1 illustrates a high-level diagram of an environment in which a system for audio spatialization and enhancement can be used in accordance with one or more embodiments.

Fig. 2 is an example illustrating a head-mounted device including an audio system that may be worn by users sharing a communication channel in accordance with one or more embodiments.

FIG. 3 shows a block diagram of an audio system in accordance with one or more embodiments.

Fig. 4 illustrates an example of an environment with multiple users utilizing a shared communication channel in accordance with one or more embodiments.

Fig. 5 shows a diagram of filtering a user audio signal in accordance with one or more embodiments.

Fig. 6 is a flow diagram of a process for spatializing and enhancing audio data received from other users in a shared communication channel, according to one or more embodiments.

FIG. 7 is a flow diagram of a process for processing an audio signal corresponding to a user's speaking voice in accordance with one or more embodiments.

Fig. 8 is a system environment of a headset including an audio system as described above, in accordance with one or more embodiments.

The figures depict various embodiments of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.

Detailed Description

Embodiments relate to a head-mounted device having an audio system configured to receive audio signals from a plurality of audio sources and to play back the received audio signals to a user (e.g., a wearer of the head-mounted device). The audio system spatializes the audio signals received from a particular audio source based on the relative location of that audio source such that the audio signals played back to the user appear to originate from the location of the audio source. In some embodiments, the audio system enhances the audio signals received from audio sources based on the location of the audio source and the direction of enhancement (e.g., gaze direction) of the user in order to emphasize (emphasize) the audio data received from a particular audio source and allow the user to switch their attention between different audio sources.

In some embodiments, a shared communication channel is established between a plurality of users within a local area network. Each user wears a headset that includes a transceiver for communicating with (e.g., transmitting and receiving audio signals) other users in the shared communication channel. Each headset also includes sensors configured to track the location and gaze direction of its user, which can be used to determine the relative locations of other users sharing the communication channel and how the locations of the other users correlate to the user's gaze direction.

The headset processes audio signals received from other users of the shared communication network based on the relative positions of the other users sharing the communication channel such that the audio signals, when played back to the user, will appear to originate from positions corresponding to the other users. The audio signal is also enhanced based on the user's direction of enhancement (the user's direction of enhancement may be based on the user's gaze direction and may be used to infer which other users the user is focusing on), wherein the audio signal from other users at locations aligned with the user's direction of enhancement may be more strongly enhanced. For example, the first user receives an audio signal from each other user sharing the communication channel, the audio signal being spatialized to indicate a relative position of each other user with respect to the first user, and enhanced based on which other user the first user is currently looking at (e.g., the other user determined by the gaze direction).

In addition, the head-mounted device also includes a microphone for recording the user's own voice. The user's own voice may then be transmitted to the other user's headset in the shared communication channel. In addition, in some embodiments, the user's own voice may be played back to the user to help the user adjust the volume of their own speaking voice.

Various embodiments may include or be implemented in conjunction with an artificial reality system. Artificial reality is a form of reality that has been adjusted in some way prior to presentation to a user, and may include, for example, Virtual Reality (VR), Augmented Reality (AR), Mixed Reality (MR), mixed reality, or some combination and/or derivative thereof. The artificial reality content may include fully generated content or generated content combined with captured (e.g., real-world) content. The artificial reality content may include video, audio, haptic feedback, or some combination thereof, and any of which may be presented in a single channel or multiple channels (such as stereoscopic video that produces a three-dimensional effect to a viewer). Further, in some embodiments, the artificial reality may also be associated with an application, product, accessory, service, or some combination thereof for creating content, for example, in the artificial reality and/or otherwise for the artificial reality (e.g., where an activity is performed). An artificial reality system that provides artificial reality content may be implemented on a variety of platforms, including a Head Mounted Display (HMD) connected to a host computer system, a standalone HMD, a mobile device or computing system, or any other hardware platform capable of providing artificial reality content to one or more viewers.

Fig. 1 shows a high-level diagram of an environment including an audio system 115 in accordance with one or more embodiments. The audio system 115 may be integrated as part of the head mounted device 110 that the user 105A may wear.

A user 105A wearing a head mounted device 110 containing an audio system 115 is in an environment proximate to a plurality of other users (users 105B, 105C, 105D, and 105E). Users 105A-E may be collectively referred to as users 105. Users may talk to each other and thus each user may be considered to correspond to an audio source. Furthermore, there may be additional audio sources in the environment. In environments where a large number of audio sources are close to each other, it may be difficult for the user 105A to focus on any particular audio source (e.g., the voice of a particular other user in the environment).

To facilitate conversation between users in the environment, each user may wear a respective headset with a respective audio system. The audio system 115 communicates with the audio systems of the other headsets to receive audio signals corresponding to the other users' voices and to play back the audio signals to the user 105A. This may allow the user 105A to hear the other user's voice more clearly. In addition, the audio system 115 processes the received audio signals such that the audio signals that are played back to the user 105 are spatialized such that the played sound is perceived to originate from the location of the other user. The played back audio signal may also be enhanced based on which other user the user 105 is currently looking at.

In some embodiments, multiple users may establish a shared communication channel. For example, fig. 1 shows a first shared communication channel 120A with three users, and a second shared communication channel 120B with two users. The shared communication channel 120 may correspond to a particular group of users who wish to talk to each other. For example, the shared communication channel 120 may include multiple users within a particular proximity of each other (e.g., users sitting on the same table). As used herein, a shared communication channel may refer to a grouping of multiple users, each associated with a corresponding audio system, where each user's audio system is capable of communicating with each other user's audio system within the grouping. For example, each of the three users 105A, 105B, and 105C sharing the communication channel 120A has a respective audio system in communication with each other, while each of the two users 105D and 105E sharing the communication channel 120B has a respective audio system in communication with each other.

In some embodiments, the shared communication channel may include one or more remote users. The shared communication channel may include multiple users within a particular geographic area (e.g., corresponding to a particular room, building, etc.). In some embodiments, the geographic area may be defined based on one or more structures (e.g., walls). As used herein, a remote user may correspond to a user participating in a shared communication channel that is located outside of the geographic region corresponding to the channel. For example, the shared communication channel may include a group of users sitting at a common desk, and one or more additional remote users located in different buildings.

Although fig. 1 shows each shared communication channel 120A and 120B corresponding to a different region, in some embodiments, the different shared communication channels cover overlapping regions. For example, users sharing the communication channel 120B may be commingled with users sharing the communication channel 120A within a common area. In some embodiments, a particular user may be part of more than one shared communication channel (e.g., both shared communication channels 120A and 120B).

In some embodiments, the shared communication channel 120 may be established by a group of one or more users through an exchange of information. For example, a first user may join a common shared communication channel with a second user (e.g., based on a head mounted device worn by the second user or a scannable object such as a badge) by scanning information corresponding to the second user (e.g., using their respective head mounted device 110 or other scanning device). In some embodiments, the shared communication channel is implemented as part of a peer-to-peer network established between the headsets of at least the first and second users.

In some embodiments, one or more users 105 access the application server 130 via the network 125. The network may include the internet, a Local Area Network (LAN), a Wide Area Network (WAN), a mobile wired or wireless network, a private network, a virtual private network, or a combination thereof.

The application server 130 contains one or more applications that facilitate communication between the headsets of different users and may correspond to an online system, a local console, or some combination thereof. For example, application server 130 may contain an application that establishes a shared communication channel between two or more users and maintains metadata corresponding to the established shared communication channel. The application server may comprise an online system. Each user may log onto the online system on the application server 130 and indicate one or more other users with whom they wish to communicate. In some embodiments, a connection between two users 105 may be established if both users indicate a desire to communicate with the other user. A shared communication channel may be formed for each group of users, where each user in the group is connected to every other user in the group.

In other embodiments, a first user may establish the shared communication channel 120 and then additional users may join the shared communication channel 120. For example, the first user may provide a password or other type of authentication to each additional user to allow the additional users to join the shared communication channel via the application server 130 (e.g., provide the password to the additional users verbally or in written form, or transmit the password to the additional users' headsets indicated by the first user through the user interface). In some embodiments, the application server 130 maintains the shared communication channel 120 and transmits updates to the headset of each user of the channel regarding the current state of the channel (e.g., when a new user joins the channel, or when an existing user exits the channel). In some embodiments, the application server 130 is configured to maintain information corresponding to the shared communication channel 120 and transmit current status information about the shared communication channel to each user's headset while communication of audio data between the headsets may be performed peer-to-peer.

In some embodiments, the application server 130 comprises a social networking system. The social networking system may maintain a social graph or other data structure indicating relationships (e.g., friendships) between different users. In some embodiments, only users with a particular type of relationship on the social networking system may establish connections with each other to form a shared communication channel. In some embodiments, the social graph maintained by the application server 130 may be used to automatically establish the shared communication channel 120 between multiple users. For example, a group of users all located within a particular geographic area and having a particular type of social network relationship with each other may be automatically included in a shared communication channel.

In some embodiments, some or all of the functions of the application server 130 may be performed by a local console. For example, a local console may be connected to multiple head mounted devices 110 corresponding to different users 105 in a local environment, and establish and maintain one or more shared communication channels between groups of users in the environment. In some embodiments, one or more headsets 105 may connect to the application server 130 through a local console.

Fig. 2 is an example illustrating a head mounted device 110 including an audio system that may be worn by a user in a shared communication channel in accordance with one or more embodiments. The head mounted device 110 presents media to the user. In one embodiment, the head mounted device 110 may be a Near Eye Display (NED). In another embodiment, the head mounted device 110 may be a Head Mounted Display (HMD). In general, the head-mounted device may be worn on the face of a user (e.g., user 105) to present content (e.g., media content) using one or both lenses 210 of the head-mounted device. However, the head mounted device 110 may also be used such that the media content is presented to the user in a different manner. Examples of media content presented by the head mounted device 110 include one or more images, video, audio, or some combination thereof. The head mounted device 110 includes an audio system and may include a frame 205, a lens 210, a camera assembly 235, a position sensor 240, an eye tracking sensor 245, and a controller 215 for controlling the audio system, as well as other various sensors of the head mounted device 110. Although fig. 2 shows components of the headset 110 in an example location on the headset 110, these components may be located elsewhere on the headset 110, on a peripheral device paired with the headset 110, or some combination of the two locations.

The head mounted device 110 may correct or enhance the vision of the user, protect the eyes of the user, or provide images to the user. The head mounted device 110 may be glasses to correct visual defects of the user. The head-mounted device 110 may be sunglasses that protect the user's eyes from sunlight. The head-mounted device 110 may be safety glasses that protect the user's eyes from impact. The head mounted device 110 may be a night vision device or infrared goggles to enhance the user's vision at night. The head mounted device 110 may be a near-eye display that generates artificial reality content for the user. Alternatively, the headset 110 may not include the lens 210 and may be the frame 205 with an audio system that provides audio content (e.g., music, radio, podcasts) to the user.

The lens 210 provides or transmits light to a user wearing the head-mounted device 110. The lens 210 may be a prescription lens (e.g., single vision, bifocal, and trifocal or progressive lenses) to help correct the user's vision deficiencies. The prescription lens transmits ambient light to the user wearing the head-mounted device 110. The transmitted ambient light may be altered by the prescription lens to correct defects in the user's vision. The lens 210 may be a polarized lens or a colored lens to protect the user's eyes from sunlight. The lens 210 may have one or more waveguides as part of a waveguide display, where the image light is coupled to the user's eye through an end or edge of the waveguide. The lens 210 may include an electronic display for providing image light and may also include an optical block for magnifying the image light from the electronic display.

In some embodiments, the head mounted device 110 includes a camera component 235 that captures visual information about a local area around the head mounted device 110. In some embodiments, camera component 235 corresponds to a depth camera component (DCA) that captures data describing depth information for a local region. In some embodiments, the DCA may include a light projector (e.g., structured light and/or flash illumination for time-of-flight), an imaging device, and a controller. The captured data may be an image of light projected by the light projector to the local area captured by the imaging device. In one embodiment, the DCA may include two or more cameras and a controller, the cameras oriented to capture portions of the local area in a stereoscopic manner. The captured data may be images of local areas captured stereoscopically by two or more cameras. The controller uses the captured data and depth determination techniques (e.g., structured light, time of flight, stereo imaging, etc.) to calculate depth information for the local region. Based on the depth information, the controller 215 may be able to determine absolute position information of the headset 110 within the local area. The DCA may be integrated with the headset 110 or may be located in a local area external to the headset 110. In the latter embodiment, the controller of the DCA may transmit the depth information to the controller 215 of the headset 110.

The position sensor 240 is configured to generate one or more measurement signals and estimate a current position of the headset 110 based on the generated signals. In some embodiments, the current position of the headset 110 is determined relative to the initial position of the headset 110. The estimated location may include a location of the headset 110 and/or an orientation of the head of the headset 110 or a user wearing the headset 110, or some combination thereof. For example, the orientation may correspond to the position of each ear relative to a reference point. In some embodiments, when the camera component 235 includes a DCA, the position sensor 240 uses depth information and/or absolute position information from the DCA to estimate the current position of the headset 110. The position sensors 240 may include one or more accelerometers to measure translational motion (forward/backward, up/down, left/right) and one or more gyroscopes to measure rotational motion (e.g., pitch, yaw, roll). In some embodiments, the position sensor 240 includes other types of sensors that may be used to detect motion, such as one or more magnetometers.

In some embodiments, the position sensor 240 includes an Inertial Measurement Unit (IMU) that quickly samples the received measurement signals and calculates an estimated position of the headset 110 from the sampled data. For example, the IMU integrates the measurement signals received from the accelerometers over time to estimate a velocity vector and integrates the velocity vector over time to determine an estimated position of a reference point on the headset 110. The reference point is a point that may be used to describe the position of the headset 110. While the reference point may be defined generally as a point in space, in practice the reference point is defined as a point within the headset 110. In some embodiments, the IMU may be implemented as part of the local controller 215 instead of the location sensor 240.

Eye tracking sensor 245 is configured to provide sensor readings (e.g., captured images of the user's eyes) that may be used to determine the direction of the user's gaze. When wearing the head mounted device 110, the user's eyes may move relative to their head, allowing the user to look in different directions without having to move their head. As such, the user may be looking in a direction other than directly in front relative to the position and orientation of the head mounted device 110 (e.g., as determined by the position sensor 240).

In some embodiments, eye tracking sensor 245 comprises an eye tracking sensor configured to determine an orientation of a user's eye. The eye tracking sensor captures and analyzes images of the user's eye to determine the orientation of the user's eye relative to the head-mounted device 110. In some embodiments, the eye tracking sensor includes one or more light sources and one or more cameras. One or more light sources illuminate the eye with IR light (e.g., infrared flash (e.g., for time-of-flight depth determination)), structured light patterns (e.g., dot patterns, bar patterns, etc.), blinking patterns, and the like. For example, the source may be a vertical cavity emitting laser, a light emitting diode, a micro LED, some other infrared IR source, or some combination thereof. The one or more cameras are configured to capture images of one or both eyes illuminated with IR light from one or more sources. The camera includes an image sensor (e.g., a complementary metal oxide semiconductor, a charge coupled device, etc.) configured to detect light emitted from one or more sources. In some embodiments, the camera may also be capable of detecting light in other bands (e.g., visible bands). The eye tracking sensor uses the captured image and depth determination techniques to determine the eye orientation of the user's one or both eyes. Depth determination techniques may include, for example, structured light, time of flight, stereo imaging, some other depth determination method familiar to those skilled in the art, and so forth. In some embodiments, the eye tracking sensor determines the eye orientation based on the captured image and a model of the user's eye.

The eye orientation determined by the eye tracking sensor may be combined with the determined position of the head-mounted device 110 (e.g., the position determined using the position sensor 240) to determine the user's gaze direction. For example, a vector corresponding to the user's eye orientation (which indicates the orientation of the user's eyes relative to their head) may be added to a vector corresponding to the position of the headset (which indicates the position and orientation of the headset within the local environment) to determine a vector corresponding to the user's gaze direction (which indicates the direction the user is looking at in the local environment). By determining the direction of the user's gaze, the direction in which the user is looking in the environment is identified, which may be combined with knowledge of the locations of other users sharing the communication channel, thereby allowing a determination of which other user the user is looking at.

In some embodiments, eye tracking sensor 245 also receives one or more images from the camera of camera assembly 235 depicting a local area within the FOV of the camera and maps the determined eye orientation to a location within the received image. In some embodiments, the module uses object recognition to identify one or more objects (e.g., other users) in one or more images and maps the determined eye orientation to one or more identified objects.

The audio system of the headset 110 is configured to allow the user to communicate with other users within the shared communication channel 110. In some embodiments, the audio system includes a microphone assembly 225, a transceiver 230, a speaker assembly having one or more speakers, and a controller 215. The controller 215 is configured to coordinate operations between the various components of the headset 110. For example, the controller 215 may control the microphone assembly 225 to capture audio data corresponding to the user's voice for transmission to other users within the shared communication channel via the transceiver 230. Further, the controller 215 may receive audio data corresponding to other users of the shared communication channel via the transceiver 230 and process the received audio data (e.g., spatialize and/or enhance audio data) based on the relative positions of the other users with respect to the current position of the user (e.g., the current position determined by the position sensor 240). The processed audio data may be played back to the user using a speaker assembly. Additional details regarding the audio system are discussed with reference to fig. 3.

The microphone assembly 225 records sound within a localized area of the headset 110. The local area is the environment surrounding the headset 110. For example, the local area may be a room in which the user wearing the head mounted device 110 is located, or the user wearing the head mounted device 110 may be outside and the local area is an external area where the microphone assembly can detect sound. The microphone assembly 225 includes one or more acoustic sensors. In some embodiments, the acoustic sensor is configured to record the voice of the user of the head mounted device 110. To this end, the acoustic sensor may be located near the user's mouth and may have a short capture range to avoid capturing other sounds that do not originate from the user. In some embodiments, the acoustic sensor may be located on a separate microphone (mouthpiece) or other structure to be positioned closer to the user's mouth.

In some embodiments, the acoustic sensor includes a port corresponding to an aperture in the frame 205 of the headset 110. The port provides an input coupling point for sound from a localized region to an acoustic waveguide that directs the sound to an acoustic sensor. The acoustic sensor captures sound emitted from one or more sound sources in a local area and is configured to detect the sound and convert the detected sound into an electronic format (analog or digital). The acoustic sensor may be an acoustic wave sensor, a microphone, an acoustic transducer or similar sensor adapted to detect sound.

Although fig. 2 shows the microphone assembly 225 located at a single location on the headset 110, in some embodiments, the microphone assembly 225 includes a microphone array having a plurality of acoustic detection locations located on the headset 110. Each acoustic detection location may include an acoustic sensor or port. The acoustic detection locations may be placed on an outer surface of the headset 110, on an inner surface of the headset 110, separate from the headset 110 (e.g., part of some other device), or some combination thereof.

The transceiver 230 is configured to communicate with transceivers of other head mounted devices of other users. For example, the transceiver 230 may transmit data (e.g., audio corresponding to the user's voice) to and receive data (e.g., audio signals corresponding to the other user's voice) from other users ' headsets within the shared communication channel. In some embodiments, transceiver 230 may access a network (e.g., network 125) to communicate with an application server or console (e.g., an application server configured to maintain a shared communication channel). The transceiver 230 may include a transmitter, a receiver, or both.

The head-mounted device 110 also includes a speaker assembly configured to play back one or more audio signals as sound projected to a user of the head-mounted device 110. In some embodiments, the speaker assembly includes two or more speakers, which allows sound projected to the user (e.g., by adjusting the amplitude of the sound projected through each speaker 220) to be spatialized such that the sound may be heard by the user as originating from a particular location or direction in the local area. For example, as shown in fig. 2, the speaker assembly may include left and right speakers 220a and 220b corresponding to the left and right ears of the user. In some embodiments, the speaker may cover the user's ear (e.g., headphones) or be inserted into the user's ear (e.g., earbuds).

Although fig. 2 shows two speakers (e.g., left speaker 220a and right speaker 200b), in some embodiments, the speakers may include a speaker array that includes a plurality of sound emission locations on the head-mounted device 110. The acoustic emission location is the location of a speaker or port in the frame 105 of the head mounted device 110. In the case of an acoustic emission location, the port provides an output coupling point for sound from an acoustic waveguide that separates the speaker of the speaker array from the port. Sound emitted from the speaker travels through the acoustic waveguide and is then ported into a local area. In some embodiments, the acoustic emission locations are placed on an outer surface of the frame 205 (i.e., a surface that does not face the user), on an inner surface of the frame 205 (a surface that faces the user), or some combination thereof.

Although fig. 2 shows various components of the headset 110 in a particular arrangement, it should be understood that in other embodiments, the headset 110 may contain different components than those described herein, and the components of the headset 110 may have different structures or be arranged differently. In some embodiments, some of the functions discussed above may be performed by different components or combinations of components.

In the configuration shown, the audio system is embedded in the NED worn by the user. In an alternative embodiment, the audio system may be embedded in a Head Mounted Display (HMD) worn by the user. While the above description discusses audio components embedded in a head-mounted device worn by the user, it will be apparent to those skilled in the art that the audio components may be embedded in different head-mounted devices that may be worn elsewhere by the user or operated by the user without being worn.

Audio system

Fig. 3 illustrates a block diagram of an audio system 300 in accordance with one or more embodiments. The audio system 300 may be implemented as part of a headset (e.g., the headset 110) and may include a microphone component 225, a transceiver 230, a speaker component 330, and a controller 215. Some embodiments of audio system 300 have different components than those described herein. Similarly, functionality may be distributed among components in a manner different than that described herein. In some embodiments, some functions of the audio system may be part of different components (e.g., some functions may be part of the headset and some functions may be part of the console and/or server).

The microphone assembly 225 is configured to capture sound within a localized area of the user and generate audio signals corresponding to the captured sound. In some embodiments, the microphone component 225 is configured to capture the voice of the user and includes a plurality of microphones configured to be beam-formed (beam-formed) toward a particular portion of the local area (e.g., near the user's mouth) to increase detection of sounds spoken by the user of the headset. For example, each microphone generates an audio input signal corresponding to the sound detected by the microphone. By analyzing the audio input signal of each microphone, sounds originating from a particular one of the local areas of the user (e.g., near the user's mouth) may be identified. The controller 215 generates a user audio signal from the audio input signal by enhancing a portion of the audio input signal corresponding to sound originating from the particular region. In this way, the user audio signal may be generated such that it reflects sound originating at or near the user's mouth (e.g., sound corresponding to the user's speech). This may be useful because it allows a clear audio signal of the user's voice to be captured even in environments with a lot of sound from other sources (e.g., in crowded rooms).

The transceiver 230 is configured to transmit data to and receive data from other users within a shared communication channel of which the user is a part. For example, the transceiver 230 may receive audio data captured by the microphone component 225 (e.g., audio data corresponding to the user's own voice) and transmit the received audio data to transceivers on other users' headsets within the shared communication channel. In addition, the transceiver 230 receives audio data (referred to as audio output signals or audio signals) output by other users sharing the communication channel, which may be processed (e.g., by the local controller 215) and played (e.g., via the speaker 220) to the first user. The transceiver 230 transmits and receives information through electromagnetic waves. The electromagnetic wave may be, for example, Radio Frequency (RF), IR, or some combination thereof. In some embodiments, transceiver 230 communicates with transceivers of other users in the local area using RF and/or infrared communications. In some embodiments, multiple transceivers corresponding to multiple users' headsets may communicate with each other (e.g., via bluetooth or other type of protocol) to establish a local network. In some embodiments, the transceiver 230 may also communicate wirelessly (e.g., via Wi-Fi) with an application server over a network (e.g., the internet) or with a local console configured to maintain a shared communication channel. Further, in embodiments where the shared communication channel may include a remote user, the transceiver 230 may communicate with the remote user through an application server or a local console.

In some embodiments, the data transmitted and received by the transceiver 230 includes metadata corresponding to the transmitted/received audio data. The metadata may indicate a user identity (e.g., a user ID) associated with the audio data and information from which the user's location may be derived. For example, the metadata may include current location information of the user (e.g., determined by a location sensor on the user's headset). In some embodiments, the transceiver 230 of the first headset 110 includes an antenna array, each antenna being located at a different location on the first headset 110, such that the relative timing (timing) or phase of the signals received by each antenna from the transceiver of the second headset may be used to determine the relative position of the second headset.

The speaker assembly 330 is configured to play back one or more audio signals as sound projected to a user of the head-mounted device. As described above, in some embodiments, the speaker assembly 330 includes two or more speakers, which allows the sound projected to the user (e.g., by adjusting the amplitude of the sound projected through each speaker) to be spatialized such that the sound may appear to the user to originate from a particular location or direction in the local area.

The speaker may be, for example, a moving coil transducer, a piezoelectric transducer, some other device that generates acoustic pressure waves using an electrical signal, or some combination thereof. In some embodiments, the speaker assembly 330 also includes a speaker (e.g., headphones, earplugs, etc.) covering each ear. In other embodiments, the speaker assembly 330 does not include any speakers that block the user's ears (e.g., the speakers are on the frame of the headset).

The controller 215 includes circuit modules for operating the microphone assembly 225, the transceiver 230, and the speaker assembly 330. These circuit blocks may include data storage 335, channel configuration circuit 305, position tracking circuit 310, gaze determination circuit 315, signal manipulation circuit (signal manipulation circuit)320, and audio filtering circuit 325. Although fig. 3 shows the components of the controller 215 to correspond to different circuits, it should be understood that in other embodiments, the channel configuration circuit 305, the location tracking circuit 310, the gaze determination circuit 315, the signal manipulation circuit 320, and the audio filtering circuit 325 may be embodied in software (e.g., software modules), firmware, hardware, or any combination thereof.

The data store 335 stores data used by various other modules of the controller 215. The stored data may include one or more parameters of the shared communication channel (e.g., identities of other users in the shared communication channel, authentication information for accessing the shared communication channel, etc.). The stored data may include location information associated with the user (e.g., the user's position and posture as determined by the location sensor 240) and/or location information associated with the audio system of other users (e.g., received from other users' headsets). In some embodiments, the data store 335 may store one or more models of local regions. For example, the controller 215 may generate a model of the local area that indicates the location of the user and other users in the local environment, one or more objects in the local environment (e.g., detected using the camera component 235), and so on. The data store 335 may also store one or more eye tracking parameters (e.g., light patterns for eye tracking, models of the user's eyes, etc.), audio content (e.g., recorded audio data, received audio data, etc.), one or more parameters for spatializing the audio content (e.g., head-related transfer functions), one or more parameters for enhancing the audio content (e.g., algorithms for determining an attention score), one or more parameters for filtering the audio content, some other information used by the audio system 300, or some combination thereof.

The channel configuration circuit 305 is configured to maintain membership of a user in a shared communication channel. As used herein, for example, maintaining membership of a user in a shared communication channel may include: establishing a shared communication channel, adding and/or removing users as members to an existing shared communication channel, updating one or more parameters of the shared communication channel (e.g., via communication with an application server or with audio systems of other users in the shared communication channel), performing other actions associated with the shared communication channel, or some combination thereof.

In some embodiments, a user may establish a shared communication channel by providing information corresponding to one or more additional users to the channel configuration circuitry 305 (e.g., via a user interface, via a scanning device, etc.). In response, the channel configuration circuit 305 may establish a shared communication channel to include the user and one or more additional users. In some embodiments, channel configuration circuit 305 transmits data (e.g., via transceiver 230) to the channel configuration circuit associated with each additional user to establish the shared communication channel.

In some embodiments, the channel configuration circuitry associated with each user in the shared communication channel stores information corresponding to that channel in a respective data store (e.g., data store 335). The information may include the identities of other users within the shared communication channel, authentication information required to communicate over the shared communication channel, and so forth. In some embodiments, the channel configuration circuit 305 may detect a change in one or more channel parameters, such as a change in user membership of the channel (e.g., a new user joining the channel, a user exiting the channel, a change in user priority), a change in authentication information associated with the channel, a change in other parameters of the channel, or some combination thereof. In response to detecting a change in the channel parameters, the channel configuration circuitry 305 may communicate the change to the channel configuration circuitry of other users in the channel. In this way, the channel communication circuits of the users may coordinate with each other such that each channel communication circuit has access to the latest parameters regarding the shared communication channel so that each audio system 300 can communicate with the audio systems of the other users of that channel.

In other embodiments, the channel configuration circuit 305 communicates (via the transceiver 230) with an application server that coordinates the establishment of the shared communication channel (e.g., by communicating with the channel configuration circuit of the audio system of each user to be included in the shared communication channel). For example, the channel configuration circuit 305 communicates with the application server to indicate participation in the shared communication channel and to receive parameters associated with the shared communication channel (e.g., identities of other users within the shared communication channel, any authentication information required to communicate over the shared communication channel, etc.). Further, the channel communication circuit 305 may communicate with the application server to indicate any changes associated with the user's participation in the channel. The application server may be responsible for maintaining parameters of the shared communication channel and communicating these parameters to the channel communication circuits corresponding to the users participating in the channel to ensure that the channel communication circuits have access to the latest parameters of the channel.

The location tracking circuit 310 is configured to determine a current location of the user. The location tracking circuit 310 receives location information corresponding to the user's headset from a location sensor (e.g., location sensor 240) and determines a current location of the headset based on the received location information. The position of the user headset may indicate the position of the user within the local environment as well as the orientation of the user (e.g., the orientation of the headset on the user's head, also referred to below as the "head orientation" of the user). In some embodiments, the position of the user is calculated relative to a reference point. In some embodiments, one or more functions of the location tracking circuitry 310 are performed by the IMU.

The location tracking circuit 310 may also be configured to determine location information corresponding to other users sharing the communication channel. In some embodiments, location information corresponding to other users may be received directly from other users' headsets (e.g., via transceiver 230). For example, location information may be received as metadata accompanying audio data received from one or more other users sharing a communication channel, the location information indicating a current location of the user from whom the audio data was received (e.g., as determined by location tracking modules of the other users' headsets). In some embodiments, the location tracking circuit 310 uses the obtained location information of the other users to determine the relative location of each of the other users with respect to the current location of the user. In some embodiments, the location tracking circuit 310 may use the determined locations of other users to generate or update a model of the local region.

In other embodiments, the location tracking circuit 310 determines the location of the other user based on analyzing signals received from multiple antennas in an antenna array on the other user's headset. For example, in some embodiments, the transceiver 230 of the audio system 300 of the first headset includes an antenna array, each antenna located at a different location on the first headset. The position tracking circuit 310 of the first headset analyzes the signals received at each antenna of the array from the transceiver of the second headset and determines the relative position of the second headset based on the relative timing or phase of the received signals. In other embodiments, the transceiver 230 receives a plurality of different signals transmitted by a transceiver of a second headset, wherein the transceiver of the second headset is an antenna array comprising a plurality of antennas located at different locations on the second headset. The position tracking circuit 310 analyzes the received signals (e.g., the timing or phase of the received signals) and may thereby determine the relative position of the second headset with respect to the first headset.

The gaze determination circuitry 315 is configured to determine a gaze direction of a user wearing the headset (e.g., based on eye tracking measurements of the eye tracking sensor 245, such as captured images of the user's eyes). As used herein, the direction of the user's gaze corresponds to the direction the user is looking in the local environment. In some embodiments, the user's gaze direction is determined based on a combination of the user's head orientation and the user's eye position. For example, gaze determination circuitry 315 may receive one or more eye tracking measurements (e.g., one or more images of the user's eyes captured by an eye tracking camera) from eye tracking sensor 245 to determine a current eye orientation of the user, receive a head orientation of the user (e.g., determined by location tracking circuitry 310), and modify the head orientation of the user with the determined eye orientation to determine a gaze direction of the user within the local environment. For example, the user's head may face in a first direction. However, if the user's eyes are oriented to look away from the first direction (e.g., not looking straight ahead), the user's gaze direction will be different than the user's head orientation.

In some embodiments, the gaze determination circuitry 315 may also receive one or more images of a local region within the FOV of the camera from the camera component 235 and map the determined eye orientation to a location within the received image. The gaze determination circuitry may use object recognition to identify one or more objects (e.g., other users) within the one or more images that correspond to the mapped locations to determine whether the gaze direction of the user is aligned with the one or more identified objects. In some embodiments, the identified objects may be used to generate or update a model of the local region. For example, the location of recognized objects (e.g., other users) within one or more images may be used to determine whether the user is looking at any of the recognized objects, where the recognized objects are located relative to the user, whether the user has a line of sight to the recognized objects, and so forth.

Although fig. 3 shows location tracking circuit 310 and gaze determination circuit 315 as separate modules, in some embodiments, location tracking circuit 310 and gaze determination circuit 315 may be implemented as one single module. For example, a single gaze determination circuit may receive sensor measurements (e.g., location data from location sensor 240 and eye tracking data from eye tracking sensor 245) to determine a location of the user, an orientation of the user's head, and an orientation of the user's eyes relative to their head, from which a gaze direction of the user may be determined.

The signal manipulation circuit 320 is configured to receive one or more audio signals received via the transceiver 230, each audio signal corresponding to an audio system of another user sharing the communication channel (referred to as a "transmitting audio system"), and process the signals to generate audio data to be presented to the user based on the relative positions of the other audio systems with respect to the user.

The signal manipulation circuit 320 identifies the relative position of the sending user with respect to the current position of the user. The location information may be received from the location determination module 310. In some embodiments, signal manipulation circuit 320 accesses a model of the local area containing the location information for each user in the local area to determine the relative location of the sending user. Further, the signal manipulation circuit 320 may receive an indication of the user's current gaze direction from the gaze determination circuit 315. Based on the relative position of the user, the signal manipulation circuit 320 may spatialize the audio signal from the sending user so that when sound is played to the user (e.g., via the speaker 220), it will appear to originate from the sending user's location.

In some embodiments, the signal manipulation circuit 320 spatializes the audio signal based on one or more generated acoustic transfer functions associated with the audio system. The acoustic transfer function may be a Head Related Transfer Function (HRTF) or other type of acoustic transfer function. The HRTF characterizes how the ear receives sound from a point in space. The HRTF for a particular source position of a person is unique to each ear of the person (and unique to the person) because human anatomy (e.g., ear shape, shoulders, etc.) can affect sound as it travels to the person's ears. For example, in some embodiments, signal manipulation circuit 320 may generate two sets of HRTFs for the user, one for each ear, corresponding to various frequencies and relative positions. One HRTF or a pair of HRTFs may be used to create audio content that includes sound that appears to originate from a particular point in space (e.g., from a location where the audio system is being transmitted). Several HRTFs may be used to create surround sound audio content (e.g., for home entertainment systems, theater speaker systems, immersive environments, etc.), where each HRTF or each pair of HRTFs corresponds to a different point in space, such that the audio content appears to come from several different points in space. A further example of generating HRTFs is described in U.S. patent application No. 16/015,879, entitled "Audio System for Dynamic Determination of qualified Acoustic Transfer Functions," which is hereby incorporated by reference in its entirety.

In some embodiments, the signal manipulation circuit 320 may enhance the audio signal based on the position of the transmitting user relative to the direction of enhancement. As used herein, an enhanced direction of a user may refer to a direction in which the user is inferred to be paying attention. In some embodiments, the direction of augmentation of the user may correspond to a direction of gaze of the user. In other embodiments, the direction of enhancement may be based on the orientation of the user's head, the orientation of the user's head relative to their torso, and the like. For ease of discussion, the enhancement direction will be discussed primarily as corresponding to the gaze direction, but it should be understood that in other embodiments, the enhancement direction may correspond to other directions relative to the user.

As used herein, enhancing an audio signal may refer to positively enhancing the audio signal (e.g., increasing the amplitude of the audio signal relative to other sounds or audio signals) or negatively enhancing the audio signal (e.g., decreasing the amplitude of the audio signal relative to other sounds or audio signals). For example, in some embodiments, audio signals from a sending user that the user is looking at (e.g., suggesting that the user is focusing on the sending user) are positively enhanced, while audio signals from other sending users that the user is not looking at are negatively enhanced, as determined based on the user's gaze direction. This may allow a user to more easily focus on speech from a particular user (e.g., the sending user they are focusing on), while speech from other users is less distracting, especially if a large number of users are speaking simultaneously. In some embodiments, the signal manipulation circuit 320 enhances each received audio signal based on the "attention score" calculated for each transmitting user, which will be described in more detail below with reference to fig. 4.

Since the user's ears are located at fixed positions on the user's head, the signal manipulation circuit 320 may spatialize the received audio data based on the user's head orientation. On the other hand, the signal manipulation circuit 320 enhances the audio data based on the gaze direction of the user to better emphasize audio data originating from other users that the user is actually looking at or is paying attention to.

Although the above discussion primarily relates to enhancing audio data based on a user's gaze direction, in other embodiments, the enhancement of audio data may be based on other directions, such as a user's head direction, a user's head direction modified according to the angle of the user's head relative to their torso, or some combination thereof.

The signal manipulation circuit 320 also outputs the spatialized and enhanced audio signals to the speakers of the speaker assembly 330. For example, based on the spatialization and/or enhancement performed, the signal manipulation circuit 320 may output audio signals of different amplitudes to each speaker of the speaker assembly 320.

The audio filtering circuit 325 is configured to receive a user audio signal (e.g., captured by the microphone assembly 225) corresponding to the user's speech and perform filtering on the user audio signal. The user audio signals may be transmitted to other users in the shared communication channel. Further, in some embodiments, user audio signals may also be played back to the user through the speaker assembly 330.

In some embodiments, because users sharing a communication channel may be in close proximity to each other, the users may be able to hear the actual sounds of the sending user's voice, as well as be able to receive audio data corresponding to the sending user's voice through their headset. Because of the time required to process the received audio signal, audio data may be presented to the user (e.g., through the speaker assembly 330) after the transmitting user's voice may be heard at the user's location. The delay between the time that the actual voice of the sending user can be heard at the user's location and the time that the audio data of the sending user is played to the user through the speaker assembly 330 is referred to as the processing delay. If the processing delay exceeds a certain amount of time, the audio data presented to the first user may sound like an echo to the first user. This creates undesirable audio effects that may distract the user. For example, in some embodiments, echo effects are generated when the processing delay is greater than 10 to 15 milliseconds.

In some embodiments, the audio filtering circuit 325 includes an all-pass filter that manipulates the phase of the user audio signal to produce a temporally dispersed user audio signal (hereinafter referred to as a "diffused user audio signal"). The diffuse user audio signal may comprise a plurality of diffuse reflections of the user audio signal having the same total energy as the original unfiltered signal. For sounds corresponding to speech, diffusing the user audio signal allows it to be less likely to be detected by the human ear as a separate auditory event than an unfiltered signal. This allows the user audio signal to experience a longer processing delay before it will be detected as a separate echo by other users receiving the user audio signal. An example of diffusing the user audio signal is described in more detail below with reference to fig. 5. Although the present discussion relates to the audio filter circuit 325 performing time dispersion on the user audio signal in preparation for transmission of the user audio signal to other users within the shared communication channel, in some embodiments, the audio filter circuit 325 performs time dispersion on audio signals received from the audio systems of other users, rather than on the user audio signal, prior to playback to the user. In some embodiments, other filtering techniques may be used. For example, in some embodiments, the audio filter circuit 325 may modify the frequency magnitude spectrum of the user audio signal to replace or supplement temporally diffusing the user audio signal.

In some embodiments, the audio filter circuit 325 also filters the user audio signal to generate a modified user audio signal to be played back to the user of the audio system 300. When a user speaks in a noisy environment and/or speaks into a microphone, the user may often not be aware of the volume of their own speaking voice because the voice of their own speaking voice is buried in the noisy environment. Thus, users may inadvertently increase their speech sound beyond what is needed.

To prevent a user from attempting to speak loudly to cover ambient noise, a version of the user's audio signal may be played back to the user so that the user can more accurately assess their speaking sound volume. Because a person hears their own speech sounds differently from their speech sounds captured by the microphone (e.g., because vibrations in their skull caused by the user's vocal cords reach their ears in addition to the sound waves through the air), the user audio signal can be modified so that the user recognizes the sound of the user audio signal as their own speech sounds. In some embodiments, the user audio signal is passed through one or more filters that approximate the effect of skull vibration on the user's speaking voice as perceived by the user. In some embodiments, the one or more filters are configured to be generally applicable to most people (e.g., based on average skull shape and size). In other embodiments, one or more filters may be customized based on one or more user settings. For example, a user of the headset 110 may configure one or more settings of the filter during setup to more closely approximate how they hear their speaking voice. In some embodiments, the filter may comprise a low pass filter, wherein the user can adjust the slope and cut-off frequency of the filter. In some embodiments, the filter may include a series of one or more tunable biquad filters, FIR filters, or some combination thereof.

In this way, by feeding back a filtered version of the user's audio signal to the user, the user can more accurately assess the volume of their speech sound even in noisy environments, and can avoid unnecessarily increasing their speech sound. In some embodiments, the filtered version of the user audio signal is played back to the user only when the amount of noise in the environment (e.g., measured using the microphone assembly) exceeds a threshold, such that the filtered user audio signal is not played back to the user if the amount of noise in the environment is such that the user can be expected to hear their own spoken sounds.

Location-based audio spatialization and enhancement

Fig. 4 illustrates an example of an environment with multiple users utilizing a shared communication channel in accordance with one or more embodiments. The environment contains a plurality of users, including a first user 405A and additional users 405B, 405C, and 405D (collectively users 405), which are part of a shared communication channel. Each user 405 wears a respective head mounted device 410, the head mounted device 410 containing an audio system that the user uses to communicate with other users sharing a communication channel. For ease of explanation, only the head mounted device 410 worn by the first user 405A is labeled in fig. 4.

The head mounted device 410 of the first user 405A includes a location sensor and an eye tracking sensor configured to determine a location and a gaze direction of the first user 405A, which may be used to determine the augmented direction 415 of the first user 405A. Although fig. 4 shows the augmentation direction 415 of the first user 405A aligned with the orientation of the head mounted device 410 and the head of the user 405A, the augmentation direction 415 may not be aligned with the orientation of the head of the user 405A. For example, in some embodiments, the enhancement direction 415 may correspond to a gaze direction of the user 405A. As such, as user 405A moves their eyes, enhancement direction 415 may change even though the position of user 405A and the orientation of user 405A's head remain stationary. In other embodiments, the augmentation direction of the user 405A may correspond to a head direction of the user (e.g., based on an orientation of the user's head), a head direction of the user modified according to an angle between the orientation of the user's head and torso (e.g., the augmentation direction 415 deviates from the head direction of the user as the angle between the user's head and torso increases), or some combination thereof.

Each of the other users 405B, 405C, and 405D within the environment may be transmitting users. In response to the voice of each of the users 405B, 405C, or 405D, audio data is recorded (e.g., by their respective headsets) and transmitted to the headset 410 of the first user 405A (and other users participating in the channel). The signal manipulation circuit 320 of the head mounted device 410 analyzes the relative position of each other user to determine how each user's audio signal should be manipulated.

In some embodiments, the audio system of the headset 410 of the first user 405A determines location information corresponding to each transmitting user transmitting audio signals to the user 405A, and determines, for each transmitting user, a relative location of the transmitting user with respect to the head orientation of the first user 405A, and a deviation between the location of the transmitting user with respect to the direction of enhancement 415 of the first user 405A.

The audio system uses the relative position of the transmitting user with respect to the head orientation of the first user 405A to spatialize the audio signal received from the transmitting user. Using the determined relative position and the determined current head orientation of the user 405A, the audio system spatializes the audio signal such that when projected to the user 405A via the speaker assembly of the headset 410, the sound of the audio signal appears to originate from the corresponding transmitting user's location. In some embodiments, the audio system spatializes the audio signal by setting one or more weights corresponding to each speaker of the speaker assembly. In some embodiments, the audio system uses HRTFs to spatialize the audio signal. By adjusting the amplitude of the audio signal projected to the user 405A through each speaker of the speaker assembly, the generated sound may be made to appear to originate from different locations (e.g., corresponding to the location of the transmitting user).

For example, as shown in FIG. 4, user 405B is located directly in front of the user. In this way, the audio signal from the user 405B is spatialized such that the generated sound is perceived by the user 405A as originating from in front of the user 405A. On the other hand, user 405C and user 405D are located on the left and right sides of user 405A, respectively. In this way, the audio system spatializes the respective audio signals such that the audio corresponding to users 405C and 405D appears to originate from the respective locations of users 405C and 405D.

In some embodiments, no spatialization is performed on the audio signal received by user 405A from the transmitting user, which is the remote user. In other embodiments, spatialization may be performed on audio signals received from a particular type of remote user (e.g., a remote user associated with a location within a threshold distance from the user 405A).

Further, in some embodiments, if there is no line of sight between the user 405A and the sending user, no spatialization is performed. For example, in some embodiments, the audio system may know (e.g., determined using the camera component 235 or other type of sensor) a particular type of object within the local area, such as a wall. If a vector 425 between the user 405A and the sending user intersects such an object, indicating a lack of line of sight between the user 405A and the sending user, the audio signal from the sending user may not be spatialized. In some embodiments, the audio signal from the transmitting user without line of sight may be spatialized if the distance between the user 405A and the transmitting user is less than a threshold amount, but may not be spatialized if the distance is greater than a threshold amount. The threshold amount may be a predetermined amount or may be dynamically determined based on one or more user inputs, one or more determined attributes of the local area (e.g., the size of the room), or some combination thereof.

Further, the audio system enhances each received audio signal based on the deviation of each respective transmitting user's location from the direction of enhancement 415 of user 405A. As used herein, the deviation of the location of the sending user (e.g., user 405C) from the direction of augmentation of user 405A may be determined based on an angle measured between the direction of augmentation 415 of user 405A and a vector 425 connecting users 405A and 405C. In some embodiments, the audio system may also enhance each received audio signal based on the distance of each respective transmitting user's location relative to user 405A (e.g., audio signals from transmitting users closer to user 405A are enhanced more than audio signals from transmitting users further away).

In the case where multiple audio signals from multiple other users are received and projected to the user 405A, even if the audio signals are spatialized, it may be difficult for the user 405A to focus on the voice of any one user. By selectively enhancing the received audio signal, the user 405A may more easily focus on speech from other users that they are focusing on, while being less distracted by speech from users that they are not.

In some embodiments, it may be inferred which sending user the user 405A is focusing on based on the direction of enhancement 415 of the user 405A. For example, if the direction of augmentation 415 of the user 405A is aligned with the position of another user, the user 405A may be inferred as being focused on the user. For example, as shown in FIG. 4, user 405A may be inferred as being focused on user 405B. In some embodiments, user 405A may be inferred to be focusing on another user if the location of the other user is within a threshold deviation 420 relative to the direction of enhancement 415. For example, as shown in fig. 4, user 405A may be inferred as not paying attention to users 405C and 405D because users 405C and 405D are more than a threshold deviation away from the direction of enhancement 415. In some embodiments, if there are multiple transmitting users within a threshold deviation 420 from the enhancement direction 415, user 405A may be considered to be focusing on the transmitting user at the location closest to user 405A, the transmitting user at the location with the least deviation from the enhancement direction 415, or some combination thereof.

In some embodiments, an "attention score" may be calculated for each other user sharing the communication channel. The attention score may be used as a metric indicating the degree to which the user may be inferred as being paying attention to another user in order to determine how much to enhance the audio signal received from the other user. The attention score of a particular user may be based on a deviation of the user's location from the direction of enhancement 415 of the first user 405A, a distance of the user's location from the location of the first user 405A, or some combination thereof. In some embodiments, the attention score is determined as a set (e.g., a weighted sum) of one or more factors.

The audio system enhances the audio signal received from the sending user based on whether the user 405A is focusing on the sending user (e.g., based on the sending user's attention score). For example, if the user 405A is inferred to be focusing on the sending user, the audio system is enhancing the audio signal, whereas if the user 405A is inferred to be not focusing on the sending user, the audio system is not enhancing the audio signal. Further, in some embodiments, if the user 405A is inferred to be not paying attention to the transmitting user, the audio signal may be negatively enhanced in order to minimize interference with the audio signal originating from the transmitting user that the user 405A is paying attention to. In some embodiments, the audio signal of the transmitting user may be enhanced based on whether there is another transmitting user that the user 405A is determined to be focusing on (e.g., the audio signal from the user 405C is negatively enhanced if the user 405A is inferred to be focusing on the user 405B, but not negatively enhanced if there is no user within the threshold deviation 420 of the enhancement direction 415).

In some embodiments, the audio system enhances the received audio signal based on the corresponding sending user's attention score. In some embodiments, the amount of enhancement may also be based on the attention scores of other users (e.g., the ranking of the sending user's score relative to other sending users). For example, in the example shown in fig. 4, the audio system of the headset 410 may determine the degree to which to enhance the audio signals from the transmitting users 405B and 405C by comparing the deviation of the location of each transmitting user relative to the direction of enhancement 415 of the user 405A, and enhance each audio signal based on the results of the comparison. For example, in some embodiments, the audio signal from the first transmitting user may be enhanced less if there is a second transmitting user with a higher attention score (e.g., due to a lower deviation from the user's gaze direction) than if there is no second transmitting user (e.g., no audio signal is currently being transmitted) or if the second transmitting user has a lower attention score than the first transmitting user.

Because the sending user's attention score is based on the direction of enhancement 415 of the user 405A, when the direction of enhancement 415 of the user 405A changes (e.g., due to movement of their head or eyes), the attention score of each sending user may be adjusted accordingly, resulting in a different amount of enhancement of their respective audio signals. In some embodiments, the attention score of each sending user is updated periodically. In some embodiments, the attention score of the sending user is updated if the audio system detects that the direction of enhancement 415 of the user 405A has changed by more than a threshold amount.

In embodiments where enhancement direction 415 corresponds to a user's gaze direction, enhancement direction 415 may change very quickly because the eyes of user 405A may move very quickly. In some embodiments, the enhancement direction 415 is not updated unless the gaze of the user 405A has not changed by more than a threshold amount for at least a threshold period of time in order to reduce the effect of random eye movements of the user 405A.

In some embodiments, the attention score of the sending user may also be based on the direction of enhancement of the sending user. For example, if the direction of enhancement of the transmitting user is facing the user 405A, the audio signal corresponding to the transmitting user may be modified to be stronger by the signal manipulation circuit 320 than if the direction of gaze of the transmitting user was not facing the user 405A. For example, as shown in FIG. 4, even though both users 405C and 405D have similar magnitudes of deviation from the enhancement direction 415 of user 405A, the audio signal from user 405C may be enhanced more than the audio signal from user 405D. In some embodiments, the weight of the sending user's orientation or gaze direction on the sending user's attention score may vary based on the deviation of the sending user's location from the augmentation direction 415.

In some embodiments, where the shared communication channel has one or more remote users, the signal manipulation circuit 320 may enhance the audio signal from the remote users based on whether the user 405A is currently focusing on another user in the local area. In some embodiments, user 405A may indicate, via a user interface, one or more modifications for how to enhance the audio signal from a particular transmitting user.

By processing (e.g., spatializing and/or enhancing) the received audio signals based on the relative positions of the respective transmitting users, the signal manipulation circuit 320 thus makes it easier for the user 405A to hear and focus on audio from other users that the user is focusing on (e.g., by being enhancing audio signals from those users), as well as allowing the user 405A to better perceive where other users from whom audio signals are received are located.

Audio filtering for echo reduction

Fig. 5 shows a diagram of filtering a user audio signal in accordance with one or more embodiments. Fig. 5 shows a first graph 505 showing an audio signal measured at the ear canal opening of a first user. The audio system of the first user communicates with the audio system of the second user over a shared communication channel. The audio signals include a real audio signal 510 and a transmitted audio signal 515. The real audio signal 510 corresponds to a sound pressure wave originating from the second user and measured at the ear canal of the user (i.e. the user hears the speech of the second user). The transmitted audio signal 515 corresponds to an unfiltered audio signal corresponding to the voice of the second user that was recorded (e.g., as the second user's user audio signal), transmitted to the first user's audio system, and played back to the first user through one or more speakers. Due to processing delays associated with recording, transmitting, processing, and playing back the transmitted audio signal, the transmitted audio signal 515 may be detected (i.e., audible to the user) at the ear canal later than the real audio 510 by an amount of time corresponding to the processing delay Δ Τ. If the processing delay deltat is a certain amount of time (e.g., 10-15 milliseconds), the first user may hear the transmitted audio signal 515 as a separate auditory event as the real audio signal 510, which may create an echoic effect distracting the first user.

The second graph 520 shows audio measured at the location of the first user when the transmitted audio is filtered using an all-pass filter to diffuse the audio signal. As shown in the second graph 520, the same real audio signal 510 is heard at the location of the first user. However, the transmit audio signal has been filtered to produce a filtered transmit audio signal 525 that includes a plurality of diffuse reflections. Even if the filtered transmitted audio signal 525 is not heard until a Δ T after the real audio 510, the diffusion of the transmitted audio signal 525 may cause the first user to interpret the real audio signal 510 and the filtered transmitted audio signal 525 as part of the same auditory event, thereby reducing or eliminating undesirable echo effects. In this way, by filtering the audio signal, longer processing delays can be accommodated without producing undesirable echo effects to the user. In some embodiments, the audio signal is filtered at the transmitting user's headset before being transmitted to other users sharing the communication channel. In other embodiments, the audio signal is filtered at the headset of the user receiving the audio signal. In some embodiments where filtering is performed on the receiver side, the audio system of the receiving headset may determine a delay between the real audio and the transmitted audio and adjust one or more filtering parameters (e.g., an amount of dispersion) based on the determined delay.

In some cases, the first user and the second user may be at a distance from each other such that the transmitted audio 525 is heard at the first user's location before the real audio 510. In some embodiments, the audio system does not perform diffusion filtering on the transmitted audio if the sending user is determined to be at least a threshold distance from the user.

In embodiments where the shared communication channel includes at least one remote user, the audio signals transmitted between the remote and non-remote users need not undergo filtering, since the remote user does not hear the true audio of the non-remote user (and vice versa), so there is no echo effect caused by processing delays. Further, in some embodiments, if it is determined that the distance between the second user and the first user is at least a threshold amount or that a particular structure (e.g., a wall) exists between the first and second users such that the first user may be inferred as not hearing the second user's true audio, the audio from the second user may not be filtered.

Channel priority

In some embodiments, different users on the shared communication channel may be given different priorities. As used herein, the priority of a user sharing a communication channel may indicate a level at which audio signals corresponding to the user's voice are enhanced relative to audio signals corresponding to other users, where audio signals from users having a higher priority are being enhanced relative to users of a lower priority. In some embodiments, the shared communication channel may include a first group of users corresponding to a base priority and at least one user (e.g., a designated speaker or leader) associated with a high priority that is considered to be prioritized over the base priority.

For example, when a user associated with a high priority (hereinafter referred to as a "priority user") is not speaking, the audio signals received by the first user corresponding to the base priority users of the shared communication channel may be processed normally (e.g., spatialized and enhanced based on the relative locations of the users), as described above. However, after the priority user speaks, the audio signal received by the first user corresponding to the priority user is enhanced regardless of the relative positions of the first user and the priority user. Further, during the time that the audio signal from the priority user is played to the first user, the audio signal from the base priority user may be attenuated to ensure that the first user can clearly hear the voice of the priority user.

In some embodiments, users sharing a communication channel may be organized into more than two different priorities. The audio signal from a user with a higher priority is enhanced relative to the audio signal from a user with a lower priority, allowing the user to hear the voice of the higher priority user more clearly when the higher priority user speaks. In some embodiments, each user sharing a communication channel may assign a personalized priority to other users of the channel based on which other users they are most interested in.

Process flow

Fig. 6 is a flow diagram of a process for spatializing and enhancing audio data received from other users in a shared communication channel, according to one or more embodiments. The process may be performed by a headset that includes an audio system (e.g., audio system 300). The first user's headset is engaged in a shared communication channel (e.g., headset 110 of user 105 shown in fig. 1, where user 105 is part of shared communication channel 120A). In other embodiments, other entities may perform some or all of the steps of the process (e.g., the console). Likewise, embodiments may include different and/or additional steps, or perform the steps in a different order.

The first user's headset determines 605 an enhancement direction for the first user. In some embodiments, the enhancement direction corresponds to a user's gaze direction, the head-mounted device comprises an eye tracking sensor and a location sensor for determining the user's gaze direction. For example, the position sensor may determine the position and orientation of the head mounted device, from which the position and orientation of the first user's head may be inferred. Further, an eye tracking sensor may be used to determine an orientation of the first user's eyes relative to their head. In this way, a combination of a position sensor and an eye tracking sensor may be used to determine the direction of the first user's gaze.

A headset receives 610 audio signals from one or more transmitting users sharing a communication channel (e.g., via a transceiver). The audio signal may correspond to the voice of the sending user and may include further metadata, such as the identity of the sending user and data from which the location of the sending user may be determined.

The head mounted device determines 615 a location associated with each sending user from which the audio signal was received. In some embodiments, the headset receives metadata associated with the audio signal indicative of the location of the transmitting user (e.g., determined by a location sensor on the transmitting user's headset). In other embodiments, the headset receives multiple signals transmitted by multiple antennas (e.g., antenna arrays) located at different locations on the transmitting user's headset. Based on the phase or timing of the received signals, the head-mounted device may determine the relative position of the transmitting user with respect to the first user.

The head mounted device determines 620 the relative position of each transmitting user with respect to the first user. The relative position of the sending user may indicate where the sending user is located relative to the first user based on the head orientation of the first user (e.g., in front of the first user, to the left of the first user, etc.).

The head mounted device determines 625 a deviation between the location of each sending user and the enhancement direction of the first user. The deviation indicates the position of the sending user relative to the direction of enhancement of the first user. In some embodiments, additionally, the headset controller may determine a distance between the sending user and the first user.

The headset spatializes 630 the audio signals of each transmitting user based on the location of the corresponding transmitting user relative to the first user such that the audio signals played to the first user through the two or more speakers sound as if they originated from a particular location (e.g., the location of the transmitting user). In some embodiments, spatializing the audio signal includes configuring the amplitude of the audio signal played through each speaker such that a user can interpret different amplitudes of sound through different speakers as corresponding to sound originating from a particular location.

The headset enhances 635 the audio signal of each transmitting user based on the deviation of the position of the respective transmitting user with respect to the enhancing direction of the first user. In some embodiments, the audio signal of the transmitting user is enhanced if the position of the transmitting user does not deviate by more than a threshold amount. In some embodiments, the magnitude of the boost may be inversely proportional to the amount of deviation between the location of the transmitting user and the direction of the boost of the first user. In this way, the audio signal from the transmitting user will be enhanced more strongly when the transmitting user is located closer to the enhancement direction of the first user than in the case where the transmitting user is located further away from the enhancement direction. In some embodiments, the amount of enhancement may also be based on the number of audio signals currently received from other transmitting users, the distance between the transmitting user and the first user, and so forth. In some embodiments, the enhancement of the audio signal may include an attenuation (e.g., negative enhancement) of the audio signal.

In this way, by spatializing and enhancing the audio signals received from other users in the shared communication channel, the user of the headset may more easily focus on the voices of the other users they are focusing on, as well as allowing the user to perceive the location of each other user from whom the audio signals are received. This allows users to hear more clearly the speech they wish to focus on even in a noisy environment, while maintaining the perception of other users in the environment.

FIG. 7 is a flow diagram of a process for processing an audio signal corresponding to a user's speaking voice in accordance with one or more embodiments. The process may be performed by a headset that includes an audio system (e.g., audio system 300). The first user's headset is engaged in a shared communication channel (e.g., headset 110 of user 105 shown in fig. 1, where user 105 is part of shared communication channel 120A). In other embodiments, other entities may perform some or all of the steps of the process (e.g., the console). Likewise, embodiments may include different and/or additional steps, or perform the steps in a different order.

The head mounted device receives 705 a user audio signal corresponding to the speech of the user of the head mounted device. In some embodiments, the user audio signal is recorded by an acoustic sensor (e.g., acoustic sensor 225) located near the user's mouth. In some embodiments, the user audio signals are generated by a microphone array that uses beamforming to isolate and capture sound from a particular area of the local area (e.g., near the user's mouth).

The head-mounted device applies 710 one or more filters (e.g., all-pass filters) to the user audio signal that temporally disperse the user audio signal to produce a diffuse user audio signal.

The headset transmits 715 the diffuse user audio signal to the other users' headsets in the shared communication channel. By temporally dispersing the user audio signals, the amount of processing delay between the time when another user hears the user's real voice and the time when they hear the user audio signals transmitted to them played through one or more speakers may be increased without causing the other user to hear the user audio signals as a separate auditory event, creating an undesirable echo effect.

In some embodiments, rather than temporally dispersing the user audio signals and transmitting the dispersed user audio signals to the other user's headset, the temporal dispersion of the audio signals is performed by the headset receiving the audio signals. In some embodiments, the user audio signals are dispersed in time based on one or more filtering parameters that may be adjusted based on the relative position or distance between the transmitting user and the receiving user's headset.

The head-mounted device applies 720 a speech filter to the user audio signal to produce an altered version of the user audio signal. The speech filter is configured to simulate the following effects: when a person speaks, the vocal cord vibrations of the person, which are transmitted through the skull, affect how they hear their own voice. In some embodiments, a user may manually configure one or more parameters of the speech filter so that the altered user audio signal more closely matches how they hear their own speech.

The headset plays back 720 the altered user audio signals to the user (e.g., through one or more speakers), allowing the user to better perceive the current volume of their speech sounds so that they can better adjust their speech volume.

Examples of Artificial reality systems

Fig. 8 is a system environment including a headset as the audio system described above, in accordance with one or more embodiments. The system 800 may operate in an artificial reality environment (e.g., virtual reality, augmented reality, mixed reality environment, or some combination thereof). The system 800 shown in fig. 8 includes a headset 805 and an input/output (I/O) interface 815 coupled to a console 810. The headset 805 may be an embodiment of the headset 110. Although fig. 8 illustrates an example system 800 including one headset 805 and one I/O interface 815, in other embodiments any number of these components may be included in the system 800. For example, there may be multiple headsets 805, each headset 805 having an associated I/O interface 815, each headset 805 and I/O interface 815 communicating with the console 810. In alternative configurations, different and/or additional components may be included in system 800. Further, in some embodiments, the functionality described in connection with one or more of the components shown in fig. 8 may be distributed between the components in a different manner than that described in connection with fig. 8. For example, some or all of the functionality of the console 810 is provided by the headset 805.

The head mounted device 805 presents content to the user that includes a view of a physical, real-world environment enhanced with computer-generated elements (e.g., two-dimensional (2D) or three-dimensional (3D) images, 2D or 3D video, sound, etc.). The head mounted device 805 may be an eyewear device or a head mounted display. In some embodiments, the presented content includes audio content (audio signals received from other users sharing a communication channel).

The head-mounted device 805 includes an audio system 820, a sensor system 825, an electronic display 830, and an optics block 835. The audio system 820 may correspond to the audio system 300 described in fig. 3 and may include the microphone assembly 225, the transceiver 230, the speaker assembly 330, and the controller 215. The audio system 820 is configured to communicate with audio systems of other HMDs, capture audio signals corresponding to the voice of the user of the HMD805, process the received audio signals (e.g., audio signals from other HMDs), and play back the processed audio signals to the user.

The sensor system 825 includes one or more sensor modules, which may include a camera assembly 235, a position sensor 240, and an eye tracking sensor 245. The sensor module may be used to generate information for a local area around HMD805, as well as track the location of HMD805 and the gaze direction of the user of HMD 805. In some embodiments, the sensors of sensor system 825 may be used with tracking module 855 to track the location of HMD 805.

Electronic display 830 and optics block 835 are one embodiment of lens 210. Some embodiments of the headset 805 have different components than those described in connection with fig. 8. Furthermore, in other embodiments, the functionality provided by the various components described in connection with fig. 8 may be distributed differently between components of the headset 805 or captured in a separate component remote from the headset 805.

Electronic display 830 displays 2D or 3D images to the user based on data received from console 810. In various embodiments, electronic display 830 comprises a single electronic display or multiple electronic displays (e.g., one display for each eye of the user). Examples of electronic display 830 include: a Liquid Crystal Display (LCD), an Organic Light Emitting Diode (OLED) display, an active matrix organic light emitting diode display (AMOLED), a waveguide display, some other display, or some combination thereof.

In some embodiments, optics block 835 amplifies the image light received from electronic display 830, corrects optical errors associated with the image light, and presents the corrected image light to a user of headset 805. In various embodiments, optics block 835 includes one or more optical elements. Example optical elements included in the optics block 835 include: a waveguide, an aperture, a fresnel lens, a convex lens, a concave lens, a filter, a reflective surface, or any other suitable optical element that affects image light. Further, the optics block 835 may include a combination of different optical elements. In some embodiments, one or more optical elements in optical block 835 can have one or more coatings, such as a partially reflective or anti-reflective coating.

The magnification and focusing of image light by optics block 835 allows electronic display 830 to be physically smaller, lighter in weight, and consume less power than larger displays. In addition, the magnification may increase the field of view of the content presented by electronic display 830. For example, the field of view of the displayed content is such that the displayed content is presented using nearly all (e.g., about 110 degrees diagonal), and in some cases, all of the user's field of view. Further, in some embodiments, the amount of magnification may be adjusted by adding or removing optical elements.

In some embodiments, optical block 835 may be designed to correct one or more types of optical errors. Examples of optical errors include barrel or pincushion distortion, longitudinal chromatic aberration, or lateral chromatic aberration. Other types of optical errors may further include spherical aberration, chromatic aberration, or errors due to lens field curvature, astigmatism, or any other type of optical error. In some embodiments, the content provided to electronic display 830 for display is pre-distorted, and optics block 835 corrects the distortion as it receives content-generated image light from electronic display 830.

The I/O interface 815 is a device that allows a user to send action requests and receive responses from the console 810. An action request is a request to perform a particular action. For example, the action request may be an instruction to begin or end the capture of image or video data, or an instruction to perform a particular action within an application. The I/O interface 815 may include one or more input devices. An example input device includes: a keyboard, mouse, hand controller, or any other suitable device for receiving and transmitting an action request to console 810. The action request received by the I/O interface 815 is communicated to the console 810, and the console 810 performs an action corresponding to the action request. In some embodiments, the I/O interface 815 includes one or more position sensors that capture calibration data indicative of an estimated position of the I/O interface 815 relative to an initial position of the I/O interface 815. In some embodiments, the I/O interface 815 may provide haptic feedback to the user according to instructions received from the console 810. For example, haptic feedback is provided when an action request is received, or the console 810 transmits instructions to the I/O interface 815, causing the I/O interface 815 to generate haptic feedback when the console 810 performs an action. The I/O interface 815 may monitor one or more input responses from a user for determining a perceived source direction and/or a perceived source location of audio content.

The console 810 provides content to the headset 805 for processing in accordance with information received from one or more of the headset 805 and the I/O interface 815. In the example shown in fig. 8, console 810 includes application storage 850, tracking module 855, and engine 845. Some embodiments of console 810 have different modules or components than those described in connection with fig. 8. Similarly, the functionality described further below may be distributed among components of console 810 in a manner different than that described in conjunction with FIG. 8.

The application storage 850 stores one or more applications executed by the console 810. An application is a set of instructions that, when executed by a processor, generate content for presentation to a user. The application generated content may be responsive to input received from the user via movement of the headset 805 or the I/O interface 815. Examples of applications include: a gaming application, a conferencing application, a video playback application, or other suitable application. In some embodiments, the console 810 may function as an application server (e.g., application server 130), and the applications may include applications for maintaining a shared communication channel between groups of users (e.g., users of different HMDs 805).

The tracking module 855 calibrates the system environment 800 using the one or more calibration parameters and may adjust the one or more calibration parameters to reduce errors in the determination of the position of the headset 805 or the I/O interface 815. The calibration performed by the tracking module 855 also takes into account information received from one or more sensor modules (e.g., location sensors) of the sensor system 825 in the headset 805 or one or more sensors included in the I/O interface 815. Further, if tracking of the headset 805 is lost, the tracking module 855 may recalibrate some or all of the system environment 800.

The tracking module 855 uses information from one sensor (e.g., the position sensor 240, the camera component 235, or some combination thereof) to track the movement of the headset 805 or the I/O interface 815. For example, the tracking module 855 determines the location of a reference point of the headset 805 in the map of the local area based on information from the headset 805. The tracking module 855 may also determine the location of a reference point of the headset 805 or a reference point of the I/O interface 815 using data from the location of the headset 805 or using data indicative of the location of the I/O interface 815 from one or more sensors included in the I/O interface 815, respectively. Further, in some embodiments, the tracking module 855 may use portions of the data indicative of the location of the headset 805 to predict a future location of the headset 805. The tracking module 855 provides the estimated or predicted future location of the headset 805 or the I/O interface 815 to the engine 845. In some embodiments, tracking module 855 may provide tracking information to audio system 820 for use in determining how to spatialize and/or enhance a received audio signal.

Engine 845 also executes applications within system environment 800 and receives position information, acceleration information, velocity information, predicted future positions, or some combination thereof, of headset 805 from tracking module 855. Based on the received information, the engine 845 determines the content provided to the headset 805 for presentation to the user. For example, if the received information indicates that the user is looking to the left, the engine 845 generates content for the headset 805 that reflects the user's movement in the virtual environment or in an environment that augments the local area with additional content. Further, engine 845 performs actions within applications executing on console 810 in response to action requests received from I/O interface 815 and provides feedback to the user that the actions were performed. The feedback provided may be visual or auditory feedback via the headset 805 or tactile feedback via the I/O interface 815.

Additional configuration information

The foregoing description of the embodiments of the present disclosure has been presented for the purposes of illustration; it is not intended to be exhaustive or to limit the disclosure to the precise form disclosed. One skilled in the relevant art will appreciate that many modifications and variations are possible in light of the above disclosure.

Some portions of this specification describe embodiments of the disclosure in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to convey the substance of their work effectively to others skilled in the art. These operations, while described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like. Moreover, it has also proven convenient at times, to refer to these arrangements of operations as modules, without loss of generality. The described operations and their associated modules may be implemented in software, firmware, hardware, or any combination thereof.

Any of the steps, operations, or processes described herein may be performed or implemented with one or more hardware or software modules, alone or in combination with other devices. In one embodiment, the software modules are implemented by a computer program product comprising a computer readable medium containing computer program code executable by a computer processor for performing any or all of the steps, operations, or processes described.

Embodiments of the present disclosure may also relate to apparatuses for performing the operations herein. This apparatus may be specially constructed for the required purposes, and/or it may comprise a general purpose computing device selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a non-transitory, tangible computer readable storage medium or any type of medium suitable for storing electronic instructions, which may be coupled to a computer system bus. Moreover, any computing system mentioned in this specification may include a single processor or may be an architecture that employs a multi-processor design to increase computing power.

Embodiments of the present disclosure may also relate to products produced by the computing processes described herein. Such an article of manufacture may comprise information resulting from a computing process, where the information is stored on a non-transitory, tangible computer readable storage medium and may comprise any embodiment of a computer program product or other combination of data described herein.

Finally, the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Therefore, it is intended that the scope of the disclosure be limited not by this detailed description, but rather by any claims that issue based on the application herein. Accordingly, the disclosure of the embodiments is intended to be illustrative, but not limiting, of the scope of the disclosure, which is set forth in the following claims.

38页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:以内容为中心的动态自组织联网

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!