Determining acoustic parameters of a headset using a mapping server

文档序号：197556 发布日期：2021-11-02 浏览：62次中文

阅读说明：本技术 使用映射服务器确定头戴装置的声学参数 (Determining acoustic parameters of a headset using a mapping server ) 是由菲利普·罗宾逊卡尔·席斯勒彼得·亨利·马雷什安德鲁·洛维特塞瓦斯蒂亚·维琴察·阿门瓜于 2020-03-17 设计创作，主要内容包括：本文给出了头戴装置的声学参数集合的确定。该声学参数集合可以基于存储在映射服务器中的物理定位的虚拟模型来确定。虚拟模型描述了多个空间和这些空间的声学属性,其中虚拟模型中的定位对应于头戴装置的物理定位。基于从头戴装置接收的描述局部区域的至少一部分的信息来确定头戴装置在虚拟模型中的定位。与头戴装置的物理定位相关联的声学参数集合部分地基于虚拟模型中的确定定位和与确定定位相关联的任何声学参数来确定。头戴装置使用从映射服务器接收的声学参数集合呈现音频内容。(Determination of a set of acoustic parameters of a headset is presented herein. The set of acoustic parameters may be determined based on a virtual model of the physical location stored in the mapping server. The virtual model describes a plurality of spaces and acoustic properties of the spaces, wherein the positioning in the virtual model corresponds to the physical positioning of the headset. The position of the headset in the virtual model is determined based on information describing at least a portion of the local region received from the headset. A set of acoustic parameters associated with a physical location of the headset is determined based in part on the determined location in the virtual model and any acoustic parameters associated with the determined location. The headset renders the audio content using the set of acoustic parameters received from the mapping server.)

1. A method, comprising:

determining, based on information describing at least a portion of a local area, a position of a head mounted device within the local area in a virtual model describing a plurality of spaces and acoustic properties of the spaces, wherein the position in the virtual model corresponds to a physical position of the head mounted device within the local area; and

determining a set of acoustic parameters associated with a physical location of the headset based in part on a determined location in the virtual model and any acoustic parameters associated with the determined location,

wherein the headset renders audio content using the set of acoustic parameters.

2. The method of claim 1, and any one or more of the following holds true:

a) the method further comprises the following steps:

receiving information describing at least a portion of the local area from the headset, the information including visual information about at least a portion of the local area; or

b) Wherein the plurality of spaces comprise: conference rooms, bathrooms, corridors, offices, bedrooms, restaurants, and living rooms; or

c) Wherein the audio content is presented as if it originated from an object within the local region; or

d) Wherein the set of acoustic parameters includes at least one of:

for each of a plurality of frequency bands, a reverberation time from a sound source to the headset,

for the level of reverberation for each frequency band,

for the direct to reverberant ratio of each frequency band,

for each frequency band, a direction of direct sound from the sound source to the headset,

the amplitude of the direct sound of each frequency band,

the time of early reflections of sound from the sound source to the headset,

the magnitude of the early reflections of each frequency band,

the direction of the early reflections is such that,

room mode frequency, and

room mode positioning.

3. The method of claim 1 or 2, further comprising:

receiving an audio stream from the headset;

determining at least one acoustic parameter based on the received audio stream; and

storing the at least one acoustic parameter to a storage location in the virtual model associated with a physical space in which the headset is located.

4. The method of claim 3, wherein the audio stream is provided from the headset in response to determining at the headset that a change in acoustic conditions of the local region over time is above a threshold change.

5. The method of any preceding claim, and any one or more of the following holds true:

a) the method further comprises the following steps:

receiving an audio stream from the headset; and

updating the set of acoustic parameters based on the received audio stream,

wherein the audio content presented by the headset is adjusted based in part on the updated set of acoustic parameters; or

b) The method further comprises the following steps:

obtaining one or more acoustic parameters;

comparing the one or more acoustic parameters to the set of acoustic parameters; and

updating the virtual model by replacing at least one acoustic parameter of the set with the one or more acoustic parameters based on the comparison; or

c) The method further comprises the following steps:

in response to a change in acoustic conditions of the local region being above a threshold change, transmitting the set of acoustic parameters to the headset to extrapolate to an adjusted set of acoustic parameters.

6. An apparatus, comprising:

a mapping module configured to determine, based on information describing at least a portion of a local region, a position of a head-mounted device within the local region in a virtual model, the virtual model describing a plurality of spaces and acoustic properties of the spaces, wherein the position in the virtual model corresponds to a physical position of the head-mounted device within the local region; and

an acoustic module configured to determine a set of acoustic parameters associated with a physical location of the headset based in part on a determined location in the virtual model and any acoustic parameters associated with the determined location,

wherein the headset renders audio content using the set of acoustic parameters.

7. The apparatus of claim 6, further comprising:

a communication module configured to receive information describing at least a portion of the local area from the headset, the information including visual information about at least a portion of the local area captured via one or more camera components of the headset.

8. The apparatus of claim 6 or 7, wherein the audio content is presented as if originating from a virtual object within the local area.

9. The apparatus of any one or more of claims 6 to 8, wherein the set of acoustic parameters comprises at least one of:

for each of a plurality of frequency bands, a reverberation time from a sound source to the headset,

for the level of reverberation for each frequency band,

for the direct to reverberant ratio of each frequency band,

for each frequency band, a direction of direct sound from the sound source to the headset,

for the amplitude of the direct sound of each frequency band,

the time of early reflections of sound from the sound source to the headset,

the magnitude of the early reflections of each frequency band,

the direction of the early reflections is such that,

room mode frequency, and

room mode positioning.

10. The apparatus of any one or more of claims 6 to 9, and any one or more of the following holds true;

a) the device further comprises:

a communication module configured to receive an audio stream from the headset, wherein

The acoustic module is further configured to determine at least one acoustic parameter based on the received audio stream, and the apparatus further comprises

A non-transitory computer-readable medium configured to store the at least one acoustic parameter to a storage location in the virtual model associated with a physical space in which the headset is located; or

b) Wherein the acoustic module is further configured to:

obtaining one or more acoustic parameters; and

comparing the one or more acoustic parameters to the set of acoustic parameters, and the apparatus further comprises

A non-transitory computer-readable storage medium configured to update the virtual model by replacing at least one acoustic parameter of the set with the one or more acoustic parameters based on the comparison.

11. The apparatus of any one or more of claims 6 to 10, further comprising:

a communication module configured to transmit the set of acoustic parameters to the headset to extrapolate to an adjusted set of acoustic parameters in response to a change in acoustic conditions of the local region being above a threshold change.

12. A non-transitory computer readable storage medium having instructions encoded thereon, which when executed by a processor, cause the processor to:

wherein the headset renders audio content using the set of acoustic parameters.

13. The computer-readable medium of claim 12, wherein the instructions further cause the processor to:

receiving an audio stream from the headset;

determining at least one acoustic parameter based on the received audio stream; and

storing the at least one acoustic parameter to a storage location in the virtual model associated with a physical space in which the headset is located, the virtual model being stored in the non-transitory computer-readable storage medium.

14. The computer-readable medium of claim 12 or 13, wherein the instructions further cause the processor to:

obtaining one or more acoustic parameters;

comparing the one or more acoustic parameters to the set of acoustic parameters; and

updating the virtual model by replacing at least one acoustic parameter of the set with the one or more acoustic parameters based on the comparison.

Background

The present disclosure relates generally to audio rendering at a headset (headset), and in particular to using a mapping server to determine acoustic parameters of the headset.

The sound perceived at the ears of two users may differ depending on the direction and location (location) of the sound source relative to each user and the surrounding environment of the room in which the sound is perceived. Humans can determine the location of a sound source by comparing the sound perceived by each set of ears. In an artificial reality environment, simulating the propagation of sound from an object to a listener may use knowledge about room acoustic parameters, such as reverberation time (reverberation time) or the direction of incidence of the strongest early reflections. One technique for determining room acoustic parameters includes placing a microphone at a desired source location, playing a controlled test signal, and deconvolving the test signal with a signal recorded at a listener location. However, this technique typically requires measurement laboratory or field-specific equipment.

In order to seamlessly place a virtual sound source in an environment, a sound signal arriving at each ear is determined based on a sound propagation path from the source through the environment to a listener (recipient). Various sound propagation paths may be represented based on a set of frequency-dependent acoustic parameters used at the headset for presenting audio content to a recipient (user of the headset). For a particular acoustic configuration of a local environment (room) with unique acoustic properties, a set of frequency-dependent acoustic parameters is typically unique. However, it is impractical to store and update various sets of acoustic parameters at the headset for all possible acoustic configurations of the local environment. The various sound propagation paths between the source and the receiver in the room represent the room impulse response, depending on the specific positioning of the source and receiver. However, storing measured or simulated room impulse responses is memory intensive for a dense network of all possible source and receiver locations in space, or even a relatively small subset of the most common arrangements. Thus, as the required accuracy increases, determining the room impulse response in real time is computationally intensive.

SUMMARY

Embodiments of the present disclosure support methods, computer-readable media, and apparatuses for determining a set of acoustic parameters to render audio content at a headset. In some embodiments, the set of acoustic parameters is determined based on a virtual model of physical positioning stored at a mapping server connected with the headset via a network. The virtual model describes a plurality of spaces and acoustic properties of the spaces, wherein the positioning in the virtual model corresponds to the physical positioning of the headset. The mapping server determines a position of the headset in the virtual model based on information describing at least a portion of the local area received from the headset. The mapping server determines a set of acoustic parameters associated with the physical location of the headset based in part on the determined location in the virtual model and any acoustic parameters associated with the determined location. The headset presents the audio content to the listener using the set of acoustic parameters received from the mapping server.

Embodiments in accordance with the invention are disclosed in particular in the accompanying claims directed to methods, apparatus and storage media, wherein any feature mentioned in one claim category (e.g., method) may also be claimed in another claim category (e.g., apparatus, storage media, systems and computer program products). The dependencies or back-references in the appended claims are chosen for formal reasons only. However, any subject matter resulting from an intentional back-reference (especially multiple references) to any preceding claim may also be claimed, such that any combination of a claim and its features is disclosed and may be claimed, irrespective of the dependencies chosen in the appended claims. The subject matter which can be claimed comprises not only the combination of features as set forth in the appended claims, but also any other combination of features in the claims, wherein each feature mentioned in the claims can be combined with any other feature or combination of other features in the claims. Furthermore, any of the embodiments and features described or depicted herein may be claimed in a separate claim and/or in any combination with any of the embodiments or features described or depicted herein or in any combination with any of the features of the appended claims.

In one embodiment, a method may comprise:

determining, based on information describing at least a portion of the local area, a location of the head mounted device within the local area in a virtual model describing a plurality of spaces and acoustic properties of the spaces, wherein the location in the virtual model corresponds to a physical location of the head mounted device within the local area; and

determining a set of acoustic parameters associated with a physical location of the headset based in part on the determined location in the virtual model and any acoustic parameters associated with the determined location,

wherein the headset renders the audio content using the set of acoustic parameters.

In one embodiment, a method may comprise:

information describing at least a portion of the local region is received from the headset, the information including visual information about at least a portion of the local region.

The plurality of spaces may include: conference rooms, bathrooms, corridors, offices, bedrooms, restaurants, and living rooms.

The audio content may appear as if it originated from an object within the local area.

The set of acoustic parameters may include at least one of:

for each of a plurality of frequency bands, the reverberation time from the sound source to the head-mounted device,

for the level of reverberation for each frequency band,

for the direct to reverbent ratio of each band,

for each frequency band, the direction of the direct sound from the sound source to the headset,

the amplitude of the direct sound of each frequency band,

the early reflection time of sound from the sound source to the head-mounted device,

the magnitude of the early reflections of each frequency band,

the direction of the early reflections is such that,

room mode frequency, and

room mode positioning.

In one embodiment, a method may comprise:

receiving an audio stream from a head-mounted device;

determining at least one acoustic parameter based on the received audio stream; and

storing at least one acoustic parameter into a stored location in a virtual model associated with a physical space in which the headset is located.

The audio stream may be provided from the headset in response to determining at the headset that a change in acoustic conditions of the local region over time is above a threshold change.

In one embodiment, a method may comprise:

receiving an audio stream from a head-mounted device; and

updating a set of acoustic parameters based on the received audio stream,

wherein the audio content presented by the headset is adjusted based in part on the updated set of acoustic parameters.

In one embodiment, a method may comprise:

obtaining one or more acoustic parameters;

comparing one or more acoustic parameters to the set of acoustic parameters; and

based on the comparison, the virtual model is updated by replacing at least one acoustic parameter in the set with one or more acoustic parameters.

In one embodiment, a method may comprise:

in response to a change in the acoustic conditions of the local region being above a threshold change, transmitting a set of acoustic parameters to the headset to extrapolate to the adjusted set of acoustic parameters.

In one embodiment, an apparatus may comprise:

a mapping module configured to determine, based on information describing at least a portion of the local region, a location of the head mounted device within the local region in a virtual model, the virtual model describing a plurality of spaces and acoustic properties of the spaces, wherein the location in the virtual model corresponds to a physical location of the head mounted device within the local region; and

an acoustic module configured to determine a set of acoustic parameters associated with a physical location of the headset based in part on the determined location in the virtual model and any acoustic parameters associated with the determined location,

wherein the headset renders the audio content using the set of acoustic parameters.

In one embodiment, an apparatus may comprise:

The audio content may appear to originate from a virtual object within the local area.

The set of acoustic parameters may include at least one of:

for each of a plurality of frequency bands, the reverberation time from the sound source to the head-mounted device,

for the level of reverberation for each frequency band,

for the direct to reverberant ratio of each frequency band,

for each frequency band, the direction of the direct sound from the sound source to the head-mounted device,

the amplitude of the direct sound of each frequency band,

the early reflection time of sound from the sound source to the head-mounted device,

the magnitude of the early reflections of each frequency band,

the direction of the early reflections is such that,

room mode frequency, and

room mode positioning.

In one embodiment, an apparatus may comprise:

a communication module configured to receive an audio stream from a head-mounted device, wherein

The acoustic module is further configured to determine at least one acoustic parameter based on the received audio stream, and the apparatus further comprises a non-transitory computer-readable medium configured to store the at least one acoustic parameter into a storage location in a virtual model associated with a physical space in which the head-mounted apparatus is located.

The acoustic module may be configured to:

obtaining one or more acoustic parameters; and

comparing one or more acoustic parameters to the set of acoustic parameters, and the apparatus further comprises

A non-transitory computer-readable storage medium configured to update the virtual model by replacing at least one acoustic parameter in the set with one or more acoustic parameters based on the comparison.

In one embodiment, an apparatus may comprise:

a communication module configured to transmit the set of acoustic parameters to the headset to extrapolate to the adjusted set of acoustic parameters in response to a change in acoustic conditions of the local region being above a threshold change.

In one embodiment, a non-transitory computer-readable storage medium may have instructions encoded thereon that, when executed by a processor, cause the processor to perform a method according to any embodiment herein, or perform the following:

wherein the headset renders the audio content using the set of acoustic parameters.

The instructions may cause the processor to:

receiving an audio stream from a head-mounted device;

determining at least one acoustic parameter based on the received audio stream; and

a storage location to store at least one acoustic parameter into a virtual model associated with a physical space in which the headset is located, the virtual model stored in a non-transitory computer-readable storage medium.

The instructions may cause the processor to:

obtaining one or more acoustic parameters;

comparing one or more acoustic parameters to the set of acoustic parameters; and

based on the comparison, the virtual model is updated by replacing at least one acoustic parameter in the set with one or more acoustic parameters.

In an embodiment, one or more computer-readable non-transitory storage media may embody software that is operable when executed to perform a method according to or within any of the embodiments described above.

In an embodiment, a system may include: one or more processors; and at least one memory coupled to the processor and comprising instructions executable by the processor, the processor being operable when executing the instructions to perform a method according to or within any of the embodiments described above.

In an embodiment, a computer program product, preferably comprising a computer-readable non-transitory storage medium, which when executed on a data processing system, is operable to perform a method according to or within any of the embodiments described above.

Brief Description of Drawings

Fig. 1 is a block diagram of a system environment of a headset in accordance with one or more embodiments.

Fig. 2 illustrates the effect of surfaces in a room on sound propagation between a sound source and a user of a head mounted device, in accordance with one or more embodiments.

Fig. 3A is a block diagram of a mapping server in accordance with one or more embodiments.

Fig. 3B is a block diagram of an audio system of a headset according to one or more embodiments.

Fig. 3C is an example of a virtual model describing a physical space and acoustic properties of the physical space in accordance with one or more embodiments.

Fig. 4 is a perspective view of a headset including an audio system in accordance with one or more embodiments.

Fig. 5A is a flow diagram illustrating a process for determining acoustic parameters of a physical location of a headset in accordance with one or more embodiments.

Fig. 5B is a flow diagram illustrating a process for obtaining acoustic parameters from a mapping server in accordance with one or more embodiments.

Fig. 5C is a flow diagram illustrating a process for reconstructing a room impulse response at a headset in accordance with one or more embodiments.

Fig. 6 is a block diagram of a system environment including a headset and a mapping server in accordance with one or more embodiments.

The figures depict embodiments of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles or advantages of the disclosure described herein.

Detailed Description

Embodiments of the present disclosure may include or be implemented in conjunction with an artificial reality system. Artificial reality is a form of reality that has been adjusted in some way before being presented to the user, and may include, for example, Virtual Reality (VR), Augmented Reality (AR), Mixed Reality (MR), hybrid reality (hybrid reality), or some combination and/or derivative thereof. The artificial reality content may include fully generated content or content generated in combination with captured (e.g., real world) content. The artificial reality content may include video, audio, haptic feedback, or some combination thereof, and any of them may be presented in a single channel or multiple channels (e.g., stereoscopic video that produces a three-dimensional effect to a viewer). Further, in some embodiments, the artificial reality may also be associated with an application, product, accessory, service, or some combination thereof, that is used, for example, to create content in the artificial reality and/or otherwise used in the artificial reality (e.g., to perform an activity in the artificial reality). An artificial reality system that provides artificial reality content may be implemented on a variety of platforms, including a headset, a Head Mounted Display (HMD) connected to a host computer system, a stand-alone HMD, a near-eye display (NED), a mobile device or computing system, or any other hardware platform capable of providing artificial reality content to one or more viewers.

A communication system for acoustic matching of a room is presented. The communication system includes a headset having an audio system communicatively coupled with a mapping server. The audio system is implemented on a headset that may include speakers, an acoustic sensor array, a plurality of imaging sensors (cameras), and an audio controller. The imaging sensor determines visual information (e.g., depth information, color information, etc.) related to at least a portion of the local region. The head-mounted device transmits the visual information to a mapping server (e.g., over a network). The mapping server maintains a virtual model of the world, which includes acoustic properties of the space in the real world. The mapping server uses visual information from the head-mounted device (e.g., an image of at least a portion of the local area) to determine a location in the virtual model that corresponds to the physical location of the head-mounted device. The mapping server determines a set of acoustic parameters (e.g., reverberation time, reverberation level, etc.) associated with determining a location and provides the acoustic parameters to the headset. The set of acoustic parameters is used by the headset (e.g., by an audio controller) to present audio content to a user of the headset. An array of acoustic sensors mounted on the headset monitors the sound of a localized area. In response to determining that a room configuration has changed (e.g., a change in human occupancy level, a window opening after closing, a window curtain opening after closing, etc.), the headset may selectively provide some or all of the monitored sounds as an audio stream to the mapping server. The mapping server may update the virtual model by recalculating the acoustic parameters based on the audio stream received from the head-mounted device.

In some embodiments, the headset obtains information about a set of acoustic parameters that parameterize the impulse response of the local area in which the headset is located. The headset may obtain a set of acoustic parameters from a mapping server. Alternatively, the set of acoustic parameters is stored in the headset. The headset may reconstruct an impulse response for a particular spatial arrangement of the headset and the sound source (e.g., virtual object) by extrapolating the set of acoustic parameters. The reconstructed impulse response may be represented by an adjusted set of acoustic parameters, wherein one or more acoustic parameters from the adjusted set are obtained by dynamically adjusting one or more corresponding acoustic parameters from the original set. The headset renders the audio content (e.g., by an audio controller) using the reconstructed impulse response (i.e., the adjusted set of acoustic parameters).

The headset may be, for example, a NED, HMD, or some other type of headset. The headset may be part of an artificial reality system. The headset also includes a display and an optical assembly. The display of the head-mounted device is configured to emit image light. The optical components of the headset are configured to direct image light to a viewing window (eye box) of the headset positioned corresponding to the wearer's eye. In some embodiments, the image light may include depth information for a local area around the headset.

Fig. 1 is a block diagram of a system 100 for a headset 110 in accordance with one or more embodiments. The system 100 includes a head-mounted device 110 that can be worn by a user 106 in the room 102. The headset 110 is connected to a mapping server 130 through a network 120.

The network 120 connects the headset 110 to the mapping server 130. Network 120 may include any combination of local area networks and/or wide area networks using wireless and/or wired communication systems. For example, the network 120 may include the internet as well as a mobile telephone network. In one embodiment, network 120 uses standard communication technologies and/or protocols. Thus, network 120 may include links using technologies such as Ethernet, 802.11, Worldwide Interoperability for Microwave Access (WiMAX), 2G/3G/4G mobile communication protocols, Digital Subscriber Line (DSL), Asynchronous Transfer Mode (ATM), InfiniBand, PCI Express advanced switching, and so forth. Similarly, network protocols used on network 120 may include multiprotocol label switching (MPLS), transmission control protocol/internet protocol (TCP/IP), User Datagram Protocol (UDP), hypertext transfer protocol (HTTP), Simple Mail Transfer Protocol (SMTP), File Transfer Protocol (FTP), and so forth. Data exchanged over network 120 may be represented using techniques and/or formats including image data in binary form (e.g., Portable Network Graphics (PNG)), hypertext markup language (HTML), extensible markup language (XML), and so forth. Additionally, all or a portion of the link may be encrypted using conventional encryption techniques, such as Secure Sockets Layer (SSL), Transport Layer Security (TLS), Virtual Private Network (VPN), internet protocol security (IPsec), and so forth. The network 120 may also connect multiple headsets located in the same or different rooms to the same mapping server 130.

The headset 110 presents media to the user. In one embodiment, the headset 110 may be a NED. In another embodiment, the head mounted device 110 may be an HMD. In general, the headset 110 may be worn on the face of a user such that content (e.g., media content) is presented using one or both lenses of the headset. However, the headset 110 may also be used so that media content is presented to the user in a different manner. Examples of media content presented by the headset 110 include one or more images, video, audio, or some combination thereof.

The head mounted device 110 may determine visual information describing at least a portion of the room 102 and provide the visual information to the mapping server 130. For example, the headset 110 may include at least one depth camera component (DCA) that generates depth image data for at least a portion of the room 102. The headset 110 may further include at least one passive camera component (PCA) that generates color image data for at least a portion of the room 102. In some embodiments, the DCA and PCA of the headset 110 are part of simultaneous localization and mapping (SLAM) sensors installed on the headset 110 for determining visual information of the room 102. Thus, depth image data captured by at least one DCA and/or color image data captured by at least one PCA may be referred to as visual information determined by the SLAM sensor of the headset 110.

The head-mounted device 110 may transmit visual information to the mapping server 130 via the network 120 for determining the set of acoustic parameters of the room 102. In another embodiment, in addition to the visual information used to determine the set of acoustic parameters, the headset 110 also provides its location information (e.g., Global Positioning System (GPS) location of the room 102) to the mapping server 130. Alternatively, the headset 110 provides only the positioning information to the mapping server 130 for determining the set of acoustic parameters. The set of acoustic parameters may be used to represent various acoustic properties of a particular configuration in the room 102 that together define the acoustic conditions in the room 102. The configuration in the room 102 is thus associated with the unique acoustic conditions in the room 102. The configuration in the room 102 and associated acoustic conditions may change based on, for example, at least one of a change in the positioning of the head-mounted device 110 in the room 102, a change in the positioning of a sound source in the room 102, a change in the human occupancy level in the room 102, a change in one or more acoustic materials of the surface in the room 102, by opening/closing windows in the room 102, by opening/closing curtains, by opening/closing doors in the room 102, and the like.

The set of acoustic parameters may include some or all of the following: for each of the plurality of frequency bands, a reverberation time from the sound source to the head mounted device 110, a reverberation level of each frequency band, a direct reverberation ratio of each frequency band, a direction of the direct sound from the sound source to the head mounted device 110 for each frequency band, a magnitude of the direct sound for each frequency band, an early reflection time of the sound from the sound source to the head mounted device, a magnitude of the early reflection for each frequency band, an early reflection direction, a room mode frequency, a room mode localization, and the like. In some embodiments, the frequency dependence of some of the aforementioned acoustic parameters may be clustered into four frequency bands. In some other embodiments, some acoustic parameters may be clustered in more or less than four frequency bands. The headset 110 presents audio content to the user 106 using the set of acoustic parameters obtained from the mapping server 130. The audio content is rendered as if it originated from an object (i.e., a real object or a virtual object) within the room 102.

The head-mounted device 110 may further include an acoustic sensor array for monitoring sound in the room 102. The headset 110 may generate an audio stream based on the monitored sounds. In response to determining that the configuration in the room 102 has changed resulting in the acoustic conditions in the room 102 having changed, the headset 110 may selectively provide the audio stream (e.g., via the network 120) to the mapping server 130 for updating one or more acoustic parameters for the room 102 at the mapping server 130. The headset 110 presents audio content to the user 106 using the updated set of acoustic parameters obtained from the mapping server 130.

In some embodiments, the headset 110 obtains the set of acoustic parameters of the impulse response of the parameterized room 102 from the mapping server 130 or from a non-transitory computer readable storage device (i.e., memory) at the headset 110. The headset 110 may selectively extrapolate the set of acoustic parameters to an adjusted set of acoustic parameters representing a reconstructed room impulse response for a particular configuration of the room 102 that is different from a configuration associated with the obtained set of acoustic parameters. The headset 110 uses the reconstructed room impulse response to present audio content to a user of the headset 110. Further, the headset 110 may include a position sensor or Inertial Measurement Unit (IMU) that tracks the position (e.g., position and pose) of the headset 110 within the room. Additional details regarding the operation and components of the headgear 110 are discussed below in conjunction with fig. 3B, 4, 5B-5C, and 6.

Mapping server 130 facilitates the creation of audio content for headset 110. The mapping server 130 includes a database storing virtual models describing a plurality of spaces and acoustic properties of the spaces, wherein one of the virtual models is located corresponding to a current configuration of the room 102. The mapping server 130 receives visual information describing at least a portion of the room 102 and/or positioning information for the room 102 from the head-mounted device 110 via the network 120. The mapping server 130 determines a location in the virtual model associated with the current configuration of the room 102 based on the received visual information and/or location information. The mapping server 130 determines (e.g., retrieves) a set of acoustic parameters associated with the current configuration of the room 102 based in part on the determined location in the virtual model and any acoustic parameters associated with the determined location. The mapping server 130 may provide information regarding the set of acoustic parameters to the headset 110 (e.g., via the network 120) for generating audio content at the headset 110. Alternatively, the mapping server 130 may generate an audio signal using the set of acoustic parameters and provide the audio signal to the headset 110 for rendering. In some embodiments, some components of mapping server 130 may be integrated with another device (e.g., a console) connected to headset 110 via a wired connection (not shown in fig. 1). Additional details regarding the operation and components of mapping server 130 are discussed below in conjunction with fig. 3A, 3C, and 5A.

Fig. 2 illustrates the effect of surfaces in a room 200 on sound propagation between a sound source and a user of a head-mounted device, in accordance with one or more embodiments. The set of acoustic parameters (e.g., a parameterized room impulse response) represents how sound is transformed as it propagates in the room 200 from the sound source to the user (recipient), and may include the effects of direct sound paths and reflected sound paths traversed by the sound. For example, the user 106 wearing the headset 110 is positioned in the room 200. The room 200 includes walls, such as walls 202 and 204, that provide a surface for reflecting sound 208 from an object 206 (e.g., a virtual sound source). When the subject 206 emits the sound 208, the sound 208 travels through multiple paths to the headset 110. Some of the sound 208 travels along a direct sound path 210 without reflection to an ear (e.g., the right ear) of the user 106. The direct sound path 210 may result in attenuation, filtering, and time delay of sound caused by the propagation medium (e.g., air) for the distance between the object 206 and the user 106.

Other portions of the sound 208 are reflected before reaching the user 106 and represent reflected sound. For example, another portion of the sound 208 travels along the reflected sound path 212, where the sound is reflected by the wall 202 to the user 106. The reflected sound path 212 may result in an attenuation, filtering, and time delay of the sound 208 caused by the propagation medium of the distance between the object 206 and the wall 202, another attenuation or filtering caused by the reflection off the wall 202, and another attenuation, filtering, and time delay caused by the propagation medium of the distance between the wall 202 and the user 106. The amount of attenuation at the wall 202 depends on the acoustic absorption of the wall 202, which may vary based on the material of the wall 202. In another example, another portion of the sound 208 travels along the reflected sound path 214, where the sound 208 is reflected by an object 216 (e.g., a table) toward the user 106.

The various sound propagation paths 210, 212, 214 within the room 200 represent a room impulse response that depends on the particular positioning of the sound source (i.e., the object 206) and the recipient (e.g., the headset 106). The room impulse response contains various information about the room including low frequency modes, diffraction paths, transmission through walls, acoustic material properties of the surface. The room impulse response may be parameterized using a set of acoustic parameters. Although the reflected sound paths 212 and 214 are examples of first order reflections caused by reflections at a single surface, this set of acoustic parameters (e.g., room impulse response) may combine the effects of higher order reflections from multiple surfaces or objects. By transforming the audio signal of the object 206 using the set of acoustic parameters, the headset 110 generates audio content for the user 106 that simulates the propagation of the audio signal as sound through the room 200 along the direct sound path 210 and the reflected sound paths 212, 214.

Note that the propagation path from the object 206 (sound source) to the user 106 (recipient) within the room 200 can generally be divided into three parts: direct sound path 210, early reflections (e.g., carried by reflected sound path 214) corresponding to first order sound reflections from nearby surfaces, and late reverberation (e.g., carried by reflected sound path 212) corresponding to first order sound reflections or higher order sound reflections from more distant surfaces. Each sound path has different perceptual requirements that affect the update rate of the corresponding acoustic parameter. For example, the user 106 may have very little tolerance for latency (latency) in the direct-sound path 210, and thus one or more acoustic parameters associated with the direct-sound path 210 may be updated at the highest rate. However, the user 106 may be more tolerant of latency in early reflections. Late reverberation is least sensitive to changes in head rotation, since in many cases it is isotropic and uniform within the room and therefore does not change with rotation or translation at the ear. It is also computationally very expensive to calculate all perceptually important acoustic parameters related to late reverberation. For this reason, the acoustic parameters associated with early reflections and late reverberation may be efficiently calculated at non-operational times, e.g., at the mapping server 130, which does not have as stringent energy and computational limitations as the headset 110, but does have a significant latency. Details regarding the operation of the mapping server 130 for determining acoustic parameters are discussed below in conjunction with fig. 3A and 5A.

Fig. 3A is a block diagram of mapping server 130 in accordance with one or more embodiments. The mapping server 130 determines a set of acoustic parameters for the physical space (room) in which the headset 110 is located. The determined set of acoustic parameters may be used at the headset 110 to transform audio signals associated with objects (e.g., virtual or real objects) in the room. To add a convincing sound source to an object, the audio signal output from the head-mounted device 110 should sound as if it had propagated from the localization of the object to the listener in the same way as a co-located natural source. The set of acoustic parameters defines the transformation caused by the propagation of sound from objects in the room to the listener (i.e. the position of the head-mounted device within the room), including the propagation along direct paths and various reflected paths from the surface of the room. Mapping server 130 includes a virtual model database 305, a communication module 310, a mapping module 315, and an acoustic analysis module 320. In other embodiments, the mapping server 130 may have any combination of the listed modules and any additional modules. In some other embodiments, mapping server 130 includes one or more modules that incorporate the functionality of the modules shown in FIG. 3A. A processor (not shown in fig. 3A) of mapping server 130 may run some or all of virtual model database 305, communication module 310, mapping module 315, acoustic analysis module 320, one or more other modules, or modules that incorporate the functionality of the modules shown in fig. 3A.

The virtual model database 305 stores virtual models describing a plurality of physical spaces and acoustic properties of the physical spaces. Each location in the virtual model corresponds to a physical location of the headset 110 within a local region having a particular configuration associated with a unique acoustic condition. The unique acoustic condition represents a condition of a local region having a unique set of acoustic properties represented by a unique set of acoustic parameters. The particular location in the virtual model may correspond to the current physical location of the headset 110 within the room 102. Each position in the virtual model is associated with a set of acoustic parameters of the corresponding physical space, which represents a configuration of the local region. The set of acoustic parameters describes various acoustic properties of a particular configuration of the local region. Physical spaces whose acoustic properties are described in the virtual model include, but are not limited to, conference rooms, bathrooms, hallways, offices, bedrooms, restaurants, and living rooms. Thus, the room 102 of fig. 1 may be a conference room, bathroom, hallway, office, bedroom, restaurant, or living room. In some embodiments, the physical space may be some exterior space (e.g., a garden, etc.) or a combination of various interior and exterior spaces. More details regarding the virtual model structure are discussed below in conjunction with FIG. 3C.

The communication module 310 is a module that communicates with the head-mounted device 130 via the network 120. The communication module 310 receives visual information describing at least a portion of the room 102 from the head-mounted device 130. In one or more embodiments, the visual information includes image data about at least a portion of the room 102. For example, the communication module 310 receives depth image data captured by the DCA of the headset 110 having information about the shape of the room 102 defined by surfaces of the room 102 (e.g., surfaces of walls, floor, and ceiling of the room 102). The communication module 310 may also receive color image data captured by the PCA of the headset 110. The mapping server 130 may use the color image data to associate different acoustic materials with the surfaces of the room 102. The communication module 310 may provide the visual information (e.g., depth image data and color image data) received from the head-mounted device 130 to the mapping module 315.

The mapping module 315 maps the visual information received from the head mounted device 110 to the location of the virtual model. The mapping module 315 determines the location of the virtual model corresponding to the current physical space in which the headset 110 is located (i.e., the current configuration of the room 102). The mapping module 315 searches the virtual model for a mapping between (i) visual information, including at least information about the surface geometry of the physical space and information about the surface acoustic material, for example, and (ii) a corresponding configuration of the physical space within the virtual model. The mapping is performed by matching the geometry and/or acoustic material information of the received visual information with geometry and/or acoustic material information stored as part of the configuration of the physical space within the virtual model. The corresponding configuration of the physical space within the virtual model corresponds to the model of the physical space in which the headset 110 is currently located. If no match is found, this indicates that the current configuration of the physical space has not been modeled in the virtual model. In this case, the mapping module 315 may notify the acoustic analysis module 320 that no match was found, and the acoustic analysis module 320 determines a set of acoustic parameters based at least in part on the received visual information.

The acoustic analysis module 320 determines a set of acoustic parameters associated with the physical location of the headset 110 based in part on the determined location in the virtual model obtained from the mapping module 315 and any acoustic parameters in the virtual model associated with the determined location. In some embodiments, the acoustic analysis module 320 retrieves the set of acoustic parameters from the virtual model because the set of acoustic parameters stores a determined location in the virtual model that is associated with a particular spatial configuration. In some other embodiments, the acoustic analysis module 320 determines the set of acoustic parameters by adjusting a previously determined set of acoustic parameters for a particular spatial configuration in the virtual model based at least in part on visual information received from the headset 110. For example, the acoustic analysis module 320 may run an offline acoustic simulation using the received visual information to determine a set of acoustic parameters.

In some embodiments, the acoustic analysis module 320 determines that the previously generated acoustic parameters are inconsistent with the acoustic conditions of the current physical positioning of the headset 110, for example, by analyzing ambient sounds captured and obtained from the headset 110. The detected mismatch may trigger the regeneration of a new set of acoustic parameters at the mapping server 130. Once recalculated, this new set of acoustic parameters may be input into the virtual model of the mapping server 130, as a replacement for the previous set of acoustic parameters, or as an additional state of the same physical space. In some embodiments, the acoustic analysis module 320 estimates the set of acoustic parameters by analyzing environmental sounds (e.g., speech) received from the headset 110. In some other embodiments, the acoustic analysis module 320 derives the set of acoustic parameters by running an acoustic simulation (e.g., a wave-based acoustic simulation or a ray tracing acoustic simulation) using visual information received from the head-mounted device 110, which may include estimates of room geometry and acoustic material properties. The acoustic analysis module 320 provides the derived set of acoustic parameters to the communication module 310, and the communication module 310 transmits the set of acoustic parameters from the mapping server 130 to the headset 110, e.g., via the network 120.

In some embodiments, as discussed, the communication module 310 receives an audio stream from the headset 110, which may be generated at the headset 110 using the sound in the room 102. The acoustic analysis module 320 may determine one or more acoustic parameters of a particular configuration of the room 102 based on the received audio stream (e.g., by applying a server-based computing algorithm). In some embodiments, the acoustic analysis module 320 estimates one or more acoustic parameters (e.g., reverberation times) from the audio stream based on a statistical model for sound attenuation in the audio stream, e.g., employing a maximum likelihood estimator. In some other embodiments, the acoustic analysis module 320 estimates one or more acoustic parameters based on, for example, time domain information and/or frequency domain information extracted from the received audio stream.

In some embodiments, the one or more acoustic parameters determined by the acoustic analysis module 320 represent a new set of acoustic parameters that are not part of the virtual model because the current configuration of the room 102 and the corresponding acoustic conditions of the room 102 are not modeled by the virtual model. In this case, the virtual model database 305 stores a new set of acoustic parameters at a location within the virtual model associated with the current configuration of the room 102, thereby modeling the current acoustic conditions of the room 102. Some or all of one or more acoustic parameters (e.g., frequency-dependent reverberation time, frequency-dependent direct-to-reverberation ratio, etc.) may be stored in the virtual model along with confidence (weights) and absolute timestamps associated with the acoustic parameters, which may be used to recalculate some acoustic parameters.

In some embodiments, the current configuration of the room 102 has been modeled by a virtual model, and the acoustic analysis module 320 recalculates the set of acoustic parameters based on the received audio stream. Alternatively, one or more acoustic parameters of the recalculated set may be determined at the headset 110 based on, for example, sound in at least a local region monitored at the headset 110 and communicated to the mapping server 130. The virtual model database 305 may update the virtual model by replacing the set of acoustic parameters with the recalculated set of acoustic parameters. In one or more embodiments, the acoustic analysis module 320 compares the recalculated set of acoustic parameters to a previously determined set of acoustic parameters. Based on the comparison, the virtual model is updated using the re-calculated set of acoustic parameters when a difference between any re-calculated acoustic parameter and any previously determined acoustic parameter is above a threshold difference.

In some embodiments, if the past estimate is within the threshold of the recalculated acoustic parameter, the acoustic analysis module 320 combines any recalculated acoustic parameter with the past estimate of the corresponding acoustic parameter for the same local region configuration. Past estimates may be stored in the virtual model database 305 at locations of virtual models associated with respective configurations of local regions. In one or more embodiments, the acoustic analysis module 320 applies a weight to the past estimate (e.g., a weight based on a timestamp associated with the past estimate or a stored weight) if the past estimate is not within the threshold of the recalculated acoustic parameter. In some embodiments, the acoustic analysis module 320 applies a material optimization algorithm to the estimate of the at least one acoustic parameter (e.g., reverberation time) and the geometric information of the physical space in which the headset 110 is located to determine the different acoustic materials that will produce the estimate of the at least one acoustic parameter. Information about the acoustic material as well as the geometric information may be stored at different locations of a virtual model that models different configurations and acoustic conditions of the same physical space.

In some embodiments, the acoustic analysis module 320 may perform acoustic simulations to generate spatially-correlated pre-computed acoustic parameters (e.g., spatially-correlated reverberation time, spatially-correlated direct-to-reverberation ratio, etc.). The spatially dependent pre-computed acoustic parameters may be stored in the virtual model database 305 at the appropriate locations of the virtual models. The acoustic analysis module 320 may use the pre-calculated acoustic parameters to re-calculate the spatially dependent acoustic parameters each time the geometry of the physical space and/or the acoustic material changes. The acoustic analysis module 320 may use various inputs for acoustic simulation, such as, but not limited to: information about room geometry, acoustic material property estimates, and/or information about human occupancy levels (e.g., empty, partially full, full). The acoustic parameters may be simulated for various occupancy levels and various states of the room (e.g., open windows, closed windows, open curtains, closed curtains, etc.). If the state of the room changes, the mapping server 130 may determine an appropriate set of acoustic parameters for presenting the audio content to the user and transmit it to the headset 110. Otherwise, if the appropriate set of acoustic parameters is not available, the mapping server 130 (e.g., via the acoustic analysis module 320) will calculate a new set of acoustic parameters (e.g., via acoustic simulation) and transmit the new set of acoustic parameters to the headset 110.

In some embodiments, the mapping server 130 stores the complete (measured or simulated) room impulse response for a given local area configuration. For example, the configuration of the local region may be based on the particular spatial arrangement of the head mounted device 110 and the sound source. The mapping server 130 may reduce the room impulse response to a defined set of acoustic parameters suitable for network transmission (e.g., the bandwidth of the network 120). The set of acoustic parameters representing a parameterized version of the full impulse response may be stored, for example, as part of the virtual model database 305, or in a separate non-transitory computer-readable storage medium of the mapping server 130 (not shown in fig. 3A).

Fig. 3B is a block diagram of an audio system 330 of the headset 110 in accordance with one or more embodiments. The audio system 330 includes a transducer assembly 335, an acoustic assembly 340, an audio controller 350, and a communication module 355. In one embodiment, the audio system 330 also includes an input interface (not shown in fig. 3B) for controlling, for example, the operation of the various components of the audio system 330. In other embodiments, the audio system 330 may have any combination of the listed components with any additional components.

The transducer assembly 335 generates sound for the user's ear, for example, based on audio instructions from the audio controller 350. In some embodiments, the transducer assembly 335 is implemented as a pair of air-conducting transducers (e.g., one for each ear) that produce sound by generating airborne sound pressure waves in the user's ears, for example, according to audio instructions from the audio controller 350. Each air-conducting transducer of the transducer assembly 335 may include one or more transducers to cover different portions of the frequency range. For example, a piezoelectric transducer may be used to cover a first portion of a frequency range, while a moving coil transducer may be used to cover a second portion of the frequency range. In some other embodiments, each transducer of the transducer assembly 335 is implemented as a bone conduction transducer that produces sound by vibrating a corresponding bone in the user's head. Each transducer, implemented as a bone conduction transducer, may be placed behind an auricle coupled to a portion of a user's bone to vibrate the portion of the user's bone that generates sound pressure waves that propagate towards tissue of the user's cochlea, thereby bypassing the eardrum.

The acoustic assembly 340 may include a plurality of acoustic sensors, e.g., one acoustic sensor per ear. Alternatively, the acoustic assembly 340 includes an array of acoustic sensors (e.g., microphones) mounted on different locations of the headset 110. The acoustic sensor of the acoustic assembly 340 detects the sound pressure wave at the entrance of the ear. One or more acoustic sensors of the acoustic assembly 340 may be located at the entrance of each ear. The one or more acoustic sensors are configured to detect airborne sound pressure waves formed at the entrance of the ear. In one embodiment, the acoustic assembly 340 provides information about the generated sound to the audio controller 350. In another embodiment, the acoustic assembly 340 transmits feedback information of the detected sound pressure waves to the audio controller 350, and the feedback information may be used by the audio controller 350 for calibration of the transducer assembly 335.

In one embodiment, the acoustic assembly 340 includes a microphone located at the entrance of each ear of the wearer. A microphone is a transducer that converts pressure into an electrical signal. The frequency response of the microphone may be relatively flat in some parts of the frequency range and linear in other parts of the frequency range. The microphone may be configured to receive signals from the audio controller 350 to scale the signals detected from the microphone based on audio instructions provided to the transducer assembly 335. For example, the signal may be adjusted based on the audio instructions to avoid clipping (clipping) the detected signal or to improve the signal-to-noise ratio in the detected signal.

In another embodiment, the acoustic assembly 340 includes a vibration sensor. The vibration sensor is coupled to a portion of the ear. In some embodiments, the vibration sensor and transducer assembly 335 is coupled to different portions of the ear. The vibration sensor is similar to the air transducer used in transducer assembly 335 except that the signal flows in reverse. Instead of the electrical signal producing a mechanical vibration in the transducer, the mechanical vibration generates an electrical signal in the vibration sensor. The vibration sensor may be made of a piezoelectric material that can generate an electrical signal when deformed. The piezoelectric material may be a polymer (e.g., PVC, PVDF), a polymer-based composite, a ceramic, or a crystal (e.g., SiO)₂PZT). By applying pressure on the piezoelectric material, the polarization of the piezoelectric material changes and generates an electrical signal. The piezoelectric sensor may be coupled to a material (e.g., silicone) that attaches well to the back of the ear. The vibration sensor may also be an accelerometer. The accelerometer may be piezoelectric or capacitive. In one embodiment, the vibration sensor maintains good surface contact with the back of the wearer's ear and maintains a steady amount of force (e.g., 1 newton) against the ear. The vibration sensor may be integrated in the IMU integrated circuit. The IMU is further described with respect to fig. 6.

The audio controller 350 provides audio instructions to the transducer assembly 335 for generating sound by generating audio content using a set of acoustic parameters (e.g., a room impulse response). The audio controller 350 renders the audio content as if it originated from an object (e.g., a virtual object or a real object) within a local area of the headset 110. In one embodiment, the audio controller 350 renders the audio content as if it originated from a virtual sound source by transforming the source audio signal using a set of acoustic parameters for the current configuration of the local region, which can parameterize the room impulse response of the current configuration of the local region.

The audio controller 350 may obtain information describing at least a portion of the local region, for example, from one or more cameras of the head-mounted device 110. The information may include depth image data, color image data, localization information of local regions, or a combination thereof. The depth image data may include geometric information about the shape of the local area defined by the surface of the local area (e.g., the surface of the walls, floor, and ceiling of the local area). The color image data may include information about the acoustic material associated with the local area surface. The positioning information may include GPS coordinates or some other location information of the local area.

In some embodiments, the audio controller 350 generates an audio stream based on the sound in the local area monitored by the acoustic assembly 340 and provides the audio stream to the communication module 355 for selective transmission to the mapping server 130. In some embodiments, the audio controller 350 runs real-time acoustic ray tracing simulations to determine one or more acoustic parameters (e.g., early reflections, direct sound occlusion, etc.). To be able to run real-time acoustic ray tracing simulations, the audio controller 350 requests and obtains information about the configured geometry and/or acoustic parameters of the local area in which the headset 110 is currently located, e.g., from a virtual model stored at the mapping server 130. In some embodiments, the audio controller 350 uses the sound in the local area monitored by the acoustic component 340 and/or visual information determined at the headset 110, such as visual information determined by one or more SLAM sensors mounted on the headset 110, to determine one or more acoustic parameters related to the current configuration of the local area.

A communication module 355 (e.g., a transceiver) is coupled to the audio controller 350 and may be integrated as part of the audio controller 350. The communication module 355 may communicate information describing at least a portion of the local region to the mapping server 130 for determining the set of acoustic parameters at the mapping server 130. The communication module 355 may selectively transmit the audio stream obtained from the audio controller 350 to the mapping server 130 for updating the visual model of the physical space at the mapping server 130. For example, the communication module 355 transmits the audio stream to the mapping server 130 in response to determining (e.g., by the audio controller 350 based on the monitored sounds) that the change in the acoustic conditions of the local region over time is above a threshold change due to a change in the configuration of the local region, which requires a new or updated set of acoustic parameters. In some embodiments, the audio controller 350 determines that the change in acoustic conditions of the local region is above a threshold change by periodically analyzing the ambient audio stream and, for example, by periodically estimating a reverberation time from the time-varying audio stream. For example, changes in acoustic conditions may be caused by: changing an occupancy level of a person in the room 102 (e.g., empty, partially full, full), opening or closing a window in the room 102, opening or closing a door of the room 102, opening or closing a window curtain on the window, changing a positioning of the head-mounted device 110 in the room 102, changing a positioning of a sound source in the room 102, changing some other feature in the room 102, or a combination thereof. In some embodiments, the communication module 355 communicates the one or more acoustic parameters determined by the audio controller 350 to the mapping server 130 for comparison with a previously determined set of acoustic parameters associated with the current configuration of the local region to possibly update the virtual model at the mapping server 130.

In one embodiment, the communication module 355 receives a set of acoustic parameters from the mapping server 130 for the current configuration of the local region. In another embodiment, the audio controller 350 determines the set of acoustic parameters for the current configuration of the local area based on, for example, visual information of the local area determined by one or more SLAM sensors mounted on the headset 110, sound in the local area monitored by the acoustic component 340, information about the position of the headset 110 in the local area determined by the position sensor 440, information about the position of the sound source in the local area, and the like. In yet another embodiment, the audio controller 350 obtains the set of acoustic parameters from a computer-readable data storage (i.e., memory) coupled to the audio controller 350 (not shown in fig. 3B). The memory may store different sets of acoustic parameters (room impulse responses) for a limited number of physical spatial configurations. The set of acoustic parameters may represent a parameterized form of the room impulse response for the current configuration of the local region.

The audio controller 350 may selectively extrapolate the set of acoustic parameters to an adjusted set of acoustic parameters (i.e., a reconstructed room impulse response) in response to changes over time in the local zone configuration that result in changes in the acoustic conditions of the local zone. The audio controller 350 may determine the change over time of the acoustic conditions of the local area based on, for example, visual information of the local area, monitored sounds in the local area, information about a change in position of the headset 110 in the local area, information about a change in position of a sound source in the local area, and the like. As the configuration of the local area changes (e.g., due to movement of the headset 110 and/or sound source in the local area), some of the acoustic parameters in the set change in a systematic manner, the audio controller 350 may apply an extrapolation scheme to dynamically adjust some of the acoustic parameters.

In one embodiment, audio controller 350 dynamically adjusts, for example, the amplitude and direction of the direct sound, the delay between the direct sound and early reflections, and/or the direction and amplitude of the early reflections using an extrapolation scheme based on information about the room geometry and pre-computed image sources (e.g., in one iteration). In another embodiment, the audio controller 350 dynamically adjusts some of the acoustic parameters based on, for example, a data-driven approach. In this case, the audio controller 350 may train the model with measurements of a defined number of rooms and source/receiver locations, and the audio controller 350 may predict the impulse response of a particular novel room and source/receiver arrangement based on a priori knowledge. In yet another embodiment, the audio controller 350 dynamically adjusts some of the acoustic parameters by interpolating the acoustic parameters associated with the two rooms as the listener approaches the connection between the rooms. Thus, the parametric representation of the room impulse response represented by the set of acoustic parameters can be dynamically adapted. The audio controller 350 may generate audio instructions for the transducer assembly 335 based at least in part on the dynamically adapted room impulse response.

The audio controller 350 may reconstruct a room impulse response for a particular configuration of local regions by applying an extrapolation scheme to the set of acoustic parameters received from the mapping server 130. The acoustic parameters representing a parameterized form of the room impulse response and related to perceptually relevant room impulse response characteristics may include some or all of the following: for each of a plurality of frequency bands, a reverberation time from the sound source to the headset 110, a reverberation level for each frequency band, a direct to reverberation ratio for each frequency band, a direction of the direct sound from the sound source to the headset 110, a magnitude of the direct sound for each frequency band, a time of early reflections of the sound from the sound source to the headset, a magnitude of early reflections for each frequency band, a direction of early reflections, a room mode frequency, a room mode localization, one or more other acoustic parameters, or a combination thereof.

The audio controller 350 may perform a spatial extrapolation on the received set of acoustic parameters to obtain an adjusted set of acoustic parameters representing a reconstructed room impulse response of the current configuration of the local area. When performing spatial extrapolation, the audio controller 350 may adjust a number of acoustic parameters, such as: direction of the direct sound, amplitude of the direct sound relative to the reverberation, direct sound equalization according to source directivity, timing of early reflections, amplitude of early reflections, direction of early reflections, etc. Note that the reverberation time in a room may remain unchanged and may need to be adjusted at the intersection of the rooms.

In one embodiment, to adjust early reflection timing/magnitude/direction, audio controller 350 performs extrapolation based on the direction of arrival (DOA) of each sample or reflection. In this case, the audio controller 350 may apply an offset to the entire DOA vector. Note that the DOA of the early reflections may be determined by processing audio data obtained by a microphone array mounted on the headset 110. The DOA of the early reflections may then be adjusted based on, for example, the user's location in the room 102 and information about the room geometry.

In another embodiment, the audio controller 350 may identify low-order reflections based on an Image Source Model (ISM) when the room geometry and source/listener location are known. The timing and direction of the identified reflections is modified by operating the ISM as the listener moves. In this case, the amplitude can be adjusted, and the staining (coloration) cannot be manipulated. Note that ISM represents a simulation model that determines the source location of early reflections, regardless of the listener's location. The early reflection direction can then be calculated by tracking from the image source to the listener. The image sources of a given source are stored and utilized to generate early reflection directions for any listener location in the room 102.

In yet another embodiment, the audio controller 350 may apply a "shoe box model (shoebox model)" of the room 102 to extrapolate acoustic parameters related to early reflection timing/amplitude/direction. The "shoe box model" is a room acoustic approximation based on a rectangular box that is approximately the same size as the actual space. The "shoe box model" can be used to approximate the reflection or reverberation time based on, for example, the Sabine equation. The strongest reflections of the original room impulse response (e.g., measured or simulated for a given source/receiver arrangement) are marked and removed. The low-order ISM using the "shoe box model" then reintroduces the strongest reflections to obtain an extrapolated room impulse response.

Fig. 3C is an example of a virtual model 360 that describes a physical space and acoustic properties of the physical space in accordance with one or more embodiments. Virtual model 360 may be stored in virtual model database 305. Virtual model 360 may represent a geographic information storage area in virtual storage database 305 that stores geographically bound triplets of information (i.e., physical space Identifier (ID)365, spatial configuration ID 370, and set of acoustic parameters 375) for all spaces in the world.

The virtual model 360 includes a list of possible physical spaces S1, S2, …, Sn, each identified by a unique physical space ID 365. Physical space ID 365 uniquely identifies a particular type of physical space. Physical space ID 365 may include, for example, a conference room, a bathroom, a hallway, an office, a bedroom, a restaurant and a living room, some other type of physical space, or a combination thereof. Thus, each physical space ID 365 corresponds to a particular type of physical space.

Each physical space ID 365 is associated with one or more spatial configuration IDs 370. Each spatial configuration ID 370 corresponds to a configuration of the physical space having particular acoustic conditions identified by the physical space ID 335. The space configuration ID 370 may include, for example, an identification of a human occupancy level in a physical space, an identification of a condition of a component of the physical space (e.g., an open/closed window, an open/closed door, etc.), an indication of acoustic material of an object and/or surface in the physical space, an indication of a location of a source and a recipient in the same space, some other type of configuration indication, or some combination thereof. In some embodiments, different configurations of the same physical space may be due to various different conditions in the physical space. Different configurations of the same physical space may be associated with, for example, different occupancies of the same physical space, different conditions of components of the same physical space (e.g., open/closed windows, open/closed doors, etc.), different acoustic materials of objects and/or surfaces in the same physical space, different positioning of sources/receivers in the same physical space, some other characteristic of the physical space, or some combination thereof. Each spatial configuration ID 370 may be represented as a unique code ID (e.g., a binary code) that identifies the configuration of physical space ID 365. For example, as shown in fig. 3C, the physical space S1 may be associated with p different spatial configurations S1C1, S1C2, …, S1Cp, each spatial configuration representing different acoustic conditions of the same physical space S1; the physical space S2 may be associated with q different spatial configurations S2C1, S2C2, …, S2Cq, each spatial configuration representing different acoustic conditions of the same physical space S2; the physical space Sn may be associated with r different spatial configurations SnC1, SnC2, …, SnCr, each representing a different acoustic condition of the same physical space Sn. The mapping module 315 may search the virtual model 360 to find the appropriate spatial configuration ID 370 based on the visual information of the physical space received from the head-mounted device 110.

Each spatial configuration ID 370 has specific acoustic conditions associated with a set of acoustic parameters 375 stored in a respective location of the virtual model 360. As shown in fig. 3C, p different spatial configurations S1C1, S1C2, …, S1Cp of the same physical space S1 are associated with p different sets of acoustic parameters AP11, AP12, …, AP1 p. Similarly, as further shown in fig. 3C, q different spatial configurations S2C1, S2C2, …, S2Cq of the same physical space S2 are associated with q different sets of acoustic parameters { AP21}, { AP22}, …, { AP2q }; and r different spatial configurations SnC1, SnC2, …, SnCr of the same physical space Sn are associated with r different sets of acoustic parameters APn1, APn2, …, APnr. Once the mapping module 315 finds a spatial configuration ID 370 corresponding to the current configuration of the physical space in which the headset 110 is located, the acoustic analysis module 320 may pull a corresponding set of acoustic parameters 375 from the virtual model 360.

Fig. 4 is a perspective view of a headset 110 including an audio system in accordance with one or more embodiments. In some embodiments (as shown in fig. 1), the headset 110 is implemented as a NED. In an alternative embodiment (not shown in fig. 1), the head mounted device 100 is implemented as an HMD. Generally, the headset 110 may be worn on the face of the user such that content (e.g., media content) is presented using one or both lenses 410 of the headset 110. However, the headset 110 may also be used so that media content is presented to the user in a different manner. Examples of media content presented by the headset 110 include one or more images, video, audio, or some combination thereof. The headset 110 may include components such as a frame 405, a lens 410, a DCA425, a PCA 430, a position sensor 440, and an audio system. The audio system of the headset 110 includes, for example, a left speaker 415a, a right speaker 415b, an acoustic sensor array 435, an audio controller 420, one or more other components, or a combination thereof. The audio system of the headset 110 is an embodiment of the audio system 330 described above in connection with fig. 3B. The DCA425 and PCA 430 may be part of a SLAM sensor mounted on the headset 110 for capturing visual information of a local area around some or all of the headset 110. Although fig. 4 shows the components of the headset 110 in an example position on the headset 110, the components may be located elsewhere on the headset 110, on a peripheral device paired with the headset 110, or some combination thereof.

The head mounted device 110 may correct or enhance the vision of the user, protect the eyes of the user, or provide images to the user. The head-mounted device 110 may be glasses that correct a user's vision deficiencies. The headgear 110 may be sunglasses that protect the user's eyes from the sun. The head-mounted device 110 may be a safety mirror that protects the user's eyes from impact. The head-mounted device 110 may be a night vision device or infrared goggles to enhance the user's night vision. The headset 110 may be a near-eye display that generates artificial reality content for the user. Alternatively, the headset 110 may not include the lens 410 and may be a frame 405 with an audio system that provides audio content (e.g., music, radio, podcasts) to the user.

The frame 405 holds the other components of the headset 110. The frame 405 includes a front portion that holds the lens 410 and an end piece (end piece) that attaches to the user's head. The front of the frame 405 rests on top of the nose of the user. The end pieces, e.g. temples (temples), are the part of the frame 405 to which the temples (temples) of the user are attached. The length of the tip may be adjustable (e.g., adjustable temple length) to suit different users. The end pieces may also include portions that bend (curl) behind the user's ears (e.g., temple caps (temples), ear pieces (ear pieces)).

The lens 410 provides or transmits light to a user wearing the headset 110. The lens 410 may be a prescription lens (e.g., single vision, bifocal, and trifocal or progressive lenses) to help correct the user's vision deficiencies. The prescription lens transmits ambient light to the user wearing the headset 110. The transmitted ambient light may be altered by the prescription lens to correct the user's vision deficiencies. The lens 410 may be a polarized lens or a colored lens to protect the user's eyes from sunlight. Lens 410 may be one or more waveguides that are part of a waveguide display, where image light is coupled to the user's eye through an end or edge of the waveguide. The lens 410 may include an electronic display for providing image light, and may also include an optical block for magnifying the image light from the electronic display.

Speakers 415a and 415b produce sound for the user's ears. The speakers 415a, 415B are embodiments of the transducers of the transducer assembly 335 in fig. 3B. Speakers 415a and 415b receive audio instructions from audio controller 420 to produce sound. Left speaker 415a may obtain a left audio channel from audio controller 420 and right speaker 415b may obtain a right audio channel from audio controller 420. As shown in fig. 4, each speaker 415a, 415b is coupled to an end piece of the frame 405 and is placed in front of the entrance of the user's respective ear. Although speakers 415a and 415b are shown outside of frame 405, speakers 415a and 415b may be enclosed in frame 405. In some embodiments, instead of separate speakers 415a and 415b for each ear, the headset 110 includes an array of speakers (not shown in fig. 4) integrated into, for example, the end pieces of the frame 405 to improve the directionality of the presented audio content.

The DCA425 captures depth image data describing depth information for a local area (e.g., a room) around the headset 110. In some embodiments, the DCA425 may include a light projector (e.g., structured light and/or flash illumination for time-of-flight), an imaging device, and a controller (not shown in fig. 4). The captured data may be an image of light projected onto the local area by the light projector captured by the imaging device. In one embodiment, the DCA425 may include a controller and two or more cameras oriented to capture portions of a local area in a stereoscopic manner. The captured data may be images captured stereoscopically by two or more cameras of the local area. The controller of the DCA425 uses the captured data and depth determination techniques (e.g., structured light, time of flight, stereo imaging, etc.) to calculate depth information for the local region. Based on the depth information, the controller of the DCA425 determines absolute position information of the headset 110 within the local area. The controller of the DCA425 may also generate a model of the local region. The DCA425 may be integrated with the headset 110 or may be located in a local area external to the headset 110. In some embodiments, the controller of the DCA425 may transmit the depth image data to the audio controller 420 of the headset 110, e.g., for further processing and transmission to the mapping server 130.

The PCA 430 includes one or more passive cameras that generate color (e.g., RGB) image data. Unlike the DCA425, which uses active light emission and reflection, the PCA 430 captures light from the local area environment to generate color image data. The pixel values of the color image data may define the visible color of the object captured in the image data, rather than the pixel values defining the depth or distance from the imaging device. In some embodiments, the PCA 430 comprises a controller that generates color image data based on light captured by the passive imaging device. The PCA 430 may provide the color image data to the audio controller 420, e.g., for further processing and transmission to the mapping server 130.

The acoustic sensor array 435 monitors and records sound in a localized area around some or all of the headset 110. The acoustic sensor array 435 is an embodiment of the acoustic assembly 340 of fig. 3B. As shown in fig. 4, the acoustic sensor array 435 includes a plurality of acoustic sensors having a plurality of acoustic detection locations located on the headset 110. The acoustic sensor array 435 can provide the recorded sound as an audio stream to the audio controller 420.

The position sensor 440 generates one or more measurement signals in response to movement of the headset 110. The position sensor 440 may be located on a portion of the frame 405 of the headset 110. The position sensor 440 may include a position sensor, an Inertial Measurement Unit (IMU), or both. Some embodiments of the headset 110 may or may not include a position sensor 440, or may include more than one position sensor 440. In embodiments where the position sensor 440 comprises an IMU, the IMU generates IMU data based on measurement signals from the position sensor 440. Examples of the position sensor 440 include: one or more accelerometers, one or more gyroscopes, one or more magnetometers, another suitable type of sensor to detect motion, one type of sensor for error correction of the IMU, or some combination thereof. The location sensor 440 may be located external to the IMU, internal to the IMU, or some combination thereof.

Based on the one or more measurement signals, the position sensor 440 estimates a current position of the headset 110 relative to an initial position of the headset 110. The estimated position may include a position of the headset 110 and/or an orientation of the headset 110 or a head of a user wearing the headset 110, or some combination thereof. The orientation may correspond to the position of each ear relative to a reference point. In some embodiments, the position sensor 440 uses the depth information and/or absolute position information from the DCA425 to estimate the current position of the headset 110. The position sensors 440 may include multiple accelerometers that measure translational motion (forward/backward, up/down, left/right) and multiple gyroscopes that measure rotational motion (e.g., pitch, yaw, roll). In some embodiments, the IMU rapidly samples the measurement signals and calculates an estimated position of the headset 110 from the sampled data. For example, the IMU integrates the measurement signals received from the accelerometers over time to estimate a velocity vector, and integrates the velocity vector over time to determine an estimated location of a reference point on the headset 110. The reference point is a point that may be used to describe the position of the headset 110. Although a reference point may be generally defined as a point in space; however, in practice, the reference point is defined as a point within the headset 110.

The audio controller 420 provides audio instructions to the speakers 415a, 415b for generating sound by generating audio content using a set of acoustic parameters (e.g., a room impulse response). Audio controller 420 is an embodiment of audio controller 350 of fig. 3B. The audio controller 420 renders the audio content as if it originated from an object (e.g., a virtual object or a real object) within the local region, for example, by transforming the source audio signal using the currently configured set of acoustic parameters for the local region.

The audio controller 420 may obtain visual information describing at least a portion of the local region, for example, from the DCA425 and/or the PCA 430. The visual information obtained at the audio controller 420 may include depth image data captured by the DCA 425. The visual information obtained at the audio controller 420 may further include color image data captured by the PCA 430. The audio controller 420 may combine the depth image data and the color image data into visual information that is communicated (e.g., via a communication module (not shown in fig. 4) coupled to the audio controller 420) to the mapping server 130 for determining the set of acoustic parameters. In one embodiment, the communication module (e.g., transceiver) may be integrated into the audio controller 420. In another embodiment, the communication module may be external to the audio controller 420 and integrated into the frame 405 as a separate module (e.g., the communication module 355 of fig. 3B) coupled to the audio controller 420. In some embodiments, the audio controller 420 generates an audio stream based on sound in a local area monitored by, for example, the acoustic sensor array 435. A communication module coupled to the audio controller 420 may selectively communicate the audio streams to the mapping server 130 for updating the visual model of the physical space at the mapping server 130.

Fig. 5A is a flow diagram illustrating a process 500 for determining acoustic parameters of a physical location of a headset in accordance with one or more embodiments. The process 500 of fig. 5A may be performed by a component of an apparatus, such as the mapping server 130 of fig. 3A. In other embodiments, other entities (e.g., components of the headset 110 of fig. 4 and/or components shown in fig. 6) may perform some or all of the steps of the process. Likewise, embodiments may include different and/or additional steps, or perform the steps in a different order.

The mapping server 130 determines 505 a position of a headset (e.g., the headset 110) within a local area (e.g., the room 102) in the virtual model (e.g., via the mapping module 315) based on information describing at least a portion of the local area. The stored virtual model describes a plurality of spaces and acoustic properties of the spaces, wherein a position in the virtual model corresponds to a physical position of the head-mounted device within the local area. The information describing at least a portion of the local area may include depth image data having information about a shape of at least a portion of the local area defined by surfaces of the local area (e.g., surfaces of walls, floors, and ceilings) and one or more objects (real objects and/or virtual objects) in the local area. The information describing at least a portion of the local region may further include color image data for associating the acoustic material with a surface of the local region and a surface of the one or more objects. In some embodiments, the information describing at least a portion of the local region may include location information of the local region, such as an address of the local region, a GPS location of the local region, information regarding a latitude and longitude of the local region, and so forth. In some other embodiments, the information describing at least a portion of the local region includes: depth image data, color image data, information about the acoustic material of at least a portion of the local region, positioning information of the local region, some other information, or a combination thereof.

The mapping server 130 determines 510 a set of acoustic parameters associated with the physical location of the headset (e.g., via the acoustic analysis module 320) based in part on the determined location in the virtual model and any acoustic parameters associated with the determined location. In some embodiments, the mapping server 130 retrieves the set of acoustic parameters in the virtual model from the determined locations in the virtual model associated with the spatial configuration in which the headset 110 is currently located. In some other embodiments, the mapping server 130 determines the set of acoustic parameters by adjusting a previously determined set of acoustic parameters in the virtual model based at least in part on information describing at least a portion of the local region received from the headset 110. The mapping server 130 may analyze the audio stream received from the headset 110 to determine if the existing set of acoustic parameters (if available) is consistent with the audio analysis or if recalculation is required. If the existing acoustic parameters are not consistent with the audio analysis, the mapping server 130 may run an acoustic simulation (e.g., a wave-based acoustic simulation or a ray-tracing acoustic simulation) using information describing at least a portion of the local region (e.g., room geometry, estimates of acoustic material properties) to determine a new set of acoustic parameters.

The mapping server 130 transmits the determined set of acoustic parameters to the headset for rendering audio content to the user using the set of acoustic parameters. The mapping server 130 also receives an audio stream from the headset 110 (e.g., via the communication module 310). The mapping server 130 determines one or more acoustic parameters (e.g., via the acoustic analysis module 320) based on analyzing the received audio stream. The mapping server 130 may store the one or more acoustic parameters to a storage location in the virtual model associated with the physical space in which the headset 110 is located, creating a new entry in the virtual model if the current acoustic configuration of the physical space has not been modeled. The mapping server 130 may compare the one or more acoustic parameters (e.g., by the acoustic analysis module 320) to a previously determined set of acoustic parameters. Based on the comparison, the mapping server 130 may update the virtual model by replacing at least one acoustic parameter of the set of acoustic parameters with one or more acoustic parameters. In some embodiments, the mapping server 130 re-determines the set of acoustic parameters based on, for example, a server-based simulation algorithm, controlled measurements from the headset 110, or measurements between two or more headsets.

Fig. 5B is a flow diagram illustrating a process 520 for obtaining a set of acoustic parameters from a mapping server in accordance with one or more embodiments. The process 520 of fig. 5B may be performed by components of a device, such as the headset 110 of fig. 4. In other embodiments, other entities (e.g., components of the audio system 330 of fig. 3B and/or components shown in fig. 6) may perform some or all of the steps of the process. Likewise, embodiments may include different and/or additional steps, or perform the steps in a different order.

The headset 110 determines 525 information describing at least a portion of a local area (e.g., the room 102). The information may include depth image data (e.g., generated by the DCA425 of the headset 110) having information about the shape of at least a portion of a local area defined by surfaces of the local area (e.g., surfaces of walls, floors, and ceilings) and one or more (real and/or virtual) objects in the local area. The information may also include color image data (e.g., generated by the PCA 430 of the headset 110) for at least a portion of the local region. In some embodiments, the information describing at least a portion of the local region may include location information of the local region, such as an address of the local region, a GPS location of the local region, information regarding a latitude and longitude of the local region, and so forth. In some other embodiments, the information describing at least a portion of the local region includes: depth image data, color image data, information about the acoustic material of at least a portion of the local region, positioning information of the local region, some other information, or a combination thereof.

The headset 110 transmits 530 information to the mapping server 130 (e.g., via the communication module 355) for determining a position of the headset within the local area in the virtual model and a set of acoustic parameters associated with the position in the virtual model. Each location in the virtual model corresponds to a particular physical location of the headset 110 within the local region, and the virtual model describes a plurality of spaces and acoustic properties of the spaces. In response to determining at the headset 110 that the change in the acoustic conditions of the local region over time is above the threshold change, the headset 110 may also selectively transmit (e.g., via the communication module 355) the audio stream to the mapping server 130 for updating the set of acoustic parameters. The headset 110 generates an audio stream by monitoring sound in a local area.

The headset 110 receives 535 information about the set of acoustic parameters from the mapping server 130 (e.g., via the communication module 355). For example, the received information includes information on a reverberation time of each frequency band of the plurality of frequency bands from the sound source to the head set 110, a reverberation level of each frequency band, a direct reverberation ratio of each frequency band, a direction of a direct sound of each frequency band from the sound source to the head set 110, a magnitude of the direct sound of each frequency band, an early reflection time of the sound from the sound source to the head set, a magnitude of an early reflection of each frequency band, an early reflection direction, a room mode frequency, a room mode localization, and the like.

The headset 110 uses the set of acoustic parameters to present 540 the audio content to the user of the headset 110, for example, by generating and providing appropriate acoustic instructions from the audio controller 420 to the speakers 415a, 415b (i.e., from the audio controller 350 to the transducer assembly 340). When a change in the local area (room environment) causes a change in the acoustic conditions of the local area, the headset 110 may request and obtain an updated set of acoustic parameters from the mapping server 130. In this case, the headset 110 presents the updated audio content to the user using the updated set of acoustic parameters. Alternatively, the set of acoustic parameters may be determined locally at the headset 110 without communicating with the mapping server 130. The headset 110 may determine the set of acoustic parameters (e.g., via the audio controller 350) by running an acoustic simulation (e.g., a wave-based acoustic simulation or a ray-tracing acoustic simulation) using information about the local region (e.g., information about the geometry of the local region, an estimate of acoustic material properties in the local region, etc.) as input.

Fig. 5C is a flow diagram illustrating a process 550 for reconstructing an impulse response of a local region in accordance with one or more embodiments. The process 550 of fig. 5C may be performed by components of a device, such as the audio system 330 of the headset 110. In other embodiments, other entities (e.g., the components shown in fig. 6) may perform some or all of the steps of the process. Likewise, embodiments may include different and/or additional steps, or perform the steps in a different order.

The headset 110 obtains 555 a set of acoustic parameters for a local region (e.g., the room 102) around some or all of the headset 110. In one embodiment, the headset 130 obtains the set of acoustic parameters from the mapping server 130 (e.g., via the communication module 355). In another embodiment, the headset 110 determines the set of acoustic parameters (e.g., via the audio controller 350) based on the depth image data (e.g., from the DCA425 of the headset 110), the color image data (e.g., from the PCA 430 of the headset 110), the sound in the local area (e.g., monitored by the acoustic component 340), information about the position of the headset 110 in the local area (e.g., determined by the position sensor 440), information about the position of the sound source in the local area. In another embodiment, the set of acoustic parameters is obtained by the headset 110 (e.g., via the audio controller 350) from a computer-readable data storage (i.e., memory) coupled to the audio controller 350. The set of acoustic parameters may represent a parameterized form of the room impulse response for a configuration of the local region characterized by a unique acoustic condition of the local region.

The headset 110 dynamically adjusts 560 the set of acoustic parameters to an adjusted set of acoustic parameters (e.g., via the audio controller 420) in response to a change in the configuration of the local region by extrapolating the set of acoustic parameters. For example, the change in the local area configuration may be due to a change in the spatial arrangement of the head-mounted device and the sound source (e.g., virtual sound source). The adjusted set of acoustic parameters may represent a parameterized form of the reconstructed room impulse response for the current (changed) configuration of the local region. For example, the direction, timing and amplitude of the early reflections may be adjusted to generate a reconstructed room impulse response for the current configuration of the local region.

The headset 110 presents 565 the audio content to a user of the headset 110 using the reconstructed room impulse response. The headset 110 (e.g., via the audio controller 350) may convolve the audio signal with the reconstructed room impulse response to obtain a transformed audio signal for presentation to the user. The headset 110 may generate and provide appropriate acoustic instructions (e.g., via the audio controller 350) to the transducer assembly 335 (e.g., speakers 415a, 415b) for generating sound corresponding to the transformed audio signal.

System environment

Fig. 6 is a system environment 600 of a headset according to one or more embodiments. The system 600 may operate in an artificial reality environment (e.g., a virtual reality environment, an augmented reality environment, a mixed reality environment, or some combination thereof). The system 600 shown in fig. 6 includes a headset 110, a mapping server 130, and an input/output (I/O) interface 640 coupled to a console 645. Although fig. 6 illustrates an example system 600 including one headset 110 and one I/O interface 640, in other embodiments any number of these components may be included in the system 600. For example, there may be multiple headsets 110, each headset 110 having an associated I/O interface 640, each headset 110 and I/O interface 640 communicating with the console 645. In alternative configurations, different and/or additional components may be included in system 600. Further, in some embodiments, the functionality described in connection with one or more of the components shown in fig. 6 may be distributed among the components in a different manner than that described in connection with fig. 6. For example, some or all of the functionality of the console 645 may be provided by the headset 110.

The headset 110 includes a lens 410, an optics block 610, one or more position sensors 440, a DCA425, an Inertial Measurement Unit (IMU)615, a PCA 430, and an audio system 330. Some embodiments of the headgear 110 have different components than those described in connection with fig. 6. In addition, the functionality provided by the various components described in conjunction with fig. 6 may be distributed differently among the components of the headset 110 in other embodiments, or may be captured in a separate component remote from the headset 110.

The lens 410 may include an electronic display that displays 2D or 3D images to a user according to data received from the console 645. In various embodiments, the lens 410 includes a single electronic display or multiple electronic displays (e.g., a display for each eye of the user). Examples of electronic displays include: a Liquid Crystal Display (LCD), an Organic Light Emitting Diode (OLED) display, an active matrix organic light emitting diode display (AMOLED), some other display, or some combination thereof.

Optics block 610 amplifies image light received from the electronic display, corrects optical errors associated with the image light, and presents the corrected image light to a user of headset 110. In various embodiments, optics block 610 includes one or more optical elements. Example optical elements included in the optical block 610 include: an aperture, fresnel lens, convex lens, concave lens, filter, reflective surface, or any other suitable optical element that affects image light. Furthermore, the optical block 610 may include a combination of different optical elements. In some embodiments, one or more optical elements in optical block 610 may have one or more coatings, such as a partially reflective coating or an anti-reflective coating.

The magnification and focusing of the image light by optics block 610 allows the electronic display to be physically smaller, lighter in weight, and consume less power than larger displays. In addition, the magnification may increase the field of view of the content presented by the electronic display. For example, the field of view of the displayed content is such that the displayed content is presented using almost all of the user's field of view (e.g., approximately 110 degrees diagonal), and in some cases all of the field of view. Further, in some embodiments, the amount of magnification may be adjusted by adding or removing optical elements.

In some embodiments, optics block 610 may be designed to correct one or more types of optical errors. Examples of optical errors include barrel or pincushion distortion, longitudinal chromatic aberration, or lateral chromatic aberration. Other types of optical errors may also include spherical aberration, chromatic aberration (chromatic aberration) or errors due to lens field curvature (lens field curvature), astigmatism or any other type of optical error. In some embodiments, the content provided to the electronic display for display is pre-distorted, and when optics block 610 receives content-based generated image light from the electronic display, optics block 610 corrects for the distortion.

The IMU 615 is an electronic device that generates data indicative of the position of the headset 110 based on measurement signals received from the one or more position sensors 440. The position sensor 440 generates one or more measurement signals in response to movement of the headset 110. Examples of the position sensor 440 include: one or more accelerometers, one or more gyroscopes, one or more magnetometers, another suitable type of sensor to detect motion, a type of sensor used for error correction of the IMU 615, or some combination thereof. The location sensor 440 may be located outside the IMU 615, inside the IMU 615, or some combination of the two locations.

The DCA425 generates depth image data of a local region such as a room. The depth image data includes pixel values that define a distance from the imaging device and thus provide a (e.g., 3D) map of the locations captured in the depth image data. The DCA425 includes a light projector 620, one or more imaging devices 625, and a controller 630. The light projector 620 may project a structured light pattern or other light that is reflected by objects in the local area and captured by the imaging device 625 to generate depth image data.

For example, the light projector 620 may project multiple Structured Light (SL) elements of different types (e.g., lines, grids, or points) onto a portion of a local area around the headset 110. In various embodiments, the light projector 620 includes an emitter and a template (pattern plate). The emitter is configured to illuminate the template with light (e.g., infrared light). The illuminated template projects an SL pattern comprising a plurality of SL elements into a local area. For example, each SL element projected by an illuminated template is a point associated with a particular location on the template.

Each SL element projected by the DCA425 includes light in the infrared portion of the electromagnetic spectrum. In some embodiments, the illumination source is a laser configured to illuminate the template with infrared light such that it is invisible to a human. In some embodiments, the illumination source may be pulsed. In some embodiments, the illumination source may be visible and pulsed such that the light is not visible to the eye.

The SL pattern projected into the local area by the DCA425 deforms when encountering various surfaces and objects in the local area. The one or more imaging devices 625 are each configured to capture one or more images of a local area. Each of the captured one or more images may include a plurality of SL elements (e.g., dots) projected by the light projector 620 and reflected by objects in the local region. Each of the one or more imaging devices 625 may be a detector array, a camera, or a video camera.

The controller 630 generates depth image data based on light captured by the imaging device 625. The controller 630 may further provide the depth image data to the console 645, the audio controller 420, or some other component.

The PCA 430 includes one or more passive cameras that generate color (e.g., RGB) image data. Unlike the DCA425, which uses active light emission and reflection, the PCA 430 captures light from the local area's environment to generate image data. The pixel values of the image data may define the visible color of the object captured in the imaging data, rather than the pixel values defining the depth or distance from the imaging device. In some embodiments, the PCA 430 comprises a controller that generates color image data based on light captured by the passive imaging device. In some embodiments, the DCA425 and the PCA 430 share a common controller. For example, the common controller may map each of one or more images captured in the visible spectrum (e.g., image data) and the infrared spectrum (e.g., depth image data) to one another. In one or more embodiments, the common controller is configured to additionally or alternatively provide one or more images of the local area to the audio controller 420 or the console 645.

The audio system 330 presents audio content to the user of the headset 110 using a set of acoustic parameters that represent acoustic properties of the local area in which the headset 110 is located. The audio system 330 renders the audio content as if it originated from an object (e.g., a virtual object or a real object) within the local area. The audio system 330 may obtain information describing at least a portion of the local region. The audio system 330 may transmit information to the mapping server 130 for determining the set of acoustic parameters at the mapping server 130. The audio system 330 may also receive the set of acoustic parameters from the mapping server 130.

In some embodiments, in response to a change in acoustic conditions of the local region being above a threshold change, the audio system 330 selectively extrapolates the set of acoustic parameters to an adjusted set of acoustic parameters representing a particular configuration of reconstructed impulse responses of the local region. The audio system 330 may present audio content to the user of the headset 110 based at least in part on the reconstructed impulse response.

In some embodiments, the audio system 330 monitors sounds in a local area and generates a corresponding audio stream. The audio system 330 may adjust the set of acoustic parameters based at least in part on the audio stream. The audio system 330 may also selectively transmit the audio stream to the mapping server 130 for updating a virtual model describing various physical spaces and acoustic properties of those spaces in response to determining that the change in acoustic properties of the local region over time is above a threshold change. The audio system 330 and the mapping server 130 of the headset 110 may communicate via a wired or wireless communication link (e.g., the network 120 of fig. 1).

I/O interface 640 is a device that allows a user to send action requests and receive responses from console 645. An action request is a request to perform a particular action. For example, the action request may be an instruction to begin or end capturing image or video data, or an instruction to perform a particular action within an application. I/O interface 640 may include one or more input devices. Example input devices include a keyboard, mouse, game controller, or any other suitable device for receiving and transmitting action requests to the console 645. The action request received by the I/O interface 640 is transmitted to the console 645, and the console 645 performs an action corresponding to the action request. In some embodiments, as further described above, the I/O interface 640 includes the IMU 615, which captures calibration data indicating an estimated location of the I/O interface 640 relative to an initial location of the I/O interface 640. In some embodiments, the I/O interface 640 may provide haptic feedback to the user according to instructions received from the console 645. For example, when an action request is received, or when the console 645 transmits instructions to the I/O interface 640, haptic feedback is provided that causes the I/O interface 640 to generate haptic feedback when the console 645 performs an action.

The console 645 provides content to the headset 110 for processing in accordance with information received from one or more of the DCA425, the PCA 430, the headset 110, and the input/output interface 640. In the example shown in fig. 6, the console 645 includes application storage 650, a tracking module 655, and an engine 660. Some embodiments of the console 645 have different modules or components than those described in conjunction with fig. 6. Similarly, the functionality described further below may be distributed among the components of the console 645 in a manner different than that described in conjunction with fig. 6. In some embodiments, the functionality discussed herein with reference to the console 645 may be implemented in the headset 110 or in a remote system.

The application storage 650 stores one or more applications for execution by the console 645. An application is a set of instructions that, when executed by a processor, generate content for presentation to a user. The content generated by the application may be responsive to input received from the user via the movement of the headset 110 or the I/O interface 640. Examples of applications include: a gaming application, a conferencing application, a video playback application, or other suitable application.

The tracking module 655 calibrates a local region of the system 600 using one or more calibration parameters, and may adjust the one or more calibration parameters to reduce errors in the determination of the position of the headset 110 or the I/O interface 640. For example, the tracking module 655 communicates calibration parameters to the DCA425 to adjust the focus of the DCA425 to more accurately determine the location of SL elements captured by the DCA 425. The calibration performed by the tracking module 655 may also take into account information received from the IMU 615 in the headset 110 and/or the IMU 615 included in the I/O interface 640. Additionally, if tracking of the headset 110 is lost (e.g., the DCA425 loses line of sight to at least a threshold number of projected SL elements), the tracking module 655 may recalibrate part or the entire system 600.

The tracking module 655 uses information from the DCA425, the PCA 430, the one or more location sensors 440, the IMU 615, or some combination thereof, to track movement of the headset 110 or the I/O interface 640. For example, the tracking module 655 determines the location of a reference point of the headset 110 in a map of the local area based on information from the headset 110. The tracking module 655 may also determine the location of the object or virtual object. Additionally, in some embodiments, the tracking module 655 may use the data portion from the IMU 615 indicating the position of the headset 110 and a representation of the local area from the DCA425 to predict a future location of the headset 110. The tracking module 655 provides the estimated or predicted future location of the headset 110 or the I/O interface 640 to the engine 660.

The engine 660 executes the application and receives position information, acceleration information, velocity information, predicted future positions, or some combination thereof, of the headset 110 from the tracking module 655. Based on the received information, the engine 660 determines content to provide to the headset 110 for presentation to the user. For example, if the received information indicates that the user has looked to the left, the engine 660 generates content for the headset 110 that reflects (mirror) the user's movement in the virtual local area or in a local area that augments the local area with additional content. In addition, engine 660 performs actions within applications executing on console 645 in response to action requests received from I/O interface 640 and provides feedback to the user that the actions were performed. The feedback provided may be visual or auditory feedback via the headset 110, or tactile feedback via the I/O interface 640.

Additional configuration information

The foregoing description of embodiments of the present disclosure has been presented for purposes of illustration; it is not intended to be exhaustive or to limit the disclosure to the precise form disclosed. One skilled in the relevant art will recognize that many modifications and variations are possible in light of the above disclosure.

Some portions of the present description describe embodiments of the disclosure in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are commonly used by those skilled in the data processing arts to convey the substance of their work effectively to others skilled in the art. These operations, while described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like. Moreover, it has also proven convenient at times, to refer to these arrangements of operations as modules, without loss of generality. The described operations and their associated modules may be embodied in software, firmware, hardware, or any combination thereof.

Any of the steps, operations, or processes described herein may be performed or implemented using one or more hardware or software modules, alone or in combination with other devices. In one embodiment, the software modules are implemented using a computer program product comprising a computer-readable medium containing computer program code, which may be executed by a computer processor, for performing any or all of the steps, operations, or processes described.

Embodiments of the present disclosure may also relate to apparatuses for performing the operations herein. The apparatus may be specially constructed for the required purposes, and/or it may comprise a general purpose computing device selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a non-transitory, tangible computer readable storage medium, or any type of medium suitable for storing electronic instructions, which may be coupled to a computer system bus. Moreover, any computing system referred to in the specification may include a single processor, or may be an architecture that employs a multi-processor design to increase computing power.

Embodiments of the present disclosure may also relate to products produced by the computing processes described herein. Such products may include information obtained from computing processes, where the information is stored on non-transitory, tangible computer-readable storage media and may include any embodiment of a computer program product or other combination of data described herein.

Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope of the disclosure be limited not by this detailed description, but rather by any claims that issue on an application based thereupon. Accordingly, the disclosure of the embodiments is intended to be illustrative, but not limiting, of the scope of the disclosure, which is set forth in the following claims.

37页详细技术资料下载

Determining acoustic parameters of a headset using a mapping server

相关技术

网友询问留言