Dual listener position for mixed reality
阅读说明:本技术 用于混合现实的双听者位置 (Dual listener position for mixed reality ) 是由 A·A·塔吉克 于 2019-02-15 设计创作,主要内容包括:公开了一种呈现混合现实环境中的音频信号的方法,所述方法包括以下步骤:识别混合现实环境中的第一耳朵听者位置;识别混合现实环境中的第二耳朵听者位置;识别混合现实环境中的第一虚拟声源;识别混合现实环境中的第一对象;确定混合现实环境中的第一音频信号,其中,第一音频信号源于第一虚拟声源并且与第一耳朵听者位置相交;确定混合现实环境中的第二音频信号,其中,第二音频信号源于第一虚拟声源,与第一对象相交,并且与第二耳朵听者位置相交;基于第二音频信号和第一对象来确定第三音频信号;经由第一扬声器向用户的第一耳朵呈现第一音频信号;以及经由第二扬声器向用户的第二耳朵呈现第三音频信号。(A method of rendering audio signals in a mixed reality environment is disclosed, the method comprising the steps of: identifying a first-ear listener position in a mixed reality environment; identifying a second-ear listener location in the mixed reality environment; identifying a first virtual sound source in a mixed reality environment; identifying a first object in a mixed reality environment; determining a first audio signal in a mixed reality environment, wherein the first audio signal originates from a first virtual sound source and intersects a first ear listener location; determining a second audio signal in the mixed reality environment, wherein the second audio signal originates from the first virtual sound source, intersects the first object, and intersects the second-ear listener position; determining a third audio signal based on the second audio signal and the first object; presenting a first audio signal to a first ear of a user via a first speaker; and presenting the third audio signal to a second ear of the user via a second speaker.)
1. A method of rendering audio signals in a mixed reality environment, the method comprising:
identifying a first ear listener location in the mixed reality environment;
identifying a second-ear listener location in the mixed reality environment;
identifying a first virtual sound source in the mixed reality environment;
identifying a first object in the mixed reality environment;
determining a first audio signal in the mixed reality environment, wherein the first audio signal originates from the first virtual sound source and intersects the first-ear listener position;
determining a second audio signal in the mixed reality environment, wherein the second audio signal originates from the first virtual sound source, intersects the first object, and intersects the second-ear listener position;
determining a third audio signal based on the second audio signal and the first object;
presenting the first audio signal to a first ear of a user via a first speaker; and
presenting the third audio signal to a second ear of the user via a second speaker.
2. The method of claim 1, wherein determining the third audio signal from the second audio signal comprises applying a low pass filter to the second audio signal, the low pass filter having parameters based on the first object.
3. The method of claim 1, wherein determining the third audio signal from the second audio signal comprises applying an attenuation to the second audio signal, the intensity of the attenuation being based on the first object.
4. The method of claim 1, wherein identifying the first object comprises identifying a real object.
5. The method of claim 4, wherein identifying the real object comprises determining a position of the real object relative to the user in the mixed reality environment using a sensor.
6. The method of claim 5, wherein the sensor comprises a depth camera.
7. The method of claim 4, further comprising generating helper data corresponding to the real object.
8. The method of claim 4, further comprising generating a virtual object corresponding to the real object.
9. The method of claim 1, further comprising: identifying a second virtual object, wherein the first audio signal intersects the second virtual object and a fourth audio signal is determined based on the second virtual object.
10. A system, comprising:
a wearable head apparatus, comprising:
a display for displaying a mixed reality environment to a user, the display comprising a transmissive eyepiece through which a real environment is visible;
a first speaker configured to present an audio signal to a first ear of the user; and
a second speaker configured to present audio signals to a second ear of the user; and
one or more processors configured to perform:
identifying a first ear listener location in the mixed reality environment;
identifying a second-ear listener location in the mixed reality environment;
identifying a first virtual sound source in the mixed reality environment;
identifying a first object in the mixed reality environment;
determining a first audio signal in the mixed reality environment, wherein the first audio signal originates from the first virtual sound source and intersects the first-ear listener position;
determining a second audio signal in the mixed reality environment, wherein the second audio signal originates from the first virtual sound source, intersects the first object, and intersects the second-ear listener position;
determining a third audio signal based on the second audio signal and the first object;
presenting the first audio signal to the first ear via a first speaker; and
presenting the third audio signal to the second ear via a second speaker.
11. The system of claim 10, wherein determining the third audio signal from the second audio signal comprises applying a low pass filter to the second audio signal, the low pass filter having parameters based on the first object.
12. The system of claim 10, wherein determining the third audio signal from the second audio signal comprises applying an attenuation to the second audio signal, the intensity of the attenuation being based on the first object.
13. The system of claim 10, wherein identifying the first object comprises identifying a real object.
14. The system of claim 13, wherein the wearable head device further comprises a sensor, and wherein identifying the real object comprises using the sensor to determine a position of the real object relative to the user in the mixed reality environment.
15. The system of claim 14, wherein the sensor comprises a depth camera.
16. The system of claim 13, the one or more processors further configured to perform generating helper data corresponding to the real object.
17. The system of claim 13, the one or more processors further configured to perform generating a virtual object corresponding to the real object.
18. The system of claim 10, the one or more processors further configured to perform identifying a second virtual object, wherein the first audio signal intersects the second virtual object and a fourth audio signal is determined based on the second virtual object.
Technical Field
This application claims the benefit of U.S. provisional patent application No. 62/631,422 filed on 2018, 2, 15, which is incorporated herein by reference in its entirety.
The present disclosure relates generally to systems and methods for presenting audio signals, and in particular to systems and methods for presenting stereo audio signals to users of mixed reality systems.
Background
Virtual environments are ubiquitous in computing environments, found for video games (where a virtual environment may represent a game world); a map (wherein the virtual environment may represent terrain to be navigated); simulation (where a virtual environment may simulate a real environment); a digital storytelling (in which virtual characters may interact with each other in a virtual environment); and many other applications. Modern computer users typically comfortably perceive and interact with a virtual environment. However, the user experience with virtual environments may be limited by the techniques used to present the virtual environments. For example, conventional displays (e.g., 2D display screens) and audio systems (e.g., fixed speakers) may not be able to implement a virtual environment in a manner that produces a convincing, realistic, and immersive experience.
Virtual reality ("VR"), augmented reality ("AR"), mixed reality ("MR"), and related technologies (collectively "XR") share the ability to present sensory information corresponding to a virtual environment represented by data in a computer system to a user of the XR system. The present disclosure contemplates the distinction between VR, AR, and MR systems (although some systems may be classified as VR in one aspect (e.g., visual aspects) and concurrently as AR or MR in another aspect (e.g., audio aspects)). As used herein, a VR system presents a virtual environment that, in at least one aspect, replaces a user's real environment; for example, the VR system may present a view of the virtual environment to the user while obscuring a view of his or her real environment, such as with a light-blocking head mounted display. Similarly, the VR system may present audio corresponding to the virtual environment to the user while blocking (attenuating) audio from the real environment.
VR systems may experience various disadvantages resulting from replacing the user's real environment with a virtual environment. One drawback is the sensation of motion sickness that may occur when the user's field of view in the virtual environment no longer corresponds to the state of his or her inner ear, which detects the balance and orientation of the person in the real environment (non-virtual environment). Similarly, users may experience disorientation in VR environments where their own body and limbs (users rely on their view of their body and limbs to feel "touchdown" in a real environment) are not directly visible. Another drawback is the computational burden (e.g., storage, processing power) placed on VR systems that must present a full 3D virtual environment, particularly in real-time applications that attempt to immerse users in the virtual environment. Similarly, such environments may need to reach a very high standard of realism to be considered immersive, as users tend to be sensitive to even minor imperfections in the virtual environment — any of which may undermine the immersion of the user in the virtual environment. Further, another drawback of VR systems is that such applications of the system cannot take advantage of the wide range of sensory data in the real environment, such as the various visuals and sounds people experience in the real world. A related drawback is that VR systems may struggle to create a shared environment that multiple users may interact with, as users sharing a physical space in a real environment may not be able to directly see each other or interact with each other in a virtual environment.
As used herein, in at least one aspect, an AR system presents a virtual environment that overlaps or overlays a real environment. For example, the AR system may present a view of the virtual environment to the user overlaid on a view of the user's real environment, such as with a transmissive head mounted display that presents the displayed image while allowing light to pass through the display into the user's eyes. Similarly, the AR system may present audio corresponding to the virtual environment to the user while mixing in audio from the real environment. Similarly, as used herein, like an AR system, an MR system, in at least one aspect, presents a virtual environment that overlaps or overlays a real environment, and may additionally, in at least one aspect, allow the virtual environment in the MR system to interact with the real environment. For example, a virtual character in the virtual environment may switch a light switch in the real environment such that a corresponding light bulb in the real environment is turned on or off. As another example, the virtual character may react to audio signals in the real environment (such as with facial expressions). By maintaining a representation of the real environment, AR and MR systems can avoid some of the aforementioned drawbacks of VR systems; for example, motion sickness of a user is reduced because visual cues from the real environment (including the user's own body) can remain visible, and such a system does not need to present the user with a fully realized 3D environment in order to be immersive. Further, AR and MR systems may utilize real world sensory input (e.g., views and sounds of scenes, objects, and other users) to create new applications that enhance the input.
XR systems can provide users with various ways to interact with virtual environments; for example, the XR system may include various sensors (e.g., cameras, microphones, etc.) for detecting the user's position and orientation, facial expressions, speech, and other characteristics; and present the information as input to the virtual environment. Some XR systems may include sensor-equipped input devices, such as virtual "mallets," and may be configured to detect the position, orientation, or other characteristics of the input device.
XR systems can provide uniquely enhanced immersion and realism by combining virtual visual and audio cues with real line of sight and sound. For example, it may be desirable to present audio cues to a user of an XR system in a manner that simulates aspects (particularly subtle aspects) of our own sensory experience. The present invention relates to presenting stereo audio signals originating from a single sound source in a mixed reality environment to a user, enabling the user to identify the position and orientation of the sound source in the mixed reality environment based on the difference of the signals received by the user's left and right ears. By identifying the location and orientation of a sound source in a mixed reality environment using audio cues, a user may experience heightened awareness of virtual sounds that originate from that location and orientation. Furthermore, the immersion of the user in the mixed reality environment can be enhanced by presenting stereo audio that not only corresponds to the direct audio signal, but also presents a fully immersive soundscape generated using the 3D propagation model.
Disclosure of Invention
Examples of the present disclosure describe systems and methods for rendering audio signals in a mixed reality environment. In one example, a method comprises the steps of: identifying a first-ear listener position in a mixed reality environment; identifying a second-ear listener location in the mixed reality environment; identifying a first virtual sound source in a mixed reality environment; identifying a first object in a mixed reality environment; determining a first audio signal in a mixed reality environment, wherein the first audio signal originates from a first virtual sound source and intersects a first ear listener location; determining a second audio signal in the mixed reality system, wherein the second audio signal originates from the first virtual sound source, intersects the first object, and intersects the second-ear listener position; determining a third audio signal based on the second audio signal and the first object; presenting a first audio signal to a first ear of a user via a first speaker; and presenting the third audio signal to a second ear of the user via a second speaker.
Drawings
1A-1C illustrate an example mixed reality environment.
2A-2D illustrate components of an example mixed reality system that may be used to interact with a mixed reality environment.
FIG. 3A illustrates an example mixed reality handheld controller that can be used to provide input to a mixed reality environment.
Fig. 3B illustrates an example auxiliary unit that may be included in an example mixed reality system.
Fig. 4 illustrates an example functional block diagram for an example mixed reality system.
5A-5B illustrate an example mixed reality environment including a user, a virtual sound source, and an audio signal originating from the virtual sound source.
Fig. 6 illustrates an example flow diagram for presenting a stereo audio signal to a user of a mixed reality environment.
Fig. 7 illustrates an example functional block diagram of an example augmented reality processing system.
Detailed Description
In the following description of the examples, reference is made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration specific examples that may be practiced. It is to be understood that other examples may be used and structural changes may be made without departing from the scope of the disclosed examples.
Mixed reality environment
Like everyone, users of mixed reality systems exist in real environments-i.e., three-dimensional portions of the "real world" and all of their content that can be perceived by the user. For example, a user perceives the real world using an individual's ordinary human senses (sight, sound, touch, taste, smell) and interacts with the real environment by moving the individual's own body in the real environment. A location in a real environment may be described as a coordinate in a coordinate space; for example, the coordinates may include latitude, longitude, and altitude relative to sea level; distances from a reference point in three orthogonal dimensions; or other suitable values. Likewise, a vector may describe a quantity having a direction and magnitude in coordinate space.
The computing device may maintain a representation of the virtual environment, for example, in memory associated with the device. As used herein, a virtual environment is a computational representation of a three-dimensional space. A virtual environment may include representations of any object, action, signal, parameter, coordinate, vector, or other characteristic associated with the space. In some examples, circuitry (e.g., a processor) of a computing device may maintain and update a state of a virtual environment; that is, the processor may determine the state of the virtual environment at the second time t1 based on data associated with the virtual environment and/or input provided by the user at the first time t 0. For example, if an object in the virtual environment is located at a first coordinate at time t0 and has some programmed physical parameter (e.g., mass, coefficient of friction); and the input received from the user indicates that force should be applied to the object with a directional vector; the processor may apply kinematics laws to determine the location of the object at time t1 using underlying mechanics. The processor may use any suitable information and/or any suitable input known about the virtual environment to determine the state of the virtual environment at time t 1. While maintaining and updating the state of the virtual environment, the processor may execute any suitable software, including software related to the creation and deletion of virtual objects in the virtual environment; software (e.g., scripts) for defining the behavior of virtual objects or characters in a virtual environment; software for defining behavior of a signal (e.g., an audio signal) in a virtual environment; software for creating and updating parameters associated with the virtual environment; software for generating an audio signal in a virtual environment; software for processing inputs and outputs; software for implementing network operations; software for applying asset data (e.g., animation data that moves virtual objects over time); or many other possibilities.
An output device (such as a display or speaker) may present any or all aspects of the virtual environment to a user. For example, the virtual environment may include virtual objects (which may include representations of inanimate objects; humans; animals; lights; etc.) that may be presented to the user. The processor may determine a view of the virtual environment (e.g., corresponding to a "camera" having a coordinate origin, a view axis, and a viewing cone); and rendering a visual scene of the virtual environment corresponding to the view to the display. Any suitable rendering technique may be used for this purpose. In some examples, the visual scene may include only some virtual objects in the virtual environment and not some other virtual objects. Similarly, the virtual environment may include audio aspects that may be presented to the user as one or more audio signals. For example, a virtual object in the virtual environment may generate sound that originates from the position coordinates of the object (e.g., a virtual character may speak or cause a sound effect); or the virtual environment may be associated with music cues or environmental sounds, which may or may not be associated with a particular location. The processor may determine audio signals corresponding to the "listener" coordinates-e.g., corresponding to a composite of sounds in the virtual environment and mixed and processed to simulate audio signals to be heard by the listener at the listener coordinates-and present the audio signals to the user via the one or more speakers.
Since the virtual environment exists only as a computing structure, the user cannot directly perceive the virtual environment using the general feeling of a person. Rather, the user may only indirectly perceive the virtual environment as presented to the user, e.g., through a display, speakers, haptic output devices, and so forth. Similarly, users cannot directly contact, manipulate, or otherwise interact with the virtual environment; input data may be provided via input devices or sensors to a processor that may update the virtual environment using the device or sensor data. For example, a camera sensor may provide optical data indicating that a user is attempting to move an object in the virtual environment, and a processor may use this data to cause the object to react accordingly in the virtual environment.
The mixed reality system may present a mixed reality environment ("MRE") that combines aspects of the real environment and the virtual environment to the user, e.g., using a transmissive display and/or one or more speakers (which may be, e.g., included in a wearable head device). In some embodiments, the one or more speakers may be external to the wearable head device. As used herein, an MRE is a simultaneous representation of a real environment and a corresponding virtual environment. In some examples, the corresponding real environment and virtual environment share a single coordinate space; in some examples, the real coordinate space and the corresponding virtual coordinate space are related to each other by a transformation matrix (or other suitable representation). Thus, a single coordinate (along with a transformation matrix, in some examples) may define a first location in the real environment, and a second corresponding location in the virtual environment; and vice versa.
In an MRE, a virtual object (e.g., in a virtual environment associated with the MRE) may correspond to a real object (e.g., in a real environment associated with the MRE). For example, if the real environment of the MRE includes a real light pole (real object) at the location coordinates, the virtual environment of the MRE may include a virtual light pole (virtual object) at the corresponding location coordinates. As used herein, a real object in combination with its corresponding virtual object together constitute a "mixed reality object". There is no need for the virtual object to perfectly match or align with the corresponding real object. In some examples, the virtual object may be a simplified version of the corresponding real object. For example, if the real environment includes a real light pole, the corresponding virtual object may include a cylinder having roughly the same height and radius as the real light pole (reflecting that the light pole may be roughly cylindrical in shape). Simplifying virtual objects in this manner may allow for computational efficiency and may simplify the computations to be performed on such virtual objects. Further, in some examples of MREs, not all real objects in the real environment may be associated with corresponding virtual objects. Likewise, in some examples of MREs, not all virtual objects in the virtual environment may be associated with corresponding real objects. That is, some virtual objects may only be in the virtual environment of the MRE without any real-world counterparts.
In some examples, a virtual object may have different (sometimes drastically different) characteristics than the corresponding real object. For example, while the real environment in the MRE may include a green two-arm cactus-multi-stabbed inanimate object-the corresponding virtual object in the MRE may have the characteristics of a green two-arm virtual character with facial features and rough behavior. In this example, the virtual object resembles its corresponding real object in some characteristic (color, number of arms); but differs from the real object in other characteristics (facial features, personality). In this way, virtual objects have the potential to represent real objects in a creative, abstract, exaggerated, or fanciful manner; or give behavior (e.g., human personalization) to other inanimate real objects. In some examples, the virtual object may be a pure fantasy creation without a real-world counterpart (e.g., a virtual monster in a virtual environment, perhaps at a location corresponding to white space in a real environment).
Compared to VR systems that present a virtual environment to a user while obscuring the real environment, mixed reality systems that present MREs offer the advantage that the real environment remains perceptible when the virtual environment is presented. Thus, a user of the mixed reality system is able to experience and interact with a corresponding virtual environment using visual and audio cues associated with the real environment. As an example, when a user of the VR system may struggle to perceive or interact with virtual objects displayed in the virtual environment-because, as described above, the user cannot directly perceive or interact with the virtual environment-a user of the MR system may find his intuitive and natural interaction with a virtual object by looking, listening and touching the corresponding real object in his or her own real environment. This level of interactivity may improve the user's sense of immersion, connection, and engagement with the virtual environment. Similarly, by presenting the real environment and the virtual environment simultaneously, the mixed reality system can reduce negative psychological sensations (e.g., cognitive disorders) and negative physical sensations (e.g., motion sickness) associated with the VR system. Mixed reality systems further offer many possibilities for applications that can augment or alter our real-world experience.
FIG. 1A illustrates an example
FIG. 1B illustrates an example
With respect to fig. 1A and 1B, the environment/world coordinate
FIG. 1C illustrates an example MRE150 that simultaneously presents aspects of the
In the illustrated example, the mixed reality object includes corresponding real and virtual object pairs (i.e., 122A/122B, 124A/124B, 126A/126B) that occupy corresponding locations in the coordinate
In some examples, real objects (e.g., 122A, 124A, 126A) may be associated with virtual content or auxiliary data that may not necessarily constitute virtual objects. The virtual content or the auxiliary data may facilitate processing or handling of the virtual object in the mixed reality environment. For example, such virtual content may include a two-dimensional representation of: a corresponding real object; a custom asset type associated with the corresponding real object; or statistics associated with the corresponding real object. This information may enable or facilitate calculations involving real objects without incurring unnecessary computational overhead.
In some examples, the above-described presentation may also include an audio aspect. For example, in MRE150,
Example Mixed reality System
An example
Fig. 2A-2D illustrate components of an example mixed reality system 200 (which may correspond to the mixed reality system 112) that may be used to present an MRE (which may correspond to the MRE 150) or other virtual environment to a user. Fig. 2A shows a perspective view of a
In some examples, the
In the example shown in fig. 2A-2D, left image-forming modulated
In some examples, as shown in fig. 2D, each of the
In some examples, to create a perception that the displayed content is three-dimensional, stereoadjusted left and right eye imagery may be presented to the user through
Fig. 2D shows an edge-facing view from the top of the
Fig. 3A illustrates an example hand-held
Fig. 3B illustrates an example
In some examples, the
Fig. 4 illustrates an example functional block diagram that may correspond to an example mixed reality system, such as the
In some examples, it may become desirable to transform coordinates from a local coordinate space (e.g., a coordinate space fixed relative to wearable head device 400A) to an inertial coordinate space (e.g., a coordinate space fixed relative to the real environment), for example, in order to compensate for motion of wearable head device 400A relative to coordinate
In some examples, the depth camera 444 may supply 3D imagery to the gesture tracker 411, which gesture tracker 411 may be implemented in a processor of the wearable head device 400A. Gesture tracker 411 may recognize a gesture of a user, for example, by matching a 3D imagery received from depth camera 444 to a stored pattern representing the gesture. Other suitable techniques for recognizing the user's gestures will be apparent.
In some examples, the one or more processors 416 may be configured to receive data from the 6DOF wearable head device subsystem 404B, IMU 409, SLAM/visual odometry block 406, depth camera 444, and/or gesture tracker 411 of the wearable head device. The processor 416 may also send and receive control signals from the 6DOF totem system 404A. The processor 416 may be wirelessly coupled to the 6DOF totem system 404A, such as in the non-limiting example of the handheld controller 400B. The processor 416 may also communicate with additional components, such as an audio-visual content memory 418, a Graphics Processing Unit (GPU)420, and/or Digital Signal Processor (DSP) audio spatial sound 422. DSP audio space acoustics 422 may be coupled to Head Related Transfer Function (HRTF) memory 425. The GPU 420 may include a left channel output coupled to a left imagewise modulated light source 424 and a right channel output coupled to a right imagewise modulated light source 426. The GPU 420 may output the stereoscopic image data to the imagewise modulated light sources 424, 426, e.g., as described above with respect to fig. 2A-2D. DSP audio space stereo 422 may output audio to left speaker 412 and/or right speaker 414. DSP audio space stereo 422 may receive input from processor 419 indicating a direction vector from the user to the virtual sound source (which may be moved by the user, e.g., via handheld controller 320). Based on the direction vectors, DSP audio space acoustics 422 may determine a corresponding HRTF (e.g., by accessing the HRTF, or by interpolating multiple HRTFs). DSP audio space acoustics 422 may then apply the determined HRTF to an audio signal, such as an audio signal corresponding to a virtual sound generated by a virtual object. This may improve the trustworthiness and realism of the virtual sound by incorporating the relative position and orientation of the user with respect to the virtual sound in the mixed reality environment-i.e., by presenting a virtual sound that matches the user's expectations of what the virtual sound will sound like if it is a real sound in a real environment.
In some examples, such as shown in fig. 4, one or more of processor 416, GPU 420, DSP audio space acoustics 422, HRTF memory 425, and audio/visual content memory 418 may be included in secondary unit 400C (which may correspond to
While fig. 4 presents elements corresponding to various components of an example mixed reality system, various other suitable arrangements of these components will become apparent to those skilled in the art. For example, elements presented in fig. 4 as being associated with the secondary unit 400C may instead be associated with the wearable headpiece 400A or the handheld controller 400B. Furthermore, some mixed reality systems may forego the handheld controller 400B or the auxiliary unit 400C altogether. Such variations and modifications are to be understood as being included within the scope of the disclosed examples.
Virtual sound source
As described above, an MRE (such as experienced via a mixed reality system (e.g., the
The sound source may correspond to a real object and/or a virtual object. For example, a virtual object (e.g.,
In some virtual or mixed reality environments, when users are presented with audio signals, such as described above, they may experience the difficulty of quickly and accurately identifying the source of the audio signal in the virtual environment — even though identifying the audio source in the real environment is an intuitive natural ability. It is desirable to improve the ability of a user to perceive the position or orientation of a sound source in an MRE so that the experience of the user in a virtual or mixed reality environment more closely resembles that of a user in the real world.
Similarly, some virtual or mixed reality environments suffer from the perception that the environment is not perceived as authentic or trustworthy. One reason for this perception is that audio and visual cues do not always match each other in the virtual environment. For example, if a user is located behind a large brick wall in an MRE, the user may desire that sound coming from behind the brick wall be quieter and more inaudible than sound originating from beside the user. This desire is based on our own auditory experience in the real world, where sound becomes quiet and inaudible when blocked by large dense objects. When a user is presented with an audio signal purportedly originating behind a brick wall but powerful and presented at full volume, the illusion that the user is behind the brick wall-or that the sound source is behind it-is compromised. The entire virtual experience may feel fake and unrealistic, in part because it does not fit our own expectations based on real-world interactions. Further, in some cases, a "terrorist" problem arises in which even subtle differences between virtual and real experiences may cause a feeling of discomfort. It is desirable to improve the user's experience by rendering audio signals in the MRE that appear to actually interact with objects in the user's environment, even in a subtle manner. The more consistent such an audio signal is with our own expectations, the more immersive and engaging the user's MRE experience will be based on the real world experience.
One way that the human brain detects the position and orientation of a sound source is by interpreting the difference between the sounds received by the left and right ears. For example, if an audio signal in a real environment reaches the user's left ear before it reaches the right ear-the human auditory system may determine by, for example, identifying a time delay or phase shift between the left and right ear signals-the brain may discern that the audio signal source is to the left of the user. Similarly, since the effective power of an audio signal generally decreases with distance and can be blocked by the user's own head, the brain can discern that the source is to the left of the user if the audio signal appears louder to the left ear than to the right ear. Similarly, our brain discerns the difference in frequency characteristics between the left and right ear signals may indicate the location of the source or the direction in which the audio signal is traveling.
The above techniques, performed by human subconscious, operate by processing stereo audio signals-in particular, by analyzing differences (e.g., in amplitude, phase, frequency characteristics), if any, between respective audio signals generated by a single sound source and received at the left and right ears. As a human, we naturally rely on these stereo hearing techniques to quickly and accurately identify where in our real environment the sound comes from, and in what direction it travels. We also rely on such stereo technology to better understand the world around us-e.g. whether the sound source is on the other side of a nearby wall and if so, how thick the wall is and what material it is made of.
It would be desirable for MREs to convincingly place virtual sound sources in MREs in such a way that users can quickly locate the virtual sound sources, using the same natural stereo techniques that our brains use in the real world. As such, it may be desirable to use these same techniques to enhance the perception that such virtual sound sources coexist with real and virtual content in the MRE — for example, by presenting stereo audio signals corresponding to those sound sources as if they were represented in the real world. By presenting an audio experience to a user of the MRE that evokes the audio experience of our daily lives, the MRE can enhance our immersion and connectivity when engaged with the MRE.
Fig. 5A and 5B depict a perspective view and a top view, respectively, of an example mixed reality environment 500 (which may correspond to the
The example MRE500 includes a
In some examples, the
In some examples, the
In one example of the MRE500, the
The MRE may also include a representation of one or more listener coordinates, each listener coordinate corresponding to a location ("listener") in a coordinate system where the virtual audio signal may be perceived. In some examples, the MRE may also include a representation of one or more listener vectors, representing the orientation of the listener (e.g., for use in determining audio signals that may be affected by the direction in which the listener is facing). In MRE, the listener coordinates may correspond to the actual location of the user's ear, which may be determined using SLAM, visual odometer, and/or with the aid of an IMU (e.g., IMU 409 described above with respect to fig. 4). In some examples, the MRE may include left and right listener coordinates in a coordinate system of the MRE corresponding to the location of the user's left and right ears, respectively. By determining a vector of virtual audio signals from the virtual sound sources to the listener coordinates, a real audio signal can be determined that corresponds to how a human listener having an ear at the coordinates will perceive the virtual audio signals.
In some examples, the virtual audio signal includes base sound data (e.g., a computer file representing an audio waveform) and one or more parameters that may be applied to the base sound data. Such parameters may correspond to attenuation (e.g., volume reduction) of the underlying sound; filtering of the base sound (e.g., a low pass filter); time delay (e.g., phase shift) of the base sound; reverberation parameters for applying artificial reverberation and echo effects; voltage Controlled Oscillator (VCO) parameters for applying time-based modulation effects; pitch modulation of the base sound (e.g., to simulate the doppler effect); or other suitable parameters. In some examples, these parameters may be a function of the relationship of the listener coordinates to the virtual audio source. For example, the parameter may define the attenuation of the real audio signal as a decreasing function of the distance from the listener coordinates to the location of the virtual audio source-i.e. the gain of the audio signal decreases with increasing distance from the listener to the virtual audio source. As another example, the parameters may define a low pass filter applied to the virtual audio signal as a function of the distance of the listener coordinates (and/or the angle of the listener vector) to the propagation vector of the virtual audio signal; for example, listeners further away from the virtual audio signal may perceive less high frequency power in the signal than listeners closer to the signal would perceive. As another example, the parameter may define an applied time delay (e.g., phase shift) based on a distance between the listener coordinates and the virtual audio source. In some examples, the processing of the virtual audio signal may be calculated using DSP audio space acoustics 422 of fig. 4, which DSP audio space acoustics 422 may render the audio signal based on the position and orientation of the user's head using HRTFs.
The virtual audio signal parameters may be influenced by virtual or real objects-the sound blanker-the virtual audio signal passes through the sound blanker on its way to the listener coordinates. (as used herein, a virtual or real object includes any suitable representation of a virtual or real object in an MRE.) for example, if a virtual audio signal and a virtual wall in an MRE intersect (e.g., are blocked by) the MRE may apply attenuation to the virtual audio signal (causing the signal to appear quieter to a listener). The MRE may also apply a low pass filter to the virtual audio signal, causing the signal to appear more inaudible as the high frequency content goes down. These effects are consistent with our expectation of hearing sounds from behind the wall: the nature of the wall in a real environment makes sound from the other side of the wall quieter and has less high frequency content because the wall blocks sound waves originating from the opposite side of the wall from reaching the listener. Applying such parameters to the audio signal may be based on the characteristics of the virtual wall: for example, a virtual wall that is thicker or corresponds to a denser material may result in a greater degree of attenuation or low pass filtering than a virtual wall that is thinner or corresponds to a less dense material. In some cases, the virtual object may apply a phase shift or an additive effect to the virtual audio signal. The effect that the virtual object has on the virtual audio signal can be determined by physical modeling of the virtual object-for example, if the virtual object corresponds to a particular material (e.g., brick, aluminum, water), the effect can be applied based on known transmission characteristics of the audio signal if the material is present in the real world.
In some examples, the virtual objects that intersect the virtual audio signal may correspond to real objects (e.g., such as
In some examples, the virtual audio signal may intersect a real object that does not have a corresponding virtual object. For example, characteristics (e.g., position, orientation, size, material) of the real object may be determined by sensors (such as attached to the wearable headpiece 510), which may be used to process the virtual audio signal, such as described above with respect to the virtual object blocker.
Stereo effect
As described above, by determining the vector of the virtual audio signal from the virtual sound source to the listener coordinates, it is possible to determine the real audio signal corresponding to how a human listener having an ear at the listener coordinates will perceive the virtual audio signal. In some examples, left and right stereo listener coordinates (corresponding to the left and right ears) may be used instead of just a single listener coordinate, which allows the impact of the real object on the audio signal-e.g., attenuation or filtering based on the interaction of the audio signal with the real object-to be determined separately for each ear. This may enhance the realism of the virtual environment by mimicking the real world stereo audio experience, where receiving different audio signals in each ear may help us understand the sound in our surroundings. Such effects may be particularly pronounced in the case where the left and right ears experience differently affected audio signals, where real objects are very close to the user. For example, if the
A desired stereo auditory effect may be simulated by determining two such vectors-one for each ear-and identifying a unique virtual audio signal for each ear, such as described above. Each of these two unique virtual audio signals may then be converted to a real audio signal and presented to the respective ear via the speaker associated with that ear. The user's brain will process those real audio signals in the same way that it would process normal stereo audio signals in the real world, as described above.
This is illustrated by the example MRE500 in fig. 5A and 5B. The MRE500 includes a
In examples where the
In some examples where
The wall 540 (whether real or virtual) may be considered an acoustic obstruction, as described above. As seen in the top view shown in fig. 5B, two vectors 532 and 534 may represent respective paths of a
The relative importance of these stereo differences may depend on the difference in the frequency spectrum of the signal in question. For example, the phase shift may be more useful for locating high frequency signals than for locating low frequency signals (i.e., signals having a wavelength of about the width of the listener's head). With such low frequency signals, the difference in arrival time between the left and right ear may be more useful for locating these signal sources.
In some examples not shown in fig. 5A-5B, objects (whether real or virtual) such as
An advantage of the MRE500 over some environments, such as video games presented by conventional display monitors and room speakers, is that the actual position of the user's ear in the MRE500 can be determined. As described above with respect to fig. 4, the
By presenting unique and separately determined left and right audio signals via
Asymmetric occlusion effects such as those described above may be particularly significant in the following cases: wherein a real or virtual object (such as wall 540) is physically close to the user's face; or where a real or virtual object occludes one ear, but not the other (such as when the center of the user's face is aligned with the edge of
In some examples, each of the left and right audio signals may not be determined independently, but may be based on another or common audio source. For example, where a single audio source generates both a left audio signal and a right audio signal, the left and right audio signals may be considered to be not completely independent, but rather acoustically related to each other via the single audio source.
Fig. 6 shows an example process 600 for presenting left and right audio signals to a user of an MRE, such as
At stage 605 of process 600, respective locations (e.g., listener coordinates and/or vectors) of a first ear (e.g., the user's left ear 502) and a second ear (e.g., the user's right ear 504) are determined. These locations may be determined using sensors of the
At stage 610, a first virtual sound source may be defined that may correspond to virtual
At stage 620A, a first virtual audio signal may be identified, which may correspond to
At stage 630A, real or virtual objects (one of which may, for example, correspond to a wall 540) intersected by the first virtual audio signal are identified. For example, a trace may be computed along a vector from a first sound source to a first virtual listener in the MRE500, and a real or virtual object that intersects the trace may be identified (along with parameters of the intersection, such as the location and vector at which the real or virtual object intersects, in some examples). In some cases, there may be no such real or virtual object. Similarly, at stage 630B, real or virtual objects are identified which are intersected by the second virtual audio signal. Furthermore, in some cases, there may be no such real or virtual object.
In some examples, the real object identified at stage 630A or stage 630B may be identified using a depth camera or other sensor associated with the
At stage 640A, each real or virtual object identified at stage 630A is processed to identify any signal modification parameters associated with that real or virtual object at stage 650A. For example, as described above, such signal modification parameters may include functions for determining attenuation, filtering, phase shift, time-based effects (e.g., delay, reverberation, modulation) and/or other effects to be applied to the first virtual audio signal. As described above, these parameters may depend on other parameters associated with the real or virtual object, such as the size, shape, or material of the real or virtual object. At stage 660A, those signal modification parameters are applied to the first virtual audio signal. For example, if the signal modification parameters specify that the first virtual audio signal should be attenuated by a coefficient that increases linearly with the distance between the listener coordinates and the audio source, then the coefficient may be calculated at stage 660A (i.e., by calculating the distance between the first ear and the first virtual sound source in the MRE 500); and is applied to the first virtual audio signal (i.e., by multiplying the amplitude of the signal by the resulting gain factor). In some examples, the signal modification parameters may be determined or applied using DSP audio space acoustics 422 of fig. 4, which DSP audio space acoustics 422 may utilize HRTFs to modify the audio signal based on the position and orientation of the user's head, such as described above. Once all of the real or virtual objects identified at stage 630A have been applied at stage 660A, the processed first virtual audio signal (e.g., the signal modification parameters representing all of the identified real or virtual objects) is output by stage 640A. Similarly, at stage 640B, each real or virtual object identified at stage 630B is processed to identify signal modification parameters (stage 650B), and those signal modification parameters are applied to the second virtual audio signal (stage 660B). Once all real or virtual objects identified at stage 630B have been applied at stage 660B, the processed first virtual audio signal (e.g., signal modification parameters representing all identified real or virtual objects) is output by stage 640B.
At stage 670A, the processed first virtual audio signal output from stage 640A may be used to determine a first audio signal (e.g., a left channel audio signal) that may be presented to a first ear. For example, at stage 670A, a first virtual audio signal may be mixed with other left channel audio signals (e.g., other virtual audio signals, music, or dialog). In some examples, stage 670A may perform little or no processing to determine the first audio signal from the processed first virtual audio signal, such as in a simple mixed reality environment with no other sounds. Stage 670A may include any suitable stereo mixing technique. Similarly, at stage 680A, the processed second virtual audio signal output from stage 640B may be used to determine a second audio signal (e.g., a right channel audio signal) that may be presented to a second ear.
At stage 680A and stage 680B, the audio signals output by stages 670A and 670B, respectively, are presented to the first ear and the second ear, respectively. For example, left and right stereo signals may be converted to left and right analog signals (e.g., via the sum DSP audio space acoustics 422 of fig. 4) that are amplified and presented to left and
Fig. 7 illustrates a functional block diagram of an example augmented reality processing system 700 that can be used to implement one or more of the examples described above. The example system 700 may be implemented in a mixed reality system, such as the
With a high degree of realism, the example augmented reality processing system 700 may integrate virtual 3D content 704 into the real world. For example, the audio associated with the local virtual sound source may be located at a distance from the user and at a position where it would be partially blocked by a real object if the audio were a real audio signal. However, in the example system 700, audio may be output by the left and
In the example system 700, the user coordinate determination subsystem 708 may be suitably physically disposed in the
A spatially distinct real occluding object sensor subsystem 712 ("occluding subsystem") is included in the example augmented reality processing system 700. Occlusion subsystem 712 may include, for example, depth camera 444; a non-depth camera (not shown in FIG. 7); SONAR (SONAR) sensors (not shown in fig. 7); and/or a light detection and ranging (LIDAR) sensor (not shown in fig. 7). The occlusion subsystem 712 may have a spatial resolution sufficient to distinguish between obstructions that affect the virtual propagation paths corresponding to the left and right listener positions. For example, if the user of the
In the example shown in fig. 7, the occlusion subsystem 712 is coupled to a per-channel (i.e., left and right audio channels) intersection and occlusion range calculator (herein, "occlusion calculator") 714. In the example, user coordinate determination system 708 and game engine 702 are also coupled to occlusion calculator 714. The occlusion calculator 714 may receive coordinates of the virtual audio source from the game engine 702, user coordinates from the user coordinate determination system 708, and information indicative of coordinates (e.g., angular coordinates optionally including distance) of occlusions from the occlusion subsystem 712. By applying the geometry, the blockage calculator 714 can determine whether there is a blocked or unblocked line of sight from each virtual audio source to each of the left and right listener positions. Although shown as a separate block in fig. 7, the blocking calculator 714 may be integrated with the game engine 702. In some examples, based on information from the user coordinate determination system 708, the occlusion may be initially sensed by an occlusion subsystem 712 in a user-centric coordinate system, wherein coordinates of the occlusion are transformed to the ambient coordinate
In some examples, the local virtual sound source 706 may comprise a mono audio signal or left and right spatialized audio signals. Such left and right spatialized audio signals may be determined by applying left and right Head Related Transfer Functions (HRTFs), which may be selected based on the coordinates of the local virtual sound source relative to the user. In example 700, game engine 702 is coupled to user coordinate determination system 708 and receives coordinates (e.g., location and orientation) of a user from user coordinate determination system 708. Game engine 702 may itself determine the coordinates of the virtual sound source (e.g., in response to user input) and, from receiving the user coordinates, may determine the coordinates of the sound source relative to the user through geometry.
In the example shown in fig. 7, the blocking calculator 714 is coupled to a filter activation and control 716. In some examples, filter activation and control 716 is coupled to left control input 718A of left filter bypass switch 718 and to right control input 720A of right filter bypass switch 720. In some examples, as in the case of other components of the example system 700, the bypass switches 718, 720 may be implemented in software. In the example shown, left filter bypass switch 718 receives the left channel of the spatialized audio from game engine 702, and right filter bypass switch 720 receives the right spatialized audio from game engine 704. In some examples where game engine 702 outputs a mono audio signal, both bypass switches 718, 720 may receive the same mono audio signal.
In the example shown in fig. 7, the first output 718B of the left bypass switch 718 is coupled to a left digital-to-analog converter ("left D/a") 724 through a left blocking filter 722, and the second output 718C of the left bypass switch 718 is coupled to the left D/a 724 (bypassing the left blocking filter 722). Similarly, in the example, the first output 720B of the right bypass switch 720 is coupled to a right digital-to-analog converter ("right D/a") 728 through a right blocking filter 726, and the second output 720C is coupled to the right D/a728 (bypassing the right blocking filter 726).
In the example shown in fig. 7, a set of filter configurations 730 may be used to configure the left blocking filter 722 and/or the right blocking filter based on the output of the per-channel intersection and blocking range calculator 722 (e.g., via filter activation and control 716). In some examples, instead of providing bypass switches 718, 720, a non-filtering pass-through configuration of blocking filters 722, 726 may be used. The blocking filters 722, 726 may be time domain or frequency domain filters. In an example where the filter is a time domain filter, each filter configuration may comprise a set of tap coefficients; in examples where the filter is a frequency domain filter, each filter configuration may include a set of band weights. In some examples, instead of a set number of predetermined filter configurations, filter activation and control 716 may be configured (e.g., programmatically) to define a filter having some level of attenuation depending on the size of the occlusion. Filter activation and control 716 may select or define a filter configuration (e.g., a configuration that is more attenuated for greater blocking) and/or may select or define a filter that attenuates higher frequency bands (e.g., for greater blocking, attenuated to a greater degree so as to simulate the effects of true blocking).
In the example shown in fig. 7, the filter activation and control 716 is coupled to a control input 722A of the left blocking filter 722 and a control input 726A of the right blocking filter 726. Based on the output from the per-channel intersection and blocking range calculator 714, the filter activation and control 716 may separately configure the left blocking filter 722 and the right blocking filter 726 using the selected configuration from the filter configuration 730.
In the example shown in FIG. 7, the left D/A724 is coupled to an input 732A of a left audio amplifier 732 and the right D/A728 is coupled to an input 734A of a right audio amplifier 734. In an example, the output 732B of the left audio amplifier 732 is coupled to left
It should be noted that the elements of the example functional block diagram shown in fig. 7 may be arranged in any suitable order-not necessarily the order shown. Further, some elements shown in the example in fig. 7 (e.g., bypass switches 718, 720) may be omitted as appropriate. The present disclosure is not limited to any particular order or arrangement of functional components shown in the examples.
Some examples of the disclosure relate to a method of rendering an audio signal in a mixed reality environment, the method comprising: identifying a first-ear listener position in a mixed reality environment; identifying a second-ear listener location in the mixed reality environment; identifying a first virtual sound source in a mixed reality environment; identifying a first object in a mixed reality environment; determining a first audio signal in a mixed reality environment, wherein the first audio signal originates from a first virtual sound source and intersects a first ear listener location; determining a second audio signal in the mixed reality system, wherein the second audio signal originates from the first virtual sound source, intersects the first object, and intersects the second-ear listener position; determining a third audio signal based on the second audio signal and the first object; presenting a first audio signal to a first ear of a user via a first speaker; and presenting the third audio signal to a second ear of the user via a second speaker. Additionally or alternatively to one or more of the examples disclosed above, in some examples determining the third audio signal from the second audio signal includes applying a low pass filter to the second audio signal, the low pass filter having parameters based on the first virtual object. Additionally or alternatively to one or more of the examples disclosed above, in some examples determining the third audio signal from the second audio signal includes applying an attenuation to the second audio signal, the intensity of the attenuation based on the first object. Additionally or alternatively to one or more of the examples disclosed above, in some examples identifying the first object includes identifying a real object. Additionally or alternatively to one or more of the examples disclosed above, in some examples identifying the real object includes determining a position of the real object relative to the user in the mixed reality environment using a sensor. Additionally or alternatively to one or more of the examples disclosed above, in some examples the sensor includes a depth camera. Additionally or alternatively to one or more of the examples disclosed above, in some examples the method further comprises generating helper data corresponding to the real object. Additionally or alternatively to one or more of the examples disclosed above, in some examples the method further comprises generating a virtual object corresponding to the real object. Additionally or alternatively to one or more of the examples disclosed above, in some examples the method further comprises: a second virtual object is identified, wherein the first audio signal intersects the second virtual object and the fourth audio signal is determined based on the second virtual object.
Some examples of the disclosure relate to a system comprising: a wearable head apparatus, comprising: a display for displaying the mixed reality environment to a user, the display including a transmissive eyepiece through which the real environment is visible; a first speaker configured to present an audio signal to a first ear of a user; and a second speaker configured to present the audio signal to a second ear of the user; and one or more processors configured to perform: identifying a first-ear listener position in a mixed reality environment; identifying a second-ear listener location in the mixed reality environment; identifying a first virtual sound source in a mixed reality environment; identifying a first object in a mixed reality environment; determining a first audio signal in a mixed reality environment, wherein the first audio signal originates from a first virtual sound source and intersects a first ear listener location; determining a second audio signal in the mixed reality system, wherein the second audio signal originates from the first virtual sound source, intersects the first object, and intersects the second-ear listener position; determining a third audio signal based on the second audio signal and the first object; presenting a first audio signal to a first ear via a first speaker; and presenting the third audio signal to the second ear via the second speaker. Additionally or alternatively to one or more of the examples disclosed above, in some examples determining the third audio signal from the second audio signal includes applying a low pass filter to the second audio signal, the low pass filter having parameters based on the first object. Additionally or alternatively to one or more of the examples disclosed above, in some examples determining the third audio signal from the second audio signal includes applying an attenuation to the second audio signal, the intensity of the attenuation based on the first object. Additionally or alternatively to one or more of the examples disclosed above, in some examples identifying the first object includes identifying a real object. Additionally or alternatively to one or more of the examples disclosed above, in some examples the wearable head device further includes a sensor, and identifying the real object includes using the sensor to determine a location of the real object relative to the user in the mixed reality environment. Additionally or alternatively to one or more of the examples disclosed above, in some examples the sensor includes a depth camera. Additionally or alternatively to one or more of the examples disclosed above, in some examples, the one or more processors are further configured to perform generating helper data corresponding to the real object. Additionally or alternatively to one or more of the examples disclosed above, in some examples, the one or more processors are further configured to perform generating a virtual object corresponding to the real object. Additionally or alternatively to one or more of the examples disclosed above, in some examples, the one or more processors are further configured to perform identifying a second virtual object, wherein the first audio signal intersects the second virtual object and the fourth audio signal is determined based on the second virtual object.
Although the disclosed examples have been fully described with reference to the accompanying drawings, it is to be noted that various changes and modifications will become apparent to those skilled in the art. For example, elements of one or more implementations may be combined, deleted, modified, or supplemented to form further implementations. Such changes and modifications are to be understood as being included within the scope of the disclosed examples as defined by the appended claims.
- 上一篇:一种医用注射器针头装配设备
- 下一篇:用于提高数据传输安全性的方法