System and method for directing speaker and microphone arrays using coded light
阅读说明:本技术 利用编码光线引导扬声器阵列和麦克风阵列的系统和方法 (System and method for directing speaker and microphone arrays using coded light ) 是由 刘琼 D·G·金贝尔 马尚 于 2019-06-27 设计创作,主要内容包括:利用编码光线引导扬声器阵列和麦克风阵列的系统和方法。描述了基于附接至TV观看者、嘈杂环境工作者或AR/VR系统用户的微型光传感器的方向输出来引导扬声器阵列或麦克风阵列的方法。将光投射器设置在天花板上,针对每个像素发出不同的顺序开/关信号。将两个光传感器附接至每个扬声器阵列或麦克风阵列,并将一个或更多个光传感器附接至每个用户。因为每个投射器像素对应于特定方向,所以在光传感器接收到投射器的顺序信号时,光传感器可以确定其对应于投射器的方向并将其报告给中心站。在已知扬声器/麦克风阵列方向和用户方向时,系统可针对不同扬声器信号生成适当相移,并为每个人生成定向声音。中心站可以确定用于组合来自不同麦克风的音频的相移。(Systems and methods for directing speaker arrays and microphone arrays using coded light. Methods of directing a speaker array or microphone array based on directional output of a miniature light sensor attached to a TV viewer, a noisy environment worker, or an AR/VR system user are described. The light projector is arranged on the ceiling, emitting a different sequential on/off signal for each pixel. Two light sensors are attached to each speaker array or microphone array and one or more light sensors are attached to each user. Because each projector pixel corresponds to a particular direction, when the light sensor receives sequential signals of a projector, the light sensor can determine and report to the central station the direction it corresponds to the projector. When the speaker/microphone array direction and the user direction are known, the system can generate appropriate phase shifts for the different speaker signals and generate directional sound for each person. The central station may determine phase shifts for combining audio from different microphones.)
1. A system for directing a speaker array and a microphone array using coded light, the system comprising:
a. a projector configured to project a temporal projector light signal, wherein the temporal projector light signal is encoded for each pixel of the projector with a piece of information comprising pixel coordinates of each pixel of the projector; and
b. at least one light sensor operatively coupled to a computer, wherein the light sensor is configured to detect the time projector light signal and generate a sensor signal, and wherein the computer is configured to receive the sensor signal from the light sensor, calculate directional information based on the detected time projector light signal, and direct a plurality of microphones or a plurality of speakers based on the calculated directional information.
2. The system of claim 1, wherein the computer is further configured to periodically recalculate the directional information based on the detected time projector light signal.
3. The system of claim 1, wherein the calculated directional information includes a direction of the user relative to the plurality of microphones.
4. The system of claim 1, wherein the calculated directional information includes a direction of the user relative to the plurality of speakers.
5. The system of claim 1, wherein the calculated directional information comprises a direction of a user's head relative to a plurality of video monitors displaying a plurality of video streams, wherein the computer is configured to direct an audio stream corresponding to a video stream displayed on a first video monitor of the plurality of video streams toward the user with the plurality of speakers when the user's head is determined to be toward the first video monitor of the plurality of video monitors.
6. The system of claim 1, wherein the calculated directional information comprises a direction toward each user in the room, wherein the computer is configured to direct an audio stream with specific parameters for each user toward each user with the plurality of speakers.
7. The system of claim 6, wherein the particular parameter comprises sound level.
8. The system of claim 6, wherein the at least one light sensor is disposed on each user.
9. The system of claim 1, wherein the calculated directional information comprises a direction toward a user in a room, wherein the computer is configured to direct a plurality of microphones toward the user.
10. The system of claim 1, wherein the plurality of microphones or the plurality of speakers are directed with a phase shift.
11. A method of directing a speaker array and a microphone array using coded light, the method comprising the steps of:
a. projecting a temporal projector light signal with a projector, wherein the temporal projector light signal is encoded for each pixel of the projector with a piece of information comprising pixel coordinates of each pixel of the projector;
b. detecting the time projector light signal with a light sensor operatively coupled to a computer and generating a corresponding sensor signal; and
c. receiving the sensor signals with the computer, calculating directional information based on the detected time projector light signals, and directing a plurality of microphones or a plurality of speakers based on the calculated directional information.
12. The method of claim 11, wherein the computer is further configured to periodically recalculate the directional information based on the detected time projector light signal.
13. The method of claim 11, wherein the calculated directional information includes a direction of the user relative to the plurality of microphones.
14. The method of claim 11, wherein the calculated directional information includes a direction of a user relative to the plurality of speakers.
15. The method of claim 11, wherein the calculated directional information comprises a direction of a user's head relative to a plurality of video monitors displaying a plurality of video streams, wherein the computer is configured to direct an audio stream corresponding to a video stream displayed on a first video monitor of the plurality of video streams toward the user with the plurality of speakers when the user's head is determined to be toward the first video monitor of the plurality of video monitors.
16. The method of claim 11, wherein the calculated directional information comprises a direction toward each user in a room, wherein the computer is configured to direct an audio stream with specific parameters for each user toward each user with the plurality of speakers.
17. The method of claim 16, wherein the particular parameter comprises sound level.
18. The method of claim 16, wherein the at least one light sensor is disposed on each user.
19. The method of claim 11, wherein the calculated directional information comprises a direction toward a user in a room, wherein the computer is configured to direct a plurality of microphones toward the user.
20. The method of claim 11, wherein the plurality of microphones or the plurality of speakers are directed with a phase shift.
21. A tangible computer readable medium containing a set of instructions to implement a method comprising the steps of:
a. projecting a temporal projector light signal with a projector, wherein the temporal projector light signal is encoded for each pixel of the projector with a piece of information comprising pixel coordinates of each pixel of the projector;
b. detecting the time projector light signal with a light sensor operatively coupled to a computer and generating a corresponding sensor signal; and
c. receiving the sensor signals with the computer, calculating directional information based on the detected time projector light signals, and directing a plurality of microphones or a plurality of speakers based on the calculated directional information.
Technical Field
The disclosed embodiments relate generally to acoustic systems and, more particularly, to systems and methods for directing speaker arrays and microphone arrays with encoded light.
Background
Conventional headphones or earplugs are still not very convenient when TV viewers want to enjoy programs without disturbing other family members or when different TV viewers like different levels of sound. In ubiquitous high quality augmented reality or virtual reality (AR/VR) environments, or high noise environments with multiple participants, or high end teleconferencing environments with multiple presenters and multiple video feeds, conventional sound capture methods are not sufficient to capture the desired clear sound. Similarly, conventional sound generation methods are also insufficient to produce clear directional sound, as experienced by humans in a real-world environment. In an AR/VR environment, the AR/VR participant may experience confusion or discomfort if the orientation of the headphones does not match the correct sound effects well.
Accordingly, in view of the above-mentioned and other shortcomings of conventional acoustic techniques, there is a need for new and improved systems and methods for directing speaker arrays or microphone arrays such that the systems can obtain better sound from all participants and provide a better acoustic experience for everyone by producing more accurate directional sound effects.
Disclosure of Invention
Embodiments described herein are directed to systems and methods that substantially obviate one or more of the above and other problems associated with conventional acoustic systems and methods.
According to an aspect of embodiments described herein, there is provided a system comprising: a projector configured to project a temporal projector light signal, wherein the temporal projector light signal is encoded for each pixel of the projector with a piece of information comprising pixel coordinates of each pixel of the projector; and at least one light sensor operatively coupled to the computer, wherein the light sensor is configured to detect a time projector light signal and generate a sensor signal, and wherein the computer is configured to receive the sensor signal from the light sensor, calculate directional information based on the detected time projector light signal, and direct the plurality of microphones or the plurality of speakers based on the calculated directional information.
In one or more embodiments, the computer is further configured to periodically recalculate the directional information based on the detected time projector light signal.
In one or more embodiments, the calculated directional information includes a user direction relative to the plurality of microphones.
In one or more embodiments, the calculated directional information includes a user direction relative to a plurality of speakers.
In one or more embodiments, the calculated direction information includes a user head direction relative to a plurality of video monitors displaying the plurality of video streams, wherein, when the user head is determined to be facing a first video monitor of the plurality of video monitors, the computer is configured to direct an audio stream corresponding to a video stream displayed on the first video monitor of the plurality of video streams toward the user using the plurality of speakers.
In one or more embodiments, the calculated directional information includes a direction towards each user in the room, wherein the computer is configured to direct an audio stream having specific parameters for each user towards each user with a plurality of speakers.
In one or more embodiments, the specific parameter includes sound level.
In one or more embodiments, at least one light sensor is disposed on each user.
In one or more embodiments, the calculated directional information comprises a direction towards a user in the room, wherein the computer is configured to direct the plurality of microphones towards the user.
In one or more embodiments, multiple microphones or multiple speakers are directed with phase shifts.
According to another aspect of embodiments described herein, there is provided a method comprising the steps of: projecting a temporal projector light signal with a projector, wherein the temporal projector light signal is encoded for each pixel of the projector with a piece of information comprising pixel coordinates of each pixel of the projector; detecting a time projector light signal with a light sensor operatively coupled to a computer and generating a corresponding sensor signal; and receiving the sensor signal with a computer, calculating directional information based on the detected time projector light signal, and directing a plurality of microphones or a plurality of speakers based on the calculated directional information.
In one or more embodiments, the computer is further configured to periodically recalculate the directional information based on the detected time projector light signal.
In one or more embodiments, the calculated directional information includes a user direction relative to the plurality of microphones.
In one or more embodiments, the calculated directional information includes a user direction relative to a plurality of speakers.
In one or more embodiments, the calculated direction information includes a user head direction relative to a plurality of video monitors displaying the plurality of video streams, wherein, when the user head is determined to be facing a first video monitor of the plurality of video monitors, the computer is configured to direct an audio stream corresponding to a video stream displayed on the first video monitor of the plurality of video streams toward the user using the plurality of speakers.
In one or more embodiments, the calculated directional information includes a direction towards each user in the room, wherein the computer is configured to utilize a plurality of speakers to direct an audio stream having specific parameters for each user towards each user.
In one or more embodiments, the specific parameter includes sound level.
In one or more embodiments, at least one light sensor is disposed on each user.
In one or more embodiments, the calculated directional information comprises a direction towards a user in the room, wherein the computer is configured to direct the plurality of microphones towards the user.
In one or more embodiments, multiple microphones or multiple speakers are directed with phase shifts.
According to yet another aspect of embodiments described herein, there is provided a tangible computer readable medium containing a set of instructions to implement a method comprising the steps of: projecting a temporal projector light signal with a projector, wherein the temporal projector light signal is encoded for each pixel of the projector with a piece of information comprising pixel coordinates of each pixel of the projector; detecting a time projector light signal with a light sensor operatively coupled to a computer and generating a corresponding sensor signal; and receiving the sensor signal with a computer, calculating directional information based on the detected time projector light signal, and directing a plurality of microphones or a plurality of speakers based on the calculated directional information.
Additional aspects related to the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention. The various aspects of the invention will be realized and attained by means of the elements and combinations of various elements and aspects particularly pointed out in the following detailed description and appended claims.
It is to be understood that both the foregoing and the following description are exemplary and explanatory only and are not intended to limit the claimed invention or its application in any way.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain and explain the principles of the technology. Specifically, the method comprises the following steps:
FIG. 1 illustrates an exemplary embodiment of a control center having a plurality of video monitors displaying a plurality of video feeds and a plurality of human participants.
Fig. 2(a) and 2(b) illustrate two exemplary time-coded light signals generated by a projector.
Fig. 3 illustrates an exemplary embodiment of a speaker/microphone array system.
Fig. 4 illustrates that when the distance between speakers or microphones is much smaller than the distance between the corresponding speaker array and microphone array, it can be considered that sound comes in and out in the same direction without an approximation error being large.
FIG. 5 illustrates an exemplary embodiment of a computer system that may be used to implement the inventive techniques described herein.
Detailed Description
In the following detailed description, reference is made to the accompanying drawings in which like functional elements are referred to by like reference numerals. The above-described figures illustrate by way of example, and not by way of limitation, specific embodiments and implementations consistent with the principles of the invention. These implementations are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other implementations may be utilized and that structural changes and/or substitutions of various elements may be made without departing from the scope and spirit of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense. In addition, various embodiments of the invention as described may be implemented in the form of software running on a general purpose computer, in the form of dedicated hardware, or a combination of software and hardware.
According to one aspect of embodiments described herein, an inventive technique is provided for directing a speaker array or microphone array based on directional output of a miniature light sensor attached to a TV viewer, a noisy environment worker, or an AR/VR system user. More specifically, in one embodiment, the projector is mounted on the ceiling of a room and is configured to issue a different sequential on/off signal for each pixel. There are also two light sensors attached to each speaker array or microphone array, and one or more light sensors attached to each user. Because each projector pixel corresponds to a particular direction, when the light sensor receives sequential signals from the projector, the light sensor can determine its direction corresponding to the projector and report the direction information to the central station. With the speaker/microphone array direction and user direction known, the system can impart appropriate phase shifts to the different speaker signals and generate directional sound for each person. Similarly, the central station may determine phase shifts for combining audio outputs from different microphones. Unlike microphone array based sound source localization, the method does not have blind source separation problems and is more reliable and accurate. These characteristics are suitable for capturing high quality audio signals based on microphone arrays. Similarly, it is also suitable for generating high quality directional loudspeaker signals for TV viewers, or AR/VR headset users and other similar applications. With this arrangement, the bandwidth required for the AR/VR headset is much less than for transmitting and receiving high quality audio signals. This is important when there are many headsets in the environment. Furthermore, the described techniques may increase speaker/microphone array steering speed. It also reduces the weight and power consumption of the AR/VR headset by eliminating the microphone, speaker and associated circuitry (the power used by the light sensor is much less than that of the close-talking microphone and earbud). In addition, by removing the microphone and earplugs from the headset, the user may feel more natural than wearing a close-talking microphone or earplugs.
Fig. 1 illustrates an exemplary embodiment of a control center having a plurality of
In this case, uniformly amplifying all video feeds can make the control center very noisy and disrupt the experience of all participants. If face detection is used to direct multiple speaker arrays (as described, for example, in h.mizoguchi; y.tamai; k.shinoda; s.kagami; k.nagashima, invitible messenger: visual sound dimensional beam forming system based on surface tracking and microphone array, eeiros2004, Sendai, Japan), crosstalk in the video feed may be reduced to some extent. However, local participants may still encounter noisy conditions because they hear all of the video feeds regardless of the direction they are facing. Needless to say, this is not a natural experience for the local participants. On the other hand, if the state-of-the-art Microphone Array system described in t.f. bergh, Speaker Tracking Based on Face Detection and Voice activity Detection Using a Microphone Array (IEEE IPIN (interior Positioning and interior navigation)2014, Busan, Korea) is used to locate participants' positions, these local participants must substantially raise their voices in order to perform good sound source position Detection. Obtaining the "attention" of a microphone array is a very difficult task because the local participants must compete with the loud speakers of all video feeds.
As will be clear to one of ordinary skill in the art, the human voice may be picked up with a close-talking (close-talking) microphone. Clear sound can be sent to each person through the individual earplugs. However, Bluetooth (Bluetooth) microphones and headsets with too much high quality are expensive, and audio quality may degrade due to network traffic. Prolonged use of the earplug may also damage the hearing of the local participant. An alternative to personal microphones or ear buds in public places are software bootable speaker arrays and microphone arrays. With software-directable speaker and microphone arrays and appropriate methods of attendee detection, the system can isolate and amplify sound using beamforming techniques so that its performance can be closer to the desired mode of operation described in the preceding paragraph.
Conventional microphone array approaches use a microphone array to detect sound sources, perform blind sound source separation, and direct the microphone array to certain sound sources for better speech capture. With this method, there is no simple method to determine all sound sources when the number of sound sources is greater than the number of microphones. In high noise environments, determining the direction of the sound source is also cumbersome and unreliable. These unreliable source estimates may degrade the beamforming results. In addition, a microphone array based sound source localization and beamforming algorithm must use past audio signals to estimate the current beamforming direction. This method is not suitable for directing a microphone array to follow a moving sound source. Furthermore, for speaker array applications, no user sound is generated for detecting the exact location of the user.
To overcome the above and other problems, embodiments of the system deploy an Infrared (IR) light projector mounted on the ceiling of a room, along with a plurality of miniature light sensors attached to each speaker array, microphone array, and user. With this arrangement, the entire space is partitioned based on solid angles corresponding to different projector pixels. Each projector pixel is modulated with a unique digital sequence code.
Fig. 2(a) and 2(b) illustrate two exemplary time-coded light signals 201 and 205 generated by a
As will be appreciated by those of ordinary skill in the art, because the correspondence between the code embedded in the light and the solid angle is predefined, the system can easily determine its direction toward the light source using the code received by the light sensor. In this way it is possible to determine the position of the user and the direction of the loudspeaker array or microphone array. The user/speaker array/microphone array relationship may then be used to guide the speaker array and microphone array. Since with current technology the light sensor can be as small as 0.1mm x 0.1mm, a small light sensor is easier to carry than a close-talking microphone and earbud. In addition, since the frequency of the user position change is much lower than the audio frequency, the bluetooth/WIFI bandwidth consumption for position data transmission is much lower than that for transmitting high-quality audio signals. This may save more power and bandwidth of the wearable sensor.
In another example, one wants to watch TV at home without disturbing other family members. With this arrangement, a light sensor located on the user can decode the directional signal it receives and let the system use this information to direct the speaker array to obtain high quality sound without affecting other family members or the user's own hearing (hearing ability). For family members with different hearing abilities, different volumes may also be provided to persons wearing different light sensors. Since the speaker array can be directed according to the direction of the light sensor, the TV user or family member can move freely in front of the TV and still receive personalized volume. Since the light sensor can track with 70Hz or even higher frequency, the movement of the user does not affect the personalized audio beamforming.
Fig. 3 illustrates an exemplary embodiment of a speaker/microphone array system. When the distance between the speaking
this distance s may be used to calculate a corresponding phase shift, as is well known to those of ordinary skill in the art. In one or more embodiments, the parameters may be used for microphone array beamforming or speaker array beamforming in the manner described above.
Exemplary embodiment of a computer System
FIG. 5 illustrates an exemplary embodiment of a
The
In one or more embodiments,
In one or more embodiments, the
In one or more embodiments, the
In one or more embodiments, the
In one or more embodiments, the functions described herein are implemented by the
The term "computer-readable medium" as used herein refers to any medium that participates in providing instructions to
Common forms of non-transitory computer-readable media include, for example: a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH-EPROM, a FLASH drive, a memory card, any other memory chip or cartridge, or any other medium from which a computer can read. Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to
In one or more embodiments, the
1. operating System (OS)513, which may be a mobile operating system, for implementing basic system services and managing various hardware components of
2. The
3. The
Finally, it should be understood that the processes and techniques described herein are not inherently related to any particular apparatus, but may be implemented by any suitable combination of components. Also, various types of general purpose devices may be used in accordance with the teachings described herein. It may also prove advantageous to construct specialized apparatus to perform the method steps described herein. The present invention has been described in relation to particular embodiments, which are intended in all respects to be illustrative rather than restrictive. It will be apparent to those skilled in the art that many different combinations of hardware, software, and firmware will be suitable for practicing the present invention. For example, the described software may be implemented using various programming or scripting languages, such as assembler, C/C + +, Objective-C, perl, shell, PHP, Java, and any now known or later developed programming or scripting language.
In addition, other implementations of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. Various aspects and/or components of the described embodiments may be used in acoustic systems and methods, alone or in any combination. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
- 上一篇:一种医用注射器针头装配设备
- 下一篇:信号自适应噪声过滤器