Sound source positioning method and device

文档序号:1214334 发布日期:2020-09-04 浏览:18次 中文

阅读说明:本技术 声源定位方法和装置 (Sound source positioning method and device ) 是由 刘鲁鹏 占凯 陈宇 耿岭 白二伟 刘颖 元海明 郑勇超 仇璐 于 2019-02-27 设计创作,主要内容包括:本申请实施例公开了声源定位方法和装置。该方法的一具体实施方式包括:对回声消除后的目标音频进行波束形成处理,统计所形成的各个方向波束的高频能量和低频能量;将各个方向的波束表示于同一个圆中;利用预先设置的区域波束数量和区域间隔,在该圆中确定多个扇形区域;基于扇形区域中各个方向波束的高频能量和低频能量,确定各个扇形区域的能量和,将能量和最大的扇形区域的对称轴自圆心向外延伸的延伸方向作为声源方向。本申请实施例能够确定出各个扇形区域的高频能量和低频能量,以得到各个扇形区域的能量和从而定位出声源位置。该方法不需要很高的信号采样频率,具有较高的定位精度。(The embodiment of the application discloses a sound source positioning method and device. One embodiment of the method comprises: performing beam forming processing on the target audio after echo cancellation, and counting high-frequency energy and low-frequency energy of formed beams in each direction; representing the beams in all directions in the same circle; determining a plurality of sector areas in the circle by using the preset number of area beams and the preset area intervals; the energy sum of each sector area is determined based on the high-frequency energy and the low-frequency energy of each directional beam in the sector area, and the extension direction in which the energy sum is the largest and the symmetry axis of the sector area extends outward from the center of the circle is taken as the sound source direction. According to the embodiment of the application, the high-frequency energy and the low-frequency energy of each fan-shaped area can be determined, so that the energy of each fan-shaped area is obtained, and the position of a sound source is positioned. The method does not need very high signal sampling frequency and has higher positioning precision.)

1. A sound source localization method, comprising:

performing beam forming processing on the target audio after echo cancellation, and determining high-frequency energy and low-frequency energy of formed beams in each direction;

representing the beams in all directions in the same circle, wherein the center of the circle is determined based on the position of a receiving device for receiving the target audio;

determining a plurality of sector areas in the circle by utilizing the preset number of area beams and the preset area intervals, wherein the number of the area beams is the number of the beams in the sector areas, and the area intervals are the number of the beams separated by two adjacent sector areas;

the energy sum of each sector area is determined based on the high-frequency energy and the low-frequency energy of each directional beam in the sector area, and the extension direction in which the energy sum is the largest and the symmetry axis of the sector area extends outward from the center of the circle is taken as the sound source direction.

2. The method of claim 1, wherein the determining a plurality of sector areas in the circle using a preset number of area beams and an area interval comprises:

in the circle, the sector area where the adjacent beams with the number of the area beams are located is used as a sliding window, the circle center is used as an axis, the area interval is used as a sliding step length, the sector area slides clockwise or anticlockwise to obtain each sector area, and each sliding operation is performed once to obtain one sector area.

3. The method of claim 1, wherein two side edges of the sector area coincide with two beams, respectively;

the size of each sector is the same.

4. The method of claim 1, wherein determining the sum of energies for each sector based on the high frequency energy and the low frequency energy for each directional beam in the sector comprises:

for each direction in the sector area, weighting the high-frequency energy and the low-frequency energy of the direction to obtain a direction energy value of the direction;

weighting the directional energy values of all directions in the sector area to obtain the energy sum of the sector area.

5. The method of claim 1, wherein the high frequency energy is an average high frequency energy of a plurality of frames of audio and the low frequency energy is an average low frequency energy of a plurality of frames of audio;

before determining the energy sum of each sector area based on the high frequency energy and the low frequency energy of each directional beam in the sector area, the method further comprises:

for each direction, determining high-frequency energy and low-frequency energy of each frame of a preset number of frames before the target audio;

and determining the average high-frequency energy and the average low-frequency energy of each frame.

6. A sound source localization apparatus comprising:

a beam forming unit configured to perform beam forming processing on the echo-cancelled target audio, and determine high-frequency energy and low-frequency energy of each formed directional beam;

a representing unit configured to represent beams in respective directions in the same circle, wherein the center of the circle is determined based on the position of a receiving device receiving the target audio;

an area determination unit configured to determine a plurality of sector areas in the circle by using a preset area beam number and an area interval, wherein the area beam number is the number of beams in the sector area, and the area interval is a distance between edges of the same side of two adjacent sector areas;

a direction determination unit configured to determine a sum of energies of the respective sector areas based on the high-frequency energy and the low-frequency energy of the respective directional beams in the sector areas, and to take an extension direction in which a symmetry axis of the sector area having the largest sum of energies extends outward from a center of a circle as a sound source direction.

7. The apparatus of claim 6, wherein the region determination unit is further configured to:

in the circle, the sector area where the adjacent beams with the number of the area beams are located is used as a sliding window, the circle center is used as an axis, the area interval is used as a sliding step length, the sector area slides clockwise or anticlockwise to obtain each sector area, and each sliding operation is performed once to obtain one sector area.

8. The apparatus of claim 6, wherein two side edges of the sector area coincide with two beams, respectively;

the size of each sector is the same.

9. The apparatus of claim 6, wherein the direction determination unit is further configured to:

for each direction in the sector area, weighting the high-frequency energy and the low-frequency energy of the direction to obtain a direction energy value of the direction;

weighting the directional energy values of all directions in the sector area to obtain the energy sum of the sector area.

10. The apparatus of claim 6, wherein the high frequency energy is an average high frequency energy of a plurality of frames of audio and the low frequency energy is an average low frequency energy of a plurality of frames of audio;

the device further comprises:

an energy determination unit configured to determine, for each direction, high-frequency energy and low-frequency energy of frames of a preset number of frames before a target audio;

an average energy determination unit configured to determine an average high frequency energy and an average low frequency energy of the frames.

11. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

12. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1-5.

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to the technical field of internet, and particularly relates to a sound source positioning method and device.

Background

With the development of computer technology, the need for human and machine information communication is more and more urgent. Voice, one of the most natural ways of human interaction, is also one of the most important ways people want to communicate with computers instead of mouse and keyboard. With the increasing urgent development demands of intelligent terminals such as smart homes, intelligent vehicles and intelligent conference systems, the intelligent voice system technology used as an intelligent terminal entrance receives more and more attention.

The sound source positioning technology is an important technology applied to an intelligent voice system, and the accuracy of sound source positioning directly influences the user experience of the intelligent voice system.

Disclosure of Invention

The embodiment of the application provides a sound source positioning method and device.

In a first aspect, an embodiment of the present application provides a sound source localization method, including: performing beam forming processing on the target audio after echo cancellation, and determining high-frequency energy and low-frequency energy of formed beams in each direction; representing the wave beams in all directions in the same circle, wherein the center of the circle is determined based on the position of a receiving device for receiving the target audio; determining a plurality of sector areas in a circle by utilizing the preset number of area beams and area intervals, wherein the number of the area beams is the number of the beams in the sector areas, and the area intervals are the number of the beams separated by two adjacent sector areas; the energy sum of each sector area is determined based on the high-frequency energy and the low-frequency energy of each directional beam in the sector area, and the extension direction in which the energy sum is the largest and the symmetry axis of the sector area extends outward from the center of the circle is taken as the sound source direction.

In some embodiments, determining a plurality of sector areas in a circle using a preset number of area beams and an area interval includes: in the circle, the sector area where the adjacent beams with the number of the area beams are located is used as a sliding window, the circle center is used as the axis, the area interval is used as a sliding step length, and the sliding is carried out in the clockwise direction or the anticlockwise direction to obtain each sector area, wherein one sector area is obtained every time the sliding is carried out.

In some embodiments, two side edges of the sector area coincide with two beams, respectively; the size of each sector is the same.

In some embodiments, determining the sum of the energies of the respective sector areas based on the high frequency energy and the low frequency energy of the respective directional beams in the sector areas comprises: for each direction in the sector area, weighting the high-frequency energy and the low-frequency energy of the direction to obtain a direction energy value of the direction; and weighting the directional energy values of all directions in the sector area to obtain the energy sum of the sector area.

In some embodiments, the high frequency energy is an average high frequency energy of a plurality of frames of audio and the low frequency energy is an average low frequency energy of the plurality of frames of audio; before determining the energy sum of each sector area based on the high frequency energy and the low frequency energy of each directional beam in the sector area, the method further comprises: for each direction, determining high-frequency energy and low-frequency energy of each frame of a preset number of frames before the target audio; an average high frequency energy and an average low frequency energy for each frame are determined.

In a second aspect, an embodiment of the present application provides a sound source localization apparatus, including: a beam forming unit configured to perform beam forming processing on the echo-cancelled target audio, and determine high-frequency energy and low-frequency energy of each formed directional beam; a representing unit configured to represent beams in respective directions in the same circle, wherein a center of the circle is determined based on a position where a receiving device receiving the target audio is located; an area determination unit configured to determine a plurality of sector areas in a circle by using a preset area beam number and an area interval, wherein the area beam number is the number of beams in the sector area, and the area interval is a distance between edges of the same side of two adjacent sector areas; a direction determination unit configured to determine a sum of energies of the respective sector areas based on the high-frequency energy and the low-frequency energy of the respective directional beams in the sector areas, and to take an extension direction in which a symmetry axis of the sector area having the largest sum of energies extends outward from a center of a circle as a sound source direction.

In some embodiments, the region determination unit is further configured to: in the circle, the sector area where the adjacent beams with the number of the area beams are located is used as a sliding window, the circle center is used as the axis, the area interval is used as a sliding step length, and the sliding is carried out in the clockwise direction or the anticlockwise direction to obtain each sector area, wherein one sector area is obtained every time the sliding is carried out.

In some embodiments, two side edges of the sector area coincide with two beams, respectively; the size of each sector is the same.

In some embodiments, the direction determination unit is further configured to: for each direction in the sector area, weighting the high-frequency energy and the low-frequency energy of the direction to obtain a direction energy value of the direction; and weighting the directional energy values of all directions in the sector area to obtain the energy sum of the sector area.

In some embodiments, the high frequency energy is an average high frequency energy of a plurality of frames of audio and the low frequency energy is an average low frequency energy of the plurality of frames of audio; the device still includes: an energy determination unit configured to determine, for each direction, high-frequency energy and low-frequency energy of frames of a preset number of frames before a target audio; an average energy determination unit configured to determine an average high frequency energy and an average low frequency energy for each frame.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement a method as in any embodiment of the sound source localization method.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements a method as in any of the embodiments of the sound source localization method.

According to the sound source positioning scheme provided by the embodiment of the application, firstly, the target audio after echo cancellation is subjected to beam forming processing, and high-frequency energy and low-frequency energy of formed beams in each direction are determined. Then, the beams in the respective directions are shown in the same circle with the start point of the beam as the center of the circle. Then, a plurality of sector areas are determined in the circle by using the preset number of area beams and the preset area interval, wherein the number of area beams is the number of beams in the sector area, and the area interval is the distance between the same side edges of two adjacent sector areas. And finally, determining the energy sum of each sector area based on the high-frequency energy and the low-frequency energy of each directional beam in the sector area, and taking the extension direction of the symmetrical axis of the sector area with the maximum energy sum extending outwards from the center of the circle as the sound source direction. The high-frequency energy and the low-frequency energy of each fan-shaped area are determined, so that the energy of each fan-shaped area is obtained, and the position of a sound source is located. The method does not need very high signal sampling frequency and has higher positioning precision.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2a is a flow chart of one embodiment of a sound source localization method according to the present application;

FIG. 2b is a schematic illustration of a sector-shaped area of a sound source localization method according to the present application;

FIG. 3 is a schematic diagram of an application scenario of a sound source localization method according to the present application;

FIG. 4a is a flow chart of yet another embodiment of a sound source localization method according to the present application;

FIG. 4b is a schematic view of a sector-shaped area according to yet another embodiment of a sound source localization method according to the present application;

FIG. 5 is a schematic structural diagram of one embodiment of a sound source localization apparatus according to the present application;

FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of the sound source localization method or sound source localization apparatus of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as a sound source positioning application, a voice recognition application, a voice interaction application, a video application, a live broadcast application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the terminal devices 101, 102, and 103.

Here, the terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for the terminal devices 101, 102, 103. The background server may analyze and otherwise process the received data such as the image, and feed back a processing result (e.g., an image showing lines) to the terminal device.

It should be noted that the sound source positioning method provided in the embodiment of the present application may be executed by the server 105 or the terminal devices 101, 102, and 103, and accordingly, the sound source positioning apparatus may be disposed in the server 105 or the terminal devices 101, 102, and 103.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2a, a flow 200 of one embodiment of a sound source localization method according to the present application is shown. The sound source positioning method comprises the following steps:

step 201, performing beam forming processing on the target audio after echo cancellation, and determining high-frequency energy and low-frequency energy of formed beams in each direction.

In this embodiment, an executing body of the sound source localization method (for example, a server or a terminal device shown in fig. 1) may perform Beamforming (Beamforming) processing on target audio that has undergone Echo Cancellation (Echo Cancellation) to form a plurality of beams in different directions. Thereafter, the high frequency energy and the low frequency energy of the formed respective directional beams are determined. The echo of the emitted sound can be cancelled by echo cancellation. The echoes may come from various directions and may be likely to cause significant interference with the sound source judgment. Therefore, before determining the direction of the sound source, the echo can be eliminated to more accurately determine the direction of the sound source.

Both high and low frequencies refer to sound frequencies within a predetermined frequency range, with the hertz value in the frequency range for high frequencies being greater than the hertz value in the frequency range for low frequencies. For example, a frequency value may be taken as a boundary between a high frequency and a low frequency, and the energy of one frame in the audio may be, for example, 2000 hz. In particular, a beam may be represented as a spectrum of sound waves, the abscissa of the spectrum being time and the ordinate being frequency. In the frequency spectrum, a high-frequency sound wave and a low-frequency sound wave can be counted, and the energy of the high-frequency sound wave and the energy of the low-frequency sound wave can be calculated as high-frequency energy and low-frequency energy, respectively.

In practice, the beamforming process may be performed using beamforming techniques. For example, the beamforming technique may be a Minimum variance distortion free response (MVDR) or a linear constrained Minimum-variance (linear constrained Minimum-variance) beamformer. Specifically, the sound pickup apparatus for receiving audio may be a single sound pickup or a combination of multiple sound pickups, that is, a microphone array, where multiple sound pickups may receive multiple audio respectively. The individual tones received by the microphone array are required to be processed to obtain a beam in each direction. Thus, the target audio may be one audio or a plurality of audios received by a combination of microphones.

In practice, the high frequency energy and the low frequency energy may be determined in a variety of ways. For example, the high frequency energy and the low frequency energy may be a sequence of the high frequency energy and the low frequency energy of each frame in the previous n frames of audio including the current frame (the latest frame) in the target audio. Alternatively, the high frequency energy and the low frequency energy may be an average of the high frequency energy and an average of the low frequency energy, respectively. Alternatively, the high frequency energy and the low frequency energy may also be the high frequency energy and the low frequency energy of the current frame of the target audio, respectively.

Step 202, representing the beams in all directions in the same circle, wherein the center of the circle is determined based on the position of the receiving device receiving the target audio.

In this embodiment, the execution body may show beams in various directions in the same circle. In particular, the center of the circle may be determined in a variety of ways. For example, the audio receiving position may be used as a center of a circle, and the beams in the respective directions are shown in the same circle. That is, when receiving audio by using the microphone array, the audio receiving position of each microphone can be approximated to a point, and the point is taken as the center of a circle. Alternatively, the beams may coincide with the radius of a circle, and the microphones in the audio receiving device may be located within the radii of the circle. Thus, the beams in all directions in the circle point to all directions with the circle center as a starting point. The resulting beam for each direction through the beamforming process may be represented in this circle.

Step 203, determining a plurality of sector areas in the circle by using the preset number of area beams and the preset area interval, wherein the number of area beams is the number of beams in the sector area, and the area interval is the number of beams spaced between two adjacent sector areas.

In this embodiment, the execution body may determine a plurality of sector areas in the circle by using a preset number of area beams and a preset area interval. The number of area beams included in each preset sector area may be equal or different. There may be an overlap between the determined respective sector areas. For example, as shown in fig. 2b, the circle in the figure includes four adjacent beams L1, L2, R1 and R2, two sector areas, sector F1 and sector F2, including edges L1, R1 and L2, R2, respectively. Since beam L1 and beam L2 are adjacent, the two beams are separated by 1 beam, and the area separation of the two sectors is 1.

In practice, if the number of area beams for each sector is acquired, and two adjacent sectors are acquired, the sectors may be divided from a predetermined point (e.g., point a in fig. 2 b). For example, the predetermined point may be used as a point on one side edge of a sector area, and the number of area beams in the area may be used to determine the intersection point of the other side edge of the sector area and the circle. And determines the two side edges of adjacent sector areas. By analogy, each sector area can be determined. The edges of the sector may coincide with beams, and the number of regional beams then counts in the beams that coincide with the edges. The edge of the sector may not coincide with the beam, for example, the two beams closest to the edge are both enlarged by 1 degree outward to obtain the sector.

In some optional implementations of this embodiment, two side edges of the sector area coincide with the two beams, respectively; the size of each sector is the same.

In these alternative implementations, the beam may be used as the edge of the sector area to perform area division, so that when the sector area is divided, the beam positions are aligned, and each sector area can be accurately determined. For example, if the number of area beams in the preset sector area is 5, two side edges of the sector area are respectively overlapped with the beams in one direction, and three beams are arranged in the middle. The same size here means that the central angles included in the sector areas are the same.

In these implementations, the fan-shaped regions have the same size, and thus the sound source directions can be obtained by dividing each fan-shaped region uniformly and efficiently. Moreover, under the condition that the edge of the sector area is overlapped with the beam, the executing body can determine the sector area more quickly and accurately.

And step 204, determining the energy sum of each sector area based on the high-frequency energy and the low-frequency energy of each directional beam in the sector area, and taking the extension direction of the symmetry axis of the sector area with the maximum energy sum, which extends outwards from the center of the circle, as the sound source direction.

In this embodiment, the execution body may determine the energy sum of each sector area based on the high-frequency energy and the low-frequency energy of each directional beam in the sector area. And the direction of extension of the energy and the axis of symmetry of the highest sector area extending outward from the center of the circle is taken as the sound source direction. The energy sum of the sector area here can be used to indicate the magnitude of the possibility that the extension direction of the symmetry axis of the sector area extending outward from the center of the circle is the direction of the sound source. The larger the energy sum, the greater the likelihood. In practice, the energy sum of the sector area may be determined in various ways, for example, the sum of the high frequency energy and the low frequency energy is determined as the energy sum of the sector area.

In some optional implementations of this embodiment, the "determining the energy sum of each sector area based on the high-frequency energy and the low-frequency energy of each directional beam in the sector area" in step 204 may include:

for each direction in the sector area, weighting the high-frequency energy and the low-frequency energy of the direction to obtain a direction energy value of the direction; and weighting the directional energy values of all directions in the sector area to obtain the energy sum of the sector area.

In these alternative implementations, the execution body may weight, for each direction in the sector region, the high-frequency energy and the low-frequency energy of the direction to obtain a directional energy value of the direction. Then, the execution body may weight the directional energy values of the respective directions in the sector area, obtain a weighted sum of the directional energy values of the respective directions in the sector area, and use the weighted sum of the respective directional energy values as the energy sum. Specifically, the weight of the high frequency energy and the weight of the low frequency energy may be the same or different in the same direction. The weights of the directional energy values for different directions may be the same or different in the same sector.

The weight of the high frequency energy and the weight of the low frequency energy may be preset here. Different weights are set for energy with different frequencies, so that the sound source directions of sounds with different frequencies can be better judged. The same weight may be generally set for the directional energy values of the respective directions. In addition, when the possible directions of the sound source direction are roughly known, beams in different directions can be given different weights to obtain an accurate sound source direction.

In some optional implementations of this embodiment, the high frequency energy is an average high frequency energy of a plurality of frames of the audio, and the low frequency energy is an average low frequency energy of the plurality of frames of the audio;

before step 204, the method further includes:

for each direction, determining high-frequency energy and low-frequency energy of each frame of a preset number of frames before the target audio; an average high frequency energy and an average low frequency energy for each frame are determined.

In these alternative implementations, the execution body may determine the high frequency energy and the low frequency energy of each frame of the previous preset number of frames. For example, the frames of the plurality of frames herein include frames of the first 100 frames of audio including the current frame. Thereafter, an average of the high frequency energies of the frames is determined as an average high frequency energy. And determines an average of the low frequency energy of the frames as an average low frequency energy. Thus, the energy sum of the sector area can be determined using the above average high frequency energy and average low frequency energy.

These implementations can avoid the problem of large deviation of energy values of a single frame, and determine accurate high-frequency energy and low-frequency energy through average values. And further, the energy sum of the accurate sector area is determined, so that the finally obtained sound source direction is more accurate.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the sound source localization method according to the present embodiment. In the application scenario of fig. 3, the executing entity 301 may perform beamforming on the echo-cancelled target audio 302, and determine high-frequency energy and low-frequency energy 304 of each formed directional beam 303. The beams in the respective directions are shown in the same circle with the origin of the beam as the center of the circle. A plurality of sector areas 307 are defined in a circle by a preset number of area beams (e.g., 3)305, which is the number of beams in a sector area, and an area interval (e.g., 1 beam) 306, which is the distance between the same side edges of two adjacent sector areas. Based on the high frequency energy and the low frequency energy 308 of the respective directional beams in the sector area, the energy sum 309 of the respective sector area is determined, and the direction in which the energy sum is the largest extends from the center of the circle to the outside is taken as the sound source direction 310.

The method provided by the above embodiment of the present application can determine the high frequency energy and the low frequency energy of each sector area to obtain the energy of each sector area and thereby locate the sound source position. The method does not need very high signal sampling frequency and has higher positioning precision.

With further reference to fig. 4a, a flow 400 of yet another embodiment of a sound source localization method is shown. The process 400 of the sound source localization method includes the following steps:

step 401, performing beam forming processing on the echo-cancelled target audio, and determining high-frequency energy and low-frequency energy of the formed directional beams.

In the present embodiment, an execution subject of the sound source localization method (for example, a server or a terminal device shown in fig. 1) may perform beamforming processing on target audio that has undergone echo cancellation to form a plurality of beams in different directions. Thereafter, the high frequency energy and the low frequency energy of the formed respective directional beams are determined.

Step 402, using the starting point of the beam as the center of the circle, and representing the beams in each direction in the same circle.

In this embodiment, the execution body may use the start point of the beam as a center of a circle, and show the beams in the respective directions in the same circle. Thus, the beams in all directions in the circle point to all directions with the circle center as a starting point. The resulting beam for each direction through the beamforming process may be represented in this circle.

And step 403, in the circle, sliding clockwise or counterclockwise by taking the sector area where the adjacent beams with the number of the area beams are located as a sliding window, taking the center of the circle as the axis, taking the area interval as a sliding step length, and obtaining each sector area, wherein one sector area is obtained every time sliding is performed.

In this embodiment, the execution body may start from a preset starting point on the circle, and perform sliding with a sector area where the fixed number of beams is located as a sliding window, a circle center as a sliding axis, and preset area intervals as sliding steps. Therefore, the sliding windows of each sliding may have the same size or different sizes when the adjacent beams are spaced apart by equal or different distances. As shown in fig. 4b, in the case of a large sliding window, there are 8 sectors, two of which, W1 and W2, are adjacent, with a step size of S.

And step 404, determining the energy sum of each sector area based on the high-frequency energy and the low-frequency energy of each directional beam in the sector area, and taking the extension direction of the symmetry axis of the sector area with the maximum energy sum, which extends outwards from the center of the circle, as the sound source direction.

In this embodiment, the execution body may determine the energy sum of each sector area based on the high-frequency energy and the low-frequency energy of each directional beam in the sector area. And the direction of extension of the energy and the axis of symmetry of the highest sector area extending outward from the center of the circle is taken as the sound source direction. The energy sum of the sector area here can be used to indicate the magnitude of the possibility that the extension direction of the symmetry axis of the sector area extending outward from the center of the circle is the direction of the sound source. The larger the energy sum, the greater the likelihood.

The present embodiment may use a sliding window to perform multiple sliding operations to obtain each sector area. To efficiently and accurately obtain a plurality of sector areas.

With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of a sound source localization apparatus, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable in various electronic devices.

As shown in fig. 5, the sound source localization apparatus 500 of the present embodiment includes: a beam forming unit 501, a presentation unit 502, a region determination unit 503, and a direction determination unit 504. Wherein, the beam forming unit 501 is configured to perform beam forming processing on the echo-cancelled target audio, and determine high-frequency energy and low-frequency energy of each formed directional beam; a display unit 502 configured to display beams in respective directions in the same circle with the start point of the beam as the center of the circle; an area determination unit 503 configured to determine a plurality of sector areas in a circle by using a preset area beam number and an area interval, wherein the area beam number is the number of beams in a sector area, and the area interval is a distance between edges of the same side of two adjacent sector areas; a direction determination unit 504 configured to determine a sum of energies of the respective sector areas based on the high-frequency energy and the low-frequency energy of the respective directional beams in the sector areas, and to take an extension direction in which a symmetry axis of the sector area having the largest sum of energies extends outward from a center of a circle as a sound source direction.

In some embodiments, the beam forming unit 501 of the sound source localization apparatus 500 may perform beam forming processing on the target audio that has undergone echo cancellation to form a plurality of beams in different directions. Thereafter, the high frequency energy and the low frequency energy of the formed respective directional beams are determined. The echo of the emitted sound can be cancelled by echo cancellation. The echoes may come from various directions and may be likely to cause significant interference with the sound source judgment. Therefore, before determining the direction of the sound source, the echo can be eliminated to more accurately determine the direction of the sound source.

In some embodiments, the representing unit 502 may represent beams in various directions in the same circle. In particular, the center of the circle may be determined in a variety of ways. For example, the audio receiving position may be used as a center of a circle, and the beams in the respective directions are shown in the same circle. That is, when receiving audio by using the microphone array, the audio receiving position of each microphone can be approximated to a point, and the point is taken as the center of a circle. Alternatively, the beams may coincide with the radius of a circle, and the microphones in the audio receiving device may be located within the radii of the circle.

In some embodiments, the area determination unit 503 may determine a plurality of sector areas in the circle by using a preset number of area beams and a preset area interval. The number of area beams included in each preset sector area may be equal or different. There may be an overlap between the determined respective sector areas.

In some embodiments, the direction determining unit 504 may determine the energy sum of each sector area based on the high frequency energy and the low frequency energy of each directional beam in the sector area. And the direction of extension of the energy and the axis of symmetry of the highest sector area extending outward from the center of the circle is taken as the sound source direction. The energy sum of the sector area here can be used to indicate the magnitude of the possibility that the extension direction of the symmetry axis of the sector area extending outward from the center of the circle is the direction of the sound source. The larger the energy sum, the greater the likelihood. In practice, the energy sum of the sector area may be determined in various ways, for example, the sum of the high frequency energy and the low frequency energy is determined as the energy sum of the sector area.

In some optional implementations of this embodiment, the region determining unit is further configured to: in the circle, the sector area where the adjacent beams with the number of the area beams are located is used as a sliding window, the circle center is used as the axis, the area interval is used as a sliding step length, and the sliding is carried out in the clockwise direction or the anticlockwise direction to obtain each sector area, wherein one sector area is obtained every time the sliding is carried out.

In some optional implementations of this embodiment, two side edges of the sector area coincide with the two beams, respectively; the size of each sector is the same.

In some optional implementations of this embodiment, the direction determining unit is further configured to: for each direction in the sector area, weighting the high-frequency energy and the low-frequency energy of the direction to obtain a direction energy value of the direction; and weighting the directional energy values of all directions in the sector area to obtain the energy sum of the sector area.

In some optional implementations of this embodiment, the high frequency energy is an average high frequency energy of a plurality of frames of the audio, and the low frequency energy is an average low frequency energy of the plurality of frames of the audio; the device still includes: an energy determination unit configured to determine, for each direction, high-frequency energy and low-frequency energy of frames of a preset number of frames before a target audio; an average energy determination unit configured to determine an average high frequency energy and an average low frequency energy for each frame.

Referring now to FIG. 6, shown is a block diagram of a computer system 600 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 6, computer system 600 includes a processor (e.g., central processing unit, graphics processor, etc.) 601, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The processor 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An Input/Output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: a storage portion 606 including a hard disk and the like; and a communication section 607 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 607 performs communication processing via a network such as the internet. Drivers 608 are also connected to the I/O interface 605 as needed. A removable medium 609 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 608 as necessary, so that a computer program read out therefrom is mounted into the storage section 606 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 607 and/or installed from the removable medium 609. The computer program, when executed by the processor 601, performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium of the present application can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a beamforming unit, a representing unit, a region determining unit, and a direction determining unit. The names of these units do not form a limitation on the unit itself in some cases, for example, the beam forming unit may also be described as "performing beam forming processing on the target audio after echo cancellation, and counting the high-frequency energy and the low-frequency energy of each formed directional beam".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: performing beam forming processing on the target audio after echo cancellation, and counting high-frequency energy and low-frequency energy of formed beams in each direction; representing the wave beams in all directions in the same circle, wherein the center of the circle is determined based on the position of a receiving device for receiving the target audio; determining a plurality of sector areas in a circle by utilizing the preset number of area beams and area intervals, wherein the number of the area beams is the number of the beams in the sector areas, and the area intervals are the distances between the edges of the same side of two adjacent sector areas; the energy sum of each sector area is determined based on the high-frequency energy and the low-frequency energy of each directional beam in the sector area, and the extension direction in which the energy sum is the largest and the symmetry axis of the sector area extends outward from the center of the circle is taken as the sound source direction.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

19页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于雷达和光电技术的探测识别装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!