Microphone array, recording device and method, and program

文档序号：1382846 发布日期：2020-08-14 浏览：8次中文

阅读说明：本技术 麦克风阵列、记录装置和方法以及程序 (Microphone array, recording device and method, and program ) 是由廖伟翔大迫庆一光藤祐基于 2019-02-15 设计创作，主要内容包括：本技术涉及麦克风阵列、记录装置和方法以及程序,其使得能够以低成本执行宽带声场记录。麦克风阵列由多个子阵列构成以用于声场记录。此外,每个子阵列包括多个麦克风,并且具有包括指定半径的离散旋转对称的形状,并且当多个子阵列的半径设置成数列时,该数列是广义算术数列。例如,本技术可应用于麦克风阵列和记录装置。(The present technology relates to a microphone array, a recording apparatus and method, and a program, which enable broadband sound field recording to be performed at low cost. The microphone array is composed of a plurality of sub-arrays for sound field recording. Further, each sub-array includes a plurality of microphones and has a discrete rotationally symmetric shape including a specified radius, and when the radii of the plurality of sub-arrays are set to a number series, the number series is a generalized arithmetic number series. For example, the present technology can be applied to a microphone array and a recording apparatus.)

1. A microphone array for soundfield recording, the microphone array comprising:

a plurality of sub-arrays, each sub-array comprising a plurality of microphones, and each sub-array having a discrete rotationally symmetric shape comprising a specified radius, wherein,

when the radius values of a plurality of said sub-arrays form a series of numbers, said series of numbers is a generalized arithmetic series of numbers.

2. The microphone array of claim 1,

each of the plurality of microphones included in the sub-array is arranged away from a center position of the microphone array by a distance corresponding to a radius of the sub-array.

3. The microphone array of claim 1,

when at least one of a zoom-in operation, a zoom-out operation, a rotation operation, and an inversion operation is performed on one of the plurality of sub-arrays, the one of the plurality of sub-arrays coincides with other sub-arrays of the plurality of sub-arrays.

4. The microphone array of claim 1,

the plurality of microphones are arranged such that: when all of the plurality of microphones included in the microphone array are radially projected onto a ring centered on a center position of the microphone array, projected microphones of the plurality of microphones are equally spaced on the ring.

5. The microphone array of claim 1,

all of the plurality of microphones included in the microphone array are omni-directional microphones or at least one of the plurality of microphones included in the microphone array is not an omni-directional microphone.

6. A recording apparatus, comprising:

a spherical harmonic coefficient calculator that calculates spherical harmonic coefficients based on multi-channel signals obtained by sound collection performed by a microphone array for sound field recording, the microphone array including a plurality of sub-arrays, each sub-array including a plurality of microphones, and each sub-array having a discrete rotationally symmetric shape including a specified radius, wherein,

when the radius values of a plurality of said sub-arrays form a series of numbers, said series of numbers is a generalized arithmetic series of numbers.

7. The recording apparatus according to claim 6,

the spherical harmonic coefficient calculator calculates the spherical harmonic coefficient by performing mode compensation.

8. The recording apparatus according to claim 7, further comprising:

a spatial resolution controller to limit a number of rows of a transform matrix used to perform the mode compensation based on a particular order of a spherical harmonic domain.

9. The recording apparatus according to claim 8,

the spatial resolution controller determines the specific order based on a maximum value of radii of a plurality of the sub-arrays.

10. The recording apparatus according to claim 8,

the spherical harmonic coefficient calculator calculates the spherical harmonic coefficient by performing the pattern compensation based on a pseudo-inverse matrix of the transform matrix that limits the number of rows and a multi-channel signal.

11. A recording method, comprising:

calculating, by a recording apparatus, spherical harmonic coefficients based on a multi-channel signal obtained by performing sound collection by a microphone array for sound field recording, the microphone array including a plurality of sub-arrays, each sub-array including a plurality of microphones, and each sub-array having a discrete rotationally symmetric shape including a specified radius, wherein,

when the radius values of a plurality of said sub-arrays form a series of numbers, said series of numbers is a generalized arithmetic series of numbers.

12. A program for causing a computer to execute processing, comprising:

calculating spherical harmonic coefficients based on multi-channel signals obtained by performing sound collection by a microphone array for sound field recording, the microphone array including a plurality of sub-arrays, each sub-array including a plurality of microphones, and each sub-array having a discrete rotationally symmetric shape including a specified radius, wherein,

when the radius values of a plurality of said sub-arrays form a series of numbers, said series of numbers is a generalized arithmetic series of numbers.

Technical Field

The present technology relates to a microphone array, a recording apparatus, a recording method, and a program, and particularly to a microphone array, a recording apparatus, a recording method, and a program capable of performing broadband sound field recording at low cost.

Background

In recent years, the recording and reproduction of sound waves has become widespread in the audio industry. Compared with the multi-channel reproduction technology in the past, the technology of synthesizing and reconstructing a wave surface makes it possible to localize a sound image of an object disposed in space and perform spatial noise cancellation, and thus can provide a more realistic acoustic experience.

For example, open circular microphone arrays, including omni-directional microphones, are used in a variety of applications.

However, such a design of the microphone arrangement of a circular microphone array is not suitable for recording a wave front (sound field) in a wide frequency range. The reason is that when a circular microphone array is used, a pattern function of spherical harmonic coefficients for obtaining a recorded sound wave surface, called a bessel function, is zero in a specific frequency range.

Therefore, for example, in order to reduce the region where the mode function is zero, a plurality of microphones are provided in a multi-circular form of two or more, a cardioid directional microphone is used (for example, refer to non-patent document 1), or a rigid baffle is used.

Further, there are some array recording techniques using an omnidirectional microphone (for example, refer to non-patent documents 2 and 3 and patent documents 1 to 3).

Reference list

Non-patent document

Non-patent document 1: huang "Design of robust centralized circular cellular arrays", The Journal of The environmental Society of America, 2017.

Non-patent document 2: prime and c. dolain "a composition of porous beans for reinforcements", Proceedings of Acoustics 2013Victor Harbor: science Technology and architecture, Annual Conference of the architectural scientific Society, 2013.

Non-patent document 3: mandal, S.P.Ghoshal and A.K.Bhattacharjee "centralized rational anti-array synthesis using Particle Swarm Optimization with construction factory Approach", Indian Antenna Week: a Workshop on Advance dAntenna Technology, 2010.

Patent document

Patent document 1: U.S. Pat. No. 6205224

Patent document 2: japanese unexamined patent application publication No. 2005-521283

Patent document 3: japanese patent application laid-open No. 2011-

Disclosure of Invention

Technical problem

However, it is difficult to perform broadband sound field recording at low cost using the above-described technique.

For example, in many cases, it is not feasible to perform sound field recording over a sufficiently wide frequency range by applying a method such as arranging a plurality of microphones in a multi-circular form, using cardioid directional microphones, or using a rigid baffle, or it is difficult to perform sound field recording over a sufficiently wide frequency range due to cost or physical limitations.

Further, the techniques disclosed in non-patent document 2 and patent documents 1 to 3 are techniques for reducing side lobes of beamforming, and the technique disclosed in non-patent document 3 is not intended for sound. Therefore, these array recording techniques are not suitable for reproducing the recording of the wave surface.

The present technology has been made in view of the above circumstances, and an object thereof is to perform broadband sound field recording at low cost.

Solution to the problem

The microphone array of the first aspect of the present technology is a microphone array for sound field recording, which includes a plurality of sub-arrays each including a plurality of microphones and each having a discrete rotationally symmetric shape including a specified radius, wherein when the radius values of the plurality of sub-arrays form a number array, the number array is a generalized arithmetic number array.

In a first aspect of the present technology, a microphone array is a microphone array for sound field recording, which includes a plurality of sub-arrays; the sub-array comprises a plurality of microphones and has a discrete rotationally symmetric shape comprising a specified radius; when the radius values of the plurality of sub-arrays form a number series, the number series is a generalized arithmetic number series.

The recording apparatus of the second aspect of the present technology includes a spherical harmonic coefficient calculator that calculates spherical harmonic coefficients based on multi-channel signals obtained by sound collection performed by a microphone array for sound field recording, the microphone array including a plurality of sub-arrays, each sub-array including a plurality of microphones, and each sub-array having a discrete rotationally symmetric shape including a specified radius, wherein when radius values of the plurality of sub-arrays form a number array, the number array is a generalized arithmetic number array.

A recording method or program of the second aspect of the present technology is a recording method or program corresponding to the recording apparatus of the second aspect of the present technology.

In a second aspect of the present technology, spherical harmonic coefficients are calculated based on multi-channel signals obtained by sound collection performed by a microphone array for sound field recording, the microphone array including a plurality of sub-arrays. Further, each of the plurality of sub-arrays includes a plurality of microphones, and each sub-array has a discrete rotationally symmetric shape including a specified radius, wherein when the radius values of the plurality of sub-arrays form a series, the series is a generalized arithmetic series.

Advantageous effects of the invention

The first and second aspects of the present technology enable broadband sound field recording to be performed at low cost.

Note that the effects described herein are not necessarily limiting, and any of the effects described in the present disclosure may be provided.

Drawings

Fig. 1 is a diagram describing mode function values depending on the settings of a microphone;

fig. 2 is a diagram describing a mode function value depending on the setting of a microphone;

fig. 3 illustrates an example of a configuration of a microphone array in accordance with the present technique;

fig. 4 is a diagram describing the arrangement of microphones;

fig. 5 illustrates an example of a configuration of a microphone array in accordance with the present technique;

fig. 6 illustrates an example of a configuration of a microphone array in accordance with the present technique;

fig. 7 illustrates an example of a configuration of a microphone array in accordance with the present technique;

fig. 8 illustrates an example of a configuration of a microphone array in accordance with the present technique;

fig. 9 is a diagram describing a mode function value depending on the setting of a microphone;

FIG. 10 is a diagram depicting condition numbers depending on the setting of the microphone;

FIG. 11 is a diagram depicting condition numbers depending on the setting of the microphone;

FIG. 12 is a diagram depicting condition numbers depending on the setting of the microphone;

fig. 13 shows an example of the configuration of a recording system and a reproducing system according to the present technology;

FIG. 14 is a flowchart describing a recording process;

FIG. 15 is a flowchart describing a reproduction process;

fig. 16 shows an example of the configuration of a computer.

Detailed Description

< first embodiment >

< present technology >

The present technology can record and reproduce a planar sound field over a wide frequency range by using a geometric arrangement of a microphone array.

The present technique makes it possible to determine the settings of each microphone, i.e. the settings of each microphone element in the microphone array, parametrically. Note that it is sufficient if the setting parameters defining the microphone unit settings are appropriately determined according to various use cases. For example, a microphone array comprises a plurality of sub-arrays, each sub-array comprising a plurality of microphones, each sub-array having a discrete rotationally symmetric shape, and the sub-arrays having shapes similar to each other.

The present technology described above makes it possible to improve robustness against errors in microphone placement and errors due to, for example, manufacturing variations of microphones, and to record and reproduce a sound field, i.e., a sound wave front over a wider frequency range. Furthermore, the requirements of microphone cost and microphone unit performance, e.g. signal-to-noise ratio (SNR), can also be easily met.

Embodiments according to the present technology are described below with reference to the drawings.

A microphone array for sound field recording according to the present technology is generally a substantially circular microphone array in which individual microphones are arranged in a two-dimensional plane so as to surround the center of the microphone array. However, the configuration is not limited to this, and the microphone array may be a microphone array for recording a three-dimensional sound field in which respective microphones are disposed in a three-dimensional space.

In other words, the microphone array according to the present technology may be, for example, a substantially spherical microphone array in which the respective microphones are arranged in a three-dimensional space so as to surround the center of the microphone array, when the microphones are arranged in the three-dimensional space.

The following description is continued assuming that a microphone array according to the present technology has a structure obtained by arranging respective microphones in a two-dimensional plane.

If the bezier function is zero when the signal of the sound wave surface (sound field) recorded by the microphone array is converted into a signal of a spherical harmonic domain, there is a frequency range in which the conversion cannot be accurately performed.

For example, if the microphone array is in the form of a single circle or a double circle, there will be a frequency range where the mode function value (i.e., the bezier function value) is zero, as shown in fig. 1.

Note that in fig. 1, the horizontal axis represents the wave number and the vertical axis represents the order of the spherical harmonic domain. Further, light and dark in fig. 1 represent values of the bezier function, and specifically, an area of a black portion represents an area where the bezier function value is 0 (zero).

More specifically, the bezier function values shown in fig. 1 are the maximum values among the bezier function values of each microphone included in the microphone array. The Bessel function value of each microphone varies according to the distance from the center of the microphone array to the microphone.

In fig. 1, a portion indicated by an arrow Q11 represents a bezier function value in each region corresponding to the wave number and the order when the microphone array is in the form of a single circle. In this example, for example, the bezier function values are zero in many areas such as the area indicated by the arrow Q11, indicating that there is a frequency range in which the wave front cannot be accurately recorded or reproduced.

On the other hand, the portion indicated by an arrow Q12 represents the bezier function values in each region corresponding to the wave number and the order when the microphone array is in the form of a double circle. This example shows that the area where the bezier function value is zero is smaller than the area in the example indicated by arrow Q11. However, with regard to the bezier function values, there are a large number of small values close to zero, which may seriously affect the recording and reproduction of the wave front.

Also, for example, as shown in fig. 2, using a cardioid directional microphone as the microphone included in the microphone array makes it possible to reduce the area where the bezier function value is zero, but this results in high cost.

Note that in fig. 2, the horizontal axis represents the wave number and the vertical axis represents the order of the spherical harmonic domain. Further, light and dark in fig. 2 represent mode function values, i.e., bezier function values, and specifically, the area of the black portion represents an area where the bezier function value is 0 (zero). More specifically, the bezier function values shown in fig. 2 are the maximum values among the bezier function values of each microphone included in the microphone array.

In the example shown in fig. 2, there are few regions where the bezier function value is zero for a region equal to or smaller than a certain order in each wave number, as compared with the example shown in fig. 1, but the use of a cardioid directional microphone results in high cost.

Further, although some array recording techniques using omnidirectional microphones have been proposed in the past, these techniques are not suitable for performing recording of sound wave surface reproduction.

On the other hand, there is also a simple method to avoid a state where the bezier function value is zero. For example, when microphone arrays of a double circular form are used, and when the bezier function value in one circular microphone array is zero and the bezier function value in the other circular microphone array is not zero, a bezier function value other than zero may be used. However, in this method, a signal in the spherical harmonic domain cannot be obtained with sufficient accuracy.

In general, it is not feasible to avoid microphone-specific sensor noise or ambient noise. Further, it is difficult to make the actual set position of the microphone and the set position expressed by the coordinates of the theoretical design accurately coincide due to the placement error of the microphone or the manufacturing variation of the microphone.

Since division is performed by a small value of the bezier function when reproducing the recorded wave front, these noises become larger, and the placement error and the error due to, for example, manufacturing variations become larger, which results in seriously affecting the numerical value calculation. It is therefore important not only to perform a method that avoids a state where the bezier function value is zero, but also to optimize or analyze the error margin when designing the microphone settings.

In other words, in order to more accurately record and reproduce the wavefront, a high tolerance to an error, i.e., robustness to the error is required. In particular, in view of cost, physical limitations and ease of performing signal processing, it is desirable to design a microphone setup that achieves high tolerance to errors and aims to use a minimum number of omnidirectional microphones.

Here, recording and reproduction of a sound wave front, that is, recording and reproduction of a sound field, are described. Note that the microphones included in the microphone array are also specifically referred to as microphone units hereinafter.

For example, the wave surface of sound can be recorded and reproduced by obtaining the spherical harmonic coefficient of the wave surface.

Specifically, when a circular microphone array is used for recording a wave surface, the sound pressure of the wave surface is sampled at Q corresponding points under the condition that the sampling theorem is satisfiedTo obtain the spherical harmonic coefficient a_mn(k)。

Furthermore, byDividing the sound pressure by b_n(kr) removing included sound pressureAnd depends on the component of the radius r of the circular microphone array, b_n(kr) is a component that depends on the radius r.

In other words, the spherical harmonic coefficient a can be obtained using the following formula (1)_mn(k)。

[ equation 1]

Note that in formula (1), n and m each denote the order of a spherical harmonic domain, and Q is an index indicating each of Q points at which sampling is performed on sound pressure, where Q is 0. The sampling point denoted by the index q is also referred to as point q in the following.

Further, k denotes a wave number, and r denotes a radius of the circular microphone array, i.e., a distance from a center position of the circular microphone array to the microphone unit. Theta_qAndrespectively, an elevation angle and an azimuth angle, each of which represents a direction in which the microphone unit located at point q is oriented.

Further, in the formula (1), f represents frequency, c_sRepresents the wave velocity, b_n(kr) represents a mode function,representing the spherical harmonic basis. In particular, when the circular microphone array comprises omni-directional microphones, b as a function of the pattern_n(kr) is a spherical Bessel function. b_n(kr) is also referred to hereinafter simply as the Bessel function. In addition, the spherical harmonic wave baseThe "+" in (a) indicates a complex conjugate.

Note that an example in which the circular microphone Array includes omnidirectional microphones is disclosed in detail in, for example, "b.rafaely, Fundamentals of personal Array Processing, Springer, 2015" (hereinafter also referred to as reference 1).

Further, the arithmetic processing in the formula (1) is, in particular, by the Bessel function b in the formula (1)_n(kr) the division process that performs the division is also referred to as pattern compensation. Note that mode compensation is disclosed in detail in, for example, "d.p. jarrett, e.a. habets and p.a. naylor, the Theory and Applications of thermal Microphone Array Processing, Springer, 2017" (also referred to as reference 2 hereinafter).

When sound is collected at each point q using a circular microphone array, and when sound pressure at the point q is obtainedTo record the wave surface of sound, the obtained sound pressure may be used by using the formula (1)To obtain the spherical harmonic coefficient a_mn(k) In that respect Further, when the spherical harmonic coefficient a is obtained as described above_mn(k) The reproduction system may use the spherical harmonic coefficient a when transmitting to the reproduction system_mn(k) The wave surface (sound field) of the sound is reproduced.

Incidentally, when the Bessel function b in the formula 1_nA numerical problem known as the bessel zero problem occurs when the value of (kr) approaches zero. In other words, as described below, when the Bessel function b_nWhen the value of (kr) is close to zero, the method is used for obtaining the spherical harmonic coefficient a_mn(k) The condition number (conditionumber) of the transformation matrix becomes large, which results in that an accurate spherical harmonic coefficient a cannot be obtained_mn(k)。

When the individual microphone elements included in the circular microphone array do not lie on the same annular shape, i.e. when the individual microphone elements arranged at the individual points q have different radii r_qSound pressure sampled at point qRepresented by the following formula (2). Note that the radius r of the microphone unit_qCorresponding to the distance from the center of the circular microphone array to the microphone unit, i.e., the distance from the center of the circular microphone array to the point q.

[ formula 2]

In this case, by the sound pressure to be obtained at each point qDistribution (i.e. from sound pressure)The constituent vectors p_k) Multiplying by B⁺ _kObtaining the spherical harmonic coefficient a of each order and wave number_mn(k) Wherein B is⁺ _kIs a transformation matrix B_kThe pseudo-inverse matrix of (2).

In other words, for example, the spherical harmonic coefficient a by the wave number k can be obtained by performing the calculation of the following formula (3)_mn(k) The component vector a (k).

[ formula 3]

In the formula (3), B⁺ _kRepresents a transformation matrix B_kThe pseudo-inverse matrix of (2). Note that the vector p_kIs determined by the sound pressure at each point lThe vector of the composition is represented by the following formula (4), wherein L ═ 0., L, and L ═ Q-1. In other words, in the formula (4), l represents an index indicating a sound pressure sampling point, and l corresponds to the above q.

Further, as shown in the following formula (5), the matrix B is transformed_kIs such a momentArray whose elements are the order n, Bessel function b, relative to each point l_n(kr_l) And spherical harmonicsWherein N is greater than or equal to 0 and less than or equal to N.

[ formula 4]

[ formula 5]

In the above formula (3), that shown in formula (1)Pseudo-inverse matrix B in division as mode compensation⁺ _kInstead. In order to obtain an accurate spherical harmonic coefficient a using the formula (3)_mn(k) The transformation matrix B_kIt needs to be reversible and avoid morbidity. Here, this can be achieved, for example, by relating to the transformation matrix B_kTo evaluate the transformation matrix B_kWhether it is in good or ill condition.

When transforming matrix B_kThe minimum singular value and the maximum singular value of_min(B_k) And σ_max(B_k) Then, the transformation matrix B can be obtained using the following equation (6)_kCondition number X (k).

[ formula 6]

In the calculation of equation (3), when an error is included in the vector observed, i.e., the error is included in the vector p in this case_kIn (1), the error is increased by a factor of x (k), where x (k) represents the condition number.

Thus, the matrix B is transformed_kCondition number of (1) X (k)Advantageously smaller, and a small condition number x (k) indicates a high tolerance to errors, i.e. improved robustness to errors. As a rule of thumb, a matrix with a condition number greater than 100 is ill-conditioned, although it depends on the application. Note that, for example, margin analysis of errors of a circular microphone array or a spherical microphone array performed based on a condition number is disclosed in detail in the above-mentioned reference 1.

As described above, the spherical harmonic coefficient a for reproducing the sound wave surface can be obtained by performing the calculation of the formula (3)_mn(k) And a good-state transformation matrix B can be obtained by appropriately setting the settings of each microphone unit included in the microphone array and the maximum value (highest order) of the n-th order of the spherical harmonic domain_k。

Thus, by appropriately setting the settings of the microphone units and the maximum order of the spherical harmonic domain, the present technique makes it possible to achieve high tolerance to noise (high tolerance to error) over a wide frequency range using fewer omnidirectional microphone units (i.e. at low cost).

Specifically, wave surface recording and reproduction according to the present technology is performed by parametrically designing a microphone array having the following characteristics and performing spatial resolution control according to frequency.

For example, a microphone array according to the present technology has the following features F1 to F3. In other words, the microphone array according to the present technology is designed based on the following features F1 to F3.

(feature F1)

The microphone array comprises a plurality of geometrically similar sub-arrays, and each sub-array is discretely rotationally symmetric.

(feature F2)

The microphone elements are distributed at equal angles, seen from the center of the microphone array.

(feature F3)

When the radius values of the respective sub-arrays form a number series (progression), the number series is a generalized arithmetic number series.

A microphone array according to the present technology includes a plurality of sub-arrays, and each sub-array includes a plurality of microphone units.

Note that the microphone array may comprise a single sub-array, or the sub-array may comprise a single microphone element.

Furthermore, all of the microphone elements included in the microphone array are substantially omni-directional microphones, but some of the microphone elements may not be omni-directional microphones.

The above-described feature F1 is a feature that, when the microphone array includes a plurality of sub-arrays, all the sub-arrays have geometrically similar shapes (in a similar microphone element arrangement). Herein, sub-arrays that are geometrically similar to each other mean that the plurality of microphone units included in the sub-arrays have a similar arrangement.

For example, two sub arrays geometrically similar to each other means that one sub array coincides with the other sub array when at least one of a zoom-in operation, a zoom-out operation, a rotation operation, or an inversion operation is performed on one sub array.

Here, the coincidence means that, after, for example, an amplification operation is performed on one sub-array, the set position of each microphone unit included in one sub-array coincides with the set position of each microphone unit included in another sub-array. In this case, the center position of each sub-array coincides with the center position of the microphone array.

Furthermore, each sub-array has a discrete rotationally symmetric shape. In other words, the sub-arrays do not have continuous rotational symmetry in which the sub-arrays constantly have the same shape when the sub-arrays are rotated by an arbitrary angle, but the sub-arrays have discrete rotational symmetry in which the shapes of the sub-arrays before and after rotation coincide when the sub-arrays are rotated by a certain angle around the center position of the sub-arrays (i.e., the center position of the microphone array). In a microphone array, a flat frequency characteristic can be achieved, since each sub-array is discretely rotationally symmetric.

Furthermore, each sub-array has a specific radius. In particular, in this case, all the microphone units included in the sub-array have an equal radius, and the radius corresponds to the radius of the sub-array. The radius of the microphone elements corresponds to the distance from the center position of the sub-array, i.e. from the center position of the microphone array, to the microphone elements.

Thus, each of the plurality of microphone elements comprised in the sub-array is arranged at a distance from the center position of the microphone array (i.e. the center position of the sub-array), which distance corresponds to the radius of the sub-array.

Feature F2 is a feature that when all the microphone elements included in the microphone array are radially projected onto a single ring centered on the center position of the microphone array, that is, projected onto the circumference of the microphone array, the projected microphone elements are uniformly distributed on the ring. In other words, the projected microphone units are equally spaced on the ring.

Here, the position where the microphone unit is projected on the ring shape is a position where a line connecting (passing through) the microphone units and the center positions of the microphone array intersects the ring shape (circular shape) on which the microphone units are projected. In other words, the position of the microphone unit on the ring shape is a position on which the microphone unit is projected, as viewed from the center position of the microphone array.

After recording the wave surface by the feature F2 given above, it is no longer necessary to perform complicated signal processing. For example, omission of complicated signal processing due to such characteristics is disclosed in detail in the above-mentioned reference 1.

Further, the feature F3 is a feature that, when there are sub-arrays having different radii among a plurality of sub-arrays included in the microphone array, and when radius values of all the sub-arrays included in the microphone array form a number array, the number array is a generalized arithmetic number array, the radius values being set in an ascending order or a descending order.

In other words, the feature F3 is a feature in which the microphone elements are arranged at intervals corresponding to the tolerance of the generalized arithmetic number array in a direction outward from the center of the microphone array (i.e., in a direction away from the center).

In, for example, "Z.Prime and C.Doolan," A compliance of porous belts forming reactors ", Proceedings of Acoustics 2013Victor Harbor: science Technology and architecture, the Annual Conference of the architectural scientific Society,2013 "and us patent No. 6205224, disclose in detail a method of setting microphone units based on distances corresponding to radii determined from logarithmic or geometric series.

However, when the spatial resolution is controlled for each frequency, by applying the present technique to determine the radius of the sub-array using a generalized arithmetic sequence provides a more effective effect of reducing the area where the bezier function value is zero or close to zero than applying the above method. In other words, the transformation matrix B_kThe condition number X (k) of (2) becomes smaller.

Further, a microphone array designed to have the features F1 and F2 makes it possible to realize scalable use (scalable use) according to requirements by using a plurality of sub-arrays.

It is assumed that a plurality of geometrically similar sub-arrays are used as sub-arrays comprised in the microphone array. In this case, for example, a scalable use is possible, wherein the microphone array comprises three sub-arrays when there is a sufficient number of available microphone elements, and two sub-arrays when there is a small number of available microphone elements.

Furthermore, the transformation matrix B of the microphone array_kThe spatial resolution for conversion is set appropriately for each frequency in the operating frequency range depending on the frequency, i.e., the wave number k, so as to obtain accurate sound field information.

For example, when the spherical harmonic coefficient a is obtained by performing the calculation of the formula (3)_mn(k) Then, if a calculation to a higher order n term is performed, a more accurate spherical harmonic coefficient a is generally obtained with a higher spatial resolution_mn(k) In that respect However, for components of order n not smaller than a particular order determined from, for example, the microphone unit settings, the bezier function value is zero or close to zero.

Thus, according to the present technique, a secondary transformation matrix B is performed_kA process of removing (removing) a row corresponding to each order n not less than a specific order as spatial resolution control so as to improve the transformation matrix B_kCondition number of (2). In other words, the order n for performing the operation, i.e. the transformation matrix B, is limited_kIs subjected toAnd (4) limiting.

In particular, the present technique is advantageous in that a broadband sound field (wave front) can be recorded using a minimum number of omnidirectional microphone units while achieving a high tolerance for errors.

Spatial resolution control can not only improve error tolerance, but also reduce the amount of computation.

Furthermore, including a plurality of sub-arrays in the microphone array makes it possible to increase the sampling density in the angular direction without using few microphone elements. The reason is that, for example, when the microphone units are radially projected onto the ring shape, the arrangement of the plurality of sub-arrays makes it possible to further increase the density of the microphone units projected onto the ring shape centered on the central position of the microphone array, as compared with when the microphone units are arranged in the form of a single circle.

Furthermore, the microphone array according to the present technology has a self-similar shape, i.e., a fractal shape (fractal shape). Thus, the present technology achieves scalability so that a microphone array can be formed even if only a small number of microphone units are used. In other words, as described above, scalable use is possible.

< example of microphone array configuration >

Next, a more specific example of the configuration of a microphone array according to the present technology is described. Fig. 3 shows an example of a configuration of an embodiment of a microphone array in accordance with the present technology.

The microphone array MA11 shown in fig. 3 is an eddy current shaped microphone array comprising a plurality of omnidirectional microphone elements. Note that in fig. 3, each point represents one microphone unit.

In this example, the microphone array MA11 includes 128 microphone elements, and these microphone elements are arranged in the form of eddy currents.

In the microphone array MA11, one sub-array includes 16 microphone elements. In other words, the microphone array MA11 includes 8 sub-arrays having different radii, and the 8 sub-arrays are concentrically arranged.

For example, sub-array SA11 is a portion comprising 16 circularly arranged microphone elements, and likewise sub-array SA12 is a portion comprising 16 circularly arranged microphone elements.

Further, the microphone array MA11 has the above-described features F1 to F3.

For example, the respective sub-arrays included in the microphone array MA11 have shapes that differ only in scaling and rotation angle. Specifically, for example, when sub-array SA11 is enlarged and rotated by a certain angle, enlarged and rotated sub-array SA11 coincides with sub-array SA 12.

Furthermore, the microphone elements of each sub-array are arranged in a circular form centered at a center position O11, which results in the sub-array having a discrete rotationally symmetric shape.

An enlarged view of a portion of the microphone array MA11 is given in fig. 4. Note that in fig. 4, each circle represents one microphone unit. Further, in fig. 4, the same numerals are given in circles respectively representing microphone units included in the same sub-array.

In the example shown in fig. 4, for example, microphone elements having the number "1" are included in the sub-array SA11 shown in fig. 3, and microphone elements having the number "8" are included in the sub-array SA12 shown in fig. 3.

Specifically, this example shows that the respective sub-arrays are adjacently disposed, and the number series including the radius values of these sub-arrays is a generalized arithmetic number series. In other words, for any sub-array, the difference between the radii of adjacent sub-arrays shows one of a plurality of predetermined values corresponding to the tolerance.

Note that the microphone array MA11 shown in fig. 3 is also specifically referred to as an eddy current microphone array hereinafter.

Furthermore, an example has been described above in which the microphone array comprises 8 sub-arrays and each sub-array comprises 16 microphone elements. However, for example, the microphone array may comprise 4 sub-arrays and each sub-array may comprise 32 microphone elements, or the microphone array may comprise 2 sub-arrays and each sub-array may comprise 64 microphone elements.

Further, the microphone array according to the present technology is not limited to the microphone array shown in fig. 3, and may have any configuration as long as it has the features F1 to F3.

Specifically, for example, the microphone array may have a configuration shown in fig. 5.

In other words, a microphone array MA21 formed of a plurality of omnidirectional microphone units arranged in the form of a flower outline is shown in a portion indicated by an arrow Q31 in fig. 5. Note that in the portion indicated by the arrow Q31, each point represents one microphone unit.

The microphone array MA21 includes 8 sub-arrays, each of which includes 16 circularly arranged microphone elements.

An enlarged view of a portion of the microphone array MA21 is given at the portion indicated by arrow Q32. Note that, in the portion indicated by the arrow Q32, each circle represents one microphone unit, and the same numeral is given in the circles respectively representing the microphone units included in the same sub-array.

This example shows that 8 sub-arrays included in the microphone array MA21 are concentrically arranged, and the respective sub-arrays are adjacently arranged.

Specifically, this example shows that the sub-array including the microphone elements having the number "2" and the sub-array including the microphone elements having the number "8" are different in the rotation angle centered on the center position of the microphone array MA21 (i.e., in the arrangement position of the microphone elements in the rotation direction, but the sub-arrays have equal radii).

Also, the sub-array including the microphone unit having the number "3" and the sub-array including the microphone unit having the number "7" are different in rotation angle but have equal radii. Further, the sub-array including the microphone unit having the number "4" and the sub-array including the microphone unit having the number "6" are different in rotation angle but have equal radii.

This microphone array MA21 has the above-mentioned features F1 to F3. Note that the microphone array MA21 is also specifically referred to as a flower-like microphone array hereinafter.

Further, a microphone array according to the present technology may have a configuration shown in fig. 6, fig. 7, or fig. 8, for example.

In other words, for example, a microphone array MA31 formed of a plurality of omnidirectional microphone elements arranged substantially in the form of an eddy current is shown in a portion indicated by an arrow Q41 in fig. 6. Note that in the portion indicated by the arrow Q41, each point represents one microphone unit.

The microphone array MA31 includes 8 sub-arrays, each having the above-described characteristic F1. Further, each sub-array includes 16 circularly arranged microphone units.

An enlarged view of a portion of the microphone array MA31 is given at the portion indicated by arrow Q42. Note that, in the portion indicated by the arrow Q42, each circle represents one microphone unit, and the same numeral is given in the circles respectively representing the microphone units included in the same sub-array.

In this example, 8 sub-arrays included in the microphone array MA31 are concentrically arranged, and the rotation angle of each sub-array is randomly determined when the sub-arrays are arranged.

Further, for example, a microphone array MA41 formed of a plurality of omnidirectional microphone elements arranged substantially in the form of an eddy current is shown in a portion indicated by an arrow Q51 in fig. 7. Note that in the portion indicated by the arrow Q51, each point represents one microphone unit.

The microphone array MA41 includes 8 sub-arrays, each of which includes 16 circularly arranged microphone elements.

In a portion indicated by an arrow Q52, an enlarged view of a part of the microphone array MA41 is given. Note that, in the portion indicated by the arrow Q52, each circle represents one microphone unit, and the same numeral is given in the circles respectively representing the microphone units included in the same sub-array.

In this example, 8 sub-arrays included in the microphone array MA41 are concentrically arranged, and the rotation angle of each sub-array is randomly determined when the sub-arrays are arranged.

In the microphone arrays shown in fig. 6 and 7, the rotation angle of each sub-array is randomly determined, and the microphone arrays shown in fig. 6 and 7 each have the above-described features F1 to F3. Note that such a microphone array is also specifically referred to as a random-shape microphone array hereinafter.

Further, for example, a microphone array MA51 formed of a plurality of omnidirectional microphone elements arranged in a triple circle is shown in fig. 8. Note that in fig. 8, each point represents one microphone unit.

The microphone array MA51 includes 3 sub-arrays, each sub-array including 43 circularly arranged microphone elements.

Specifically, in this example, 3 sub-arrays included in the microphone array MA51 are concentrically arranged, and when one of the 3 sub-arrays is enlarged or reduced and then rotated, the one of the 3 sub-arrays coincides with the other sub-arrays.

The microphone array having the above-described features F1 to F3 makes it possible to reduce the region where the bezier function value is zero, and to improve the transformation matrix B_kCondition number X (k). For example, an eddy current shaped microphone array, a flower shaped microphone array, and a random shaped microphone array are employed, each such that there is no region where the bezier function value is zero, as shown in fig. 9.

Note that in fig. 9, the horizontal axis represents the wave number and the vertical axis represents the order of the spherical harmonic domain. Further, light and dark in fig. 9 represent values of the bezier function, and specifically, an area of a black portion represents an area where the bezier function value is 0 (zero). More specifically, the bezier function values shown in fig. 9 are the maximum values among the bezier function values of each sub-array included in the microphone array.

In fig. 9, a portion indicated by an arrow Q61 represents a bezier function value in each region corresponding to the wave number k and the order n when the eddy current type microphone array is used.

Further, a portion indicated by an arrow Q62 represents a bezier function value in each region corresponding to the wave number k and the order n when the flower-like microphone array is used. Further, a portion indicated by an arrow Q63 represents a bezier function value in each region corresponding to the wave number k and the order n when a random-shape microphone array is used.

These examples indicated by arrows Q61 to Q63 indicate that, in the frequency range from 0kHz to 8kHz, there is no longer a region where the bezier function value is zero with respect to a region where the bezier function value is zero equal to or less than a certain nth order, in the example shown in fig. 1. As described above, when there is no region where the bezier function value is zero any more, this makes it possible to transform the matrix B_kIs smaller, thereby improving the tolerance to errors over a wide frequency range at low cost.

Furthermore, when spatial resolution control is applied to a microphone array according to the present technique, if the microphone units projected onto the loop are closer to each other, the transformation matrix B is_kThe case of (2) is more preferable, for example, as shown in fig. 10.

Note that in fig. 10, the horizontal axis represents frequency, and the vertical axis represents transformation matrix B_kCondition number X (k). Further, in the example of fig. 10, the condition number x (k) is a condition number when spatial resolution control described later is performed.

In this example, curves L11 through L14 represent the condition numbers x (k) of the eddy-shaped microphone array MA11 shown in fig. 3, the flower-shaped microphone array MA21 shown in fig. 5, the random-shaped microphone array MA31 shown in fig. 6, and the random-shaped microphone array MA41 shown in fig. 7, respectively.

Here, this example shows that the condition number x (k) of the flower microphone array MA21 is minimal over the entire frequency range, because in the case of the flower microphone array MA21, the distance between the microphone elements projected onto the ring is the shortest.

On the other hand, in the case of the eddy current type microphone array MA11, the distance between the microphone elements is relatively long for every 8 microphone elements. In other words, the microphone elements included in the sub-array that is included in the microphone array MA11 and is closest to the center, and the microphone elements included in the sub-array that is farthest from the center are disposed away from each other.

Therefore, the distance between the microphone units projected onto the loop is longer than in the case of the microphone array MA21, and the condition number x (k) of the vortex microphone array MA11 is slightly larger than the condition number x (k) of the flower microphone array MA 21.

Further, in the case of the random-shape microphone array MA31 and the random-shape microphone array MA41, the distance between the microphone elements projected onto the ring shape is relatively long. Thus, the condition number x (k) of the microphone arrays MA31 and MA41 is greater than the condition number x (k) of the eddy current microphone array MA 11.

< setting parameters regarding microphone array >

Incidentally, as described above, the present technology makes it possible to parametrically determine the arrangement of each microphone unit in the microphone array.

Here, the parameter indicating the setting of each microphone unit of the microphone array is referred to as a setting parameter, and the set of the plurality of setting parameters is referred to as a setting parameter set. In other words, the settings of each microphone unit comprised in the microphone array are decided by the set of setting parameters.

Specifically, examples of the setting parameters include the number S of sub-arrays, the radius r of each sub-array_s(where S ═ 0, 1.. S-1), and the rotation angle of the sub-array(wherein S-0, 1.. S-1).

Here, the number of sub-arrays S is the number of sub-arrays included in the microphone array, the radius r of the sub-arrays_sCorresponding to the distance from the center position of the microphone array to the microphone elements comprised in the sub-array. Radius r comprising S sub-arrays_sIs also referred to as the radius vector r in the following text_sub。

In addition, the rotation angle of the sub-arrayIs the inclination angle of the sub-array with respect to the given direction as viewed from the center position of the microphone array. In other words, the rotation angle of the sub-arrayIs the direction of rotationIndicating the position of the sub-array in the direction of rotation centered on the central position of the microphone array.

Specifically, for example, it is assumed that the center position of the microphone array is center 0, and the direction serving as the specified reference is the reference direction as viewed from center 0. In this case, for example, the angle of rotationIs the angle between the line connecting the center 0 and the microphone unit included in the sub-array and used as a specified reference and the reference direction.

For example, the direction of the microphone unit included in the sub-array closest to the center 0 and used as a reference is set as the reference direction. In this case, the rotation angle of the sub-arrayIndicating that the other sub-array closest to center 0 is made to coincide with the sub-array by rotating the angle.

Note the rotation angle for the S sub-arraysIs also referred to as the rotation angle vector in the following text

Number of subarrays S as setting parameters, and radius vector r_subAnd a rotation angle vectorHereinafter also referred to as set parameter set

For example, the optimal setting parameters depend on the total number of microphone units Q, the operating frequency range f_min,f_max]Diameter D of microphone unit_mAnd the upper limit of the condition number X (k)X_max。

Here, the total number Q of microphone elements is the number of microphone elements included in the microphone array. The number of sub-arrays comprised in the microphone array, i.e. the number of sub-arrays S, is determined by the total number of microphone units Q.

Specifically, for example, when the total number Q of microphone units is 24, the value of the number S of sub-arrays may be set to 1, 2, 3, 4, 6, 12, or 24.

Furthermore, the operating frequency range [ f ]_min,f_max]Is from the minimum value f of the frequency of the target sound_minTo a maximum value f_maxThe frequency range of (c).

When it is determined to set the parameter set P^Q _optWhen considering the operating frequency range f_min,f_max]And optimizing each setting parameter.

Diameter D of microphone unit_mIs the diameter of the microphone elements included in the microphone array, and D_mIs to determine a radius vector r_subThe lower limit of the absolute value of the tolerance of the generalized arithmetic series of (1).

For example, assume the radius r of two arbitrary sub-arrays_sIs radius r_iAnd radius r_j(wherein i ≠ j). In this case, the radius r_iAnd radius r_jThe following formula (7) needs to be satisfied. The reason is that even if the radius r of the sub-array is considered_sAnd the angle of rotationNor is it possible to physically arrange side by side a plate having a diameter D_mUnless the condition of equation (7) is satisfied.

[ formula 7]

Further, the upper limit X_maxIs in the operating frequency range f_min,f_max]The value of the acceptable condition number X (k), and indicates the state of the optimal condition (the most favorable of the condition number X (k))Large value).

Empirically, a matrix with a condition number x (k) greater than 100 is ill-conditioned and the inverse matrix is unstable, although it depends on the application. However, since multicollinearity is undesirable in many cases if the upper limit X is_maxA setting of about 30 is practically sufficient.

By operating frequency range f based on total number Q of microphone units_min,f_max]Diameter D of microphone unit_mAnd an upper limit X of the above condition number X (k)_maxTo determine the optimal set of parameters P^Qo_ptA microphone array comprising appropriately arranged microphone units may be obtained such that the microphone array has the features F1 to F3.

In particular, for example, by operating in the frequency range [ f ]_min,f_max]In minimizing transformation matrix B_kAnd by the total number Q, diameter D of the microphone units_mAnd upper limit X_maxImposing constraints to obtain an optimal set of parameters P^Q _opt。

Implementing a set of setting parameters P by performing an exhaustive search of possible setting parameters^Q _optTo search for (1). Empirically, substantially optimal results can be obtained through a hyper-heuristic optimization method (e.g., differential evolution).

Note that a hyperheuristic Optimization method (for example, Differential Evolution) is disclosed in detail in, for example, "r.stone and k.price" Differential Evolution-a Simple and efficient statistical for Global Optimization over Spaces ", journal of Global Optimization, 1997" (hereinafter also referred to as reference 3).

< control of spatial resolution >

Next, spatial resolution control in the microphone array is described.

For example, in reference 1 and reference 2, it is advantageous to select an appropriate spatial resolution for each frequency range to obtain greater robustness. This means that a proper choice of spatial resolution results in a well-formed transformation matrix.

In practice, when an arbitrary value kr is given with respect to the wave number k and the radius r of the microphone array, and when the order n is constantly equal to or greater than a certain high order (hereinafter referred to as n)₀(kr)), as the order n becomes higher, the corresponding mode function (bessel function) value becomes closer to zero.

For example, as shown in the following equation (8), the order N determined according to the total number Q of microphone elements of the microphone array is represented by N_arrAnd (4) showing.

[ formula 8]

n₀(kr)＜N_arr＝[(Q-1)/2]···(8)

In this case, the spherical harmonic term (i.e., transformation matrix B)_kCorresponding to order n to greater than n₀N of (kr)_arrElement(s) does not include reliable information for reproducing the wave surface. The reason is that for order n to greater than n₀N of (kr)_arrThe Bessel function value is zero or close to zero.

Thus, in accordance with the present technique, such numerically small spherical harmonic terms are excluded to minimize information loss and improve the transformation matrix B_kThe conditions of (1).

In this case, the constrained transform matrix B is performed_kProcessing the number of lines of (B), as spatial resolution control, transforming the matrix B_kIs used to perform an operation to calculate the spherical harmonic coefficient a_mn(k) Including mode compensation.

In other words, for example, when max (r)_s) Radius r representing each sub-array_sAt the maximum value of (2), by transforming matrix B_kTransformation matrix B obtained by performing spatial resolution controlⁿ⁰ _kIs comprised of a transformation matrix B_kLine 1 to n₀(k×max(r_s) -th rows of matrices. In other words, due to the control of the spatial resolution, on an order n basis₀(k×max(r_s) B) transformation matrix B_kIs limited to n₀(k×max(r_s) ) rows, and obtain a transformation matrix Bⁿ⁰ _kAs a transformation matrix that limits the number of rows.

Here, the matrix B is transformed_kN of (2)₀(k×max(r_s) With lines) th corresponding to order n₀(k×max(r_s) Row of). Order n₀(k×max(r_s) Is max (r) relative to the radius_s) The order of the sub-array of (a). In other words, when the radius r is max (r)_s) When, step n₀(k×max(r_s) Is of order n)₀(kr)。

Any method may be employed as the determination of the order n relative to the radius r₀(kr) method, for example, n may be satisfied₀(kr) ═ th × r in which the threshold value th has a value of 1 or 1.1, or the order n can be determined by performing the calculation of the following formula (9)₀(kr). For example, it is disclosed in detail in the above references 1 and 2 for satisfying n₀(kr) th × r.

[ formula 9]

Note that it is sufficient if the threshold th in equation (9) is a real number between 0 and 1, and a value close to 1 is advantageous. Specifically, for example, the threshold th is set to 0.95. Further, in fig. 3, 5 to 8, and 10 described above, and in fig. 12 described later, the order n defined by the formula (9) is used in its entirety₀(kr_s)。

By performing such spatial resolution control, it is possible to improve the condition of the transform matrix and to improve the error margin. For example, for a microphone array with a corresponding arrangement of microphone units when the spatial resolution is not controlled and when kr-6, the transformation matrix B_kExhibits the values shown in fig. 11. Note that in fig. 11, the horizontal axis represents frequency, and the vertical axis represents the condition number x (k).

In fig. 11, curves L21 through L23 represent the condition numbers of the circular microphone array, the eddy-shaped microphone array MA11 shown in fig. 3, and the flower-shaped microphone array MA21 shown in fig. 5, respectively.

This example shows the transformation matrix B of all microphone arrays_kCondition number of (X: (k) Large in the low frequency range.

Due to the transformation matrix B_kSuch a phenomenon occurs due to the linear correlation caused by the redundant rows, and the microphone array according to the present technology makes it possible to solve such a phenomenon by performing spatial resolution control or appropriate matrix regularization.

On the other hand, for a microphone array with a corresponding arrangement of microphone units when controlling the spatial resolution, the transformation matrix Bⁿ⁰ _kExhibits the values shown in fig. 12. Note that in fig. 12, the horizontal axis represents frequency, and the vertical axis represents the condition number x (k).

In fig. 12, curves L31 through L33 represent the condition numbers of the circular microphone array, the eddy-shaped microphone array MA11 shown in fig. 3, and the flower-shaped microphone array MA21 shown in fig. 5, respectively.

This example shows the transformation matrix B of all microphone arrays compared to the example of fig. 11ⁿ⁰ _kIs smaller in the low frequency range.

Furthermore, the condition number x (k) of a circular microphone array is large depending on the frequency. Such deterioration of the condition of the circular microphone array is a specific feature thereof due to the bezier function value becoming zero, and cannot be solved by performing spatial resolution control or matrix regularization.

On the other hand, for the vortex microphone array MA11 and the flower microphone array MA21, the corresponding condition number x (k) is not greater than 30 at most frequencies. The results show that by performing spatial resolution control on a microphone array with a suitable microphone element arrangement, a better condition number is obtained and the error margin is improved.

< example of configuration of recording System and reproducing System >

Next, a recording system for recording a sound wave surface (sound field) using the above-described microphone array, and a spherical harmonic coefficient a obtained based on the recording system will be described_mn(k) A configuration example of a reproduction system that reproduces a sound wave surface.

Such a recording system and such a reproducing system are configured as shown in fig. 13, for example.

In fig. 13, the recording system includes a microphone array 11 and a recording device 12, and the reproduction system includes a reproduction device 13 and a speaker array 14.

Note that the microphone array 11 may be part of the recording apparatus 12, and the speaker array 14 may be part of the reproduction apparatus 13.

In the recording system, a sound wave front is recorded by a microphone array 11 including a plurality of microphone units, and a multi-channel signal of a sound signal obtained as a result of the recording is supplied to a recording apparatus 12. In other words, the microphone array 11 records a sound wave front by collecting sound using the respective microphone units, and outputs a signal, which is an audio signal obtained by sound collection performed using the respective microphone units, as a multi-channel signal.

The microphone array 11 is used to record a sound field, i.e., a sound wave front, and includes a plurality of sub-arrays. Further, each sub-array includes a plurality of microphone units. Specifically, the microphone array 11 is a microphone array having the above-described features F1 to F3, for example, the microphone arrays shown in fig. 3 and 5 to 8, and the microphone elements included in the microphone array 11 are omnidirectional microphones.

The recording apparatus 12 calculates a spherical harmonic coefficient a using the multi-channel signal supplied from the microphone array 11_mn(k) And the spherical harmonic coefficient a_mn(k) Is supplied to the reproduction means 13.

In this example, the recording apparatus 12 includes an input section 21, a time-frequency analyzer 22, a parameter holding section 23, a spatial resolution controller 24, and a spherical harmonic coefficient calculator 25.

The input section 21 performs analog-to-digital (AD) conversion on the multi-channel signal supplied from the microphone array 11 to convert the analog multi-channel signal into a digital signal, and supplies the digital signal to the time-frequency analyzer 22.

The time-frequency analyzer 22 performs a short-time fourier transform (STFT) on the multi-channel signal supplied from the input section 21, and supplies a time-frequency spectrum obtained as a result of performing the short-time fourier transform to the spherical harmonic coefficient calculator 25. Time frequency spectrum obtained by time frequency analyzer 22Corresponding to the sound pressure shown in equation (4)

The parameter holding section 23 holds the operating frequency range [ f ] based on, for example, the total number Q of microphone units given in advance_min,f_max]Diameter D of microphone unit_mAnd an upper limit X of the condition number X (k)_maxDetermined set of setting parameters P^Q _opt。

For example, the microphone array 11 is a microphone having a set of setting parameters P determined as described above^Q _optA microphone array of determined shape and a set of setting parameters P related to the microphone array 11^Q _optHeld by the parameter holding section 23. In other words, the parameter set P is set^Q _optIs geometrical information indicating the microphone element arrangement of the microphone array 11.

The parameter holding section 23 holds the set parameter P held in the parameter holding section 23^Q _optIs supplied to a spatial resolution controller 24 and a spherical harmonic coefficient calculator 25.

The spatial resolution controller 24 sets the parameter P based on the setting provided by the parameter holding section 23^Q _optTo control the spatial resolution.

In other words, the parameter set P is set based on the inclusion^Q _optDetermined radius max (r) of a sub-array in the microphone array 11_s) The spatial resolution controller 24 performs, for example, the calculation of the above equation (9) for each frequency (i.e., for each wave number K) to calculate (determine) the order n₀(k×max(r_s)). The spatial resolution controller 24 then adjusts the order n obtained as described above₀(k×max(r_s) Is provided to the spherical harmonic coefficient calculator 25 and instructs the spherical harmonic coefficient calculator 25 to restrict the transformation matrix B_kThe number of rows of (c).

The spherical harmonic coefficient calculator 25 uses the time-frequency spectrum supplied from the time-frequency analyzer 22 and the set of setting parameters P supplied from the parameter holding section 23^Q _optAnd order n provided by the spatial resolution controller 24₀(k×max(r_s) To calculate the spherical harmonic coefficient a)_mn(k)。

For example, the spherical harmonic coefficient calculator 25 generates the transformation matrix B limiting the number of rows in accordance with an instruction given by the spatial resolution controller 24ⁿ⁰ _k. Specifically, the spherical harmonic coefficient calculator 25 generates the set of parameters P including the parameters according to the setting^Q _opt(i.e. the microphone element arrangement of the microphone array 11) determined transformation matrix B_kLine 1 to n₀(k×max(r_s) Th rows as a transformation matrix B being the final matrixⁿ⁰ _k。

Set of setting parameters P based on geometrical information as microphone array 11^Q _optAnd an order n as an output of the spatial resolution controller 24₀(k×max(r_s) For each wavenumber K (i.e., for each STFT bin), the transformation matrix B is generatedⁿ⁰ _k。

The spherical harmonic coefficient calculator 25 is based on a transformation matrix Bⁿ⁰ _kThe pseudo-inverse matrix is obtained, and based on the time-frequency spectrum, the calculation of the formula (3) as described above is performed, and the spherical harmonic coefficient a is calculated_mn(k) In that respect For example, the spherical harmonic coefficient calculator 25 uses the Moore-Penrose inverse as the transform matrix Bⁿ⁰ _kThe pseudo-inverse matrix of (2). In other words, the calculation is relative to the transformation matrix Bⁿ⁰ _kInverse of Moore-Penrose as transformation matrix Bⁿ⁰ _kThe pseudo-inverse matrix of (2).

The spherical harmonic coefficient calculator 25 performs calculation similar to the above equation (3), and performs Spherical Harmonic Transform (SHT) and mode compensation simultaneously in the calculation. In this case, the mode compensation is the one corresponding to the equation (1)Divided by b_n(kr), that is, a process of dividing the time-frequency spectrum on which the spherical harmonic transformation has been performed by a mode function (bezier function).

Note that the description here is made in obtaining the spherical harmonic coefficient a_mn(k) Simultaneous execution of spherical harmonic transformationTransform and mode compensation, but the spherical harmonic transform and mode compensation may be performed separately.

In this case, the spherical harmonic coefficient calculator 25 is provided with a processing block for performing spherical harmonic transformation and a processing block for performing pattern compensation. Then, in the processing block for performing spherical harmonic transform, spherical harmonic transform is performed on the time-frequency spectrum, and in the processing block for performing mode compensation, the time-frequency spectrum of the spherical harmonic transform has been divided by a mode function (bezier function). Here, in performing the spherical harmonic transformation and the mode compensation, the operation is performed up to the order of n₀(k×max(r_s) ) the determined item.

In addition, the spherical harmonic coefficient calculator 25 calculates the spherical harmonic coefficient a_mn(k) Output (send) to the rendering system.

In the reproduction system, the spherical harmonic coefficient a output from the spherical harmonic coefficient calculator 25 is used as the basis_mn(k) A drive signal for driving the speaker array 14 is generated, and a sound wave surface is reproduced. The generation of the drive signals may be performed by correcting the loudspeaker characteristics of the loudspeaker array 14 or by using other algorithms.

For example, the reproduction apparatus 13 of the reproduction system includes a speaker setting information holding section 31, a drive signal generator 32, a time-frequency synthesizer 33, and an output section 34.

The speaker setting information holding section 31 holds speaker setting information indicating the setting of speakers included in the speaker array 14, and supplies the held speaker setting information to the drive signal generator 32.

The driving signal generator 32 receives the spherical harmonic coefficient a transmitted from the spherical harmonic coefficient calculator 25_mn(k) Based on received spherical harmonic coefficients a_mn(k) And the speaker setting information supplied from the speaker setting information holding section 31 generates a drive signal, and supplies the generated drive signal to the time-frequency synthesizer 33.

For example, the drive signal generator 32 performs the calculation of the above equation (2), and the calculation represents the sound pressureAs a drive signal in the time-frequency domain. Note that, in the calculation of the formula (2), a radius value of a reproduction region as a region where a sound wave surface is reproduced is used as the radius r_q。

Further, in the calculation of the formula (2), the spherical harmonic coefficient a is simultaneously performed_mn(k) Multiplication by a bezier function (i.e., generation of the drive signal in the spherical harmonic domain), and inverse spherical harmonic transformation (ish) on the generated drive signal. However, the inverse spherical harmonic transform may be performed after generating the drive signal in the spherical harmonic domain. In this case, the drive signal generator 32 is provided with a processing block for generating a drive signal in the spherical harmonic domain, and a processing block for performing inverse spherical harmonic transformation.

The time-frequency synthesizer 33 performs inverse short-time fourier transform (ISTFT) on the drive signal supplied from the drive signal generator 32, and supplies the drive signal in the time domain obtained as a result of performing the inverse short-time fourier transform to the output section 34.

The output section 34 performs digital-to-analog conversion on the drive signal supplied from the time-frequency synthesizer 33, and supplies an analog drive signal obtained as a result of performing the digital-to-analog conversion to the speaker array 14. The speaker array 14 outputs sound based on the drive signal supplied from the output section 34 to reproduce the sound wave surface recorded by the recording system.

For example, the speaker array 14 is obtained by rectangular arrangement of line speaker arrays, each of the line speaker arrays is obtained by linearly arranging speakers, and the region located inside the speaker array 14 is a reproduction region of a wave surface. Note that the speaker array 14 may have any shape, i.e. the speaker array 14 may have any speaker arrangement.

< description of recording processing >

Next, operations of the recording system and the reproducing system shown in fig. 13 are described.

First, a recording process performed by the recording system is described with reference to a flowchart of fig. 14. It is noted thatBefore the recording process is started, the parameter set P is set in advance by the parameter holding section 23 or another processing block^Q _optAnd a set parameter P obtained as a result of the determination^Q _optHeld by the parameter holding section 23.

In step S11, the spatial resolution controller 24 sets the parameter set P based on the setting provided by the parameter holding section 23^Q _optTo control the spatial resolution.

For example, the spatial resolution controller 24 performs a calculation such as the above equation (9) to calculate the order n₀(k×max(r_s) Calculated order n) will be calculated₀(k×max(r_s) Is provided to the spherical harmonic coefficient calculator 25 and instructs the spherical harmonic coefficient calculator 25 to restrict the transformation matrix B_kThe number of rows of (c).

In step S12, the microphone array 11 collects ambient sound using the microphone units, and supplies a multi-channel signal obtained as a result of the collection to the input section 21. The input section 21 performs AD conversion on the multi-channel signal supplied from the microphone array 11, and supplies the multi-channel signal on which the AD conversion has been performed to the time-frequency analyzer 22.

In step S13, the time frequency analyzer 22 performs a short-time fourier transform on the multi-channel signal supplied from the input section 21, and supplies a time frequency spectrum obtained as a result of the short-time fourier transform to the spherical harmonic coefficient calculator 25.

In step S14, the spherical harmonic coefficient calculator 25 sets the parameter set P from the parameter holding unit 23 based on the time-frequency spectrum from the time-frequency analyzer 22^Q _optAnd order n from the spatial resolution controller 24₀(k×max(r_s) To calculate the spherical harmonic coefficient a)_mn(k)。

In other words, the spherical harmonic coefficient calculator 25 bases on the order n according to the instruction given by the spatial resolution controller 24₀(k×max(r_s) Generate transformation matrix Bⁿ⁰ _kAnd calculating the generated transformation matrix Bⁿ⁰ _kThe pseudo-inverse matrix of (2). Then, the spherical harmonic coefficient calculator 25 performs a homologation-like process based on the obtained pseudo-inverse matrix and the time-frequency spectrumCalculating the formula (3) and calculating the spherical harmonic coefficient a_mn(k)。

The spherical harmonic coefficient calculator 25 outputs the spherical harmonic coefficient a calculated as described above_mn(k) And the recording process is terminated.

As described above, the recording system uses the recording medium with the parameter set P set according to the setting^Q _optThe microphone array 11 of the determined shape (microphone unit arrangement) records the wave surface, and calculates the spherical harmonic coefficient a using a transformation matrix obtained by controlling the spatial resolution_mn(k) In that respect This enables broadband sound field recording to be performed at low cost.

< description of reproduction processing >

Next, the reproduction processing performed by the reproduction system is described with reference to the flowchart of fig. 15. When the spherical harmonic coefficient a transmitted from the recording system is received by the drive signal generator 32 of the reproducing unit 13_mn(k) When this occurs, the playback processing is started.

In step S41, the drive signal generator 32 receives the spherical harmonic coefficient a_mn(k) And the speaker setting information supplied from the speaker setting information holding section 31 generates a drive signal, and supplies the generated drive signal to the time-frequency synthesizer 33. For example, in step S41, the calculation of the above equation (2) is performed, and the instruction sound pressure is calculatedAs a drive signal in the time-frequency domain.

In step S42, the time frequency synthesizer 33 performs inverse short-time fourier transform on the drive signal supplied from the drive signal generator 32, and supplies the drive signal in the time domain obtained as a result of performing the inverse short-time fourier transform to the output section 34. Further, the output section 34 performs DA conversion on the drive signal supplied from the time-frequency synthesizer 33, and supplies an analog drive signal obtained as a result of the DA conversion to the speaker array 14.

In step S43, the speaker array 14 outputs sound based on the drive signal supplied from the output section 34 to reproduce the sound wave surface recorded by the recording system, and the reproduction process is terminated.

As described above, the reproduction system depends on the received spherical harmonic coefficients a_mn(k) A drive signal is generated, and a sound wave surface is reproduced based on the generated drive signal. The reproduction system makes it possible to reproduce the data by using a spherical harmonic coefficient a based on the data received from the recording system_mn(k) The wave surface is reproduced to perform wideband wave surface reproduction.

< example of configuration of computer >

Incidentally, the series of processes described above may be executed using hardware or software. When a series of processes is performed using software, a program including the software is installed on a computer. Here, examples of the computer include a computer incorporated into dedicated hardware, and a computer capable of executing various functions by various programs installed thereon, for example, a general-purpose personal computer.

Fig. 16 is a block diagram of an example of a hardware configuration of a computer that executes the above-described series of processes using a program.

In the computer, a Central Processing Unit (CPU)501, a Read Only Memory (ROM)502, and a Random Access Memory (RAM)503 are connected to each other by a bus 504.

Further, an input/output interface 505 is connected to the bus 504. The input unit 506, the output unit 507, the recording unit 508, the communication unit 509, and the drive 510 are connected to the input/output interface 505.

The input section 506 includes, for example, a keyboard, a mouse, a microphone array, and an imaging element. The output section 507 includes, for example, a display and a speaker array. The recording unit 508 includes, for example, a hard disk and a nonvolatile memory. The communication unit 509 includes, for example, a network interface. The drive 510 drives a removable recording medium 511, such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.

In the computer having the above-described configuration, for example, the CPU 501 loads a program stored in the recording section 508 into the RAM 503 via the input/output interface 505 and the bus 504 and executes the program, thereby executing the series of processes described above.

For example, the program executed by the computer (CPU 501) may be provided by being stored in a removable recording medium 511 serving as, for example, a package medium. Further, the program may be provided via a wired or wireless transmission medium (e.g., a local area network, the internet, or digital satellite broadcasting).

In the computer, the program can be installed on the recording section 508 via the input/output interface 505 by a removable recording medium 511 installed on the drive 510. Further, the program may be received by the communication section 509 via the input/output interface 505 to be installed on the recording section 508. Further, the program may be installed in advance on the ROM 502 or the recording portion 508.

Note that the program executed by the computer may be a program that executes processing in chronological order in the order described herein, or may be a program that executes processing in parallel or at necessary timing (for example, at the time of calling).

Furthermore, the embodiments of the present technology are not limited to the above-described examples, and various modifications may be made thereto without departing from the scope of the present technology.

For example, the present technology may also have a configuration of cloud computing in which a plurality of devices share a task of a single function and cooperate to perform the single function via a network.

Further, the respective steps described using the above flowcharts may be shared by a plurality of apparatuses to be executed, in addition to being executed by a single apparatus.

Further, when a single step includes a plurality of processes, the plurality of processes included in the single step may be shared by a plurality of apparatuses to be executed, in addition to being executed by a single apparatus.

Further, the present technology can also adopt the following configuration.

(1) A microphone array for sound field recording, comprising:

a plurality of sub-arrays, each sub-array comprising a plurality of microphones, and each sub-array having a discrete rotationally symmetric shape comprising a specified radius, wherein,

when the radius values of the plurality of sub-arrays form a series, the series is a generalized arithmetic series.

(2) The microphone array according to (1), wherein,

each of the plurality of microphones comprised in the sub-array is arranged at a distance from a central position of the microphone array, which distance corresponds to a radius of the sub-array.

(3) The microphone array according to (1) or (2), wherein,

when at least one of a zoom-in operation, a zoom-out operation, a rotation operation, or an inversion operation is performed on one of the plurality of sub-arrays, the one of the plurality of sub-arrays coincides with another one of the plurality of sub-arrays.

(4) The microphone array according to any one of (1) to (3), wherein,

the plurality of microphones are arranged such that when all of the plurality of microphones included in the microphone array are radially projected onto a ring centered on the center position of the microphone array, the projected microphones of the plurality of microphones are equally spaced on the ring.

(5) The microphone array according to any one of (1) to (4), wherein,

(6) A recording apparatus comprises

A spherical harmonic coefficient calculator that calculates spherical harmonic coefficients based on multi-channel signals obtained by sound collection performed by a microphone array for sound field recording, the microphone array including a plurality of sub-arrays, each sub-array including a plurality of microphones, and each sub-array having a discrete rotationally symmetric shape including a specified radius, wherein,

when the radius values of the plurality of sub-arrays form a series, the series is a generalized arithmetic series.

(7) The recording apparatus according to (6), wherein,

the spherical harmonic coefficient calculator calculates a spherical harmonic coefficient by performing mode compensation.

(8) The recording apparatus according to (7), further comprising

A spatial resolution controller limits a number of rows of a transform matrix used to perform the mode compensation based on a particular order of the spherical harmonic domain.

(9) The recording apparatus according to (8), wherein,

the spatial resolution controller determines the particular order based on a maximum value of the radii of the plurality of sub-arrays.

(10) The recording apparatus according to (8) or (9), wherein,

the spherical harmonic coefficient calculator calculates a spherical harmonic coefficient based on a pseudo-inverse matrix of a transformation matrix that limits the number of lines and the multi-channel signal by performing mode compensation.

(11) A recording method comprises

Calculating, by a recording apparatus, spherical harmonic coefficients based on a multi-channel signal obtained by sound collection performed by a microphone array for sound field recording, the microphone array including a plurality of sub-arrays, each sub-array including a plurality of microphones, and each sub-array having a discrete rotationally symmetric shape including a specified radius, wherein,

when the radius values of the plurality of sub-arrays form a number series, the number series is a generalized arithmetic number series.

(12) A program for causing a computer to execute processing, comprising:

calculating spherical harmonic coefficients based on multi-channel signals obtained by sound collection performed by a microphone array for sound field recording, the microphone array including a plurality of sub-arrays, each sub-array including a plurality of microphones, and each sub-array having a discrete rotationally symmetric shape including a specified radius, wherein,

when the radius values of the plurality of sub-arrays form a number series, the number series is a generalized arithmetic number series.

List of reference numerals

11 microphone array

12 recording device

22 time-frequency analyzer

23 parameter holding part

24 spatial resolution controller

25 spherical harmonic coefficient calculator.

37页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：分布式换能器悬架纸盆(DTSC)

Microphone array, recording device and method, and program

相关技术

网友询问留言