Magnetic detection of replay attacks

文档序号：1525429 发布日期：2020-02-11 浏览：9次中文

阅读说明：本技术 重放攻击的磁检测 (Magnetic detection of replay attacks ) 是由 C·阿朗索 J·P·莱索于 2018-06-25 设计创作，主要内容包括：一种检测语音生物测定系统上的重放攻击的方法,包括：接收表示话音的音频信号；检测磁场；确定所述音频信号和所述磁场之间是否存在相关性；以及如果所述音频信号和所述磁场之间存在相关性,则确定所述音频信号可能是由重放攻击所导致的。(A method of detecting a replay attack on a voice biometric system, comprising: receiving an audio signal representing speech; detecting a magnetic field; determining whether a correlation exists between the audio signal and the magnetic field; and determining that the audio signal is likely to be caused by a replay attack if a correlation exists between the audio signal and the magnetic field.)

1. A method of detecting a replay attack on a voice biometric system, the method comprising:

receiving an audio signal representing speech;

detecting a magnetic field;

determining whether a correlation exists between the audio signal and the magnetic field; and

determining that the audio signal is likely to be caused by a replay attack if a correlation exists between the audio signal and the magnetic field.

2. The method of claim 1, wherein determining whether a correlation exists between the audio signal and the magnetic field comprises:

identifying a first time period during which the audio signal contains speech;

identifying a second time period during which the magnetic field differs from the baseline; and

determining whether the first period of time and the second period of time are substantially the same.

3. The method of claim 2, comprising: determining that the first time period and the second time period are substantially the same if more than 60% of a first time period during which the audio signal contains speech overlaps a second time period during which the magnetic field is substantially different from baseline, and/or more than 60% of a second time period during which the magnetic field is substantially different from baseline overlaps a first time period during which the audio signal contains speech.

4. The method of claim 3, comprising: determining that a first time period during which the audio signal contains speech and a second time period during which the magnetic field differs significantly from baseline are substantially the same if more than 80% of the first time period overlaps with a second time period during which the magnetic field differs significantly from baseline, and/or more than 80% of the second time period overlaps with a first time period during which the audio signal contains speech.

5. The method of claim 1, wherein determining whether a correlation exists between the audio signal and the magnetic field comprises:

sampling the detected magnetic field at a first sampling rate;

sampling the audio signal at a second sampling rate; and

it is determined whether there is a correlation between the sampled audio signal and the sampled detected magnetic field.

6. The method of claim 5, comprising:

receiving a series of signal values representative of magnetic field strength;

forming an average of the magnetic field strength over a period of time; and

subtracting the average value of the magnetic field strength from the series of signal values representing the magnetic field strength to form the detected magnetic field.

7. The method of claim 5 or 6, comprising:

a digital audio signal is obtained at a third sampling rate and the digital audio signal is undersampled to form the audio signal at the second sampling rate.

8. The method of claim 5, 6 or 7, wherein the second sampling rate is approximately equal to the sampling rate.

9. The method of claim 5, 6, 7 or 8, wherein the step of determining whether there is a correlation between the sampled audio signal and the sampled detected magnetic field comprises: a mathematical correlation operation is performed on the sampled audio signal and the sampled detected magnetic field to obtain an output correlation function, and a determination is made whether a peak value of the output correlation function exceeds a predetermined threshold.

10. The method of claim 1, further comprising:

determining a direction of a source of the audio signal representing speech;

determining a direction of a source of the magnetic field; and

determining that the audio signal is likely to be caused by a replay attack if the direction of the source of the audio signal representing speech corresponds to the direction of the source of the magnetic field.

11. A system for detecting a replay attack on a voice biometric system, the system configured to:

receiving an audio signal representing speech;

detecting a magnetic field;

determining whether a correlation exists between the audio signal and the magnetic field; and

determining that the audio signal is likely to be caused by a replay attack if a correlation exists between the audio signal and the magnetic field.

12. The system of claim 11, configured to determine whether a correlation exists between the audio signal and the magnetic field by:

identifying a first time period during which the audio signal contains speech;

identifying a second period of time during which the magnetic field differs from the baseline; and

determining whether the first period of time and the second period of time are substantially the same.

13. The system of claim 12, configured to: determining that the first time period and the second time period are substantially the same if more than 60% of a first time period during which the audio signal contains speech overlaps a second time period during which the magnetic field is substantially different from baseline, and/or more than 60% of a second time period during which the magnetic field is substantially different from baseline overlaps a first time period during which the audio signal contains speech.

14. The system of claim 13, configured to: determining that a first time period during which the audio signal contains speech and a second time period during which the magnetic field differs significantly from baseline are substantially the same if more than 80% of the first time period overlaps with a second time period during which the magnetic field differs significantly from baseline, and/or more than 80% of the second time period overlaps with a first time period during which the audio signal contains speech.

15. The system of claim 11, configured to determine whether a correlation exists between the audio signal and the magnetic field by:

sampling the detected magnetic field at a first sampling rate;

sampling the audio signal at a second sampling rate; and

it is determined whether there is a correlation between the sampled audio signal and the sampled detected magnetic field.

16. The system of claim 15, configured to:

receiving a series of signal values representative of magnetic field strength;

forming an average of the magnetic field strength over a period of time; and

subtracting the average value of the magnetic field strength from the series of signal values representing the magnetic field strength to form the detected magnetic field.

17. The system of claim 15 or 16, configured to:

a digital audio signal is obtained at a third sampling rate and the digital audio signal is undersampled to form the audio signal at the second sampling rate.

18. The system of claim 15, 16 or 17, wherein the second sampling rate is approximately equal to the sampling rate.

19. The system of claim 15, 16, 17 or 18, configured to determine whether there is a correlation between the sampled audio signal and the sampled detected magnetic field by: a mathematical correlation operation is performed on the sampled audio signal and the sampled detected magnetic field to obtain an output correlation function, and a determination is made whether a peak value of the output correlation function exceeds a predetermined threshold.

20. The system of any of claims 11 to 19, further configured to:

determining a direction of a source of the audio signal representing speech;

determining a direction of a source of the magnetic field;

21. A method of detecting a replay attack on a voice biometric system, the method comprising:

receiving an audio signal representing speech;

detecting a magnetic field; and

determining that the audio signal is likely to be caused by a replay attack if the strength of the magnetic field exceeds a threshold.

22. A system for detecting a replay attack on a voice biometric system, the system configured to:

receiving an audio signal representing speech;

detecting a magnetic field; and

determining that the audio signal is likely to be caused by a replay attack if the strength of the magnetic field exceeds a threshold.

23. A method of detecting a replay attack on a voice biometric system, the method comprising:

receiving an audio signal representing speech;

determining a direction of a source of the audio signal representing speech;

detecting a magnetic field;

determining a direction of a source of the magnetic field; and

24. The method of claim 23, comprising receiving audio signals representing speech from a plurality of microphones.

25. A method according to claim 23 or 24, comprising detecting components of the magnetic field in three orthogonal directions.

26. A system for detecting a replay attack on a voice biometric system, the system configured to:

receiving an audio signal representing speech;

determining a direction of a source of the audio signal representing speech;

detecting a magnetic field;

determining a direction of a source of the magnetic field; and

27. The system of claim 26, configured to receive audio signals representing speech from a plurality of microphones.

28. A system according to claim 26 or 27, configured to detect components of a magnetic field in three orthogonal directions.

29. An apparatus comprising the system of any of claims 11-20, 22 or 26-28.

30. The device of claim 29, wherein the device comprises a mobile phone, an audio player, a video player, a mobile computing platform, a gaming device, a remote controller device, a toy, a machine, or a home automation controller or a home appliance.

31. A computer program product comprising a tangible medium readable by a computer and instructions for performing the method of any of claims 1-10, 21 or 23-25.

32. A non-transitory computer-readable storage medium having stored thereon computer-executable instructions that, when executed by processor circuitry, cause the processor circuitry to perform the method of any of claims 1-10, 21 or 23-25.

33. An apparatus comprising the non-transitory computer-readable storage medium of claim 32.

34. The device of claim 33, wherein the device comprises a mobile phone, an audio player, a video player, a mobile computing platform, a gaming device, a remote controller device, a toy, a machine, or a home automation controller or a home appliance.

Technical Field

Embodiments described herein relate to methods and apparatus for detecting replay attacks on voice biometric systems.

Background

Speech biometric systems are becoming more and more widely used. In such systems, users train the system by providing their voice samples during the enrollment phase. In subsequent use, the system is able to distinguish between registered users and unregistered speakers. Voice biometric systems can in principle be used to control access to various services and systems.

One way in which malicious parties attempt to disable a voice biometric system is to obtain a recording of the registered user's voice and then attempt to play back the recording to impersonate the registered user and to gain access to services intended to be restricted to the registered user.

This is known as a replay attack or a spoofing attack.

Disclosure of Invention

According to one aspect of the present invention, a method of detecting a replay attack on a voice biometric system is provided. The method comprises the following steps: receiving an audio signal representing speech; detecting a magnetic field; determining whether a correlation exists between the audio signal and the magnetic field; and determining that the audio signal is likely to be caused by a replay attack if a correlation exists between the audio signal and the magnetic field.

Determining whether a correlation exists between the audio signal and the magnetic field may include: identifying a first time period during which the audio signal contains speech; identifying a second time period during which the magnetic field differs from the baseline; and determining whether the first period and the second period are substantially the same.

The method can comprise the following steps: determining that the first time period and the second time period are substantially the same if more than 60% of a first time period during which the audio signal contains speech overlaps a second time period during which the magnetic field is substantially different from the baseline, and/or more than 60% of a second time period during which the magnetic field is substantially different from the baseline overlaps the first time period during which the audio signal contains speech. The method can comprise the following steps: determining that the first time period and the second time period are substantially the same if more than 80% of a first time period during which the audio signal contains speech overlaps a second time period during which the magnetic field is substantially different from the baseline, and/or more than 80% of a second time period during which the magnetic field is substantially different from the baseline overlaps the first time period during which the audio signal contains speech.

Determining whether a correlation exists between the audio signal and the magnetic field may include: sampling the detected magnetic field at a first sampling rate; sampling the audio signal at a second sampling rate; and determining whether a correlation exists between the sampled audio signal and the sampled detected magnetic field.

The method can comprise the following steps: receiving a series of signal values representative of magnetic field strength; forming an average of the magnetic field strength over a period of time; and subtracting the average value of the magnetic field strength from the series of signal values representing the magnetic field strength to form the detected magnetic field.

The method can comprise the following steps: a digital audio signal is obtained at a third sampling rate and the digital audio signal is undersampled to form the audio signal at the second sampling rate.

The second sampling rate is approximately equal to the sampling rate.

The step of determining whether there is a correlation between the sampled audio signal and the sampled detected magnetic field comprises: a mathematical correlation operation is performed on the sampled audio signal and the sampled detected magnetic field to obtain an output correlation function, and it is determined whether a peak value of the output correlation function exceeds a predetermined threshold.

The method may further comprise: determining a direction of a source of the audio signal representing speech; determining a direction of a source of the magnetic field; and determining that the audio signal is likely to be caused by a replay attack if the direction of the source of the audio signal representing speech corresponds to the direction of the source of the magnetic field.

According to one aspect of the present invention, there is provided a system for detecting a replay attack on a voice biometric system, the system being configured to: receiving an audio signal representing speech; detecting a magnetic field; determining whether a correlation exists between the audio signal and the magnetic field; and determining that the audio signal is likely to be caused by a replay attack if a correlation exists between the audio signal and the magnetic field.

The system may be configured to determine whether a correlation exists between the audio signal and the magnetic field by: identifying a first time period during which the audio signal contains speech; identifying a second period of time during which the magnetic field differs from the baseline; and determining whether the first period and the second period are substantially the same.

The system may be configured to: determining that the first time period and the second time period are substantially the same if more than 60% of a first time period during which the audio signal contains speech overlaps a second time period during which the magnetic field is substantially different from the baseline, and/or more than 60% of a second time period during which the magnetic field is substantially different from the baseline overlaps the first time period during which the audio signal contains speech.

The system may be configured to: determining that the first time period and the second time period are substantially the same if more than 80% of a first time period during which the audio signal contains speech overlaps a second time period during which the magnetic field is substantially different from the baseline, and/or more than 80% of a second time period during which the magnetic field is substantially different from the baseline overlaps the first time period during which the audio signal contains speech.

The system may be configured to determine whether a correlation exists between the audio signal and the magnetic field by: sampling the detected magnetic field at a first sampling rate; sampling the audio signal at a second sampling rate; and determining whether a correlation exists between the sampled audio signal and the sampled detected magnetic field.

The system may be configured to: receiving a series of signal values representative of magnetic field strength; forming an average of the magnetic field strength over a period of time; and subtracting the average value of the magnetic field strength from the series of signal values representing the magnetic field strength to form the detected magnetic field.

The system may be configured to: a digital audio signal is obtained at a third sampling rate and the digital audio signal is undersampled to form the audio signal at the second sampling rate.

The second sampling rate may be approximately equal to the sampling rate.

The system may be configured to determine whether a correlation exists between the sampled audio signal and the sampled detected magnetic field by: a mathematical correlation operation is performed on the sampled audio signal and the sampled detected magnetic field to obtain an output correlation function, and it is determined whether a peak value of the output correlation function exceeds a predetermined threshold.

The system may be further configured to: determining a direction of a source of the audio signal representing speech; determining a direction of a source of the magnetic field; and determining that the audio signal is likely to be caused by a replay attack if the direction of the source of the audio signal representing speech corresponds to the direction of the source of the magnetic field.

According to one aspect of the present invention, a method of detecting a replay attack on a voice biometric system is provided. The method comprises the following steps: receiving an audio signal representing speech; detecting a magnetic field; and determining that the audio signal is likely to be caused by a replay attack if the strength of the magnetic field exceeds a threshold.

According to one aspect of the present invention, there is provided a system for detecting a replay attack on a voice biometric system, the system being configured to: receiving an audio signal representing speech; detecting a magnetic field; and determining that the audio signal is likely to be caused by a replay attack if the strength of the magnetic field exceeds a threshold.

According to one aspect of the present invention, a method of detecting a replay attack on a voice biometric system is provided. The method comprises the following steps: receiving an audio signal representing speech; determining a direction of a source of the audio signal representing speech; detecting a magnetic field; determining a direction of a source of the magnetic field; and determining that the audio signal is likely to be caused by a replay attack if the direction of the source of the audio signal representing speech corresponds to the direction of the source of the magnetic field.

The method may include receiving audio signals representing speech from a plurality of microphones.

The method may comprise detecting components of the magnetic field in three orthogonal directions.

According to one aspect of the present invention, there is provided a system for detecting a replay attack on a voice biometric system, the system being configured to: receiving an audio signal representing speech; determining a direction of a source of the audio signal representing speech; detecting a magnetic field; determining a direction of a source of the magnetic field; and determining that the audio signal is likely to be caused by a replay attack if the direction of the source of the audio signal representing speech corresponds to the direction of the source of the magnetic field.

The system may be configured to receive audio signals representing speech from a plurality of microphones.

The system may be configured to detect components of the magnetic field in three orthogonal directions.

According to an aspect of the invention, there is provided an apparatus comprising a system according to any of the above aspects. The device may comprise a mobile phone, an audio player, a video player, a mobile computing platform, a gaming device, a remote controller device, a toy, a machine or a home automation controller or a household appliance.

According to an aspect of the invention, there is provided a computer program product comprising a tangible medium readable by a computer and instructions for performing a method according to any of the preceding aspects.

According to an aspect of the invention, there is provided a non-transitory computer-readable storage medium having stored thereon computer-executable instructions that, when executed by processor circuitry, cause the processor circuitry to perform a method according to any one of the preceding aspects.

According to an aspect of the invention, there is provided an apparatus comprising the non-transitory computer-readable storage medium. The device comprises a mobile phone, an audio player, a video player, a mobile computing platform, a gaming device, a remote controller device, a toy, a machine or home automation controller or a household appliance.

Drawings

For a better understanding of the present invention, and to show how the same may be carried into effect, reference will now be made to the accompanying drawings, in which:

FIG. 1 illustrates a smart phone;

fig. 2 is a schematic diagram illustrating the form of a smartphone;

FIG. 3 illustrates a first scenario in which a replay attack is being performed;

FIG. 4 illustrates a second scenario in which a replay attack is being performed;

FIG. 5 is a flow chart illustrating a method according to the present invention;

FIG. 6 is a block diagram of a system for implementing a method;

FIG. 7 illustrates the results of one method;

FIG. 8 illustrates the results of the second method;

FIG. 9 illustrates other scenarios in which a replay attack is being performed; and

FIG. 10 illustrates another system for implementing a method.

Detailed Description

The following description sets forth example embodiments according to the present disclosure. Other example embodiments and implementations will be apparent to those of ordinary skill in the art. In addition, those of ordinary skill in the art will recognize that various equivalent techniques may be applied in place of or in combination with the embodiments discussed below, and all such equivalents should be considered encompassed by the present disclosure.

Fig. 1 illustrates a smartphone 10 having a microphone 12 for detecting ambient sounds. In normal use, the microphone is of course used to detect the voice of the user holding the smartphone 10.

Fig. 2 is a schematic diagram illustrating the form of the smartphone 10.

In particular, fig. 2 shows a number of interconnected components of the smartphone 10. It should be understood that the smartphone 10 will in fact contain many other components, but the following description is sufficient for understanding the present invention.

Thus, fig. 2 shows the microphone 12 described above. In some embodiments, the smartphone 10 is provided with a plurality of microphones 12, 12a, 12b, etc.

Fig. 2 also shows a memory 14, which memory 14 may actually be provided as a single component or as multiple components. The memory 14 is arranged to store data and program instructions.

Fig. 2 also shows a processor 16, as such the processor 16 may actually be provided as a single component or as multiple components. For example, one component of the processor 16 may be an application processor of the smartphone 10.

Fig. 2 also shows a transceiver 18, which transceiver 18 is arranged to allow the smartphone 10 to communicate with an external network. For example, the transceiver 18 may include circuitry for establishing an internet connection over a WiFi local area network or over a cellular network.

Fig. 2 also shows audio processing circuitry 20 for performing operations on the audio signal detected by the microphone 12 as needed. For example, audio processing circuitry 20 may filter the audio signal or perform other signal processing operations.

Fig. 2 also shows at least one sensor 22. In an embodiment of the invention, the sensor is a magnetic field sensor for detecting a magnetic field. For example, the sensor 22 may be a Hall effect sensor capable of providing separate measurements of magnetic field strength in three orthogonal directions.

In this embodiment, the smartphone 10 is provided with a voice biometric function and a control function. Thus, the smartphone 10 is capable of performing a variety of functions in response to voice commands (spoken commands) from a registered user. The biometric function is able to distinguish between voice commands from a registered user and the same commands spoken by different people. Accordingly, certain embodiments of the present invention relate to the operation of a smartphone or another portable electronic device (e.g., a tablet or laptop computer, a gaming console, a home control system, a home entertainment system, an in-vehicle entertainment system, a home appliance, etc., where voice biometric functions are performed in a device intended to perform voice commands) with some voice operability. Certain other embodiments relate to a system that performs a voice biometric function on a smartphone or other device that sends a command to a separate device if the voice biometric function is able to confirm that the speaker is a registered user.

In some embodiments, while the voice biometric function is performed on the smartphone 10 or other device located near the user, the transceiver 18 is used to transmit the voice command to a remote voice recognition system that determines the meaning of the voice command. For example, the voice recognition system may be located on one or more remote servers in a cloud computing environment. A signal based on the meaning of the voice command is then returned to the smartphone 10 or other local device.

One attempt to spoof voice biometric systems is to play a recording of the registered user's voice in the form of a so-called replay attack or spoof attack.

Fig. 3 illustrates one embodiment of a scenario in which a replay attack is being performed. Thus, in fig. 3, the smartphone 10 is provided with a voice biometric function. In this embodiment, the smartphone 10 is at least temporarily held by an attacker who owns another smartphone 30. The smartphone 30 has been used to record the voice of a registered user of the smartphone 10. The smartphone 30 is brought close to the microphone inlet 12 of the smartphone 10 and a recording of the registered user's voice is played back. If the voice biometric system is not able to detect that the registered user's voice it detects is a recording, the attacker will gain access to one or more services that are only accessible by the registered user.

It is known that due to size limitations, smartphones (such as smartphone 30) are typically provided with speakers having a relatively low mass. Thus, a recording of the registered user's voice played back through such a speaker will not be a perfect match to the user's voice, and this fact can be used to identify a replay attack. For example, a speaker may have certain frequency characteristics that, if detectable in a voice signal received by a speech biometric system, may be considered to be caused by a replay attack.

Fig. 4 shows a second embodiment of a situation in which a replay attack is being performed, attempting to disrupt the above detection method. Thus, in fig. 4, the smartphone 10 is provided with a voice biometric function. Also in this embodiment, the smartphone 10 is at least temporarily occupied by an attacker who owns another smartphone 40. The smartphone 40 has been used to record the voice of a registered user of the smartphone 10.

In this embodiment, the smartphone 40 is connected to a high quality speaker 50. The microphone inlet 12 of the smartphone 10 is then positioned close to the speaker 50 and a recording of the registered user's voice is played back through the speaker 50. As before, if the voice biometric system is not able to detect that the registered user's voice it detects is a recording, the attacker will gain access to one or more services that are only accessible by the registered user.

In this embodiment, the speaker 50 may be of sufficiently high quality that a recording of the registered user's speech played back through the speaker will not be reliably distinguishable from the user's speech, and thus the audio characteristics of the voice signal cannot be used to identify a replay attack.

However, it will be appreciated that many loudspeakers (in particular, many high quality loudspeakers) are electromagnetic loudspeakers in which an electrical audio signal is applied to a voice coil located between the poles of a permanent magnet, causing the coil to move rapidly backwards and forwards. This motion moves a diaphragm attached to a coil backward and forward, generating acoustic waves. It is recognized herein that if a device such as the smartphone 10 is positioned close to the speaker while playing back sound, there will be a corresponding change in the magnetic field that will be detectable by the magnetic field sensor 22.

Fig. 5 is a flowchart illustrating a method of detecting a replay attack on a voice biometric system, and fig. 6 is a block diagram illustrating functional blocks in the voice biometric system.

Specifically, in step 60 of the method of fig. 5, an audio signal is received at the input 80 of the system shown in fig. 6. For example, in the device shown in fig. 2, the audio signal received at input 80 may be the audio signal detected by microphone 12, or if there is more than one microphone, may be the sum of the audio signals detected by the microphones.

Meanwhile, in step 62 of the method of fig. 5, an input signal is received at input 82 of the system shown in fig. 6. The input signal received on input 82 is received from a magnetometer. For example, when the method is performed in a device such as a smartphone or tablet computer, the device will typically include a three axis magnetometer that generates output signals containing separate measurements of magnetic field strength in three orthogonal directions.

In some embodiments, the input signal received from the magnetometer is passed to a first pre-processing block 84. For example, if the signal received from the magnetometer contains separate measurements of magnetic field strength in three orthogonal directions, these measurements may be combined to provide a single measurement of magnetic field strength. The measurement of the magnetic field strength can be considered as the square root of the sum of the squares of three separate measurements of the magnetic field strength in three orthogonal directions.

Furthermore, the purpose of the system is to determine any magnetic field generated by a nearby object (e.g. a loudspeaker). In order to obtain the most useful information about this, one possibility is to process the input signals received from the magnetometer in order to eliminate the influence of the earth's magnetic field. This may be achieved, for example, by forming an average of the magnetic field strength, which is subtracted from each individual measurement to obtain an instantaneous measurement of the magnetic field generated by the artificial source, for a period of at least a few seconds, and possibly minutes or hours, for example. When considering measurements of magnetic field strength in three orthogonal directions alone, these measurements will depend to a large extent on the orientation of the device in the earth's magnetic field. The orientation may be determined from signals generated by accelerometers present in the device and may therefore be taken into account when determining the artificial magnetic field generated by a nearby object (e.g. a loudspeaker).

Typically, in a smartphone, a magnetometer generates a digital signal, with a sampling rate in the range of 80-120Hz, which can be applied as an input signal on the input 82 of the system.

In some embodiments, the audio signal received on input 80 is passed to a second pre-processing block 86. For example, if the audio signal is received in analog form, the pre-processing block 86 may include an analog-to-digital converter for converting the signal to digital form.

In some embodiments, the pre-processing block 86 may include a digital or analog filter to correct for expected non-linearities in the frequency response of the speaker whose presence is detected. Thus, the relationship between magnetic field and frequency in a typical loudspeaker will have a notch shape, that is, at a particular frequency near the mechanical resonance of the loudspeaker, the magnetic field will be particularly low. The pre-processing block 86 may then apply a filter having similar characteristics to the received audio signal, thereby improving the degree of correlation between the audio signal and the magnetic field.

In the embodiment illustrated herein, the second pre-processing block 86 comprises a decimation block. If the analog-to-digital converter in the second pre-processing block 86 generates a digital audio signal with a sampling rate that exceeds the sampling rate of the magnetometer signal, the samples of the digital audio signal are discarded, with the result that the resulting sampling rate is approximately equal to the sampling rate R of the magnetometer signal, e.g. the audio sampling rate should be in the range of 0.5R-2R, more preferably in the range of 0.8R-1.2R. For example, if the input signal received from the magnetometer has a sampling rate in the range of 80-120Hz, and the analog-to-digital converter in the second pre-processing block 86 has a sampling rate of 40kHz (which would be typical for an analog-to-digital converter that is typically present in a device such as a smartphone and that can be used for accurate digital representation of an analog audio signal), only one out of every 400 samples at the analog-to-digital converter would be retained, so that the resulting sampling rate of 40kHz/400 is 100 Hz.

Alternatively, the magnetometer signals may be upsampled to the sampling rate of the audio signal by interleaving serum-value (sero-value) samples between samples of the magnetometer signals.

In step 64 of the method of fig. 5, it is determined whether there is a correlation between the audio signal and the magnetic field. Thus, in fig. 6, the outputs of the first and second pre-processing blocks 84 and 86 are passed to a correlation block 88.

The correlation block 88 may operate in different ways.

Fig. 7 illustrates a first method of determining whether there is a correlation between an audio signal and a magnetic field.

Thus, fig. 7(a) illustrates the form of the decimated audio signal generated by the second pre-processing block 86, while fig. 7(b) illustrates the form of the magnetometer output. In fig. 7(a) and 7(b), the horizontal axis represents time. More specifically, the units on the horizontal axis are samples of the corresponding digital samples. In each case, the sampling rate is ≈ 100Hz, so 1000 samples are ≈ 10 seconds. In fig. 7(a) and 7(b), the vertical axis represents the intensity of the corresponding signal in arbitrary units. In the case of fig. 7(b), the effect of the earth's magnetic field has been removed by forming an average of the magnetic field strength (e.g., over a period of several seconds). This average is then taken as the baseline and subtracted from each individual measurement. Figure 7(b) then shows these individual measurements as a difference from this baseline, representing an instantaneous measurement of the magnetic field generated by the artificial source.

In this embodiment, the sample rate of the decimated audio signal generated by the second pre-processing block 86 is exactly the same as the sample rate of the magnetometer output. Thus, fig. 7 (showing two signals of the same sample number) covers equal periods of the two inputs.

It can be seen that the audio signal contains an obvious input during the time periods of about samples 15-85, sample 115, sample 300, sample 365, sample 395, 545, etc. For example, it may be determined that the audio signal contains a relevant input when the magnitude of the sample values exceeds a threshold value, or when the magnitude of the sample values averaged over a relatively small number of samples exceeds a threshold value. Thus, it can be assumed that the user's voice is present during the period in which these samples are taken.

It can also be seen that during the same period, the magnetometer output also contains obvious inputs. For example, a magnetometer output may be determined to contain a correlation input when the magnitude of the sample value exceeds a threshold, or when the magnitude of the sample value averaged over a relatively small number of samples exceeds a threshold. Thus, it can be assumed that during the period in which these samples are taken, the device has a speaker producing sound.

If the device has a speaker producing sound at the same time as the microphone detects the voice, this may indicate that the voice was played by the speaker and thus the device is the target of a replay attack.

Thus, the correlation block 88 may identify a first period of time during which the audio signal contains speech and may identify a second period of time during which a significant magnetic field is present.

In step 66 of the method of fig. 5, it is determined whether the audio signal is likely to be caused by a replay attack. Thus, in fig. 6, the result of the determination of the correlation block 88 is passed to a decision block 90 which determines whether the first time period and the second time period are substantially the same. If substantially the same, it is determined that the audio signal is likely to be caused by a replay attack.

For example, the decision block 90 may determine that the first time period and the second time period are substantially the same if: if more than 60% of a first period of time during which the audio signal contains speech overlaps a second period of time during which a substantial magnetic field is present, and/or more than 60% of a second period of time during which a substantial magnetic field is present overlaps a first period of time during which the audio signal contains speech, or more than 80% of a first period of time during which the audio signal contains speech overlaps a second period of time during which a substantial magnetic field is present, and/or more than 80% of a second period of time during which a substantial magnetic field is present overlaps a first period of time during which the audio signal contains speech.

The method illustrated in fig. 7 is particularly effective when the audio signal and magnetometer output are not subject to a significant amount of noise. Fig. 8 illustrates a second method of determining whether there is a correlation between an audio signal and a magnetic field.

This second method forms a mathematical cross-correlation between the decimated audio signal generated by the second pre-processing block 86 and the samples of the magnetometer output. That is, for a range of delay values, a sequence of samples in one signal of the plurality of signals is correlated with a delayed copy of another signal. The degree of correlation will be a function of the delay, which can be conveniently measured by the number of sampling periods that the delayed version has been delayed. Conventionally, autocorrelation is performed on both signals, and the magnitude of the correlation for any delay value is normalized with respect to the magnitude of the autocorrelation at zero delay for both signals.

Thus, the correlation Rxy [ n ] between two signals x [ m ] and y [ m ] is given by:

and, after normalization:

FIG. 8 illustrates the results of obtaining cross-correlation in one exemplary embodiment. In particular, trace 100 shows the results obtained when audio input is obtained from a live user's voice, while trace 102 shows the results obtained when audio input is obtained by playing back the user's voice through a speaker.

It can be seen that trace 100 fluctuates, but there is no clear pattern. The trace 102 fluctuates in a similar manner, but it is noted that at one or both of the particular delay values 104, there is a very high degree of correlation. These particular delay values correspond to zero delay, i.e. the two signals are correlated.

This can be assumed to be the result of the fact that: the audio input is obtained by playing back the user's voice through the speaker, so the speaker is generating a magnetic field in synchronization with the sound it produces.

This approach picks out the correlation even when either or both of the audio signal and magnetometer output contain a significant amount of noise.

In step 66 of the method of fig. 5, it is determined whether the audio signal is likely to be caused by a replay attack based on the correlation. Thus, in fig. 6, the result of the determination of the correlation block 88 is passed to a decision block 90, which decision block 90 determines whether the correlation is such that it should be determined that the audio signal is likely to be caused by a replay attack. For example, in some embodiments, the decision block 90 determines that the audio signal is likely to be caused by a replay attack if the peak of the cross-correlation (or specifically, the peak of the cross-correlation that may occur at a delay value corresponding to the synchronization of the audio input and the measured magnetic field) exceeds a threshold. Fig. 8 shows the result of performing cross-correlation on one frame of data (e.g., comprising 1000 samples or about 10 seconds) in the data. In the case where the input signal is continuously longer than this, the decision block 90 may determine that the audio signal is likely to be caused by a replay attack by considering multiple frames of data, for example, if the peak of the cross-correlation exceeds a threshold in each frame, or if the peak of the cross-correlation averaged over several frames exceeds a threshold, the decision block 90 may determine that the audio signal is likely to be caused by a replay attack.

In addition to determining whether the change in the magnetic field is temporally correlated with the change in the audio signal, it may also be determined whether the direction of the source of the magnetic field corresponds to the direction of the source of the audio signal.

Fig. 9 illustrates how this is done. Specifically, fig. 9 illustrates two other cases in which a replay attack is being performed.

Fig. 9(a) shows the situation where the speaker 120 is placed on a surface and the smartphone 122 is held vertically in front of the speaker and facing it. The smartphone 122 is shown in section so that the internal components of the smartphone may be shown.

In particular, fig. 9(a) shows three microphones 124, 126 and 128, located near the center of the top edge of the smartphone, near the bottom left corner of the smartphone (when the user looks at the front of the smartphone), and near the bottom right corner of the smartphone, respectively.

In addition, FIG. 9(a) shows a three axis magnetometer 130, which three axis magnetometer 130 produces separate measurements of magnetic field strength in the x, y and z directions as shown in FIG. 9 (a).

Loudspeaker 120 is an electromagnetic loudspeaker in which sound is produced by the motion of a coil that is moved by a magnetic field. As shown in fig. 9(a), the magnetic field M is oriented out of the front face 132 of the loudspeaker 120. Assuming that the smartphone 122 is positioned sufficiently close to the front of the speaker 120, it may be assumed that the direction of the magnetic field sensed by the magnetometer 130 will generally be in the z-direction. Thus, if measurements of the magnetic field strength in the x, y and z directions indicate that the magnetic field in the z direction is dominant, it may be assumed that smartphone 122 is positioned near the front of speaker 120 in the orientation illustrated in fig. 9 (a).

The signals received from the three microphones 124, 126, 128 may also be used to determine the direction of the source of the audio signal using known techniques. For example, in the case shown in fig. 9(a), the audio signal generated by the speaker 120 will be received by the three microphones 124, 126, 128 substantially simultaneously. This can be used to determine the general direction in which the source of the audio signal is located.

Thus, in this embodiment, it may be determined that the direction of the source of the magnetic field generally corresponds to the direction of the source of the audio signal. This can be used to further confirm that the audio signal is the result of a replay attack.

Fig. 9(b) shows an alternative situation in which the speaker 120 is placed on a surface and the smartphone 122 is placed face up on the same surface.

In this case, assuming that the smartphone 122 is positioned sufficiently close to the front of the speaker 120, it may be assumed that the direction of the magnetic field sensed by the magnetometer 130 is generally in the y-direction. Thus, if measurements of magnetic field strength in the x, y, and z directions indicate that the magnetic field in the y direction is dominant, it may be assumed that smartphone 122 is positioned near the front of speaker 120 in the orientation illustrated in fig. 9 (b).

Also, the signals received from the three microphones 124, 126, 128 may be used to determine the location of the source of the audio signal using known techniques. For example, in the scenario shown in fig. 9(b), the audio signal generated by the speaker 120 will be received by both microphones 126, 128 at substantially the same time and shortly thereafter the audio signal will be received by the microphone 124. This can be used to determine the general direction in which the source of the audio signal is located.

Thus, it can again be determined that the source of the magnetic field generally corresponds to the source of the audio signal. This can be used to further confirm that the audio signal is the result of a replay attack.

In other embodiments, the method includes receiving an audio signal representing speech and detecting a magnetic field. In these embodiments, it is determined that the audio signal may be caused by a replay attack if the magnetic field strength exceeds a threshold value. These embodiments are particularly suitable when a replay attack is generated using a loudspeaker containing a large magnet, so that the presence of a large magnetic field indicates a replay attack. In this case, the magnetic field strength may be several times, and possibly even orders of magnitude, greater than the baseline magnetic field strength induced by the earth's magnetic field. Thus, in these cases, there is no need to determine the baseline magnetic field strength and subtract it from the individual measurements.

In other embodiments, the method includes receiving an audio signal representing speech and detecting a magnetic field. It is possible that if multiple microphones are used to detect the signal, the direction of the source of said audio signal representing speech may be determined, for example, using beamforming techniques. The direction of the source of the magnetic field may also be determined. It may be determined that the audio signal is likely to be caused by a replay attack if the direction of the source of said audio signal representing speech corresponds to the direction of the source of said magnetic field.

Fig. 10 illustrates a system for determining whether the direction of the source of the magnetic field generally corresponds to the source direction of an audio signal representing speech, whether the result is used in conjunction with other methods to further confirm that the audio signal is the result of a replay attack, or whether the result is used as the sole indication that the audio signal is the result of a replay attack.

Fig. 10 shows a processor 150. After appropriate adjustment in the pre-processing blocks 152a, …, 152n, the processor 150 receives input signals from the plurality of microphones 12a, …, 12 n. Similarly, the processor 150 receives separate input signals from three magnetometers 154, 156, 158, which measure magnetic field strength in three orthogonal directions and which are again amplified after appropriate adjustment in pre-processing blocks 160, 162, 164.

The processor 150 may separately calculate the direction of the source of the audio signal (e.g., using standard beamforming techniques) and the direction of the source of the magnetic field, and may then check for a correlation between them.

Alternatively, the processor 150 may be a neural network that is pre-trained using samples of representative speakers having various orientations with respect to the target device.

Accordingly, the disclosed methods and systems can be used to detect situations that may indicate that the received audio signal is the result of a replay attack.

The skilled person will recognise that some aspects of the apparatus and methods described above may be embodied as processor control code, for example on a non-volatile carrier medium such as a magnetic disk, CD-ROM or DVD-ROM, programmed memory such as read only memory (firmware), or on a data carrier such as an optical or electrical signal carrier. Many application embodiments of the invention will be implemented on a DSP (digital signal processor), an ASIC (application specific integrated circuit), or an FPGA (field programmable gate array). Thus, the code may comprise conventional program code or microcode or, for example code for setting up or controlling an ASIC or FPGA. The code may also include code for dynamically configuring a reconfigurable device, such as a reprogrammable array of logic gates. Similarly, the code may include code for a hardware description language, such as Verilog (TM) or VHDL (very high speed Integrated Circuit hardware description language). As the skilled person will appreciate, the code may be distributed between a plurality of coupled components in communication with each other. The embodiments may also be implemented using code running on a field-programmable (re) programmable analog array or similar device to configure analog hardware, where appropriate.

Note that as used herein, the term module should be used to refer to a functional unit or block that may be implemented at least in part by dedicated hardware components (such as custom circuitry), and/or by one or more software processors or appropriate code running on a suitable general purpose processor or the like. The modules themselves may comprise other modules or functional units. A module may be provided by a number of components or sub-modules that need not be co-located and may be provided on different integrated circuits and/or run on different processors.

26页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：盒存储器、记录介质盒及其制造方法

Magnetic detection of replay attacks

相关技术

网友询问留言