Bioassay process

文档序号:1026959 发布日期:2020-10-27 浏览:11次 中文

阅读说明:本技术 生物测定过程 (Bioassay process ) 是由 J·P·莱索 于 2019-03-20 设计创作,主要内容包括:本公开内容提供了用于基于音频信号与所存储的用于已授权用户的语音模型的比较来认证用户的方法、系统、设备和计算机程序产品。在一个方面,一种方法包括:获得包括骨传导的信号的表示的第一音频信号,其中所述骨传导的信号经由所述用户的骨骼的至少一部分传导;获得包括空气传导的信号的表示的第二音频信号;以及,响应于确定所述第一音频信号包括语音信号,启用基于所述第二音频信号对所存储的用于已授权用户的语音模型的更新。(The present disclosure provides methods, systems, devices, and computer program products for authenticating a user based on a comparison of an audio signal to a stored speech model for an authorized user. In one aspect, a method comprises: obtaining a first audio signal comprising a representation of a bone conducted signal, wherein the bone conducted signal is conducted via at least a portion of a bone of the user; obtaining a second audio signal comprising a representation of the air-conducted signal; and, in response to determining that the first audio signal comprises a speech signal, enabling updating of the stored speech model for the authorized user based on the second audio signal.)

1. A method in a biometric authentication system for authenticating a user based on a comparison of an audio signal to a stored speech model for an authorized user, the method comprising:

obtaining a first audio signal comprising a representation of a bone conducted signal, wherein the bone conducted signal is conducted via at least a portion of a bone of the user;

obtaining a second audio signal comprising a representation of the air-conducted signal;

in response to determining that the first audio signal comprises a speech signal, enabling updating of a stored speech model for the authorized user based on the second audio signal.

2. The method of claim 1, further comprising:

updating the stored speech model for the authorized user is further enabled, and in response to authenticating the user as the authorized user, the stored speech model for the authorized user is updated using the second audio signal.

3. The method of claim 2, wherein the user is authenticated as the authorized user based on a biometric process.

4. The method of claim 3, wherein the biometric process comprises a voice biometric process based on the second audio signal.

5. The method of claim 2, wherein the user is authenticated as the authorized user based on a non-biometric process.

6. The method of claim 5, wherein the non-biometric process comprises entering a password for the authorized user.

7. The method of any of the preceding claims, wherein the step of enabling the updating of the stored speech model for the authorized user is further responsive to determining that the second audio signal comprises a speech signal.

8. The method of any of the preceding claims, wherein the step of enabling the updating of the stored speech model for the authorized user based on the second audio signal is further based on a comparison between the first audio signal and the second audio signal.

9. The method of claim 8, wherein enabling updating of the stored speech model for the authorized user based on the second audio signal is in response to detecting a correlation between the first audio signal and the second audio signal.

10. The method of claim 9, wherein enabling updating of the stored speech model for the authorized user based on the second audio signal is in response to detecting a correlation in the first audio signal identified as including a portion of the speech signal and a corresponding portion of the second audio signal.

11. The method of any preceding claim, wherein the first audio signal is generated by an in-ear transducer.

12. The method of any preceding claim, wherein the second audio signal is generated by a microphone external to the user's ear.

13. A biometric authentication system for authenticating a user based on a comparison of an audio signal to a stored speech model for an authenticated user, the biometric authentication system comprising:

a first input for obtaining a first audio signal comprising a representation of a bone conducted signal, wherein the bone conducted signal is conducted via at least a portion of a bone of the user;

a second input for obtaining a second audio signal comprising a representation of an air-conducted signal; and

an enabling module operable to determine whether the first audio signal comprises a speech signal and, in response to determining that the first audio signal comprises a speech signal, enable updating of a stored speech model for an authorized user based on the second audio signal.

14. The biometric authentication system of claim 13, further comprising a biometric module operable to update the stored speech model for the authorized user using the second audio signal in response to authenticating the user as the authorized user.

15. The biometric authentication system of claim 14, further comprising an authentication module operable to authenticate the user as the authorized user based on a biometric process.

16. The biometric authentication system of claim 15, wherein the biometric process comprises a voice biometric process based on the second audio signal.

17. The biometric authentication system of claim 14, further comprising an authentication module operable to authenticate the user as the authorized user based on a non-biometric process.

18. The biometric authentication system of claim 17, wherein the non-biometric process includes entering a password for the authorized user.

19. The biometric authentication system of any one of claims 13 to 18, wherein the enabling module is further operable to enable an update of the stored voice model for the authorized user based on the second audio signal in response to determining that the air-conducted signal includes a voice signal.

20. The biometric authentication system of any one of claims 13 to 19, wherein the enabling module is further operable to enable updating of the stored speech model for the authorized user based on the second audio signal based on a comparison of the first audio signal and the second audio signal.

21. The biometric authentication system of claim 20, wherein the enabling module is further operable to enable an update of the stored speech model for the authorized user based on the second audio signal in response to detecting a correlation between the first audio signal and the second audio signal.

22. The biometric authentication system of any one of claims 13 to 21, wherein the first input is connectable to a transducer adapted for insertion into an ear of a user.

23. The biometric authentication system of any one of claims 13 to 22, wherein the second input is connectable to a voice microphone.

24. The biometric authentication system according to any one of claims 13 to 23, wherein the biometric authentication system is provided on a single integrated circuit.

25. An electronic device for authenticating a user based on a comparison of an audio signal to a stored speech model for an authenticated user, the electronic device comprising processing circuitry and a non-transitory machine-readable medium storing instructions that, when executed by the processing circuitry, cause the electronic device to:

obtaining a first audio signal comprising a representation of a bone conducted signal, wherein the bone conducted signal is conducted via at least a portion of a bone of the user;

obtaining a second audio signal comprising a representation of the air-conducted signal; and

in response to determining that the first audio signal comprises a speech signal, enabling updating of a stored speech model for an authorized user based on the second audio signal.

26. The electronic device of claim 25, wherein the electronic device comprises a personal audio device or a host electronic device.

27. A non-transitory machine-readable medium for authenticating a user based on a comparison of an audio signal to a stored speech model for an authenticated user, the medium storing instructions that, when executed by processing circuitry, cause an electronic device to:

obtaining a first audio signal comprising a representation of a bone conducted signal, wherein the bone conducted signal is conducted via at least a portion of a bone of the user;

obtaining a second audio signal comprising a representation of the air-conducted signal; and

in response to determining that the first audio signal comprises a speech signal, enabling updating of a stored speech model for an authorized user based on the second audio signal.

Technical Field

Embodiments of the present disclosure relate to methods, devices, and systems for performing biometric processes, and more particularly, to methods, devices, and systems for performing biometric processes that include authenticating a user based on the user's voice.

Background

Biometric technology is becoming increasingly popular as a method for authenticating those users who are attempting to access restricted areas or restricted devices or who are attempting to perform restricted actions. A number of different biometric identifiers are known, including fingerprint recognition, iris recognition and facial recognition.

The voice biometric system authenticates the user based on the user's voice. Prior to authentication using a voice biometric system, a user first registers with the system. During enrollment, the voice biometric system acquires biometric data that is characteristic of the user's voice and stores the data as a voice model or voiceprint. Authentication may be based on a particular word or phrase (text-dependent) spoken during enrollment, or based on a voice (text-independent) different from the voice spoken during enrollment. Authentication involves extracting one or more biometric features from the input audio signal and comparing these features to a stored voiceprint. Determining that the acquired data matches or is sufficiently close to the stored voiceprint results in a successful authentication of the user. Successful authentication of the user may result in the user being allowed to perform restricted actions or being authorized to access restricted areas or restricted devices (for example). If the acquired features do not match or are not sufficiently close to the stored voiceprint, the user is not authenticated and the authentication attempt is unsuccessful. An unsuccessful authentication attempt may prevent the user from being allowed to perform a restricted action or the user may be denied access to a restricted area or restricted device.

The performance of a voice biometric system may be limited by changes in the user's voice that occur during the time period between enrollment and authentication. For example, the user's voice may vary with age, disease, or time of day at which biometric data is acquired. If the user's voice changes sufficiently, the authentication system may reject the user even if they are authorized and should have been authenticated, a problem known as "false rejection". The voice biometric system may take into account changes in the user's voice by collecting additional biometric identification data at multiple intervals and using these data to update the stored voiceprint. This process is called enrichment (enrichment).

Enrichment may be a supervised or an unsupervised process. Supervised enrichment involves prompting a user to re-register with the system at multiple intervals. For example, the user may be asked to repeat a particular word or phrase, and the resulting data may be used to update the stored voice print. Prior to this process, the identity of the user is established using one or more authentication techniques (e.g., the user may be required to enter a password or personal identification code). While supervised enrichment provides a reliable method for updating stored voiceprints, it requires the user to actively participate in the enrichment process.

In contrast, unsupervised enrichment uses any speech from the user to update the stored voiceprint without explicit knowledge of the user. Biometric data can be collected during routine use without prompting the user to provide additional input. Thus, unsupervised enrichment allows stored voiceprints to be updated more frequently, thereby improving the performance of the voice biometric system.

In order to effectively use unsupervised enrichment, it is important to update the stored user voiceprints using only the user's voice. If the voiceprint is erroneously updated using, for example, a voice from another speaker, the effectiveness of the speech biometric system may be compromised and the user may experience more frequent false rejects. In addition to inconveniencing the user, erroneously updating stored voiceprints can also pose a significant security risk. Thus, to successfully implement unsupervised enrichment in a speech biometric system, the speech biometric system should be able to distinguish between the user's voice and other audio detected by the system (e.g., voices from other speakers).

Embodiments of the present disclosure seek to address this and other problems.

Disclosure of Invention

One aspect of the present disclosure provides a method in a biometric authentication system for authenticating a user based on a comparison of an audio signal to a stored speech model for an authorized user. The method comprises the following steps: obtaining a first audio signal comprising a representation of a bone conducted signal, wherein the bone conducted signal is conducted via at least a portion of a bone of the user; and obtaining a second audio signal comprising a representation of the air-conducted signal; and, in response to determining that the first audio signal comprises a speech signal, enabling updating of the stored speech model for the authorized user based on the second audio signal.

Another aspect provides a biometric authentication system for authenticating a user based on a comparison of an audio signal to a stored speech model for an authenticated user. The biometric authentication system includes: a first input for obtaining a first audio signal comprising a representation of a bone conducted signal, wherein the bone conducted signal is conducted via at least a portion of a bone of the user; a second input for obtaining a second audio signal comprising a representation of an air-conducted signal; and an enabling module operable to determine whether the first audio signal comprises a speech signal and, in response to determining that the first audio signal comprises a speech signal, enable updating of the stored speech model for the authorized user based on the second audio signal.

Another aspect provides an electronic device for authenticating a user based on a comparison of an audio signal to a stored speech model for an authenticated user. The electronic device includes processing circuitry and a non-transitory machine-readable medium storing instructions that, when executed by the processing circuitry, cause the electronic device to: obtaining a first audio signal comprising a representation of a bone conducted signal, wherein the bone conducted signal is conducted via at least a portion of a bone of the user; obtaining a second audio signal comprising a representation of the air-conducted signal; and, in response to determining that the first audio signal comprises a speech signal, enabling updating of the stored speech model for the authorized user based on the second audio signal.

Another aspect provides a non-transitory machine-readable medium for authenticating a user based on a comparison of an audio signal to a stored speech model for an authenticated user. The medium stores instructions that, when executed by processing circuitry, cause an electronic device to: obtaining a first audio signal comprising a representation of a bone conducted signal, wherein the bone conducted signal is conducted via at least a portion of a bone of the user; obtaining a second audio signal comprising a representation of the air-conducted signal; and, in response to determining that the first audio signal comprises a speech signal, enabling updating of the stored speech model for the authorized user based on the second audio signal.

Drawings

For a better understanding of embodiments of the present disclosure, and to show more clearly how the same may be carried into effect, reference will now be made, by way of example only, to the following drawings, in which:

fig. 1 a-1 f illustrate a personal audio device according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram illustrating an arrangement according to an embodiment of the present disclosure;

fig. 3 illustrates a system according to an embodiment of the present disclosure; and

fig. 4 is a flow diagram of a method according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure provide methods, apparatuses, and computer programs for enriching or updating stored speech models (also referred to as templates or voiceprints) for authorized users of biometric authentication systems. Various embodiments utilize bone-conducted speech signals (e.g., speech signals that have been conducted at least partially via a portion of the user's bone, such as the jaw bone) to identify when the user is speaking and enable updating of the stored speech model. For example, a method may include obtaining a first audio signal and a second audio signal that include a representation of a bone conducted signal and a representation of an air conducted signal, respectively. In response to determining that the first audio signal comprises a speech signal, updating of the stored speech model based on the second audio signal may be implemented. Other embodiments may include: updating the stored speech model is effected in response to determining that the second audio signal comprises a speech signal, or in response to determining that the first audio signal and the second audio signal comprise respective speech signals that are related to each other.

Embodiments of the present disclosure may be implemented in a variety of different electronic devices and systems. Fig. 1 a-1 f illustrate embodiments of personal audio devices that may be used to implement aspects of the present disclosure. As used herein, the term "personal audio device" is any electronic device that is suitable or configurable to provide audio playback to substantially only a single user. Some embodiments of suitable personal audio devices are shown in fig. 1 a-1 f.

Fig. 1a shows a schematic view of a user's ear comprising an (external) concha (pinna) or auricle (auricle)12a and an (internal) ear canal 12 b. The personal audio device 20 includes a circum-aural headphone worn by the user over the ear. The headset includes a housing that substantially surrounds and encloses the pinna 12a to provide a physical barrier between the user's ear and the external environment. A cushion or pad may be provided at the edge of the housing to increase the comfort of the user and the acoustic coupling between the headphone and the user's skin (i.e. to provide a more effective barrier between the external environment and the user's ear).

The headphones comprise one or more speakers 22, which one or more speakers 22 are positioned on the inner surface of the headphones and arranged to generate acoustic signals towards the user's ear, and in particular the ear canal 12 b. The headphones further comprise one or more microphones 24, which one or more microphones 24 are also positioned on the inner surface of the headphones and arranged to detect acoustic signals within the inner volume defined by the headphones, the pinna 12a and the ear canal 12 b. These microphones 24 are operable to detect bone-conducted speech signals.

Headphones may be able to perform active noise cancellation to reduce the amount of noise experienced by a user of the headphones. Active noise cancellation operates by detecting noise (i.e., using a microphone) and generating a signal that is the same amplitude but opposite phase as the noise signal (i.e., using a speaker). Thus, the generated signal destructively interferes with the noise, thereby mitigating the noise experienced by the user. Active noise cancellation may operate based on a feedback signal, a feedforward signal, or a combination of the two. Feed-forward active noise cancellation utilizes one or more microphones on the outer surface of the headset, operating to detect ambient noise before it reaches the user's ear. The detected noise is processed quickly and a cancellation signal is generated to match the incoming noise as it reaches the user's ear. Feedback active noise cancellation operates to detect a combination of noise and an audio playback signal generated by one or more speakers, using one or more error microphones on the inner surface of the headphones. This combination is used in a feedback loop, along with knowledge of the audio playback signal, to adjust the cancellation signal generated by the speaker to reduce noise. Thus, the microphone 24 shown in fig. 1a may form part of an active noise cancellation system, e.g. as an error microphone.

The personal audio device 20 may include, or be used in conjunction with, a voice microphone arranged to capture an air-conducted representation of the user's voice. See figure 1f for more details.

Fig. 1b shows an alternative personal audio device 30, the personal audio device 30 comprising an over-the-ear (supra-aural) headphone. The ear-fitted headphones do not surround or encircle the user's ear, but are positioned on the pinna 12 a. The headphones may include a cushion or pad to mitigate the effects of ambient noise. Like the ear-wrapped headphones shown in fig. 1a, the ear-wrapped headphones comprise one or more speakers 32 and one or more microphones 34. The speaker 32 and the microphone 34 may form part of an active noise cancellation system, with the microphone 34 acting as an error microphone.

Fig. 1c shows another alternative personal audio device 40, where the personal audio device 40 comprises an in-ear (intra-concha) headphone (or earphone). In use, the in-the-ear headphone is located inside the external ear cavity of the user. The in-the-ear headset may be loosely mounted within the cavity, allowing air to flow into and out of the user's ear canal 12 b.

As with the devices shown in fig. 1a and 1b, in-ear headphones comprise one or more speakers 42 and one or more microphones 44, which one or more speakers 42 and one or more microphones 44 may form part of an active noise cancellation system.

Fig. 1d shows another alternative personal audio device 50, which personal audio device 50 comprises an in-ear (in-ear) headphone (or earphone), an insert headphone or an earbud. This headset is configured to be partially or fully inserted within the ear canal 12b and may provide a relatively tight seal between the ear canal 12b and the external environment (i.e., it may be acoustically closed or sealed). The headphones may include one or more speakers 52 and one or more microphones 54, which, like the other devices described above, may form part of an active noise cancellation system.

Since an in-ear headphone may provide a relatively tight acoustic seal around the ear canal 12b, the external noise detected by the microphone 54 (i.e., external noise from the external environment) may be low.

Fig. 1e shows another alternative personal audio device 60, the personal audio device 60 being a mobile or cellular telephone or handset (handset). The earpiece 60 includes one or more speakers 62 for playback of audio to the user, and one or more similarly positioned microphones 64.

In use, the earpiece 60 is held close to the user's ear to provide audio playback (e.g., during a conversation). Although a tight acoustic seal is not achieved between the earpiece 60 and the user's ear, the earpiece 60 is typically held close enough to enable the one or more microphones 64 to detect bone-conducted speech signals. As with other devices, one or more speakers 62 and one or more microphones 64 may form part of an active noise cancellation system.

The earpiece 60 also includes a voice microphone 66 positioned at or near an end of the earpiece opposite the one or more speakers 62 and the one or more microphones 64. Thus, when held close to the user's face in use, the speech microphone 66 is relatively close to the user's mouth and can detect the user's voice conducted via air.

Thus, all of the personal audio devices described above provide audio playback to substantially a single user in use. Each device is also operable to detect bone-conducted voice signals through a respective microphone 24, 34, 44, 54 and 64.

Fig. 1f shows the application of a personal audio device (in this case having a similar construction to the personal audio device 50) to a user. The user has two ear canals 104, 108. A first in-ear headphone 102 (comprising a first speaker or other audio transducer, and a first microphone or other transducer) is inserted into a first ear canal 104, and a second in-ear headphone 106 (comprising a second speaker or other audio transducer, and a second microphone) is inserted into a second ear canal 108.

A voice microphone 110 is also provided, the voice microphone 110 being positioned outside the ear. In the illustrated embodiment, the voice microphone 110 is coupled to the first headphones 102 and the second headphones 106 via a wired connection. However, the voice microphone 110 may be positioned anywhere suitable for detecting the user's voice as conducted through the air, for example on an exterior surface of one or more of the headphones 102, 106. The voice microphone 110 may be coupled to the first headset 102 and the second headset 106 via a wireless connection. The headphones 102, 106 and the voice microphone 110 are also coupled to the host electronic device 112. The host electronic device 112 may be a smart phone or other cellular or mobile phone, a media player, or the like. In some embodiments, processing may be performed in one of the headphones 102, 106, making the host electronic device 112 unnecessary. It should also be noted that although fig. 1f shows two headphones 102, 106, in some embodiments only a single headphone may be provided, or signals from only a single one of the two headphones 102, 106 may be used for the processing described below.

When the user speaks, his or her voice is transmitted through the air to the voice microphone 110 where his or her voice is detected. Furthermore, the speech signal is transmitted through a bone or a part of the skull of the user (such as the jaw bone) and coupled to the ear canal. Thus, the microphones in the headphones 102, 106 detect bone-conducted speech signals.

Those skilled in the art will appreciate that the microphone or other transducer (such as an accelerometer) that detects the bone conducted signal may be the same as the microphone or other transducer provided as part of the active noise cancellation system (e.g., for detecting error signals). Alternatively, separate microphones or transducers may be provided for these individual purposes (or combination of purposes) in the personal audio device described above.

All of the devices shown in fig. 1 a-1 f and described above may be used to implement aspects of the present disclosure.

Fig. 2 illustrates an arrangement 200 according to various embodiments of the present disclosure. The arrangement 200 includes a personal audio device 202 and a biometric system 204. The personal audio device 202 may be any device suitable or configured to detect bone conducted voice signals as well as air conducted voice signals from a user. Essentially speaking, bone-conducted speech signals originate substantially from a single user (i.e., the user of the personal audio device). Depending on the environment surrounding the device 202, the air-conducted speech signal may include additional speech signals from nearby speakers. The personal audio device 202 comprises first and second microphones which are positioned, in use, adjacent or within the user's ear (thereby detecting bone-conducted audio signals) and adjacent the user's mouth (thereby detecting air-conducted audio signals), respectively. The personal audio device may be wearable and include a headphone for each ear of the user. Alternatively, the personal audio device may be operable to be carried by a user and held adjacent one or more ears of the user during use. The personal audio device may comprise a headphone or a mobile telephone handset as described above with respect to any of fig. 1a to 1 f.

The biometric system 204 is coupled to the personal audio device 202 and thus receives biometric data indicative of an individual using the personal audio device. In some embodiments, the biometric system 204 is operable to control the personal audio device 202 to acquire biometric data.

For example, the personal audio device 202 may acquire a bone conducted voice signal and output the signal to the biometric system 204 for processing. For example, the personal audio device 202 may take an air-conducted voice signal and output the signal to the biometric system 204 for processing. For example, the personal audio device 202 may acquire voice biometric data and output the signal to the biometric system 204 for processing.

The biometric system 204 can send appropriate control signals to the personal audio device 202 to initiate acquisition of biometric data and receive biometric data from the personal audio device 202. The biometric system 204 is operable to extract one or more features from the biometric data and utilize those features as part of the biometric process.

Some examples of suitable biometric processes include biometric enrollment and biometric authentication. Enrollment includes acquiring and storing biometric data, which is a characteristic of an individual. In the present context, such stored data may be referred to as a "voiceprint". Authentication includes obtaining biometric data from an individual and comparing the data to stored data for one or more registered or authorized users. A positive comparison (i.e., the acquired data matches or is sufficiently close to the stored voice print or ear print) results in the individual being authenticated. For example, the individual may be allowed to perform restricted actions, or be authorized to access a restricted area or restricted device. A negative comparison (i.e., the acquired data does not match or is not sufficiently close to the stored voiceprint or earprint) results in the individual not being authenticated. For example, the individual may not be allowed to perform restricted actions or be authorized to access restricted areas or restricted devices.

In some embodiments, the biometric system 204 may form part of the personal audio device 202 itself. Alternatively, the biometric system 204 may form part of an electronic host device (e.g., an audio player) to which the personal audio device 202 is coupled by wired or wireless means. In further embodiments, the operation of the biometric system 204 may be distributed between circuitry in the personal audio device 202 and the electronic host device.

Fig. 3 illustrates a system 300 according to an embodiment of the present disclosure.

The system 300 includes a processing circuit 324, which processing circuit 324 may include one or more processors, such as a central processing unit or Application Processor (AP) or Digital Signal Processor (DSP). The system 300 also includes a memory 326, the memory 326 communicatively coupled to the processing circuitry 324. Memory 326 may store instructions that, when executed by processing circuitry 324, cause processing circuitry to perform one or more methods as described below (e.g., see fig. 4).

The one or more processors may perform the methods described herein based on data and program instructions stored in memory 324. The memory 324 may be provided as a single component or multiple components or co-integrated with at least some of the processing circuitry 322. In particular, the methods described herein may be performed in the processing circuit 322 by executing instructions stored in the memory 324 in a non-transitory form, where the program instructions are stored during manufacture of the system 300 or the personal audio device 202 or by uploading when the system or device is used.

The system 300 includes a first microphone 302, which first microphone 302 may belong to a personal audio device (i.e., as described above). The first microphone 302 may be configured to be placed in or near the ear of a user in use, and is hereinafter referred to as "ear microphone (302"). As described above, the ear microphone 302 is operable to detect a voice signal from bone conduction of the user.

The processing circuitry 324 includes an analog-to-digital converter (ADC)304, which analog-to-digital converter 304 receives and converts the electrical audio signal detected by the ear microphone from the analog domain to the digital domain. Of course, in an alternative implementation, the headset 302 may be a digital microphone and generate a digital data signal (and thus need not be converted to the digital domain).

The system 300 also includes a second microphone 310, which second microphone 310 may belong to the personal audio device 202 (i.e., as described above). The second microphone 310 may be configured to be placed outside the ear of the user in use. The second microphone 310 is hereinafter referred to as a "voice microphone 310". As described above, voice microphone 310 is operable to detect air-conducted voice signals from a user. Processing circuitry 324 also includes an ADC 312 for audio signals detected by voice microphone 310 (unless voice microphone 310 is a digital microphone that produces digital data signals, as discussed above).

The output of ADC 304 (i.e., the bone-conducted audio signal) is passed to enable module 306. The output of ADC 310 (i.e., the air-conducted audio signal) may also optionally be passed to enable module 306. The operation of enablement module 306 will be described in more detail below.

The system implements a voice biometric authentication algorithm. Thus, the air-conducted audio signal is also used to perform voice biometric authentication.

The signal detected by the voice microphone 310 is in the time domain. However, the features extracted for the purposes of the biometric process may be in the frequency domain (since the characteristic is the frequency of the user's speech). Thus, the processing circuitry 324 includes a fourier transform module 308, the fourier transform module 308 converting the reflected signal to the frequency domain. For example, the fourier transform module 308 may implement a Fast Fourier Transform (FFT).

The transformed signal is then passed to a feature extraction module 314, which feature extraction module 314 extracts one or more features of the transformed signal for use in a biometric process (e.g., biometric enrollment, biometric authentication, etc.). For example, the feature extraction module 314 may extract one or more mel-frequency cepstral coefficients. Alternatively, the feature extraction module may determine the amplitude or energy of the user's speech at one or more predetermined frequencies or within one or more frequency ranges. The extracted features may correspond to data for a model of the user's speech.

The extracted features are passed to a biometric module 316, which biometric module 316 performs a biometric process on them. For example, the biometric module 316 may perform a biometric enrollment in which the extracted features (or parameters derived from the extracted features) are stored in the biometric data as part of the characteristics of the individual. The biometric data may be stored in a memory module 318 (and may be securely accessible by the biometric module 316) located within or remotely from the system. Such stored data may be referred to as a "voiceprint". In another embodiment, the biometric module 316 may perform biometric authentication and compare one or more extracted features to corresponding features in the stored voiceprint (or stored voiceprints). Based on the comparison, a biometric score is generated that indicates a likelihood that the speech contained in the air-conducted speech signal corresponds to the speech of the authorized user. The score may be compared to a threshold to determine whether the speech contained in the air-conducted speech signal is authenticated as being that of an authorized user. For example, in one embodiment, the voice is authenticated when the biometric score exceeds the threshold; when the biometric score is below the threshold, the voice is not authenticated.

As described above, embodiments of the present disclosure relate to enrichment or updating of stored voiceprints for authorized users, and in particular to using bone conducted audio signals to determine when air conducted audio signals include speech of a user of a system. In other words, due to the position of the ear microphone 302 in use, the bone conducted audio signal may only contain the voice of the user of the system 300. If other voices are present in the bone-conducted audio signal (e.g., due to other nearby speakers), the signals associated with those voices may have much lower amplitudes than the signals associated with the user's voices. Thus, a positive determination that speech is present in the bone conducted audio signal may be used to enable an update or enrichment of the voiceprint of an authorized user.

Thus, in one implementation, the enablement module 306 operates to receive the bone conducted audio signal from the ADC 304 and generate an output control signal for the biometric module 316 to enable the biometric module 316 to update the stored speech model based on the air conducted audio signal.

In one embodiment, enablement module 306 may receive only bone conducted audio signals and include a voice activity detection module or otherwise operate to perform a voice activity detection function to detect the presence of audio in the bone conducted audio signals that is characteristic of speech. Note that such voice activity detection does not correspond to speaker detection (i.e., recognition of a particular speaker), but generally corresponds to detection of speech.

Various voice activity detection methods are known in the art, and the present disclosure is not limited in this respect. For example, voice activity detection may be relatively complex, where one or more parameters of the bone-conducted signal (e.g., spectral slope, correlation coefficient, log-likelihood ratio, cepstrum, weighted cepstrum, and/or modified distance metric) are determined and compared to corresponding parameters that are characteristic of speech. In a simpler embodiment, it may be assumed that the user's voice of the personal audio device 202 will dominate the bone conducted signal when the user speaks (i.e., the user's voice will dominate over other noise sources). In this case, the voice activity detection may comprise a simple comparison of the amplitude of the bone-conducted audio signal with a threshold; when the amplitude is above the threshold, it may be assumed that the bone conducted audio signal contains the user's voice.

In one implementation, in response to determining that the bone conducted audio signal comprises a voice signal, the enablement module 306 outputs a control signal to the biometric module 316 that enables the biometric module 316 to update the stored voiceprint for the authorized user based on the air conducted audio signal.

Enable module 306 may further receive the air-conducted audio signal from ADC 310 and determine whether to enable updating of the stored speech model based on both the bone-conducted audio signal and the air-conducted audio signal.

For example, enablement module 306 may perform a voice activity detection function on the air-conducted audio signal to detect the presence of audio in the air-conducted audio signal that is characteristic of speech. When both the air-conducted audio signal and the bone-conducted audio signal contain speech, enablement module 306 can generate an output control signal to biometric module 316, as described above. In this embodiment, it is understood that the control signal may be generated when the temporally overlapping (or simultaneous) portions of the air-conducted audio signal and the bone-conducted audio signal both comprise speech. In this way, it may be assumed that both speech in the bone conducted audio signal and speech in the air conducted audio signal originate from the same person (i.e. user).

Additionally or alternatively, enablement module 306 may cross-correlate the bone conducted audio signal with the air conducted audio signal. After determining that the bone conducted audio signal includes speech, enablement module 306 may cross-correlate the bone conducted audio signal (particularly the portion of the bone conducted audio signal that includes speech) with the air conducted audio signal (particularly the portion of the air conducted audio signal that is concurrently present with the portion of the bone conducted audio signal that includes speech) to determine a level of correlation between the two signals. Any suitable correlation algorithm may be used. In response to determining that the two signals are correlated (e.g., the correlation exceeds a threshold), the enablement module 306 may output a control signal to the biometric module 316 to enable updating of the stored speech model.

The determination to enable updating of the stored speech model may be further based on authentication of the user of the personal audio device 202 as an authorized user. Thus, in the illustrated embodiment, system 300 also includes an authentication module 320 coupled to enablement module 306.

In one embodiment, the authentication module 320 includes the biometric module 316 or is the same as the biometric module 316. Thus, the system 300 may be used to authenticate a user based on air-conducted audio signals. The biometric module 316 performs a biometric authentication algorithm on the air-conducted audio signal and compares one or more features extracted from the air-conducted audio signal to a stored voiceprint for an authorized user. Based on the comparison, an output is generated indicating a determination as to whether the user of system 300 is an authorized user. This output may be used by the system 300 or personal audio device to allow one or more restricted actions in general. In the illustrated embodiment, the output is additionally or alternatively passed to an enabling module 306, in response which enabling module 306 may enable the updating of the stored voice print.

Additionally or alternatively, authentication module 320 may include one or more alternative authentication mechanisms. For example, authentication module 320 may perform authentication based on one or more alternate biometrics, such as an ear biometric, a fingerprint, an iris, or a retinal scan. For example, authentication module 320 may implement an input-output mechanism for accepting and authorizing a user based on a passcode, password, or personal identification code entered by the user and associated with the authorized user. The input-output mechanism may present a question to the user based on the passcode, password, or personal identification code, the answer to which does not reveal the entire passcode, password, or personal identification code. For example, a question may be associated with a particular character or number of a passcode, password, or personal identification code (e.g., "what is the third character of the password. The problem may require performing a mathematical operation on the personal identification code or a portion of the personal identification code (e.g., "what is the first digit of the personal identification code plus three. The input-output mechanism may audibly output the question (e.g., by playing back on a speaker) so that only the user can hear the question. Further, the input-output mechanism may provide input of the answer audibly (e.g., through microphone 310) or via some other input mechanism, such as a touch screen, keypad, keyboard, or the like.

According to an embodiment of the present disclosure, the system 300 is operable to update the stored voiceprint for an authorized user after the user is successfully authenticated as an authorized user.

Thus, the user registers with the biometric module 316 (i.e., by obtaining speech model data) and the voiceprint 318 is stored for the user. The user may then seek authentication via the system 300, thus obtaining more voice biometric data for this purpose, as described above. If the authentication is successful, the biometric module 316 may return a positive authentication message to the enabling module 306, enabling updating of the stored voice print 318 for the user based on the retrieved voice data.

If the authentication is not successful, the biometric module 316 may return a negative authentication message. However, the system 300 includes one or more further authentication mechanisms 320. If the user is subsequently successfully authenticated via one or more of these mechanisms, enablement module 306 can issue a control signal to biometric module 316 to update stored voice model 318 for the user with data acquired as part of an unsuccessful voice biometric authentication attempt.

Additionally or alternatively, updates to the stored speech model 318 for the user may be based on speech model data obtained only for this purpose (i.e., and not as part of a successful or failed authentication attempt). Once successfully authenticated, system 300 may utilize microphone 310 to obtain more speech model data, whether or not with the user's knowledge. Such data acquisition may be periodic, continuous, on a defined schedule, or upon detection of one or more defined events.

The stored speech model 318 may be updated by the biometric module 316 based on data within the air-conducted audio signal that overlaps in time or coincides with data comprising speech signals in the bone-conducted audio signal. For example, in some embodiments, detected speech in bone conducted audio signals may be used to gate portions of air conducted audio signals to be used to update stored speech models. For this purpose, a time stamp may be applied to the data in each audio signal. Thus, the time stamp of the data frame detected to comprise speech in the bone conducted audio signal may be used to identify the data frame in the air conducted audio signal to be used for updating the stored speech model.

Fig. 4 is a flow diagram of a method according to an embodiment of the present disclosure.

In step 400, the biometric system obtains an audio signal for bone conduction, for example using any of the microphones 24, 34, 44, 54, 64, or 302. In step 402, the biometric system obtains an air-conducted audio signal, for example using any of microphones 66, 110, or 310. Although described as discrete steps, those skilled in the art will appreciate that these steps occur simultaneously, with the bone-conducted audio signal and the air-conducted audio signal being related to the audio environment simultaneously.

In step 404, the biometric system determines whether the bone conducted audio signal includes any voice activity. Various voice activity detection methods are known in the art, and the present disclosure is not limited in this respect. For example, voice activity detection may be relatively complex, where one or more parameters of the bone-conducted signal (e.g., spectral slope, correlation coefficient, log-likelihood ratio, cepstrum, weighted cepstrum, and/or modified distance metric) are determined and compared to corresponding parameters that are characteristic of speech. In a simpler embodiment, it may be assumed that the user's voice of the personal audio device 202 will dominate the bone conducted signal when the user speaks (i.e., the user's voice will dominate over other noise sources). In this case, the voice activity detection may comprise a simple comparison of the amplitude of the bone-conducted audio signal with a threshold; when the amplitude is above the threshold, it may be assumed that the bone conducted audio signal contains the user's speech.

If there is no speech activity in the bone-conducted audio signal, it may be assumed that no one is speaking and the method ends in step 406. If voice activity is present, the method proceeds to step 408 where the biometric system determines if the air-conducted audio signal includes any voice activity in step 408. Likewise, any suitable voice activity detection method may be used.

If there is no voice activity in the air-conducted audio signal, it may be assumed that the voice microphone is not operating properly or is in a noisy environment where voice is not detected, and the method ends in step 406. If there is voice activity in the air-conducted audio signal, the method proceeds to step 410 where the biometric system determines whether the air-conducted audio signal and the bone-conducted audio signal are related to each other in step 410.

For example, a correlation value indicative of the level of correlation between two signals may be compared to a threshold value: if the correlation value exceeds the threshold, then signal correlation may be determined; if the correlation value is less than the threshold, the signal may be determined to be uncorrelated. Any suitable cross-correlation method may be used, and the disclosure is not limited in this respect.

If the two audio signals are not correlated, it can be assumed that the speech microphone has detected a significant noise level (e.g., the presence of other speakers). In this case, it may not be appropriate to update the stored speech template based on the air-conducted speech signal, so the method proceeds to step 406 and ends. If the audio signals do correlate, the method proceeds to step 412 where the biometric system determines whether the user is authenticated as an authorized user in step 412.

The user may be authenticated as an authorized user via any suitable mechanism. For example, the user may be authenticated based on a voice biometric algorithm performed on the air-conducted audio signal obtained in step 402. Alternatively, authentication may be based on one or more alternative biometrics (such as ear biometrics, fingerprint, iris or retina scans) or non-biometric authentication (such as entry of a passcode, password or personal identification code).

If the user is not authenticated as an authorized user, the method ends in step 406 because the stored voice template for the authorized user should not be updated based on the voice of a different person. If the user is authenticated as an authorized user, the method proceeds to step 414, where in step 414, the user's speech model is updated based on the air-conducted audio signals obtained in step 402.

The speech model may be updated based on those portions of the air-conducted audio signal that correspond to portions of the bone-conducted audio signal that include speech. For example, those portions of the bone conducted audio signal that contain speech may be used to gate the air conducted audio signal, thereby isolating the user's voice from other noise sources or sources of voice present in the air conducted audio signal.

For example, the stored parameters of the speech model may be updated as follows:

μnew=αμstored+(1-α)μcalc

where α is a coefficient between 0 and 1, μnewIs a new (i.e., updated) stored speech model parameter, μstoredIs the old (i.e., previous) stored speech model parameters, and μcalcAre the newly acquired speech model data parameters. Thus, the new speech model is based on a combination of the previous speech model and the newly acquired speech model data. Of course, alternative expressions may be used to achieve almost the same effect. The value of the coefficient alpha may be set as needed to achieve a desired rate of change of the stored speech model. For example, it may be desirable for the speech model to change relatively slowly, making the system difficult to crack. Therefore, α may be set to a value close to 1 (e.g., 0.95 or higher).

Accordingly, embodiments of the present disclosure provide methods, apparatuses, and systems for authenticating a user.

25页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:将以不同格式捕获的音频信号转换为减少数量的格式以简化编码及解码操作

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!