Bioassay process
阅读说明:本技术 生物测定过程 (Bioassay process ) 是由 J·P·莱索 于 2019-03-20 设计创作,主要内容包括:本公开内容提供了用于基于音频信号与所存储的用于已授权用户的语音模型的比较来认证用户的方法、系统、设备和计算机程序产品。在一个方面,一种方法包括:获得包括骨传导的信号的表示的第一音频信号,其中所述骨传导的信号经由所述用户的骨骼的至少一部分传导;获得包括空气传导的信号的表示的第二音频信号;以及,响应于确定所述第一音频信号包括语音信号,启用基于所述第二音频信号对所存储的用于已授权用户的语音模型的更新。(The present disclosure provides methods, systems, devices, and computer program products for authenticating a user based on a comparison of an audio signal to a stored speech model for an authorized user. In one aspect, a method comprises: obtaining a first audio signal comprising a representation of a bone conducted signal, wherein the bone conducted signal is conducted via at least a portion of a bone of the user; obtaining a second audio signal comprising a representation of the air-conducted signal; and, in response to determining that the first audio signal comprises a speech signal, enabling updating of the stored speech model for the authorized user based on the second audio signal.)
1. A method in a biometric authentication system for authenticating a user based on a comparison of an audio signal to a stored speech model for an authorized user, the method comprising:
obtaining a first audio signal comprising a representation of a bone conducted signal, wherein the bone conducted signal is conducted via at least a portion of a bone of the user;
obtaining a second audio signal comprising a representation of the air-conducted signal;
in response to determining that the first audio signal comprises a speech signal, enabling updating of a stored speech model for the authorized user based on the second audio signal.
2. The method of claim 1, further comprising:
updating the stored speech model for the authorized user is further enabled, and in response to authenticating the user as the authorized user, the stored speech model for the authorized user is updated using the second audio signal.
3. The method of claim 2, wherein the user is authenticated as the authorized user based on a biometric process.
4. The method of claim 3, wherein the biometric process comprises a voice biometric process based on the second audio signal.
5. The method of claim 2, wherein the user is authenticated as the authorized user based on a non-biometric process.
6. The method of claim 5, wherein the non-biometric process comprises entering a password for the authorized user.
7. The method of any of the preceding claims, wherein the step of enabling the updating of the stored speech model for the authorized user is further responsive to determining that the second audio signal comprises a speech signal.
8. The method of any of the preceding claims, wherein the step of enabling the updating of the stored speech model for the authorized user based on the second audio signal is further based on a comparison between the first audio signal and the second audio signal.
9. The method of claim 8, wherein enabling updating of the stored speech model for the authorized user based on the second audio signal is in response to detecting a correlation between the first audio signal and the second audio signal.
10. The method of claim 9, wherein enabling updating of the stored speech model for the authorized user based on the second audio signal is in response to detecting a correlation in the first audio signal identified as including a portion of the speech signal and a corresponding portion of the second audio signal.
11. The method of any preceding claim, wherein the first audio signal is generated by an in-ear transducer.
12. The method of any preceding claim, wherein the second audio signal is generated by a microphone external to the user's ear.
13. A biometric authentication system for authenticating a user based on a comparison of an audio signal to a stored speech model for an authenticated user, the biometric authentication system comprising:
a first input for obtaining a first audio signal comprising a representation of a bone conducted signal, wherein the bone conducted signal is conducted via at least a portion of a bone of the user;
a second input for obtaining a second audio signal comprising a representation of an air-conducted signal; and
an enabling module operable to determine whether the first audio signal comprises a speech signal and, in response to determining that the first audio signal comprises a speech signal, enable updating of a stored speech model for an authorized user based on the second audio signal.
14. The biometric authentication system of claim 13, further comprising a biometric module operable to update the stored speech model for the authorized user using the second audio signal in response to authenticating the user as the authorized user.
15. The biometric authentication system of claim 14, further comprising an authentication module operable to authenticate the user as the authorized user based on a biometric process.
16. The biometric authentication system of claim 15, wherein the biometric process comprises a voice biometric process based on the second audio signal.
17. The biometric authentication system of claim 14, further comprising an authentication module operable to authenticate the user as the authorized user based on a non-biometric process.
18. The biometric authentication system of claim 17, wherein the non-biometric process includes entering a password for the authorized user.
19. The biometric authentication system of any one of claims 13 to 18, wherein the enabling module is further operable to enable an update of the stored voice model for the authorized user based on the second audio signal in response to determining that the air-conducted signal includes a voice signal.
20. The biometric authentication system of any one of claims 13 to 19, wherein the enabling module is further operable to enable updating of the stored speech model for the authorized user based on the second audio signal based on a comparison of the first audio signal and the second audio signal.
21. The biometric authentication system of claim 20, wherein the enabling module is further operable to enable an update of the stored speech model for the authorized user based on the second audio signal in response to detecting a correlation between the first audio signal and the second audio signal.
22. The biometric authentication system of any one of claims 13 to 21, wherein the first input is connectable to a transducer adapted for insertion into an ear of a user.
23. The biometric authentication system of any one of claims 13 to 22, wherein the second input is connectable to a voice microphone.
24. The biometric authentication system according to any one of claims 13 to 23, wherein the biometric authentication system is provided on a single integrated circuit.
25. An electronic device for authenticating a user based on a comparison of an audio signal to a stored speech model for an authenticated user, the electronic device comprising processing circuitry and a non-transitory machine-readable medium storing instructions that, when executed by the processing circuitry, cause the electronic device to:
obtaining a first audio signal comprising a representation of a bone conducted signal, wherein the bone conducted signal is conducted via at least a portion of a bone of the user;
obtaining a second audio signal comprising a representation of the air-conducted signal; and
in response to determining that the first audio signal comprises a speech signal, enabling updating of a stored speech model for an authorized user based on the second audio signal.
26. The electronic device of claim 25, wherein the electronic device comprises a personal audio device or a host electronic device.
27. A non-transitory machine-readable medium for authenticating a user based on a comparison of an audio signal to a stored speech model for an authenticated user, the medium storing instructions that, when executed by processing circuitry, cause an electronic device to:
obtaining a first audio signal comprising a representation of a bone conducted signal, wherein the bone conducted signal is conducted via at least a portion of a bone of the user;
obtaining a second audio signal comprising a representation of the air-conducted signal; and
in response to determining that the first audio signal comprises a speech signal, enabling updating of a stored speech model for an authorized user based on the second audio signal.
Technical Field
Embodiments of the present disclosure relate to methods, devices, and systems for performing biometric processes, and more particularly, to methods, devices, and systems for performing biometric processes that include authenticating a user based on the user's voice.
Background
Biometric technology is becoming increasingly popular as a method for authenticating those users who are attempting to access restricted areas or restricted devices or who are attempting to perform restricted actions. A number of different biometric identifiers are known, including fingerprint recognition, iris recognition and facial recognition.
The voice biometric system authenticates the user based on the user's voice. Prior to authentication using a voice biometric system, a user first registers with the system. During enrollment, the voice biometric system acquires biometric data that is characteristic of the user's voice and stores the data as a voice model or voiceprint. Authentication may be based on a particular word or phrase (text-dependent) spoken during enrollment, or based on a voice (text-independent) different from the voice spoken during enrollment. Authentication involves extracting one or more biometric features from the input audio signal and comparing these features to a stored voiceprint. Determining that the acquired data matches or is sufficiently close to the stored voiceprint results in a successful authentication of the user. Successful authentication of the user may result in the user being allowed to perform restricted actions or being authorized to access restricted areas or restricted devices (for example). If the acquired features do not match or are not sufficiently close to the stored voiceprint, the user is not authenticated and the authentication attempt is unsuccessful. An unsuccessful authentication attempt may prevent the user from being allowed to perform a restricted action or the user may be denied access to a restricted area or restricted device.
The performance of a voice biometric system may be limited by changes in the user's voice that occur during the time period between enrollment and authentication. For example, the user's voice may vary with age, disease, or time of day at which biometric data is acquired. If the user's voice changes sufficiently, the authentication system may reject the user even if they are authorized and should have been authenticated, a problem known as "false rejection". The voice biometric system may take into account changes in the user's voice by collecting additional biometric identification data at multiple intervals and using these data to update the stored voiceprint. This process is called enrichment (enrichment).
Enrichment may be a supervised or an unsupervised process. Supervised enrichment involves prompting a user to re-register with the system at multiple intervals. For example, the user may be asked to repeat a particular word or phrase, and the resulting data may be used to update the stored voice print. Prior to this process, the identity of the user is established using one or more authentication techniques (e.g., the user may be required to enter a password or personal identification code). While supervised enrichment provides a reliable method for updating stored voiceprints, it requires the user to actively participate in the enrichment process.
In contrast, unsupervised enrichment uses any speech from the user to update the stored voiceprint without explicit knowledge of the user. Biometric data can be collected during routine use without prompting the user to provide additional input. Thus, unsupervised enrichment allows stored voiceprints to be updated more frequently, thereby improving the performance of the voice biometric system.
In order to effectively use unsupervised enrichment, it is important to update the stored user voiceprints using only the user's voice. If the voiceprint is erroneously updated using, for example, a voice from another speaker, the effectiveness of the speech biometric system may be compromised and the user may experience more frequent false rejects. In addition to inconveniencing the user, erroneously updating stored voiceprints can also pose a significant security risk. Thus, to successfully implement unsupervised enrichment in a speech biometric system, the speech biometric system should be able to distinguish between the user's voice and other audio detected by the system (e.g., voices from other speakers).
Embodiments of the present disclosure seek to address this and other problems.
Disclosure of Invention
One aspect of the present disclosure provides a method in a biometric authentication system for authenticating a user based on a comparison of an audio signal to a stored speech model for an authorized user. The method comprises the following steps: obtaining a first audio signal comprising a representation of a bone conducted signal, wherein the bone conducted signal is conducted via at least a portion of a bone of the user; and obtaining a second audio signal comprising a representation of the air-conducted signal; and, in response to determining that the first audio signal comprises a speech signal, enabling updating of the stored speech model for the authorized user based on the second audio signal.
Another aspect provides a biometric authentication system for authenticating a user based on a comparison of an audio signal to a stored speech model for an authenticated user. The biometric authentication system includes: a first input for obtaining a first audio signal comprising a representation of a bone conducted signal, wherein the bone conducted signal is conducted via at least a portion of a bone of the user; a second input for obtaining a second audio signal comprising a representation of an air-conducted signal; and an enabling module operable to determine whether the first audio signal comprises a speech signal and, in response to determining that the first audio signal comprises a speech signal, enable updating of the stored speech model for the authorized user based on the second audio signal.
Another aspect provides an electronic device for authenticating a user based on a comparison of an audio signal to a stored speech model for an authenticated user. The electronic device includes processing circuitry and a non-transitory machine-readable medium storing instructions that, when executed by the processing circuitry, cause the electronic device to: obtaining a first audio signal comprising a representation of a bone conducted signal, wherein the bone conducted signal is conducted via at least a portion of a bone of the user; obtaining a second audio signal comprising a representation of the air-conducted signal; and, in response to determining that the first audio signal comprises a speech signal, enabling updating of the stored speech model for the authorized user based on the second audio signal.
Another aspect provides a non-transitory machine-readable medium for authenticating a user based on a comparison of an audio signal to a stored speech model for an authenticated user. The medium stores instructions that, when executed by processing circuitry, cause an electronic device to: obtaining a first audio signal comprising a representation of a bone conducted signal, wherein the bone conducted signal is conducted via at least a portion of a bone of the user; obtaining a second audio signal comprising a representation of the air-conducted signal; and, in response to determining that the first audio signal comprises a speech signal, enabling updating of the stored speech model for the authorized user based on the second audio signal.
Drawings
For a better understanding of embodiments of the present disclosure, and to show more clearly how the same may be carried into effect, reference will now be made, by way of example only, to the following drawings, in which:
fig. 1 a-1 f illustrate a personal audio device according to an embodiment of the present disclosure;
fig. 2 is a schematic diagram illustrating an arrangement according to an embodiment of the present disclosure;
fig. 3 illustrates a system according to an embodiment of the present disclosure; and
fig. 4 is a flow diagram of a method according to an embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure provide methods, apparatuses, and computer programs for enriching or updating stored speech models (also referred to as templates or voiceprints) for authorized users of biometric authentication systems. Various embodiments utilize bone-conducted speech signals (e.g., speech signals that have been conducted at least partially via a portion of the user's bone, such as the jaw bone) to identify when the user is speaking and enable updating of the stored speech model. For example, a method may include obtaining a first audio signal and a second audio signal that include a representation of a bone conducted signal and a representation of an air conducted signal, respectively. In response to determining that the first audio signal comprises a speech signal, updating of the stored speech model based on the second audio signal may be implemented. Other embodiments may include: updating the stored speech model is effected in response to determining that the second audio signal comprises a speech signal, or in response to determining that the first audio signal and the second audio signal comprise respective speech signals that are related to each other.
Embodiments of the present disclosure may be implemented in a variety of different electronic devices and systems. Fig. 1 a-1 f illustrate embodiments of personal audio devices that may be used to implement aspects of the present disclosure. As used herein, the term "personal audio device" is any electronic device that is suitable or configurable to provide audio playback to substantially only a single user. Some embodiments of suitable personal audio devices are shown in fig. 1 a-1 f.
Fig. 1a shows a schematic view of a user's ear comprising an (external) concha (pinna) or auricle (auricle)12a and an (internal)
The headphones comprise one or more speakers 22, which one or more speakers 22 are positioned on the inner surface of the headphones and arranged to generate acoustic signals towards the user's ear, and in particular the
Headphones may be able to perform active noise cancellation to reduce the amount of noise experienced by a user of the headphones. Active noise cancellation operates by detecting noise (i.e., using a microphone) and generating a signal that is the same amplitude but opposite phase as the noise signal (i.e., using a speaker). Thus, the generated signal destructively interferes with the noise, thereby mitigating the noise experienced by the user. Active noise cancellation may operate based on a feedback signal, a feedforward signal, or a combination of the two. Feed-forward active noise cancellation utilizes one or more microphones on the outer surface of the headset, operating to detect ambient noise before it reaches the user's ear. The detected noise is processed quickly and a cancellation signal is generated to match the incoming noise as it reaches the user's ear. Feedback active noise cancellation operates to detect a combination of noise and an audio playback signal generated by one or more speakers, using one or more error microphones on the inner surface of the headphones. This combination is used in a feedback loop, along with knowledge of the audio playback signal, to adjust the cancellation signal generated by the speaker to reduce noise. Thus, the
The
Fig. 1b shows an alternative
Fig. 1c shows another alternative personal audio device 40, where the personal audio device 40 comprises an in-ear (intra-concha) headphone (or earphone). In use, the in-the-ear headphone is located inside the external ear cavity of the user. The in-the-ear headset may be loosely mounted within the cavity, allowing air to flow into and out of the user's
As with the devices shown in fig. 1a and 1b, in-ear headphones comprise one or more speakers 42 and one or more microphones 44, which one or more speakers 42 and one or more microphones 44 may form part of an active noise cancellation system.
Fig. 1d shows another alternative
Since an in-ear headphone may provide a relatively tight acoustic seal around the
Fig. 1e shows another alternative
In use, the
The
Thus, all of the personal audio devices described above provide audio playback to substantially a single user in use. Each device is also operable to detect bone-conducted voice signals through a
Fig. 1f shows the application of a personal audio device (in this case having a similar construction to the personal audio device 50) to a user. The user has two
A
When the user speaks, his or her voice is transmitted through the air to the
Those skilled in the art will appreciate that the microphone or other transducer (such as an accelerometer) that detects the bone conducted signal may be the same as the microphone or other transducer provided as part of the active noise cancellation system (e.g., for detecting error signals). Alternatively, separate microphones or transducers may be provided for these individual purposes (or combination of purposes) in the personal audio device described above.
All of the devices shown in fig. 1 a-1 f and described above may be used to implement aspects of the present disclosure.
Fig. 2 illustrates an
The
For example, the
The
Some examples of suitable biometric processes include biometric enrollment and biometric authentication. Enrollment includes acquiring and storing biometric data, which is a characteristic of an individual. In the present context, such stored data may be referred to as a "voiceprint". Authentication includes obtaining biometric data from an individual and comparing the data to stored data for one or more registered or authorized users. A positive comparison (i.e., the acquired data matches or is sufficiently close to the stored voice print or ear print) results in the individual being authenticated. For example, the individual may be allowed to perform restricted actions, or be authorized to access a restricted area or restricted device. A negative comparison (i.e., the acquired data does not match or is not sufficiently close to the stored voiceprint or earprint) results in the individual not being authenticated. For example, the individual may not be allowed to perform restricted actions or be authorized to access restricted areas or restricted devices.
In some embodiments, the
Fig. 3 illustrates a system 300 according to an embodiment of the present disclosure.
The system 300 includes a processing circuit 324, which processing circuit 324 may include one or more processors, such as a central processing unit or Application Processor (AP) or Digital Signal Processor (DSP). The system 300 also includes a memory 326, the memory 326 communicatively coupled to the processing circuitry 324. Memory 326 may store instructions that, when executed by processing circuitry 324, cause processing circuitry to perform one or more methods as described below (e.g., see fig. 4).
The one or more processors may perform the methods described herein based on data and program instructions stored in memory 324. The memory 324 may be provided as a single component or multiple components or co-integrated with at least some of the processing circuitry 322. In particular, the methods described herein may be performed in the processing circuit 322 by executing instructions stored in the memory 324 in a non-transitory form, where the program instructions are stored during manufacture of the system 300 or the
The system 300 includes a first microphone 302, which first microphone 302 may belong to a personal audio device (i.e., as described above). The first microphone 302 may be configured to be placed in or near the ear of a user in use, and is hereinafter referred to as "ear microphone (302"). As described above, the ear microphone 302 is operable to detect a voice signal from bone conduction of the user.
The processing circuitry 324 includes an analog-to-digital converter (ADC)304, which analog-to-digital converter 304 receives and converts the electrical audio signal detected by the ear microphone from the analog domain to the digital domain. Of course, in an alternative implementation, the headset 302 may be a digital microphone and generate a digital data signal (and thus need not be converted to the digital domain).
The system 300 also includes a second microphone 310, which second microphone 310 may belong to the personal audio device 202 (i.e., as described above). The second microphone 310 may be configured to be placed outside the ear of the user in use. The second microphone 310 is hereinafter referred to as a "voice microphone 310". As described above, voice microphone 310 is operable to detect air-conducted voice signals from a user. Processing circuitry 324 also includes an ADC 312 for audio signals detected by voice microphone 310 (unless voice microphone 310 is a digital microphone that produces digital data signals, as discussed above).
The output of ADC 304 (i.e., the bone-conducted audio signal) is passed to enable module 306. The output of ADC 310 (i.e., the air-conducted audio signal) may also optionally be passed to enable module 306. The operation of enablement module 306 will be described in more detail below.
The system implements a voice biometric authentication algorithm. Thus, the air-conducted audio signal is also used to perform voice biometric authentication.
The signal detected by the voice microphone 310 is in the time domain. However, the features extracted for the purposes of the biometric process may be in the frequency domain (since the characteristic is the frequency of the user's speech). Thus, the processing circuitry 324 includes a fourier transform module 308, the fourier transform module 308 converting the reflected signal to the frequency domain. For example, the fourier transform module 308 may implement a Fast Fourier Transform (FFT).
The transformed signal is then passed to a feature extraction module 314, which feature extraction module 314 extracts one or more features of the transformed signal for use in a biometric process (e.g., biometric enrollment, biometric authentication, etc.). For example, the feature extraction module 314 may extract one or more mel-frequency cepstral coefficients. Alternatively, the feature extraction module may determine the amplitude or energy of the user's speech at one or more predetermined frequencies or within one or more frequency ranges. The extracted features may correspond to data for a model of the user's speech.
The extracted features are passed to a biometric module 316, which biometric module 316 performs a biometric process on them. For example, the biometric module 316 may perform a biometric enrollment in which the extracted features (or parameters derived from the extracted features) are stored in the biometric data as part of the characteristics of the individual. The biometric data may be stored in a memory module 318 (and may be securely accessible by the biometric module 316) located within or remotely from the system. Such stored data may be referred to as a "voiceprint". In another embodiment, the biometric module 316 may perform biometric authentication and compare one or more extracted features to corresponding features in the stored voiceprint (or stored voiceprints). Based on the comparison, a biometric score is generated that indicates a likelihood that the speech contained in the air-conducted speech signal corresponds to the speech of the authorized user. The score may be compared to a threshold to determine whether the speech contained in the air-conducted speech signal is authenticated as being that of an authorized user. For example, in one embodiment, the voice is authenticated when the biometric score exceeds the threshold; when the biometric score is below the threshold, the voice is not authenticated.
As described above, embodiments of the present disclosure relate to enrichment or updating of stored voiceprints for authorized users, and in particular to using bone conducted audio signals to determine when air conducted audio signals include speech of a user of a system. In other words, due to the position of the ear microphone 302 in use, the bone conducted audio signal may only contain the voice of the user of the system 300. If other voices are present in the bone-conducted audio signal (e.g., due to other nearby speakers), the signals associated with those voices may have much lower amplitudes than the signals associated with the user's voices. Thus, a positive determination that speech is present in the bone conducted audio signal may be used to enable an update or enrichment of the voiceprint of an authorized user.
Thus, in one implementation, the enablement module 306 operates to receive the bone conducted audio signal from the ADC 304 and generate an output control signal for the biometric module 316 to enable the biometric module 316 to update the stored speech model based on the air conducted audio signal.
In one embodiment, enablement module 306 may receive only bone conducted audio signals and include a voice activity detection module or otherwise operate to perform a voice activity detection function to detect the presence of audio in the bone conducted audio signals that is characteristic of speech. Note that such voice activity detection does not correspond to speaker detection (i.e., recognition of a particular speaker), but generally corresponds to detection of speech.
Various voice activity detection methods are known in the art, and the present disclosure is not limited in this respect. For example, voice activity detection may be relatively complex, where one or more parameters of the bone-conducted signal (e.g., spectral slope, correlation coefficient, log-likelihood ratio, cepstrum, weighted cepstrum, and/or modified distance metric) are determined and compared to corresponding parameters that are characteristic of speech. In a simpler embodiment, it may be assumed that the user's voice of the
In one implementation, in response to determining that the bone conducted audio signal comprises a voice signal, the enablement module 306 outputs a control signal to the biometric module 316 that enables the biometric module 316 to update the stored voiceprint for the authorized user based on the air conducted audio signal.
Enable module 306 may further receive the air-conducted audio signal from ADC 310 and determine whether to enable updating of the stored speech model based on both the bone-conducted audio signal and the air-conducted audio signal.
For example, enablement module 306 may perform a voice activity detection function on the air-conducted audio signal to detect the presence of audio in the air-conducted audio signal that is characteristic of speech. When both the air-conducted audio signal and the bone-conducted audio signal contain speech, enablement module 306 can generate an output control signal to biometric module 316, as described above. In this embodiment, it is understood that the control signal may be generated when the temporally overlapping (or simultaneous) portions of the air-conducted audio signal and the bone-conducted audio signal both comprise speech. In this way, it may be assumed that both speech in the bone conducted audio signal and speech in the air conducted audio signal originate from the same person (i.e. user).
Additionally or alternatively, enablement module 306 may cross-correlate the bone conducted audio signal with the air conducted audio signal. After determining that the bone conducted audio signal includes speech, enablement module 306 may cross-correlate the bone conducted audio signal (particularly the portion of the bone conducted audio signal that includes speech) with the air conducted audio signal (particularly the portion of the air conducted audio signal that is concurrently present with the portion of the bone conducted audio signal that includes speech) to determine a level of correlation between the two signals. Any suitable correlation algorithm may be used. In response to determining that the two signals are correlated (e.g., the correlation exceeds a threshold), the enablement module 306 may output a control signal to the biometric module 316 to enable updating of the stored speech model.
The determination to enable updating of the stored speech model may be further based on authentication of the user of the
In one embodiment, the authentication module 320 includes the biometric module 316 or is the same as the biometric module 316. Thus, the system 300 may be used to authenticate a user based on air-conducted audio signals. The biometric module 316 performs a biometric authentication algorithm on the air-conducted audio signal and compares one or more features extracted from the air-conducted audio signal to a stored voiceprint for an authorized user. Based on the comparison, an output is generated indicating a determination as to whether the user of system 300 is an authorized user. This output may be used by the system 300 or personal audio device to allow one or more restricted actions in general. In the illustrated embodiment, the output is additionally or alternatively passed to an enabling module 306, in response which enabling module 306 may enable the updating of the stored voice print.
Additionally or alternatively, authentication module 320 may include one or more alternative authentication mechanisms. For example, authentication module 320 may perform authentication based on one or more alternate biometrics, such as an ear biometric, a fingerprint, an iris, or a retinal scan. For example, authentication module 320 may implement an input-output mechanism for accepting and authorizing a user based on a passcode, password, or personal identification code entered by the user and associated with the authorized user. The input-output mechanism may present a question to the user based on the passcode, password, or personal identification code, the answer to which does not reveal the entire passcode, password, or personal identification code. For example, a question may be associated with a particular character or number of a passcode, password, or personal identification code (e.g., "what is the third character of the password. The problem may require performing a mathematical operation on the personal identification code or a portion of the personal identification code (e.g., "what is the first digit of the personal identification code plus three. The input-output mechanism may audibly output the question (e.g., by playing back on a speaker) so that only the user can hear the question. Further, the input-output mechanism may provide input of the answer audibly (e.g., through microphone 310) or via some other input mechanism, such as a touch screen, keypad, keyboard, or the like.
According to an embodiment of the present disclosure, the system 300 is operable to update the stored voiceprint for an authorized user after the user is successfully authenticated as an authorized user.
Thus, the user registers with the biometric module 316 (i.e., by obtaining speech model data) and the voiceprint 318 is stored for the user. The user may then seek authentication via the system 300, thus obtaining more voice biometric data for this purpose, as described above. If the authentication is successful, the biometric module 316 may return a positive authentication message to the enabling module 306, enabling updating of the stored voice print 318 for the user based on the retrieved voice data.
If the authentication is not successful, the biometric module 316 may return a negative authentication message. However, the system 300 includes one or more further authentication mechanisms 320. If the user is subsequently successfully authenticated via one or more of these mechanisms, enablement module 306 can issue a control signal to biometric module 316 to update stored voice model 318 for the user with data acquired as part of an unsuccessful voice biometric authentication attempt.
Additionally or alternatively, updates to the stored speech model 318 for the user may be based on speech model data obtained only for this purpose (i.e., and not as part of a successful or failed authentication attempt). Once successfully authenticated, system 300 may utilize microphone 310 to obtain more speech model data, whether or not with the user's knowledge. Such data acquisition may be periodic, continuous, on a defined schedule, or upon detection of one or more defined events.
The stored speech model 318 may be updated by the biometric module 316 based on data within the air-conducted audio signal that overlaps in time or coincides with data comprising speech signals in the bone-conducted audio signal. For example, in some embodiments, detected speech in bone conducted audio signals may be used to gate portions of air conducted audio signals to be used to update stored speech models. For this purpose, a time stamp may be applied to the data in each audio signal. Thus, the time stamp of the data frame detected to comprise speech in the bone conducted audio signal may be used to identify the data frame in the air conducted audio signal to be used for updating the stored speech model.
Fig. 4 is a flow diagram of a method according to an embodiment of the present disclosure.
In
In
If there is no speech activity in the bone-conducted audio signal, it may be assumed that no one is speaking and the method ends in
If there is no voice activity in the air-conducted audio signal, it may be assumed that the voice microphone is not operating properly or is in a noisy environment where voice is not detected, and the method ends in
For example, a correlation value indicative of the level of correlation between two signals may be compared to a threshold value: if the correlation value exceeds the threshold, then signal correlation may be determined; if the correlation value is less than the threshold, the signal may be determined to be uncorrelated. Any suitable cross-correlation method may be used, and the disclosure is not limited in this respect.
If the two audio signals are not correlated, it can be assumed that the speech microphone has detected a significant noise level (e.g., the presence of other speakers). In this case, it may not be appropriate to update the stored speech template based on the air-conducted speech signal, so the method proceeds to step 406 and ends. If the audio signals do correlate, the method proceeds to step 412 where the biometric system determines whether the user is authenticated as an authorized user in
The user may be authenticated as an authorized user via any suitable mechanism. For example, the user may be authenticated based on a voice biometric algorithm performed on the air-conducted audio signal obtained in
If the user is not authenticated as an authorized user, the method ends in
The speech model may be updated based on those portions of the air-conducted audio signal that correspond to portions of the bone-conducted audio signal that include speech. For example, those portions of the bone conducted audio signal that contain speech may be used to gate the air conducted audio signal, thereby isolating the user's voice from other noise sources or sources of voice present in the air conducted audio signal.
For example, the stored parameters of the speech model may be updated as follows:
μnew=αμstored+(1-α)μcalc
where α is a coefficient between 0 and 1, μnewIs a new (i.e., updated) stored speech model parameter, μstoredIs the old (i.e., previous) stored speech model parameters, and μcalcAre the newly acquired speech model data parameters. Thus, the new speech model is based on a combination of the previous speech model and the newly acquired speech model data. Of course, alternative expressions may be used to achieve almost the same effect. The value of the coefficient alpha may be set as needed to achieve a desired rate of change of the stored speech model. For example, it may be desirable for the speech model to change relatively slowly, making the system difficult to crack. Therefore, α may be set to a value close to 1 (e.g., 0.95 or higher).
Accordingly, embodiments of the present disclosure provide methods, apparatuses, and systems for authenticating a user.