Secure utterance storage
阅读说明:本技术 安全的话语存储 (Secure utterance storage ) 是由 W·F·H·克鲁斯 P·特克 P·托马斯 于 2018-06-13 设计创作,主要内容包括:公开了用于话语的安全存储的技术。计算装置捕获发出口头话语的人的音频。所述话语被提供给语音到文本(STT)服务,所述STT服务将所述话语转译为文本。所述STT服务还可以识别所述话语中的各种说话者特定的属性。所述文本和属性被提供给文本到语音(TTS)服务,所述TTS服务从所述文本和所述属性的子集创建语音。所述语音存储在数据存储区中,所述数据存储区的安全性低于存储原始话语所需的安全性。然后,可以丢弃所述原始话语。所述STT服务还可以将由所述TTS服务生成的所述语音转译为文本。然后,比较由所述STT服务从所述语音生成的所述文本和由所述STT服务从所述原始话语生成的所述文本。如果所述文本不匹配,则可以保留所述原始话语。(Techniques for secure storage of utterances are disclosed. The computing device captures audio of a person who uttered the spoken utterance. The utterance is provided to a speech-to-text (STT) service that translates the utterance into text. The STT service may also identify various speaker-specific attributes in the utterance. The text and attributes are provided to a text-to-speech (TTS) service that creates speech from a subset of the text and the attributes. The speech is stored in a data store that is less secure than the security required to store the original utterance. The original utterance may then be discarded. The STT service may also translate the speech generated by the TTS service into text. Then, the text generated by the STT service from the speech and the text generated by the STT service from the original utterance are compared. If the text does not match, the original utterance may be retained.)
1. A processor for:
causing a speech-to-text service to perform speech recognition on first audio data to identify one or more first words and a plurality of attributes in the first audio data;
causing a text-to-speech service to generate second audio data using at least the one or more first words and a subset of the plurality of attributes; and
such that only the second audio data is stored.
2. The processor of claim 1, wherein the plurality of attributes are indicative of personally identifiable information of a speaker.
3. The processor of claim 2, wherein the subset of the plurality of attributes includes attributes that do not indicate the personally identifiable information of the speaker.
4. The processor of any one of claims 1, 2, or 3, further providing a user interface for defining the subset of the plurality of attributes.
5. A processor according to any one of claims 1, 2, 3, or 4, further caused to provide a user interface for specifying whether the first audio data is to be stored or deleted.
6. The processor of any one of claims 1, 2, 3, 4, or 5, further to:
causing speech recognition to be performed on the second audio data to identify one or more second words;
comparing the one or more first terms to the one or more second terms; and
in response to determining that the one or more first words are the same as the one or more second words, causing the first audio data to be discarded and causing the second audio data to be stored.
7. The processor of claim 6, further responsive to determining that the one or more first terms are different from the one or more second terms, causing the first audio data to be stored and causing the second audio data to be discarded.
8. The processor of any of claims 1, 2, 3, 4, 5, 6, or 7, wherein the first audio data comprises a first utterance of the one or more first words and the second audio data comprises a second utterance of the one or more first words.
9. A computer-implemented method, comprising:
cause speech recognition to be performed on first audio data to identify one or more first words and a plurality of attributes in the first audio data;
causing second audio data to be generated using at least the one or more first words and a subset of the plurality of attributes; and
such that only the second audio data is stored.
10. The computer-implemented method of claim 9, wherein the plurality of attributes indicate personally identifiable information of a speaker.
11. The computer-implemented method of claim 10, wherein the subset of the plurality of attributes includes attributes that do not indicate the personally identifiable information of the speaker.
12. The computer-implemented method of any of claims 9, 10, or 11, further comprising providing a user interface for defining the subset of the plurality of attributes.
13. The computer-implemented method of any of claims 9, 10, 11, or 12, further comprising:
performing speech recognition on the second audio data to identify one or more second words;
comparing the one or more first terms with the one or more second terms; and
in response to determining that the one or more first words are the same as the one or more second words, discarding the first audio data and storing only the second audio.
14. The computer-implemented method of claim 13, further comprising, in response to determining that the one or more first terms are different from the one or more second terms, storing the first audio data and discarding the second audio data.
15. The computer-implemented method of any of claims 9, 10, 11, 12, 13, or 14, wherein the first audio data comprises a first utterance of the one or more first words and the second audio data comprises a second utterance of the one or more first words.
Background
Voice-driven computing systems are widely used. These computing systems typically receive a recording of a user's voice (often referred to as an "utterance"). Voice recognition may be applied to an utterance to determine whether a user has requested to perform a command, has requested information, or has requested to perform another type of action. In many systems, an original recording of the utterance is saved for future use. However, since such records may have attributes from which personally identifiable information (e.g., the user's age or gender) may be derived, high security levels are typically used to store the records. However, storing such records at a high security level may utilize significant computing resources, such as processor cycles, memory, and storage space.
With respect to these and other considerations, the disclosure set forth herein is presented.
Drawings
FIG. 1 is a system architecture diagram illustrating aspects of the configuration and operation of a web service for securely storing utterances, according to one embodiment;
FIG. 2 is a flow diagram illustrating a routine illustrating further aspects of the network service of FIG. 1 for securely storing utterances, according to one embodiment;
FIG. 3 is a system architecture diagram illustrating another example embodiment of the network service of FIG. 1 for securely storing utterances;
FIG. 4 is a flow diagram illustrating a routine illustrating further aspects of the network service of FIG. 3 for securely storing utterances, according to one embodiment;
FIG. 5 is a system architecture diagram illustrating aspects of the configuration and operation of a processor configured for securely storing utterances, according to one embodiment;
FIG. 6 is a computing system diagram illustrating a configuration of a distributed execution environment that may be used to implement aspects of the technology disclosed herein;
FIG. 7 is a computing system diagram illustrating aspects of a configuration of a data center that may be used to implement aspects of the technology disclosed herein; and is
FIG. 8 is a computer architecture diagram showing an illustrative computer hardware architecture for implementing a computing device that may be used to implement aspects of the various techniques presented herein.
Detailed Description
The following detailed description is directed to techniques for securely storing utterances, such as utterances recorded by a voice-driven computing device. Using the disclosed techniques, utterances can be stored in a secure manner while using less storage resources than previously required to securely store such utterances. Thus, savings in utilization of various types of computing resources (such as, but not limited to, processor cycles, memory usage, and mass storage usage) may be realized. In addition, power consumption savings may also be realized because computing resources may be more efficiently utilized using the techniques disclosed herein. The disclosed technology may also provide other technical benefits not specifically identified herein.
To provide the disclosed functionality, in one embodiment, the recorded utterance is provided to a speech to text ("STT") service that recognizes words in the utterance to translate the utterance into text. The STT service may also recognize various speaker-specific attributes of the utterance, such as, but not limited to, attributes of the utterance from which personally identifiable information ("PII") of the utterance may be derived (e.g., attributes indicating the age or gender of the speaker).
The text (i.e., recognized words) and utterance attributes are provided to a text-to-speech ("TTS") service that creates speech from a subset of the text and attributes recognized by the STT service, thereby removing at least some of the attributes that may be used to derive PII from the utterance. The speech created by the TTS is then stored in a data store that is less secure and therefore requires less computing resources than the data store required to store the original utterance containing the PII. The original utterance is then discarded by deleting or otherwise handling the utterance. In this manner, the utterance can be stored in a secure manner (i.e., without any PII or with limited PII included) while using less computing resources than needed to securely store the original utterance including the PII.
In one embodiment, the STT service may also translate speech generated by the TTS service into text. A comparison may then be made between the text generated by the STT service from the speech and the text generated by the STT service from the original utterance. If the text matches, the original utterance can be discarded and the text can be stored. If the text does not match, the original utterance may be retained and the speech generated by the STT service may be discarded. Recognized text in the original utterance may also be stored. Additional details regarding the various components and processes briefly described above for securely storing utterances will be presented below with respect to fig. 1-8.
It should be appreciated that the subject matter presented herein may be implemented as a computer process, a computer-controlled apparatus, a computing system, or as an article of manufacture such as a computer-readable storage medium. While the subject matter described herein is presented in the general context of program modules that execute on one or more computing devices, those skilled in the art will recognize that other implementations may be performed in combination with other types of program modules. Generally, program modules include routines, programs, components, data structures, and other types of structures that perform particular tasks or implement particular abstract data types.
Those skilled in the art will also appreciate that aspects of the subject matter described herein may be practiced with or in conjunction with other computer system configurations beyond those described herein, including the following: multiprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, hand-held computers, personal digital assistants, electronic readers, mobile telephone devices, tablet computing devices, dedicated hardware devices, network appliances, and the like. As mentioned briefly above, the embodiments described herein may be practiced in distributed computing environments where tasks may be performed by remote computing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.
In the following detailed description, references are made to the accompanying drawings that form a part hereof, and in which are shown by way of illustrations specific embodiments or examples. The drawings herein are not drawn to scale. Like numbers refer to like elements throughout the several views (which may be referred to herein as "figure(s)").
FIG. 1 is a system architecture diagram illustrating aspects of the configuration and operation of a
As will be described in detail below, the
In the embodiment shown in fig. 1, the
The
To address this concern and possibly other considerations, the mechanisms described in detail below remove some or all of the attributes in
To provide the disclosed functionality, the
To securely store the
The
In some implementations, the
Because the
In some embodiments, the
As shown in FIG. 1, attributes 116 and
In some configurations, the
In some embodiments,
As another example, if
It is to be appreciated that, in various embodiments, the above-described process can be performed synchronously as the
FIG. 2 is a flow diagram illustrating a routine 200, illustrating further aspects of the
The implementation of the various components described herein is a matter of choice dependent on the performance and other requirements of the computing system. Accordingly, the logical operations described herein are referred to variously as operations, structural devices, acts, or modules. These operations, structural devices, acts and modules may be implemented in software, in firmware, in special purpose digital logic, and any combination thereof. It should also be appreciated that more or fewer operations may be performed than shown in the figures and described herein. These operations may also be performed in parallel, or in a different order than described herein. Some or all of these operations may also be performed by components other than the specifically identified components.
The routine 200 begins at
From
At
At
At
At
From
From
As also described above, in some embodiments,
FIG. 3 is a system architecture diagram illustrating another example embodiment of the
If the
FIG. 4 is a flow diagram illustrating a routine 400, illustrating further aspects of the
From operation 404, the routine 400 proceeds to operation 406, where the
At operation 410, the
If, at operation 410, the
If
FIG. 5 is a system architecture diagram illustrating aspects of the configuration and operation of a processor configured to securely store utterances, according to one embodiment. As shown in fig. 5,
The instructions decoded by the decoder circuit may be general-purpose instructions or function-specific instructions that receive speech attributes 116 and
FIG. 6 is a system and network diagram illustrating aspects of a distributed
Each type of computing resource provided by the distributed
In one embodiment, the computing resources provided by the distributed
Users of the distributed
FIG. 7 is a diagram of a computing system illustrating one configuration of a
The server computer 702 may be a standard tower, rack, or blade server computer suitably configured to provide the computing resources 708. As described above, the computing resources 708 may be data processing resources, such as virtual machine instances or hardware computing systems, data storage resources, database resources, networking resources, and the like. Some servers 702 may also be configured to execute a
The
In the
Suitable load balancing devices or other types of network infrastructure components may also be used to balance the load between each data center 704A-704N, between each
Fig. 8 illustrates an example computer architecture for a computer 800 capable of executing program components for implementing various aspects of the functionality described herein. The computer architecture shown in fig. 8 illustrates a conventional server computer, workstation, desktop computer, laptop, tablet, network appliance, e-reader, smartphone, or other computing device, and may be used to execute any of the software components presented herein. For example, the computer architecture shown in FIG. 8 may be used to execute software components for providing the
The computer 800 includes a substrate 802 or "motherboard" that is a printed circuit board to which various components or devices may be connected via a system bus or other electrical communication path. In an illustrative embodiment, one or more central processing units ("CPUs") 804 operate in conjunction with a chipset 806. CPU 804 may be a standard programmable processor that performs arithmetic and logical operations necessary for the operation of computer 800.
The CPU 804 performs operations by operating switching elements that differentiate and change discrete physical states to transition from one discrete physical state to the next. A switching element may generally include electronic circuitry, such as a flip-flop, that maintains one of two binary states, and electronic circuitry, such as a logic gate, that provides an output state based on a logical combination of the states of one or more other switching elements. These basic switching elements may be combined to create more complex logic circuits including registers, adder-subtractors, arithmetic logic units, floating point units, and the like.
The chipset 806 provides an interface between the CPU 804 and the remaining components and devices on the substrate 802. The chipset 806 may provide an interface to a RAM 808 used as main memory in the computer 800. The chipset 806 may also provide an interface to a computer-readable storage medium, such as read-only memory ("ROM") 810 or non-volatile RAM ("NVRAM"), for storing basic routines that help to start the computer 800 and transfer information between various components and devices. The ROM 810 or NVRAM may also store other software components necessary for the operation of the computer 800 according to embodiments described herein.
The computer 800 may operate in a networked environment using logical connections to remote computing devices and computer systems through a network, such as the network 806. Chipset 806 may include functionality to provide network connectivity through NIC 812 (e.g., a gigabit ethernet adapter). The NIC 812 is capable of connecting the computer 800 to other computing devices via the
The computer 800 may be connected to a mass storage device 818, which provides a non-volatile storage for the computer. The mass storage device 818 may store an operating system 820, programs 822, and data, which have been described in greater detail herein. The mass storage device 818 may be connected to the computer 800 through a storage controller 814 that is connected to the chipset 806. The mass storage device 818 may be comprised of one or more physical memory units. Storage controller 814 may interface with physical storage units through a serial attached SCSI ("SAS") interface, a serial advanced technology attachment ("SATA") interface, a fibre channel ("FC") interface, or other type of interface for physically connecting and transferring data between a computer and physical storage units.
The computer 800 may store data on the mass storage device 818 by transforming the physical state of the physical memory units to reflect the information being stored. The particular transformation of physical state may depend on various factors, in different implementations of the present description. Examples of such factors may include, but are not limited to, the technology used to implement the physical storage unit, whether mass storage 818 is characterized as primary or secondary storage, and so forth.
For example, the computer 800 may store information to the mass storage device 818 by issuing instructions through the storage controller 814 to change the magnetic properties of a particular location within the disk drive unit, the reflective or refractive properties of a particular location in the optical storage unit, or the electrical properties of a particular capacitor, transistor, or other discrete component in the solid-state storage unit. Other transformations of physical media are possible without departing from the scope and spirit of the present description, with the foregoing examples provided only to facilitate this description. The computer 800 may also read information from the mass storage device 818 by detecting the physical state or characteristics of one or more particular locations within the physical storage unit.
In addition to the mass storage device 818, described above, the computer 800 may also access other computer-readable storage media to store and retrieve information, such as program modules, data structures, or other data. Those skilled in the art will appreciate that computer-readable storage media can be any available media that provide non-transitory storage of data and that can be accessed by the computer 800.
By way of example, and not limitation, computer-readable storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology. Computer-readable storage media include, but are not limited to, RAM, ROM, erasable programmable ROM ("EPROM"), electrically erasable programmable ROM ("EEPROM"), flash memory or other solid state memory technology, optical disk ROM ("CD-ROM"), digital versatile disks ("DVD"), high-definition DVD ("HD-DVD"), BLU-RAY or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information in a non-transitory manner.
As mentioned briefly above, the mass storage device 818 may store an operating system 820 for controlling the operation of the computer 800. In one embodiment, operating system 820 is a LINUX operating system. In another embodiment, operating system 820 is the WINDOWS SERVER operating system from MICROSOFT CORPORATION. In other embodiments, a UNIX operating system or one of its variants may be used as operating system 820. It should be understood that other operating systems may be used. The mass storage device 818 may store other systems or applications and data that are utilized by the computer 800.
In one embodiment, the mass storage device 818 or other computer-readable storage medium is encoded with computer-executable instructions that, when loaded into the computer 800 and executed, transform the computer from a general-purpose computing system into a special-purpose computer capable of implementing the embodiments described herein. As described above, these computer-executable instructions transform the computer 800 by specifying how the CPU 804 transitions between states. According to one embodiment, the computer 800 may access a computer-readable storage medium that stores computer-executable instructions that, when executed by the computer 800, perform various processes described herein. Computer 800 may also include a computer-readable storage medium for performing any other computer-implemented operations described herein.
The computer 800 may also include one or more input/output controllers 816 for receiving and processing input from a number of input devices, such as a keyboard, mouse, touchpad, touchscreen, electronic stylus, or other type of input device. Similarly, an input/output controller 816 may provide output to a display, such as a computer monitor, a flat panel display, a digital projector, a printer, or other type of output device. It should be understood that computer 800 may not include all of the components shown in fig. 8, may include other components not explicitly shown in fig. 8, or may utilize an architecture completely different than that shown in fig. 8.
One or more embodiments disclosed herein may include a processor to cause a speech-to-text service to perform speech recognition on first audio data to identify one or more first words and a plurality of attributes in the first audio data, to cause the text-to-speech service to generate second audio data using a subset of at least the one or more first words and the plurality of attributes, and to cause only the second audio data to be stored.
Optionally, in one or more embodiments disclosed herein, the plurality of attributes may indicate personally identifiable information of the speaker. Optionally, in one or more embodiments disclosed herein, the subset of the plurality of attributes may include attributes that are not indicative of personally identifiable information of the speaker. Optionally, in one or more embodiments disclosed herein, the processor may also provide a user interface for defining a subset of the plurality of attributes. Optionally, in one or more embodiments disclosed herein, the processor may further cause a user interface to be provided for specifying whether the first audio data is to be stored or deleted. Optionally, in one or more embodiments disclosed herein, the processor may further cause speech recognition to be performed on the second audio data to identify one or more second words; causing one or more first terms to be compared to one or more second terms; and in response to determining that the one or more first words are the same as the one or more second words, causing the first audio data to be discarded and the second audio data to be stored. Optionally, in one or more embodiments disclosed herein, in response to determining that the one or more first terms are different from the one or more second terms, the processor may further cause the first audio data to be stored and the second audio data to be discarded. Optionally, in one or more embodiments disclosed herein, the first audio data may include a first utterance of one or more first words, and the second audio data may include a second utterance of one or more first words.
One or more embodiments disclosed herein may cause speech recognition to be performed on first audio data to identify one or more first words and a plurality of attributes in the first audio data, cause second audio data to be generated using at least a subset of the one or more first words and the plurality of attributes, and cause only the second audio data to be stored.
Optionally, in one or more embodiments disclosed herein, the plurality of attributes may indicate personally identifiable information of the speaker. Optionally, in one or more embodiments disclosed herein, the subset of the plurality of attributes may include attributes that are not indicative of personally identifiable information of the speaker. Optionally, one or more embodiments disclosed herein may provide a user interface for defining a subset of the plurality of attributes. Optionally, one or more embodiments disclosed herein may perform speech recognition on the second audio data to identify one or more second words, compare the one or more first words to the one or more second words, and in response to determining that the one or more first words are the same as the one or more second words, discard the first audio data and store only the second audio. Optionally, in response to determining that the one or more first terms are different from the one or more second terms, one or more embodiments disclosed herein may store the first audio data and discard the second audio data. Optionally, in one or more embodiments disclosed herein, the first audio data may include a first utterance of one or more first words, and the second audio data may include a second utterance of one or more first words.
It should be appreciated that techniques for securely storing utterances have been disclosed herein. Although the subject matter presented herein has been described in language specific to computer structural features, methodological acts, and computer readable media, it is to be understood that the invention defined in the appended claims is not necessarily limited to the specific features, acts, or media described herein. Rather, the specific features, acts and mediums are disclosed as example forms of implementing the claims.
The above-described subject matter is provided by way of illustration only and should not be construed as limiting. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure. Various modifications and changes may be made to the subject matter described herein without following the example embodiments and applications illustrated and described, and without departing from the true spirit and scope of the present invention, which is set forth in the following claims.
- 上一篇:一种医用注射器针头装配设备
- 下一篇:基于相关性的近场检测器