Secure similarity search for sensitive data

文档序号:1937644 发布日期:2021-12-07 浏览:17次 中文

阅读说明:本技术 针对敏感数据的安全相似性搜索 (Secure similarity search for sensitive data ) 是由 M·赖特 A·奥凯里博 于 2021-05-18 设计创作,主要内容包括:一种系统,包括:在关联处理单元(APU)上实现的用于创建加密向量的安全的存储器内单元。该存储器内单元包括:数据存储和加密器。数据存储用于存储数据;而加密器用于将数据加密为加密的向量。可选地,所述单元包括神经代理散列编码器,用于将数据编码为编码的向量,并且在该实施例中,加密器将编码的向量加密为经加密的编码的向量。神经代理散列编码器包括经训练的神经网络,该经训练的神经网络包括用于将数据编码为特征集的多个层。经训练的神经网络对图像文件、音频文件或大数据集进行编码。APU在SRAM、非易失性或非破坏性存储器上实现。(A system, comprising: a secure in-memory location implemented on an Associated Processing Unit (APU) for creating an encryption vector. The in-memory unit includes: data storage and encryptors. A data store for storing data; and an encryptor for encrypting the data into an encrypted vector. Optionally, the unit comprises a neugent hash encoder for encoding the data into an encoded vector, and in this embodiment the encryptor encrypts the encoded vector into an encrypted encoded vector. The neugent hash encoder includes a trained neural network that includes a plurality of layers for encoding data into a feature set. The trained neural network encodes an image file, an audio file, or a large data set. The APUs are implemented on SRAM, non-volatile or non-destructive memory.)

1. A system, comprising:

a secure in-memory location implemented on an Associated Processing Unit (APU) for creating an encrypted vector, the secure in-memory location for implementing:

a data store for storing data; and

an encryptor for encrypting the data into an encrypted vector.

2. The system of claim 1, further comprising a neugent hash encoder for encoding the data into an encoded vector, the encryptor for encrypting the encoded vector into an encrypted encoded vector.

3. The system of claim 2, wherein the neugent hash encoder includes a trained neural network including a plurality of layers for encoding the data into a feature set.

4. The system of claim 3, the trained neural network to encode at least one of: image files, audio files, and large data sets.

5. The system of claim 1 wherein the APU is implemented on one of: SRAM, non-volatile memory, and non-destructive memory.

6. A system, comprising:

a secure in-memory location implemented on an Associated Processing Unit (APU) for performing a security affinity search, the secure in-memory location for implementing:

a decryptor for decrypting the encrypted encoded vector into an encoded vector;

an encoded vector data store for storing a plurality of encoded search candidate vectors; and

a similarity searcher to perform a similarity search between the encoded search query vector and the plurality of encoded search candidate vectors.

7. The system of claim 6, wherein the encoded vector is one of: an encoded search query vector and an encoded search candidate vector.

8. The system of claim 6, wherein the vector data store is to store the encoded search candidate vectors in columns.

9. The system of claim 8, the similarity searcher to perform the similarity search on the plurality of encoded search candidate vectors in the column in a parallel process.

10. The system of claim 6, wherein the similarity search is a nearest neighbor search.

11. The system of claim 6 wherein the APU is implemented on one of: SRAM, non-volatile memory, and non-destructive memory.

12. A system, comprising:

a secure in-memory location implemented on an Associated Processing Unit (APU) for performing a security affinity search, the secure in-memory location for implementing:

a decryptor for decrypting the encrypted data vector into a data vector;

a neugent hash encoder for encoding the data vector into an encoded search data vector;

an encoded vector data store for storing a plurality of encoded search candidate vectors; and

a similarity searcher to perform a similarity search between the encoded search query vector and the plurality of encoded search candidate vectors.

13. The system of claim 12, wherein the encoded search data vector is one of: an encoded search query vector and an encoded search candidate vector.

14. The system of claim 12, the vector data store to store the encoded search candidate vectors in columns.

15. The system of claim 14, the similarity searcher to perform the similarity search on the plurality of encoded search candidate vectors in the column in a parallel process.

16. The system of claim 12, wherein the similarity search is a nearest neighbor search.

17. The system of claim 12, wherein the neugent hash encoder includes a trained neural network including a plurality of layers for encoding input data into a feature set.

18. The system of claim 17, the trained neural network to encode at least one of: image files, audio files, and large data set files.

19. The system of claim 12 wherein the APU is implemented on one of: SRAM, non-volatile memory, and non-destructive memory.

20. A system, comprising:

a secure in-memory unit, implemented on an Associated Processing Unit (APU), for secure data transfer, the secure in-memory unit to implement:

a decryptor for decrypting the encrypted data vector into a data vector; and

an encoded vector data store for storing a plurality of data vectors.

21. The system of claim 20 wherein the APU is implemented on one of: SRAM, non-volatile memory, and non-destructive memory.

Technical Field

The present invention relates generally to similarity searching and, more particularly, to sensitive data.

Background

Users often need to transfer sensitive data between their computing devices and third-party systems for processing without compromising the security of the transferred data. Such sensitive data may be, for example: private, personal, system critical or business confidential data. Some examples of such sensitive data transmissions are: the patient needs to provide medical images or medical records to a doctor or hospital; the autonomous control system needs to transfer the file from the sensor to the remote processing system; and, the investor needs to transmit the proof of assets to the financial institution. It is critical that such data transmission must be kept secure and private.

Sometimes sensitive information is transmitted over the internet from a personal computing device (e.g., a computer or mobile phone) to a remote server where the information is stored. Data transfer may also occur over a private network or via a device such as a USB thumb drive. Once the data is stored on the server, the system processor may access and retrieve the data for processing.

Referring now to FIG. 1, which illustrates how sensitive data is transferred between a user device and a processing system, the figure shows a user computing device 10 having a CPU 12, the CPU 12 being connected to a data storage device 14 via a data bus 15. The computing device may transfer sensitive data from the data storage 14 across the data bus 15 and encrypt it using software on the CPU 12. Sensitive data is encrypted using known methods, such as Secure Hash Algorithm (SHA) or other shared key algorithms (e.g., MD 5).

The encrypted data packet is then transmitted across the network 16. Network 16 may be implemented in a variety of ways, such as: "sneaker network" 17, where data is placed on a physical device such as a USB thumb drive and taken by a person to a receiving server; a private or public wired network 19; a private or public wireless network 20; or a cloud network 21 that may include a cloud-based server 22.

The processing system 25 has a CPU 27, a memory 26 and a data bus 32. The local server 33 is connected to the processing system 25 by a data bus 24 and/or the cloud server 22 is connected to the CPU 27 via a network connection 29. The data bus may be located within the processor, local connection, or network connection.

The encrypted data packets traverse the network 16, where the encrypted data packets are to be stored on the cloud server 22 or a local server 33 locally attached to the processing system 25. The processing system 25 has a CPU 27 that performs processing, a local memory 26 for storing a local copy of the data for processing, and a server attached as described above. The CPU 27 retrieves and decrypts the encrypted data from the local server 33 or the cloud server 22, and then performs any operation required, such as a search. Any output will be encrypted before writing to the server.

Disclosure of Invention

According to a preferred embodiment of the present invention, there is provided a system comprising: a secure in-memory location implemented on an Associated Processing Unit (APU) for creating an encryption vector. The in-memory unit includes a data store and an encryptor. The data store is for storing data; and the encryptor is for encrypting the data into an encrypted vector.

According to a preferred embodiment of the present invention, there is provided a system comprising: a secure in-memory location implemented on an Associated Processing Unit (APU) for performing a secure similarity search. The in-memory unit includes: a decryptor, an encoded vector data store, and a similarity searcher. A decryptor decrypts the encrypted encoded vector into an encoded vector. The encoded vector data store is for storing a plurality of encoded search candidate vectors; and a similarity searcher performing a similarity search between the encoded search query vector and the plurality of encoded search candidate vectors.

According to a preferred embodiment of the present invention, there is provided a system comprising: a secure in-memory location implemented on an Associated Processing Unit (APU) for performing a secure similarity search. The in-memory unit includes: a decryptor, a neugent hash encoder, an encoded vector data store, and a similarity searcher. The decryptor decrypts the encrypted data vector into a data vector; and the neugent hash encoder encodes the data vector into an encoded search data vector. The encoded vector data store is for storing a plurality of encoded search candidate vectors; and a similarity searcher performing a similarity search between the encoded search query vector and the plurality of encoded search candidate vectors.

According to a preferred embodiment of the present invention, there is provided a system comprising: a secure in-memory unit implemented on an Associated Processing Unit (APU) for secure data transfer. The in-memory unit includes: a decryptor and encoded vector data store. The decryptor decrypts the encrypted data vector into a data vector; and the encoded vector data store is for storing a plurality of data vectors.

Furthermore, in accordance with a preferred embodiment of the present invention, the system further includes a neugent hash encoder and an encryptor. The neugent hash encoder encodes the data into an encoded vector, and the encryptor encrypts the encoded vector into an encrypted encoded vector.

Additionally in accordance with a preferred embodiment of the present invention, the neugent hash encoder includes a trained neural network including a plurality of layers for encoding data into a feature set.

Furthermore, in accordance with a preferred embodiment of the present invention, the trained neural network encodes at least one of: image files, audio files, and large data sets.

Additionally in accordance with a preferred embodiment of the present invention, the APU is implemented on one of: SRAM, nonvolatile, and non-destructive memory.

Furthermore, in accordance with a preferred embodiment of the present invention, the encoded vector is an encoded search query vector or an encoded search candidate vector.

Further in accordance with a preferred embodiment of the present invention, the vector data store stores the encoded search candidate vectors in columns.

Further in accordance with a preferred embodiment of the present invention the similarity searcher performs a similarity search on a plurality of encoded search candidate vectors in a column in a parallel process.

Additionally in accordance with a preferred embodiment of the present invention the similarity search is a nearest neighbor search.

Drawings

The subject matter which is regarded as the invention is particularly pointed out and distinctly claimed in the concluding portion of the specification. The invention, however, both as to organization and method of operation, together with objects, features, and advantages thereof, may best be understood by reference to the following detailed description when read with the accompanying drawings in which:

FIG. 1 is a diagram of a prior art system for transferring sensitive data between a user device and a processing system;

FIG. 2A is a diagram of an encoded, encrypted search vector system;

FIG. 2B is a diagrammatic illustration of data flow in the encoded, encrypted search vector system of FIG. 2A;

FIG. 3A is a diagram of an encoded, encrypted candidate vector system;

FIG. 3B is a diagrammatic representation of the data flow in the encoded, encrypted candidate vector system of FIG. 3A;

FIG. 4A is a diagram of an encrypted candidate vector system;

FIG. 4B is a diagram of data flow in the encrypted candidate vector system of FIG. 4A;

FIG. 5A is a diagram of a vector transmission system; and

fig. 5B is a diagram of data flow in the vector delivery system of fig. 5A.

It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity. Further, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements.

Detailed Description

In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be understood by those skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, and components have not been described in detail so as not to obscure the present invention.

Applicants have appreciated that as data moves across data buses within a system and data packets move across a network, an intercepting device known as a "sniffer" (sniffer) may be used to intercept such sensitive data packets or steal encryption keys. Such sniffers may be hardware or software devices placed by bad actors. Once the data is intercepted, the data payload may be attacked and if decrypted, its security will be compromised.

Applicants have recognized that in-memory neural network encoding, in-memory encryption and decryption, and in-memory storage of encoded data may be performed on an Associated Processing Unit (APU), which may be implemented on any suitable type of memory array, such as an SRAM, non-volatile, or non-destructive type of memory array. An example of such an APU is a Gemini APU commercially available from GSI Technology inc. Such an associated memory device may deny access to sniffers in users and processing systems and improve security for transmitting data packets across a network. Applicants have also appreciated that such APU devices can be easily embedded in users and processing systems.

Referring to FIG. 2A, there is shown an encoded, encrypted search vector system 30 of a preferred embodiment of the present invention; and to fig. 2B, which illustrates the flow of data in the system 30. The encoded, encrypted search vector system 30 includes a secure user computing device 31 and a secure processing system 37 connected together across a network 46. Secure user computing device 31 and secure processing system 37 are each implemented on an APU, such as the APUs described above.

The user computing device 31 includes a data store 32, a neugent hash encoder 34, and a vector encryptor 35. Secure data vector dataiIs an uncoded and unencrypted original number stored in data store 32Accordingly, it may be encoded by the neugent hash encoder 34 as a feature set fsi

An example of such a neugent hash encoder 34, which is based on binary Hashing and maps data points in the original representation Space to binary codes in Hamming Space, is described in detail in U.S. provisional patent application 63/043,215 entitled "Hamming Space localization For Similarity Search," filed 24/6/2020, which is commonly owned by the applicant of the present invention and is incorporated herein by reference.

The neugent hash encoder is a Neural Network (NN) trained to encode data files as binary-coded feature sets. A feature set is a data representation of a particular characteristic of the data to be encoded. For example, if the feature of interest in the human characteristics dataset is a height or weight of a person, the NN is trained to extract and encode the height and weight from the data input to the NN. The neural network is trained by calibrating multiple "layers" using a training dataset with known content and feature labels. When the NN reliably extracts features from a known dataset, the NN is considered trained. The NN may also be trained to identify features in datasets, images, and sound files. Such large and highly complex data can be reduced to a known feature set, which is a set of binary data, referred to as a feature set. Applicants have recognized that the feature set is actually an encoding of complex data and, thus, can be used as an encoder.

The encryptor 35 may then use the sender's public and private keys and the recipient's public key and add any additional personal data (e.g. name and age), the data vector fs to be encodediEncrypted as an encoded and encrypted vector fsei. Then, vector fseiMay be transmitted to the processing system 37 via the network 46.

Network 46 is similar to network 16 in FIG. 1, and may be implemented in a variety of ways, such as: "sneaker network" 47, wherein the secure user computing device 31 places the encoded and encrypted data on a physical device, such as a USB thumb drive, and the user can take the drive to a computer or "kiosk" (kiosk) containing a secure processing system, such as at a hospital or doctor's office; a private or public wired network 48; a private or public wireless network 49; or cloud network 50.

The secure processing system 37 includes a data manager 38, a vector decryptor 39, an encoded vector data store 40, a secure similarity searcher 42 and a vector encryptor 44.

The data store 40 may store the encoded search candidate vector cfs in its columniWherein the candidate vector cfsiOr may have been previously encoded by another version of the neugent hash encoder 34.

The encrypted NN-encoded vector fse may be decrypted by the vector decryptor 39iSuch as those generated by the secure user computing device 31. The decryptor 39 may then apply the resulting NN-encoded vector fsiAs an encoded search query vector qfsiProvided to the secure similarity searcher 42, the secure similarity searcher 42 may then search for similar vectors in the NN-encoded search candidate vectors cfs in the columns of the data store 40.

The results of the similarity search (vector result) may then be stored by the encryptor 44 before being transmitted from the APUs or storedi) Encrypted into an encrypted vector resulti. The data manager 38 may then delete the encoded query vector qfsiOr it may be considered as candidate vector cfsiAdded to the data store 40 for future searching.

It should be noted that binary-coded vectors may be used as query vectors In a similarity search against data stores of candidate coded vectors that have been previously similarly coded, as described In U.S. patent 10,929,751 entitled "filing K Extreme Values In Processing Time," filed on 23.2.2021 and U.S. patent application 16/033,259 entitled "Natural Language Processing With KNN," filed on 12.7.2018, both of which are commonly owned by the applicant of the present invention and which are incorporated herein by reference.

It should be appreciated that the similarity search between the encoded binary query vector and the plurality of encoded binary candidate vectors is suitable for in-memory, large-capacity parallel processing of complexity O (1) performed on the APUs. Such similarity searches only require the use of the encoded feature set during such similarity searches. It should also be appreciated that similarity searches using a set of encoded features are less complex than similarity searches performed using complex data (e.g., large data sets, image and sound files).

It should be noted that all processing in the security similarity search is performed using only encoded vectors, and, as applicants have appreciated, encoded vectors contain only data that is convolved into an unrecoverable representation of the original data. It will be appreciated that the encoded data itself is secure even if the security of the secure processing system 37 is compromised. Thus, a malicious actor gaining access to such a security system would only gain access to the encoded feature set and not the original data set, image and sound files.

It should be noted that an encoded similarity search only requires the set of encoded features to be transmitted and utilized during such a similarity search. It will be appreciated that by transmitting only the encoded vectors, the size of the transmission file may be reduced. Functions such as image searching require increased fixed and mobile bandwidth. Compared to the original image data, NN-encoded vectors may achieve more than 50,000: 1, compression level. For example, a 1 megapixel image may be represented by 1600 ten thousand bits, while an NN-encoded vector of such a 1 megapixel image may be represented by only 256 bits. Such a level of compression may reduce the bandwidth requirements of an image-based search by the same amount. It will be appreciated that the bandwidth reduction also translates into reduced physical memory requirements. Users who may use thumb drives or similar portable memory devices may require much less memory on such devices when using NN encoded vectors. As the original file size increases (e.g., for higher fidelity sound or higher resolution images), feature set encoding represents an even higher reduction in transmission bandwidth requirements and a reduction in transmission duration.

It should be noted that sniffers may be present in the user equipment and the processing system and are capable of intercepting data packets on the data bus. Since hardware and software sniffers may be attached to the entire wireless or wired network, the sniffers are able to intercept data packets anywhere in the data transmission path.

It should be noted that each read/write operation between a processor and a server may need to be encrypted/decrypted. This requires encryption and decryption of each block of data that is retrieved from or written to the server. It should be appreciated that by storing and processing data on the APUs, the need for encryption/decryption for each memory retrieval/writing operation is reduced to a single instance of writing data from the server to the APU memory or transferring data from the APU to the server. This may reduce system complexity and data processing duration.

Applicants have appreciated that as encrypted, encoded search vectors may be securely transmitted between a user and a processing system, candidate vectors on which a search may be performed may also be securely transmitted.

Referring now to fig. 3A, an encoded, encrypted candidate vector system 52 is shown, and to fig. 3B, a data flow in system 52 is shown. Similar to the encoded, encrypted search vector system 30, the encoded, encrypted candidate vector system 52 includes a secure user computing device 31 and a secure processing system 37' that are connected together by a network 46.

Similarly, the secure data vector dataiIs unencrypted and undecrypted raw data stored in data store 32 that can be encoded by a neugent hash encoder 34 into a feature set fsi. The encoded data vector fs is then used by the encryptor 35 to encrypt the encoded data vector fs using the sender's public and private keys and the recipient's public key and adding any additional personal data (e.g., name and age)iEncrypted as an encoded and encrypted vector fsei. Then, vector fseiMay be transmitted to the processing system 37' over the network 46.

The secure processing system 37' includes a data manager 38', a vector decryptor 39', an encoded vector data store 40', a secure similarity searcher 42', and a vector encryptor 44.

Encrypted NN-encoded vector fseiThe (e.g., generated by the secure user computing device 31) may be decrypted by the vector decryptor 39'. In this embodiment, the decryptor 39' may apply the resulting NN-encoded vector fsiAs candidate vector cfsiStored in the encoded vector data store 40. The encoded query vector qfs may be input from the encoded vector data store 40 or as external data input from a useriInput to the security similarity searcher 42'. The security similarity searcher 42' may then select candidate NN-encoded vectors cfs stored in columns of the data store 40iSearching for similar vectors, including the newly added candidate vector cfsi

Result of similarity search resultiMay then be encrypted by encryptor 44 to an encrypted vector result before being stored or transmitted out of the APUi. The data manager 38' may then delete the newly added encoded candidate vector cfsiOr it may be considered as candidate vector cfsiAdded to the data store 40' for future searches.

Applicants have recognized that, just as encrypted encoded vectors can be securely transmitted between a user and a processing system, similarly, unencoded vectors can also be securely transmitted and then encoded in the processing system.

Referring now to fig. 4A, an encrypted candidate vector system 54 is shown, and to fig. 4B, the data flow in system 54 is shown. Similar to the encoded, encrypted candidate vector system 52, the encrypted candidate vector system 54 includes a secure user computing device 31' and a secure processing system 37 "that are connected together by a network 46.

Similarly, the secure data vector dataiIs the raw data stored in the data store 32, unencrypted and undecrypted, and may be encrypted by encryptor 35 to encrypted using the sender's public and private keys and the recipient's public key and adding any additional personal data (e.g., name and age)Vector dataei. Then, the encrypted vector dataeiMay be transmitted to the processing system 37 "via the network 46.

The secure processing system 37 "includes a data manager 38', a vector decryptor 39", a neugent hash encoder 56, an encoded vector data store 40', a secure similarity searcher 42', and a vector encryptor 44.

Encrypted data vector dataeiThe data (e.g., generated by the secure user computing device 31') may be decrypted by the vector decryptor 39 ". The decryptor 39' can then apply the resulting data vector dataiIs provided to the neugent hash encoder 56 to provide the data vector dataiCandidate vector cfs encoded as binary codeiAnd may be stored in the encoded vector data store 40'. Similar to the system 52 in fig. 3A and 3B, the encoded query vector qfs may be input from the encoded vector data store 40' or as external data inputiInput to the security similarity searcher 42' and may be a candidate NN-encoded vector cfs stored in a column of the data store 40iSearching for similar vectors, including the newly added candidate vector cfsi

The result of the similarity search is then foundiMay be encrypted by encryptor 44 to an encrypted vector result before being stored or transmitted from the APUi. The data manager 38' may then delete the newly added encoded candidate vector cfsiOr it may be considered as candidate vector cfsiAdded to the data store 40' for future searches.

It should be noted that in another embodiment of the preferred invention (not shown), the neugent hash encoder 56 may encode the data vector dataiSearch query vector qfs encoded as binary codeiThe binary-coded search query vector qfsiWith search vector qfs in system 30 of fig. 2A and 2BiSimilarly used as a query vector.

Applicants have recognized that, just as an unencoded vector can be securely transmitted and encoded in a processing system, an unencoded vector can similarly be securely transmitted and stored in the processing system without NN encoding.

Referring now to fig. 5A, a vector delivery system 58 is shown; and to fig. 5B, which illustrates the data flow in the system 58. Similar to the encrypted candidate vector system 54, the vector delivery system 58 includes a secure user computing device 31' and a secure processing system 37 "that are connected together by the network 46.

Secure data vector dataiIs the raw data stored in data store 32, unencrypted and undecrypted, may be encrypted by encryptor 35 to an encrypted vector data using the sender's public and private keys and the recipient's public key and adding any additional personal data (e.g., name and age)i. Then, the encrypted vector dataeiMay be transmitted to processing system 37 "' over network 46.

The secure processing system 37 "' includes a vector decryptor 39", a data vector store 60 and a vector encryptor 44.

Encrypted data vector dataeiMay be decrypted by the vector decryptor 39 'into data vector data (e.g., generated by the secure user computing device 31')i. The decryptor 39' may then read the dataiTo the data vector store 60.

The data vector data from the data store 60 is theniMay be encrypted by encryptor 44 to an encrypted data vector datae before being stored or transmitted from the APUi

It should be noted that the private and public keys are protected from sniffer attacks by the APU encryption by handling encryption and encryption on-chip. It will be appreciated that even the raw data encrypted using the on-chip APU encryption method is more secure than the raw data encrypted using current system-based encryption processes.

While certain features of the invention have been illustrated and described herein, many modifications, substitutions, changes, and equivalents will now occur to those of ordinary skill in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the invention.

19页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:数据访问控制、数据缓存控制方法及其装置、设备与介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!