Detecting duplication using exact and fuzzy matching of cryptographic matching indices

文档序号:864128 发布日期:2021-03-16 浏览:6次 中文

阅读说明:本技术 利用对加密匹配索引进行的精确和模糊匹配来检测重复 (Detecting duplication using exact and fuzzy matching of cryptographic matching indices ) 是由 A·赫尚斯 S·谢尔 C·克尔 P·V·瓦伊什纳芙 A·本-古尔 V·W·刘 D·麦加里 于 2019-05-30 设计创作,主要内容包括:本文公开了系统、方法和计算机程序产品实施方案,其用于在云计算平台中使用加密密钥利用对加密匹配索引的精确和模糊匹配而检测重复。实施方案通过在接收到新记录时确定匹配规则索引值而进行操作。实施方案使用客户的加密密钥和确定性加密方法对匹配索引规则值进行加密,并且存储加密的匹配规则索引值。稍后可以通过如下来执行重复检测:使用相同的确定性加密方法确定候选条目的密文,并且将该密文与存储的加密匹配索引进行比较。(Disclosed herein are system, method, and computer program product embodiments for detecting duplicates with exact and fuzzy matching of cryptographic matching indices using cryptographic keys in a cloud computing platform. Embodiments operate by determining a matching rule index value when a new record is received. Embodiments encrypt the matching index rule value using a client's encryption key and a deterministic encryption method, and store the encrypted matching rule index value. Duplicate detection may be performed later by: the ciphertext of the candidate entry is determined using the same deterministic encryption method and compared to the stored encrypted matching index.)

1. A method, comprising:

receiving, by the cloud computing platform, a new record comprising one or more fields;

selecting, by the cloud computing platform, a matching rule applicable to the new record, the matching rule including a unique identifier, one or more match types, and one or more applicable fields;

computing, by the cloud computing platform, a matching index value, wherein the matching index value is a combination of one or more applicable fields received in one or more fields in the new record;

deriving, by the cloud computing platform, an encryption matching index value, wherein a unique identifier of the matching rule is used as an initialization vector in an encryption scheme and the matching index value is used as plaintext in the encryption scheme; and

storing, by the cloud computing platform, the encrypted matching index value in an encrypted matching index column.

2. The method of claim 1, further comprising:

receiving, by the cloud computing platform, matching rule parameters comprising the one or more match types and the one or more applicable fields;

creating, by the cloud computing platform, the unique identifier based on the one or more applicable fields; and

storing, by the cloud computing platform, a customized matching rule that includes the unique identifier, the one or more match types, and the one or more applicable fields.

3. The method of claim 1, further comprising:

comparing, by the cloud computing platform, an encrypted matching index value to existing values in the encrypted matching index column to determine whether the encrypted matching index value is duplicate with one or more existing values; and

displaying, by the cloud computing platform, an error message if the cryptographic match index value is duplicate with an existing value of the one or more existing values.

4. The method of claim 1, further comprising:

storing, by the cloud computing platform, the matching index value in unencrypted form if encryption is not enabled for any of the one or more fields.

5. The method of claim 1, further comprising:

scanning, by the cloud computing platform, an encryption matching index column to determine one or more duplicates; and

displaying, by the cloud computing platform, the one or more repetitions to a user in a network interface.

6. The method of claim 1, wherein the one or more match types may be exact or fuzzy.

7. The method of claim 1, wherein the encryption scheme is a deterministic scheme.

8. The method of claim 1, wherein the cloud computing platform is a customer relationship management platform.

9. A system, comprising:

a memory; and

at least one processor coupled to the memory and configured to:

receiving, in the cloud computing platform, a new record comprising one or more fields;

selecting a matching rule applicable to the new record, the matching rule comprising a unique identifier, one or more match types, and one or more applicable fields;

calculating a match index value, wherein the match index value is a combination of the one or more applicable fields received in the one or more fields in the new record, and

the value of the encryption match index is derived,

wherein the unique identifier of the matching rule is used as an initialization vector in an encryption scheme, and

wherein the matching index value is used as plaintext in the encryption scheme; and

storing the encrypted matching index value in an encrypted matching index column.

10. The system of claim 9, the at least one processor further configured to:

receiving matching rule parameters, the matching rule parameters including the one or more match types and the one or more applicable fields;

creating the unique identifier based on the one or more applicable fields; and

storing a customized matching rule comprising the unique identifier, the one or more match types, and the one or more applicable fields.

11. The system of claim 9, the at least one processor further configured to:

comparing the encrypted match index value to existing values in the encrypted match index column to determine whether the encrypted match index value is duplicate with one or more existing values; and

displaying an error message if the encrypted matching index value is duplicate with an existing value of the one or more existing values.

12. The system of claim 9, the at least one processor further configured to:

scanning the column of cryptographic matching indices to determine one or more repetitions; and

displaying the one or more repetitions to a user in a network interface.

13. The system of claim 9, the at least one processor further configured to:

storing the matching index value in unencrypted form if encryption is not enabled for any of the one or more fields.

14. The system of claim 9, wherein the match type is exact or fuzzy.

15. The system of claim 9, wherein the matching rule further comprises an indication of whether a blank field should be considered a match.

16. The system of claim 9, wherein if the match type is fuzzy, the at least one processor may determine a match using one of Jaro-Winkler, Kullback-Liebler distance, name variation, keyboard distance, Metaphone 3, or syllable alignment.

17. The system of claim 9, wherein the encryption scheme is a deterministic scheme.

18. The system of claim 9, wherein the one or more fields may be standard fields or custom fields in the cloud computing platform.

19. The system of claim 9, wherein the cloud computing platform is a customer relationship management platform.

20. A non-transitory computer-readable device having instructions stored thereon, which, when executed by at least one computing device, cause the at least one computing device to perform operations comprising:

receiving, in the cloud computing platform, a new record comprising one or more fields;

selecting a matching rule applicable to the new record, the matching rule comprising a unique identifier, one or more match types, and one or more applicable fields;

calculating a match index value, wherein the match index value is a combination of one or more applicable fields received in one or more fields in the new record, and

deriving an encryption matching index value;

wherein the unique identifier of the matching rule is used as an initialization vector in an encryption scheme, and

wherein the matching index value is used as plaintext in the encryption scheme; and

storing the encrypted matching index value in an encrypted matching index column.

Background

In general, an organization or individual may utilize a cloud computing platform to manage relationships with customers. Such a cloud computing platform may be referred to as a Customer Relationship Management (CRM) solution. CRM solutions may include various features such as contact management, sales management, and productivity tools to better track and analyze interactions with customers and potential customers. CRM solutions can accumulate large amounts of data to support these features.

Keeping this data clean, up-to-date and without duplicate optimization and improving the performance and analytical utility of CRM solutions. However, cleaning data can present challenges. For example, CRM solutions may encrypt a particular static (at rest) data field or static entity with a customer's key in the event that the customer needs to encrypt data using a key (encryption key) they control for data security purposes. Such tenant-level encryption may further complicate duplicate detection and elimination in CRM solutions.

Drawings

The accompanying drawings are incorporated herein and form a part of the specification.

Fig. 1 is a block diagram of a cloud computing system, according to some embodiments.

Fig. 2 reflects repeated screenshots detected in a cloud computing platform, according to some embodiments.

Fig. 3 reflects a screenshot of an encryption configuration screen of standard fields in a cloud computing platform, according to some embodiments.

Fig. 4 reflects a screenshot of an encryption configuration screen for a custom field (custom field) in a cloud computing platform according to some embodiments.

Fig. 5 reflects a screenshot of a matching rule in a cloud computing platform, according to some embodiments.

Fig. 6 is a flow diagram illustrating a duplicate detection method when adding new records in a cloud computing platform, according to some embodiments.

Fig. 7 is a flow diagram illustrating a method of creating and encrypting a matching index to be used for duplicate detection, according to some embodiments.

FIG. 8 is an example computer system useful for implementing various embodiments.

In the drawings, like reference characters generally refer to the same or similar elements. In addition, generally, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears.

Detailed Description

The following detailed description refers to the accompanying drawings to illustrate exemplary embodiments consistent with the present disclosure. References in the detailed description to "one exemplary embodiment," "an exemplary embodiment," etc., indicate that the exemplary embodiment described may include a particular feature, structure, or characteristic, but every exemplary embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same exemplary embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an exemplary embodiment, it will be understood by those skilled in the relevant art that such feature, structure, or characteristic may be affected in connection with other exemplary embodiments whether or not explicitly described.

The exemplary embodiments described herein provide illustrative examples and are not intended to be limiting. Other exemplary embodiments are possible, and modifications can be made to the exemplary embodiments within the spirit and scope of the present disclosure. Therefore, the detailed description does not limit the disclosure. Rather, the scope of the disclosure is defined by the appended claims and equivalents thereof.

Embodiments may be implemented in hardware (e.g., circuitry), firmware, software, or any combination thereof. Embodiments may also be implemented as instructions stored on a machine-readable medium and read and executed by one or more processors. A machine-readable medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computing device). For example, in some embodiments, a machine-readable medium includes Read Only Memory (ROM); random Access Memory (RAM); a magnetic disk storage medium; an optical storage medium; a flash memory device; electrical, optical, acoustical or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.), and others. Further, firmware, software, routines, and/or instructions may be described herein as performing certain actions. However, such descriptions are merely for convenience and such actions result from computing devices, processors, controllers, or other devices executing the firmware, software, routines, and/or instructions. Further, any implementation variations may be performed by a general purpose computer, as described below.

Any reference to the term "module" should be understood to include at least one of, or any combination of, software, firmware, and hardware (such as one or more circuits, microchips, or devices, or any combination thereof). Further, one skilled in the relevant art will appreciate that each module may include one or more components within an actual device, and that each component forming a portion of the described module may operate cooperatively or independently of any other component forming a portion of the module. Rather, multiple modules described herein may represent a single component within an actual device. Further, the components within a module may be a single device or distributed among multiple devices in a wired or wireless manner.

The following detailed description of exemplary embodiments will reveal the general nature of the disclosure sufficiently that others can, by applying knowledge of persons skilled in the relevant art, readily modify and/or customize such exemplary embodiments for various applications without undue experimentation and without departing from the spirit and scope of the present disclosure. Accordingly, such modifications are intended to fall within the meaning and range of equivalents of the exemplary embodiments based on the teachings and guidance presented herein. The phraseology or terminology herein is for the purpose of description and not of limitation, such that the terminology or phraseology of the present specification is to be interpreted by a person skilled in the relevant art in light of the teachings herein.

System, device, apparatus, method, and/or computer program product embodiments, and/or combinations and subcombinations thereof, are provided for enabling duplicate detection using an encrypted match index (encrypted match index) in a cloud computing platform.

Organizations may utilize cloud computing platforms as CRM solutions to manage relationships with customers. The cloud computing platform may allow organizations to track and analyze interactions with customers, increase sales, manage contacts, and better plan the future. The cloud computing platform may organize customer service flow and supply chain management, and may monitor social media flow to determine potential customers. By tracking interactions with customers via such a cloud computing platform, an organization may ultimately improve profitability, eliminate process inefficiencies, and/or otherwise improve organizational ability.

The cloud computing platform may store a wide variety and quantity of data fields related to organizations, sales, customers, suppliers, competitors, leads (leads), and the like. By way of example only, the cloud computing platform may store fields related to contact information, customer preferences, social media data, customer purchase records, service records, customer interactions, marketing campaigns, sales goals, organizational goals, sales data, profitability analysis, leads/opportunities, and the like. The fields may be standard fields such as contacts, accounts, threads and opportunities, or custom fields designed and used by an organization to suit the particular purpose of its own organizational requirements.

Due to the potentially sensitive nature of this data, the cloud computing platform may support data encryption. Encryption uses a key and an initialization vector, while translating originally readable alphanumeric data (i.e., plaintext) of a field into an encrypted form (i.e., ciphertext) that is unreadable by an entity that does not know the key and initialization vector. The cloud computing platform then stores the encrypted form of the ciphertext. When retrieving the stored data, the cloud computing platform may decrypt the ciphertext using the key and the initialization vector.

Both the standard and custom fields may be encrypted. Organizations may choose between encryption methods (e.g., deterministic and/or probabilistic) to encrypt data fields. A deterministic encryption scheme always produces the same ciphertext for a given plaintext and key. Probabilistic encryption schemes exploit randomness in the encryption process, giving the same plaintext and key to produce different ciphertexts.

Organizations also have an interest in maintaining clean data in cloud computing platforms. Clean data improves availability, prevents errors, maintains system integrity, and enhances analysis capabilities. One aspect of maintaining clean data is to eliminate duplicates in a given data set. Duplicate detection may be globally managed by running duplicate elimination jobs. Duplicate detection may also occur on a table-by-table or case-by-case basis. Duplicate detection may occur automatically when a new record is added to the cloud computing platform.

The cloud computing platform may detect illegal records using matching rules. The matching rule may examine a particular field or fields in the cloud computing platform. The matching rules may be standard or customized (i.e., user-defined). The standard matching rules may examine predetermined fields of a given data entity to determine if there is duplication. For example, a standardized matching rule for a contact in a cloud computing platform may check a first name (FirstName), a last name (LastName), and an Address (Address). The custom matching rules check the user configuration fields in a custom manner. For example, a customized, user-defined matching rule for a contact may be configured to also consider the position (Title) of the contact. In this customized rule, duplicates are determined only if the first name, last name, address, and job position match. The user can write boolean logic to specify the matching rules or specify the matching rules using another programming approach.

The duplicate detection may employ an exact-match scheme or a fuzzy-match scheme. In an exact match scheme, only exact matches in the fields will return a positive result, i.e., a match. For example, if the duplicate detection checks the full name (FullName) field, "John Smith (John Smith)" and "John Smith" will match, but "John Smith" and "John Smith (Jon Smith)" will not match. Fuzzy matching provides a method that allows non-exact matches to be positively identified as duplicates. In the example above, using a fuzzy matching scheme, "John Smith" may be positively identified as a repeat of "John Smith". An example method of fuzzy matching includes: Jaro-Winkler, Kullback-Liebler distance, name variant (name variant), keyboard distance, Metaphone 3, and syllable alignment (syllable alignment).

The cloud computing platform may also provide duplicate detection for fields encrypted in various ways, including fields encrypted via the encryption scheme described above. In support of duplicate detection for these fields, the cloud computing platform may utilize auxiliary storage entries (e.g., matching indexes) to facilitate duplicate detection. To avoid storing the plaintext of other encrypted fields, the auxiliary field may also need to be encrypted. In some embodiments, enabling encryption of fields in a cloud computing platform may interfere with a duplicate detection system that checks those fields. In one non-limiting example, the cloud computing platform may store the encrypted field in a manner related only to the raw seed text (seed text) by using the key and the initialization vector. In such an example, conventional duplicate detection may not work for encrypted fields, as a simple comparison between a matching rule consisting of an unencrypted key and an encrypted field will not result in a positive identification of duplicates. In another embodiment, the use of probabilistic encryption may result in duplicate detection being disabled due to varying and unpredictable results of the probabilistic encryption scheme. Therefore, there is a need to allow a cloud computing platform to detect duplication in encrypted fields using an encrypted matching index.

Fig. 1 is a block diagram of a system 100 according to some embodiments. The cloud computing system 100 may include a cloud computing platform 102, a user system 104, a network 106, a host application 108, a record store 110, an encryption engine 112, and a matching engine 114. The system 100 may connect the cloud computing platform 102 to the user system 104 via the network 106. As will be appreciated by one of ordinary skill in the art, the cloud computing platform 102 may be connected to a plurality of user systems 104.

The cloud computing platform 102 may be a server computer, desktop computer, laptop computer, tablet computer, mobile device, wearable electronic device, or other electronic device. The cloud computing platform 102 may also be a software platform for cloud computing. For example, the cloud computing platform 102 may be a software as a service (SaaS) cloud platform on which users subscribe to applications accessible via the internet. The cloud computing platform 102 may provide CRM-related services. Such services may allow organizations to track and analyze interactions with customers. The cloud computing platform 102 may provide any other functionality.

The user system 104 may be a person interacting with the cloud computing platform 102. The user system 104 may be a cellular phone, a smart phone, a tablet computer, a laptop computer, a desktop computer, a web browser, or any other suitable computing device. The user system 104 may be configured to connect to and communicate with the cloud computing platform 102. The user of the user system 104 may be a business owner, employee, agent, or other suitable individual that interacts with information about a business, company, non-profit, government agency, or any other suitable organization. Rather, a user of the user system 104 may use the cloud computing platform 102 for personal pursuits or reasons unrelated to any business or organizational goal.

Network 106 may be any network or combination of networks including the internet, a Local Area Network (LAN), a Wide Area Network (WAN), a wireless network, a cellular network, or various other types of networks as will be appreciated by one of ordinary skill in the art.

The host application 108 may be included in the cloud computing platform 102 and may be any suitable type of application. For example, the host application 108 may be a CRM application currently running on the cloud computing platform 102. In some embodiments, the host application 108 may be a web application designed to run within a web browser. In some other embodiments, the host application 108 may be a software application designed to run on a server computer, desktop computer, or other type of electronic device. The host application 108 may be designed and deployed for running on a mobile platform. The host application 108 may run in any suitable operating system environment. As one of ordinary skill in the art will appreciate, the host application 108 may be another type of software application.

The record store 110 may be a database or other type of data store. A data object may be a data item or a data collection. The record store 110 may store fields related to organizations, sales, customers, suppliers, competitors, threads, and the like. The fields stored by the record store 110 may be standard fields or custom fields. The record store 110 may also store matching indices to be utilized in detecting duplicates. The record store 110 may support encryption of data fields contained in the record store 110.

The encryption engine 112 may be used by the cloud computing platform 102 to encrypt the fields in the record storage 110 with deterministic, probabilistic, or other suitable encryption schemes. The encryption engine 112 may be used by the cloud computing platform 102 or the host application 108. The encryption engine 112 may encrypt the fields in the record store 110. The encryption engine 112 may encrypt the fields in the record store 110 with an appropriate seed, key, or initialization vector. The encryption engine 112 may also encrypt matching indices used to detect duplicates in the cloud computing platform 102. When encrypting a record in the record store 110, the encryption engine 112 may retrieve the unique identifier stored in the record store 110 for the matching rule and use the unique identifier as a key or initialization vector. The encryption engine 112 may then decrypt any stored encrypted values for use. The encryption engine 112 may utilize or comply with an appropriate encryption standard or specification, such as 2TDEA, 3TDEA, AES-128, AES-192, AES-256, and the like.

The matching engine 114 may determine duplicates stored in the record store 110. The matching engine 114 may also check for new data received by the host application 108 before inserting the data into the record store 110 to ensure that duplicates are not introduced by the insertion. The matching engine 114 may be used by the cloud computing platform 102 or the host application 108. The matching engine 114 can determine whether there is currently duplication for the unencrypted fields in the record store 110 by comparing the unencrypted fields to the matching index created for each record in the data entity. In one embodiment, the matching engine 114 may perform a two-stage matching, where the first stage performs exact matching and the second stage performs an appropriate fuzzy matching algorithm to determine the repetition. The matching engine 114 may further determine whether there are duplicates for the encrypted fields in the record store 110, as described in further detail below with reference to fig. 3 and 4.

Fig. 2 reflects duplicate screenshots detected in the cloud computing platform 102. This is merely an exemplary embodiment. Fig. 2 depicts an exemplary interface including an account 202, a new contact button 204, a repeat notification 206, and account details 208. Fig. 2 shows an account 202, but this may be another standard or custom field within the cloud computing platform 102. The new contact button 204 is an exemplary input field that can receive input from a user, allowing the user to enter updated information for the field, such as contacts, accounts, etc. The duplicate notification 206 may be displayed when the system detects a duplicate by an appropriate method, as described in further detail below in the discussion of fig. 6 and 7. In this screenshot, for example only, matching rules may have been created for account name, phone, and owner. The duplicate notification 206 is shown here perhaps because another record stored in the record storage 110 may be the account name "Global Media", the telephone number "(905) 555-. The account details 208 may display various fields related to the standard or custom data fields that are displayed.

Fig. 3 reflects a screenshot of an encryption configuration screen for standard fields in the cloud computing platform 102. This is merely an exemplary embodiment. Fig. 3 depicts an exemplary interface including an account box 302, check box(s) 304, field tab(s) 306, encryption settings 308, and type selector 310. Fig. 3 shows an account box 302, but this may be another standard field within the cloud computing platform 102, e.g., contact, sales, etc. Check box(s) 304 provide a mechanism to enable a user to encrypt fields associated with the exemplary account. Implementing encryption on the standard fields may result in encrypting the data stored in the record memory 112, as described below with reference to fig. 6 and 7. The field tag(s) 306 are tags of the fields associated with this example. The encryption settings 308 provide a framework for adjusting the type selector(s) 310. In this implementation, the type selector(s) 310 indicate probabilistic or deterministic encryption. Those skilled in the art will appreciate that the fields in the account box 302 may vary based on the type of standard data.

Fig. 4 reflects a screenshot of an encryption configuration screen for custom fields in the cloud computing platform 102. This is merely an exemplary embodiment. Fig. 4 depicts an exemplary interface including a field tag 402, a field name 404, a description 406, help text 408, an encryption enabler 410, and a type selector 412. The field tags 402, field names 404, descriptions 406, and help text 408 may specify details about custom fields in the cloud computing platform 102. With encryption enabler 410, a user may enable encryption for custom fields in cloud computing platform 102. The type selector 412 may be used to switch the type of encryption, e.g., deterministic and probabilistic. Enabling encryption of the custom field may result in encrypting data stored in the record memory 112, as described below with reference to fig. 6 and 7.

Fig. 5 reflects a screenshot of a matching rule creation screen in the cloud computing platform 102. This is merely an exemplary embodiment. FIG. 5 depicts an exemplary interface including criteria selector 502, criteria rule selector 504, custom selector 506, custom rule selector 508, criteria string 510, add button 512, and remove button 514. As discussed below with reference to fig. 7, the matching rules may be standard or user defined. The criteria selector 502 and the criteria rules selector 504 may depict a criteria matching rules configuration. Custom selector 506, custom rule selector 508, and criteria string 510 may depict custom matching rules. An add rule button 512 and a remove button 512 may allow a user to add or remove matching rules. The use of matching rules in determining duplicates in a cloud computing platform is described in more detail below with reference to FIG. 6.

Fig. 6 is a flow diagram illustrating a duplicate detection method 600 when adding new records in a cloud computing platform, according to some embodiments. Method 600 may be performed by processing logic that may comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions executed on a processing device), or a combination thereof. It should be understood that not all steps are required to implement the disclosure provided herein. Further, one of ordinary skill in the art will appreciate that some of the steps may be performed simultaneously or in a different order than shown in FIG. 6. The method 600 will be described with reference to fig. 1. However, the method 600 is not limited to this example embodiment.

At 602, the host application 108 may receive a new record from the user system 104. The new record may represent a new criteria field, such as a contact, account, thread, or opportunity. The new record may also represent new custom fields previously configured within the host application 108 by a user of the user system 104. The new record may be received using appropriate HTML form data, XML, or any other suitable information transfer format. In an illustrative example, the host application 108 may receive a new contact that includes a first name ("john"), a last name ("smith"), a mailing address ("Maple Avenue 10 (10Maple Avenue)"), a job position ("developer"), and a contact number ("555-. The present disclosure will utilize this example in the discussion of subsequent steps below. In one embodiment, such contacts may also be received through an automated process, such as an import mechanism or other automated propagation of data.

At 604, the host application 108 retrieves the stored matching rules associated with the new record received at 602. In the above example, the host application 108 may retrieve any matching rules from the record store 110 that are configured to detect duplicates on contact fields. In an alternative example, the host application 108 may receive matching rules for the custom data entity. Those skilled in the art will appreciate that more than one matching rule may be applicable to a given data entity depending on the configuration stored in the record store 110. Thus, the host application 108 may obtain more than one matching rule, and in this embodiment, subsequent steps may be repeated. To continue with the above contact example, the host application 108 may obtain matching rules for first name, last name, address, and title. The host application may obtain the fields to which the matching rules apply, the type of matching performed on those fields (fuzzy or exact), and other appropriate configuration information. The matching rule may also include a unique identifier stored in the record store 110 to identify the particular matching rule.

At 606, the host application 108 computes a matching index for the new record. The nature of the matching index may vary based on the nature of the data being received. If more than one matching rule is appropriate, the host application 108 may derive a different matching index corresponding to each matching rule received in 604. The matching index may be a combination of received data fields. For the above example, the matching index for the first name, last name, address, and title may be determined to be "jsmithmapledveloper". However, this example is in no way limiting, and a variety of variations may be employed to determine the matching index; these methods may vary based on the nature of the data entity in question.

At 608, the host application 108 determines whether encryption is enabled on any of the fields containing the matching index. In the above example, the host application 108 may determine from the record store 110 whether any of the first name, last name, mailing address, job title, and contact number are encrypted. In an embodiment, the host application 108 may behave differently if the encryption scheme is probabilistic versus deterministic. In one embodiment, if the encryption scheme is probabilistic, the repeated determination may not work. If encryption is enabled for any of the fields, method 600 moves to 612. If encryption is not enabled for any of the fields, method 600 moves to 610.

At 610, the host application 108 compares the matching index determined at 606 with the row (row) in the associated data entity in the record store 110. Because encryption is not enabled on the fields containing the matching indices, the record store 110 may store the matching indices in unencrypted form. Thus, for the above example, the host application 108 may perform a table scan to determine whether "jsmithmapleddeveloper" is present in the associated data entity in the record store 110. The host application 108 may perform any other suitable search mechanism to determine whether the matching index is duplicate.

At 612, the host application 108 encrypts, via the encryption engine 112, the matching index determined in 606. The host application 108 may use the unique identifier of the matching rule as an initialization vector for the key of the encryption scheme. Thus, the encryption engine 112 can derive the appropriate ciphertext from the "jsmithmapleddeveloper" plaintext. The ciphertext may be comprised of alphanumeric characters and may be of a configurable bit length.

At 614, the host application 108 compares the encrypted matching index determined at 612 to the column of encrypted matching indices (columns) in the associated data entities in the record store 110. For the above example, the host application 108 may perform a table scan to determine whether encrypted ciphertext is present in the associated data entity in the record store 110. The host application 108 may perform any other suitable search mechanism to determine whether the encryption match index is duplicate.

At 616, the host application 108 determines whether a duplicate was found in step 610 or 614. If a duplicate record is found, method 600 proceeds to 618. If no duplicate records are found, method 600 proceeds to 620.

At 618, the host application 108 returns an error message indicating that duplicate records were found. In one embodiment, the user system 104 may be required to confirm the information or update the information, if appropriate.

At 620, the host application 108 adds the new record to the record store 110. The record may include a column that holds the matching index. If encryption is not enabled for the relevant field, the matching index may be stored in this column. If encryption is enabled, the encryption match index may be stored in this column.

Fig. 7 is a flow diagram illustrating a method 700 of creating and encrypting a matching rule to be used for duplicate detection, in accordance with some embodiments. Method 700 may be performed by processing logic that may comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions executed on a processing device), or a combination thereof. It should be understood that not all steps are required to implement the disclosure provided herein. Further, one of ordinary skill in the art will appreciate that some of the steps may be performed simultaneously or in a different order than shown in FIG. 7. The method 700 will be described with reference to fig. 1. However, method 700 is not limited to this example embodiment.

At 702, the host application 108 receives matching rule parameters from a user of the user system 704. The parameters may specify the data entity (i.e., table) to which the matching rule applies, fields in the data entity to which the matching rule applies, boolean or other logic that specifies how the matching rule should behave, whether a blank or null value should be considered a match, and other suitable configuration information. The matching rule parameters may be received using appropriate HTML form data, XML, or any other suitable information transmission format. In one embodiment, the host application may receive matching rule parameters that update an existing rule.

In 704, the host application 108 may create a matching rule in the record store 110. The host application may allow more than one matching rule to check the same field, however, appropriate redundancy checks may be performed. The appropriate rows and columns may be added to a particular table to track matching rules. When an existing matching rule is modified, the host application 108 may perform an update to the base table instead of an insert.

At 706, the host application 108 may create a matching index column based on the currently existing entries in the specified data entities. For example, if the matching rule checks first name, last name, and position, the host application 108 may create an appropriate matching index (e.g., "jsmithhadeloper") for each record in the data entity, i.e., each row in the table, based on the data contained in the table. The main application 108 may perform this operation on a batch, iterative, recursive, or line-by-line basis in any other suitable programmatic manner. The host application 108 may temporarily store the matching index for each line in memory, text, a temporary database table, or any other suitable storage medium.

At 708, the host application 108 determines whether encryption is enabled on any of the fields that match the rule check. The host application 108 may retrieve data from the record store 110 regarding encryption enabled fields and data entities. The host application 108 may also obtain information about the type of encryption that is applied to the data entity. In an embodiment, the host application 108 may behave differently based on the type of encryption. If encryption is enabled, method 700 proceeds to 710. If encryption is not enabled, method 700 proceeds to 714.

At 710, the host application 108 may store the matching index column in unencrypted form in the record store 110 without enabling or requiring encryption to store the matching index column.

At 712, the host application 108 may encrypt the entirety of the matching index column created at 706. The host application 108 may utilize the unique identifier of the matching rule as an initialization vector for the encryption scheme. The host application 108 may retrieve the matching index created in 706 from memory, text, a temporary database table, or the like. The host application 108 may encrypt the entries in the matching index column one by one, in batches, or in their entirety.

At 714, the host application 108 stores the encrypted form of the matching index column in the record store 110. The host application 108 may store the encrypted matching index in a column by batch insertion, database update, text processing, or other suitable storage method.

For example, various embodiments may be implemented using one or more computer systems (e.g., computer system 800 shown in FIG. 8). Computer system 800 may be used, for example, to implement method 600 of FIG. 6 and method 700 of FIG. 7. For example, the computer system 800 may use pre-set global filters or updates to initialize the analysis system. Computer system 800 may be any computer capable of performing the functions described herein.

Computer system 800 may be any known computer capable of performing the functions described herein.

Computer system 800 includes one or more processors (also called central processing units, or CPUs), such as processor 804. The processor 804 is connected to a communication infrastructure or bus 806.

The one or more processors 804 may each be a Graphics Processing Unit (GPU). In one embodiment, the GPU is a processor designed as a dedicated electronic circuit that processes mathematically intensive applications. GPUs may have a parallel structure that is efficient for parallel processing of large blocks of data, such as mathematically intensive data common to computer graphics applications, images, video, and the like.

Computer system 800 also includes user input/output device(s) 803, such as a monitor, keyboard, pointing device, etc., that communicate with the communication infrastructure 806 via user input/output interface(s) 802.

Computer system 800 also includes a main memory or main memory 808, such as Random Access Memory (RAM). The main memory 808 may include one or more levels of cache. The main memory 808 has stored therein control logic (i.e., computer software) and/or data.

The computer system 800 may also include one or more secondary storage devices or memories 810. Secondary memory 810 may include, for example, a hard disk drive 812 and/or a removable storage device or drive 814. Removable storage drive 814 may be a floppy disk drive, a magnetic tape drive, an optical disk drive, an optical storage device, a tape backup device, and/or any other storage device/drive.

The removable storage drive 814 may interact with a removable storage unit 818. Removable storage unit 818 includes a computer usable or readable storage device having stored thereon computer software (control logic) and/or data. Removable storage unit 818 may be a floppy disk, magnetic tape, compact disk, DVD, optical storage disk, and/or any other computer data storage device. The removable storage drive 814 reads from and/or writes to a removable storage unit 818 in a well known manner.

According to an example embodiment, secondary memory 810 may include other devices, means, or other methods for allowing computer system 800 to access computer programs and/or other instructions and/or data. Such means, methods, or other means may include, for example, a removable storage unit 822 and an interface 820. Examples of a removable storage unit 822 and interface 820 may include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM, or PROM) and associated socket, a memory stick and USB port, a memory card and associated memory card socket, and/or any other removable storage unit and associated interface.

Computer system 800 may also include a communications or network interface 824. Communications interface 824 enables computer system 800 to communicate and interact with any combination of remote devices, remote networks, remote entities, and the like, individually and collectively referenced by reference numeral 828. For example, communication interface 824 may allow computer system 800 to communicate with remote device 828 via communication path 826, which may be wired and/or wireless and may include any combination of a LAN, a WAN, the Internet, or the like. Control logic and/or data can be transmitted to computer system 800 and from computer system 800 via communications path 826.

In one embodiment, a tangible, non-transitory device or article of manufacture comprising a tangible, non-transitory computer usable or readable medium having control logic (software) stored thereon is also referred to herein as a computer program product or program storage device. This includes, but is not limited to, computer system 800, main memory 808, secondary memory 810, and removable storage units 818 and 822, as well as tangible articles of manufacture embodying any combination of the foregoing. Such control logic, when executed by one or more data processing devices (e.g., computer system 800), causes such data processing devices to operate as described herein.

It will be apparent to one skilled in the relevant art(s) how to make and use embodiments of the present disclosure using data processing apparatus, computer systems, and/or computer architectures other than that shown in fig. 8, based on the teachings included in the present disclosure. In particular, embodiments may operate in software, hardware, and/or operating system implementations other than those described herein.

It should be understood that the detailed description section, and not any other section, is intended to be used to interpret the claims. The other sections may set forth one or more, but not all exemplary embodiments contemplated by the inventors, and are therefore not intended to limit the present disclosure or the appended claims in any way.

The breadth and scope of the present disclosure should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims appended hereto and their equivalents.

21页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:用于核实IP资源的有效性的方法以及相关联的访问控制服务器、验证服务器、客户端节点、中继节点和计算机程序

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类