Method for accessing data records of a master data management system

文档序号:1850869 发布日期:2021-11-16 浏览:15次 中文

阅读说明:本技术 用于访问主数据管理系统的数据记录的方法 (Method for accessing data records of a master data management system ) 是由 A·卢茨·艾克夏维尔·达科斯塔 G·S·普里帕蒂 M·卡迪比 N·辛格 A·赛斯 于 2020-03-19 设计创作,主要内容包括:本发明涉及一种方法,包括:用一个或多个搜索引擎来增强主数据管理系统,该一个或多个搜索引擎用于能够访问数据记录。可以在主数据管理系统处接收数据请求。可以标识多个属性中的在所接收的请求中被引用的一个或多个属性的属性集合。可以选择主数据管理系统的搜索引擎中的一个或多个搜索引擎的组合,该一个或多个搜索引擎的搜索性能满足当前选择规则。并且,可以使用搜索引擎的组合来处理请求。可以提供处理结果的至少一部分,并且可以基于对所提供的结果的用户操作来更新选择规则,更新后的选择规则成为当前选择规则。(The invention relates to a method comprising: the master data management system is enhanced with one or more search engines for enabling access to the data records. A data request may be received at a primary data management system. A set of attributes of one or more of the plurality of attributes that are referenced in the received request may be identified. A combination of one or more of the search engines of the primary data management system whose search performance satisfies the current selection rules may be selected. Also, a combination of search engines may be used to process the request. At least a portion of the processing results may be provided, and the selection rule may be updated based on a user operation on the provided results, the updated selection rule becoming the current selection rule.)

1. A method for accessing a data record of a primary data management system, the data record including a plurality of attributes, the method comprising:

augmenting the master data management system with one or more search engines for enabling access to the data records;

receiving a request for data at a primary data management system;

identifying a set of attributes of the one or more attributes of the plurality of attributes that are referenced in the received request;

selecting a combination of one or more of the search engines of the primary data management system whose performance for searching for values of at least a portion of the set of attributes satisfies a current selection rule;

processing the request using a combination of search engines;

providing at least a portion of the results of the processing.

2. The method of claim 1, further comprising updating the selection rule based on a user action on the provided result, the updated selection rule becoming the current selection rule, and repeating the identifying, selecting, processing, and providing steps using the current selection rule upon receipt of another data request.

3. The method according to claim 1 or 2, wherein the results comprise data records of the master data management system associated with respective match scores obtained by a scoring engine of the search engines, the method further comprising weighting the match scores according to performance of a component involved in providing the results, the component comprising method steps, elements for providing the results, and at least a portion of the results, wherein the provided results comprise non-duplicate data records having weighted match scores above a predefined score threshold.

4. The method of any preceding claim, the components involved in providing the results comprising the search engine, the identifying step and the results, the method further comprising:

assigning an engine weight to each of the search engines;

assigning an attribute weight to the set of attributes, wherein the attribute weight for an attribute indicates a confidence level at which the attribute is identified;

assigning a weight indicative of a completeness of the data record and a weight indicative of a freshness of the data record to each data record of the result;

for each data record of the result, combining a respective engine weight, attribute weight, completion weight, and freshness weight, and weighting the score of the data record by the combined weight.

5. The method of claim 3 or 4, further comprising:

providing user parameters quantifying user operations;

for each component in at least a portion of the components, determining a value of the user parameter and an associated value of a component parameter describing the component; and updating the weights assigned to the components using the determined associations.

6. The method of any preceding claim 3-5, further comprising providing a look-up table associating values of the user parameters with the values of the component parameters, and using the look-up table to update the weights assigned to the components.

7. The method according to any of the preceding claims 3-5, further comprising modeling changes in the values of the user parameters with the values of the component parameters using a predefined model, and using the model to determine updated weights for the components and using the updated weights to update the weights assigned to the components.

8. The method of any preceding claim 2-7, wherein a user operation of the user operations comprises an indication of a selection of a result, the indication comprising a mouse click of a displayed result of the provided results, wherein the user parameters comprise at least one of a number of clicks, a frequency of clicks, and a duration of access to a given one of the results.

9. The method of any of the preceding claims, wherein the results comprise data records of the master data management system associated with respective match scores obtained by a scoring engine of the search engine, wherein the results provided comprise non-duplicate data records having match scores above a predefined score threshold.

10. The method according to any of the preceding claims, wherein for each attribute of the set of attributes, the selection rule comprises:

for each of the search engines, determining a value of a performance parameter indicative of performance of the search engine for searching for the value of the attribute;

a search engine is selected having a performance parameter value above a predetermined performance threshold.

11. The method of claim 10, the performance parameter comprising at least one of: the number of results and the degree to which the results match the expectations.

12. The method of claim 10 or 11, the selection rule using a table that associates attributes to corresponding search engines, the updating of the selection rule comprising:

determining a value of a user parameter that quantifies user operation of the provided results to each search engine in the combination of search engines; and

identifying values of the user parameters that are less than a predefined threshold using the determined values associated with each search engine in the combination of search engines, and for each identified value of the user parameter, determining the attributes of the set of attributes and the search engine associated with the identified value, and updating the table using the determined attributes and search engines.

13. The method of any preceding claim, wherein the processing of the request is performed in parallel by a combination of the search engines.

14. The method of any preceding claim 1-12, wherein the combination of search engines is a ranked list of search engines, wherein processing of the request is performed successively after the ranked list until a minimum number of results is exceeded.

15. The method of any of the preceding claims, wherein identifying the set of attributes comprises inputting the received request to a predefined machine learning model; receiving a classification of the request from the machine learning model, the classification indicative of the set of attributes.

16. The method of any preceding claim, inputting the set of attributes to a predefined machine learning model, and receiving from the machine learning model one or more search engines that can be used to search the set of attributes.

17. The method of claim 16, further comprising: receiving a training set indicative of different sets of one or more training attributes, wherein each training attribute set is labeled to indicate a search engine suitable for performing a search of the training attribute set; training a predefined machine learning algorithm using the training set, thereby generating the machine learning model.

18. The method of any preceding claim, wherein the results provided comprise data records that are filtered according to the sender of the request.

19. A computer program product comprising a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code configured for accessing a data record of a master data management system, the data management system comprising a search engine for enabling access to the data record, the data record comprising a plurality of attributes, the computer readable program code further configured to:

receiving a request for data at the primary data management system;

identifying a set of attributes of the one or more attributes of the plurality of attributes that are referenced in the received request;

selecting a combination of one or more of the search engines of the primary data management system whose performance for searching for values of at least a portion of the set of attributes satisfies a current selection rule;

processing the request using a combination of the search engines;

at least a portion of the processing results are provided.

20. A computer system for enabling access to a data record, the data record comprising a plurality of attributes, the computer system comprising a plurality of search engines for enabling access to the data record; a user interface configured to receive a request for data; an entity identifier configured to identify a set of attributes of one or more of the plurality of attributes that are referenced in the received request; an engine selector configured to select a combination of one or more of the search engines whose performance for searching for values of at least a portion of the set of attributes satisfies a current selection rule; wherein the search engine is configured to process the request; a result provider configured to provide at least a portion of the result of the processing.

21. The computer system of claim 20, wherein the computer system is a master data management system.

22. The computer system according to claim 20 or 21, wherein the results comprise data records of the computer system associated with respective match scores obtained by a scoring engine of the search engine, the computer system further comprising a weight provider configured for weighting the match scores according to a performance of a component involved in providing the results, the component comprising method steps, elements for providing the results, and at least a portion of the results, wherein the provided results comprise non-duplicate data records having weighted match scores above a predefined score threshold.

Background

The present invention relates to the field of digital computer systems, and more particularly to a method for accessing data records of a master data management system.

Enterprise data matching involves matching and linking customer data received from different sources and creating a single version of the real data. Master Data Management (MDM) based solutions work with enterprise data and perform indexing, matching and linking of data. The master data management system may allow access to such data. However, there is a continuing need to improve access to data in a primary data management system.

Disclosure of Invention

Various embodiments provide a method, a computer system and a computer program product for accessing data records of a master data management system as described by the subject matter of the independent claims. Advantageous embodiments are described in the dependent claims. Embodiments of the present invention may be freely combined with each other if they are not mutually exclusive.

In one aspect, the invention relates to a method for accessing a data record of a primary data management system, the data record including a plurality of attributes. The method comprises the following steps:

augmenting the master data management system with one or more search engines for enabling access to the data records;

receiving a request for data at a primary data management system;

identifying a set of one or more attributes of the plurality of attributes that are referenced in the received request;

selecting a combination of one or more of the search engines of the primary data management system whose performance for searching for values of at least a portion of the set of attributes satisfies a current selection rule;

processing the request using a combination of search engines;

at least a portion of the results of the processing is provided.

In another aspect, the invention relates to a computer system for enabling access to a data record, the data record comprising a plurality of attributes, the computer system comprising a plurality of search engines for enabling access to the data record; a user interface configured to receive a data request; an entity identifier configured to identify a set of one or more attributes of the plurality of attributes that are referenced in the received request; an engine selector configured to select a combination of one or more of the search engines whose performance for searching for values of at least a portion of the set of attributes satisfies a current selection rule; wherein the search engine is configured to process the request; a result provider configured to provide at least a portion of a result of the processing.

In another aspect, the invention relates to a computer program product having computer readable program code embodied therewith, the computer readable program code configured to access a data record of a host data management system, the data management system comprising a search engine for enabling access to the data record, the data record comprising a plurality of attributes, the computer readable program code further configured to: receiving a request for data at a primary data management system; identifying a set of one or more attributes of the plurality of attributes that are referenced in the received request; selecting a combination of one or more of the search engines of the primary data management system whose performance for searching for values of at least a portion of the set of attributes satisfies a current selection rule; processing the request using a combination of search engines; at least a portion of the results of the processing is provided.

Drawings

Embodiments of the invention will now be explained in more detail, by way of example only, with reference to the accompanying drawings, in which:

figure 1 is a flow chart of a method for accessing data records of a master data management system,

figure 2 is a flow diagram of a method for providing search results for a set of search engines,

figure 3 is a flow diagram of a method for providing search results for multiple search engines,

fig. 4A depicts a table that includes search results from different engines, the search results being normalized and merged,

figure 4B depicts a table including examples of engine weights,

figure 4C depicts a table that includes an example of attribute weights that identify attribute types based on confidence of an entity identification,

figure 4D depicts a table that includes an example of a completion weight,

figure 4E depicts a table that includes an example of freshness weighting,

figure 4F depicts a table including result records and associated weights and scores,

FIG. 5 is a flow chart of a method for updating weights used to weight match scores for data records of results of multiple search engine processing search requests,

figure 6A depicts a table including a user's number of clicks as a function of data record completion,

figure 6B depicts a table that includes a user click score as a function of data record completion,

figure 6C is a graph of the distribution of click scores as a function of data record completeness,

figure 7 illustrates a block diagram representation of a computer system 700 according to an example of the present disclosure,

figure 8 depicts a flowchart describing a method of example operation of a master data management system,

FIG. 9 depicts a schematic diagram of an example of processing a request according to the present subject matter.

Detailed Description

The description of the various embodiments of the invention will be presented for purposes of illustration but is not intended to be exhaustive or limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles of the embodiments, the practical application, or improvements to the technology found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

The subject matter can enable efficient access to data stored in a master data management system. The present subject matter may improve the performance of a master data management system. The present subject matter can reduce the number of repeated or retried search requests because it can provide the best possible results using multiple search engines, and thus the user does not have to retry or reformulate the search query as is the case with other systems.

The master data management system may use a single type of search engine. With the present subject matter, different types of search engines can be used by the master data management system. The type of search engine may be defined by the technology it uses to perform searches such as full text searches or structured probabilistic searches. For example, the additional search engines added by the present method may be of a different type than the type of search engine originally included by the primary data management system. Thus, the present subject matter may provide an integrated search and matching engine that aims to take advantage of the best of all the different capabilities of multiple search and indexing engines based on the type of input data or the type of query being made. Different indexing or search engines do have different capabilities, so they work at best on different types of input or different requirements. The subject matter may enable a better way to search data by employing multiple different indexing and search engines, thereby enhancing user experience without impacting performance of machine-based interactions.

For example, the identifying, selecting, processing, and providing steps may be performed automatically upon receipt of a data request. In one example, the identifying, selecting, processing and providing steps may be automatically repeated upon receipt of a further data request, wherein in each repetition an updated selection rule resulting from an immediately previous execution of the method is used.

The results may include data records. Providing the data record may include displaying data indicative of the data record on a graphical user interface. For example, for each data record, a row may be displayed, where the row may be a hyperlink or link that enables the user to click on to access the detailed information of the data record.

A data record is a collection of related data items such as the name, date of birth (DOB), and category of a particular user. A record represents an entity, where an entity refers to a user, object, or concept about which information is stored in the record.

According to one embodiment, the method further comprises updating the selection rule based on a user operation on the provided results, the updated selection rule becoming the current selection rule, and upon receiving another data request, repeating the identifying, selecting, processing and providing steps using the current selection rule. In one example, the updating of the selection rule may be performed after a predefined time period, e.g. during which the method may have been performed a number of times, and the updating is performed based on a combination of user operations on the provided results during the time period. This may enable a self-improving search system based on user input and experience. The search engine is a search engine that is part of a predefined table of the data management system associated with at least a portion of the set of attributes, the search engine searching for values of at least a portion of the set of attributes whose performance satisfies the current selection rules. For example, the table includes a plurality of entries. Each entry i of the table comprises a search engine SEi and the associated one or more attributes Ti which are appropriately searched by the search engine. In one example, each association of Ti and SEI may be assigned an update score that can be changed or updated. The selected search engine is the search engine SEI of the table associated with one or more attributes in the attribute group, e.g., if the attribute group includes T1 and T2, the table may be searched to identify entries having T1 and T2, and the selected search engine is the search engine of those identified entries. The updating of the selection rule may comprise updating the table, e.g. if the number of clicks on a display result from the search engine SEx and associated with a given attribute Tx of the search is less than a threshold, the table may be updated accordingly, e.g. deleting the association between Tx and SEx, or if Tx and SEx are associated with an update score, changing the update score by e.g. lowering the update score. For example, if the same combination Tx and SEx was previously found at least once and does not perform well, e.g., the number of clicks of the associated result is less than a threshold number of times, and thus the associated update score is below a given threshold, then deletion may be performed. In one example, the table initially has many or all possibilities of combinations between attributes and search engines, and within a predefined period of time, non-executing entries may be removed.

According to one embodiment, the results include data records of the primary data management system associated with respective match scores obtained by a scoring engine of the search engine, wherein the provided results include non-duplicate data records having match scores above a predefined score threshold. The match score may indicate a level or degree of match between the data record and the requested data.

This embodiment may further improve the performance of the master data management system by only providing results that meet the selection criteria of the matching score. For example, irrelevant results may not be provided to the user. This may save processing resources, such as display resources and data transmission resources to be used for irrelevant results. For example, the weighting of the scores may be performed as described in the following embodiments.

According to one embodiment, the results comprise data records of the master data management system associated with respective match scores obtained by a scoring engine of the search engine, the method further comprising weighting the match scores according to a performance of a component involved in producing the results, the component comprising method steps, elements for producing the results, and at least a portion of the results, wherein the provided results comprise non-duplicate data records having weighted match scores above a predefined score threshold. The weighting may for example comprise: for each data record of the result, a weight is assigned to each of the components that provide or generate the data record, where the components may include the provided data record itself, the weights are combined and the match scores for the data records are weighted using the combined weights.

For example, the generation of search results for the received data request involves the execution of a search process (the present method may include a search process). The search process has a plurality of process steps, where each process step may be performed by a system element such as a search engine or a scoring engine. The search process may have components that are process steps and/or system elements and/or results provided thereby. Each component may have functionality that it performs to contribute to the obtaining of search results. Those components of the search process may each have an effect on the quality of the results obtained. For example, if a component of the search process is not functioning properly, this may affect the search results. For example, if a component is a process step that identifies an attribute in a received request, and the component may not be valid in identifying a particular type of attribute, it may happen that the process step does not correctly identify the type of attribute. Thus, when a request is received for data having attributes of this type referenced therein, the results obtained may be affected because they may include irrelevant, unwanted search results for the incorrectly identified attributes. The performance of the components of the search process may contribute differently to the results obtained by the search process. This embodiment may take into account at least a portion of these contributions by weighting the match scores accordingly. For example, each of at least a portion of the components of the search process of this embodiment may be assigned a weight indicating its performance in performing its respective function. The weights may be user defined, for example, the weights may be initially defined by a user (e.g., for a first execution of the method), and may be automatically updated later with a weight update method as described herein. These weights may be used to weight the match scores. This embodiment may further improve the performance of the data management system. For example, the user may not be provided with further irrelevant results. This may save processing resources, such as display resources and data transmission resources.

Examples of components considered in the weighting of the search process may be described in the following embodiments. This embodiment may be advantageous because it identifies and weights components whose performance may have a greater impact on search results.

According to one embodiment, a component includes a search engine, an identification step, and results. The method further comprises the following steps: assigning an engine weight to each of the search engines; assigning an attribute weight to the set of attributes, wherein an attribute weight of an attribute indicates a confidence level at which the attribute is identified; assigning a weight indicative of a completeness of the data record and a weight indicative of a freshness of the data record to each data record of the result; for each data record of the result, a corresponding engine weight, attribute weight, completion weight, and freshness weight are combined, and the score of the data record is weighted by the combined weight. The attribute weights may be generated at the attribute level and applied to the complete result set (and all attributes) returned for the received request. This may make the result set likely to be less useful if the automatically determined search entity type itself is incorrect.

The following embodiments provide a weight update method for updating weights used in accordance with the present subject matter. They enable an efficient and systematic processing of the weighting process.

According to one embodiment, the method further comprises: providing a user parameter quantifying a user's operation on the provided result; for each component in at least a portion of the components, determining a value of the user parameter and an associated value of a component parameter describing the component; and updating the weights assigned to the components using the determined associations. For example, the component parameters may include at least one of a degree of completion, a freshness of the data record, an ID of the search engine, and a confidence that the attribute may be identified.

For example, user operations or interactions may be monitored by an activity monitor of the master data management system. In one example, the user action may be a user clicking on a provided result. The associated values of the user parameters and the component parameters may be provided in the form of a distribution, which may be fitted or modeled to derive the weights. For example, the distribution of click counts relative to various characteristics of the rows representing the data records (e.g., characteristics may indicate, for example, from which search engine the data records came, what the confidence of entity type detection was, how complete the records were, how fresh the records were, etc.) may be provided and analyzed to find weights. For example, the embodiment may be performed for each new click, e.g., the distribution may be changed as each new click is fed back to the system, and thus help to reassign the weights. This embodiment may enable the weights used in previous iterations of the method to be updated. This embodiment may enable the data management system to maintain self-improvement based on its own experience with data searches. For example, all weights used in the above embodiments may be updated. In another example, only a portion of the weights used (e.g., the completion weights) may be updated. Updating the weights may include determining new weights and replacing the used weights with the corresponding new weights. According to this embodiment, the new weights may be determined by monitoring user activity related to the results provided to the user.

According to one embodiment, the method further comprises providing a look-up table associating values of user parameters with values of component parameters, and using the look-up table to update the weights assigned to the components.

According to one embodiment, the method further comprises using a predefined model to model changes in the user parameter values as a function of the values of the component parameters, and using the model to determine updated weights for the components and using the updated weights to update the weights assigned to the components. For example, the predefined model may be configured to receive component parameter values as inputs and output corresponding weights. This may enable accurate weighting techniques in accordance with the present subject matter.

According to one embodiment, the user operation of the user operations comprises a mouse click on a displayed result of the provided results, wherein the user parameters comprise at least one of a number of clicks, a frequency of clicks, and a duration of accessing a given result of the results. For example, the activity monitor may use click counts and/or may check the time spent on individual results (e.g., until a back/resume button is used after it is clicked) and/or it may check the back and forth operations on the result set, and the last selected record where the user spent time exceeded some threshold may be considered a "user-liked result".

According to one embodiment, for each attribute in the set of attributes, the selection rule comprises: for each of the search engines, determining a value indicative of a performance parameter of the search engine for searching for the value of the attribute; weighting the determined values with the respective current weights; a search engine is selected having a performance parameter value above a predetermined performance threshold.

For example, in a first or initial execution of the method of this embodiment, the current weight may be set to 1. In another example, if the set of attributes includes three attributes, ATT1, ATT2, and ATT3, the performance of each search engine, such as search engine 1(SE1), may be evaluated. This may result in three performance parameter values per search engine, Perf _ att1_ SE1, Perf _ att2_ SE1, and Perf _ att3_ SE 1. The current weights of search engine SE1 may be determined from Perf _ att1_ SE1, Perf _ att2_ SE1, and Perf _ att3_ SE1, resulting in weights W1_ SE1, W2_ SE1, and W2_ SE 1. These weights may be used to weight the performance parameter values Perf _ att1_ SE1, Perf _ att2_ SE1, and Perf _ att3_ SE 1. To decide whether to select search engine SE1, a combination of weighted Perf _ att1_ SE1, Perf _ att2_ SE1, and Perf _ att3_ SE1 may be determined, and SE1 may be selected if the combined value (e.g., average) is above a performance threshold. In another example, each weighted performance value, Perf _ att1_ SE1, Perf _ att2_ SE1, and Perf _ att3_ SE1, is compared to a performance threshold and SE1 may be selected only if each of them is above the performance threshold.

According to one embodiment, the performance parameter comprises at least one of: the number of results and the degree to which the results match the desired or requested content.

According to one embodiment, the selection rule uses a table that associates attributes to corresponding search engines, and the updating of the selection rule includes: determining a value of a user parameter quantifying user operation of the provided results for each engine in the combination of search engines; and identifying a value of the user parameter that is less than a predefined threshold using the determined value associated with each search engine in the combination of search engines, and for each identified value of the user parameter, determining an attribute in the set of attributes and the search engine associated with the identified value, and updating the table using the determined attribute and the search engine. In one example, the table initially has many or all possibilities for combinations between attributes and search engines. For example, after a predetermined period of time, the non-executing entry may be removed. For example, the user parameter may be the number of clicks on each of the provided results, i.e., there is a value of the user parameter for each displayed result. These values may be compared to a predetermined threshold (e.g., 10 clicks) and display results associated with values less than the threshold may be identified. Each of these identified results is obtained by a given search engine X as a result of searching for one or more attributes, such as attribute T1 of the attribute group. Thus, X and T1 may be used to update the table as described herein.

According to one embodiment, the processing of requests is performed in parallel by a combination of search engines. This may speed up the search process of the present subject matter.

According to one embodiment, the combination of search engines is an ordered list of search engines, wherein processing of requests is performed consecutively after the ordered list until a minimum number of results is exceeded. This may save processing resources. If the engine selection rules suggest only engine 1(SE1), but the actual search did not produce sufficient results, SE2 (next in the ordered list) may be used.

According to one embodiment, the results provided include data records that are filtered according to the sender of the request. For example, data control rules are applied after obtaining a matching list of given data inputs and providing role-based visibility and applying consent-related filters; thereby respecting privacy while providing better match quality and search flexibility.

According to one embodiment, identifying the set of attributes includes inputting the received request to a predefined machine learning model; receiving a classification of the request from the machine learning model, the classification indicative of a set of attributes.

According to one embodiment, the selection rule comprises: the set of attributes is input to a predefined machine learning model, and one or more search engines from the machine learning model that can be used to search the set of attributes are received.

According to one embodiment, the method further comprises: receiving a training set indicative of different sets of one or more attributes, wherein each attribute set is labeled to indicate a suitability for execution of a search engine on the attribute set; training a predefined machine learning algorithm using the training set, thereby generating the machine learning model.

FIG. 1 is a flow chart of a method for accessing data records of a master data management system. The data record includes a plurality of attributes.

For example, the master data management system may process records received from client systems and store the data records in a central repository. The client system may communicate with the master data management system, for example, via a network connection including, for example, a Wireless Local Area Network (WLAN) connection, a WAN (wide area network) connection, a LAN (local area network) connection, or a combination thereof.

The data records stored in the central repository may have a predefined data structure, such as a data table having a plurality of columns and rows. The predefined data structure may include a plurality of attributes (e.g., each attribute represents a column of a data table). In another example, the data records may be stored in a graphical database as entities having relationships. The predefined data structure may comprise a graph structure, wherein each record may be assigned to a node of the graph. Examples of attributes may be names, addresses, etc.

The master data management system may include a search engine (referred to as an initial search engine) that performs a search of data records stored in a central repository based on a received search query using a single technique such as probabilistic structure search. The initial search engine may be as well suited for a particular type of attribute as any other search engine, but not for other attributes. That is, the performance of the initial search engine may depend on the type of attribute values being searched. For example, the attribute "name" may be well searched by probabilistic search engines due to nicknames and phonetics, while an attribute address such as a city may work well with free text search engines because it is partial. To this end, in step 101, the master data management system may be augmented with one or more search engines that have access to the data records of the central repository. This may result in multiple search engines, including an initial search engine and an added search engine. For example, each search engine of the master data management system may be associated with a respective API through which search queries may be received. This may enable the aggregated search and matching engine to utilize the best of all the different capabilities of multiple search and index engines based on the type of input data or the type of query being made. Different indexing or search engines do have different capabilities, so they work at best on different types of input or different requirements.

The master data management system may receive a data request at step 103. For example, the request may be received in the form of a search query. For example, the search query may be used to retrieve attribute values, a collection of attribute values, or any combination thereof. The search query may be, for example, an SQL query. The received request may relate to one or more attributes of the data records of the central repository. This may be performed, for example, by explicitly referencing the attributes in the request and/or indirectly referencing the attributes. For example, the search query may be a structured search in which comparison or range predicates are used to limit the values of certain attributes. The structured search may provide explicit references to attributes. In another example, the search query may be an unstructured search, such as a keyword search that filters out records that do not contain some form of specified keywords. Unstructured searches may indirectly reference attributes. In one example, the received request may include a name, an entity type, and/or a numerical and temporal expression in unstructured format.

Upon receiving the request, in step 105, an entity identifier of the primary data management system may be used to identify a set of one or more attributes referenced in the received request. The identification of the property group can further include identifying an entity type for each property of at least a portion of the property group. For example, the received request may be analyzed, e.g., parsed, to search for attributes whose values are searched. For example, the entity identifier may identify the name and type of the entity, the numerical and temporal expressions in the user input entered as unstructured text, and map them with a certain probability to the attributes of the master data management system, which allows them to be used to perform structured searches.

The entity identifier may be, for example, a token identifier that identifies a string, a value, a pattern name, a location, and the like. For example, the identification of an email may use the following email structure ABC @ UVW. XYZ. The identification of the telephone number may be based on the fact that the telephone number is a 10 digit number. The identification of the social security account number (SSN) may be based on the fact that the SSN has the following structure AAA-BB-CCCC.

In one example, the entity identifier may use a Machine Learning (ML) model generated by an ML algorithm. The ML algorithm may be configured to read enterprise data, identify/learn portions of the data, and identify attributes. Using the ML model, the entity identifier can determine with a certain probability whether the input text can be a name or address or phone number or SSN, etc. The engine selector may also perform the selection using an ML model generated by an ML algorithm.

Using the identified set of attributes (e.g., and/or associated entity types), the engine selector of the master data management system may select a combination of one or more search engines of the master data management system at step 107. For example, the performance of each search engine of the master data management system may be evaluated to search for the value of each of the attributes. The performance of the search engine may be determined by evaluating performance parameters. The performance parameter may be, for example, the average number of results obtained by the search engine for different values of the search attribute and clicked on or used by the user. The performance parameters may alternatively or additionally include average match scores for results obtained by the search engine for different values of the search attribute and clicked on or used by the user.

Selection of a combination of one or more search engines may be performed using current selection rules. For example, a selection rule may be applied for each given attribute in the set of attributes as follows: for each of the search engines of the master data management system, a value of a performance parameter may be determined, the performance parameter being indicative of performance of the search engine for searching for the value of the given attribute. This may result in multiple values for each search engine in the search engine combination, e.g., if the set of attributes includes two attributes, each search engine may have two performance values associated with the two attributes.

For example, if the set of attributes includes a name and a birthday attribute, the structured probabilistic search engine may get better results for this input set and may be selected accordingly. In addition, a free text search engine may be selected. Also, the request may be executed using two engines as follows: free text searches may also be performed when no results are found by the probabilistic search engine. In another example, both search engines may be used to execute requests regardless of their respective results. In another example, the set of attributes may include a year of birth and a phone number. In this case, two engines may be selected because the probabilistic search engine may handle the edit distance value and the year of birth may be well satisfied by the free text engine as part of the text of the birth date. If the received request specifically invokes AND or NOT logic, a full-text search engine may be used.

After selecting the combination of search engines, the request may be processed using the combination of search engines in step 109. For example, the engine selector may decide to process data in parallel or sequentially using a combination of search engines based on pre-established heuristics. A combination of search engines is used to obtain a candidate list based on the rules of the engine selector.

In step 111, at least a portion of the results of processing the request by the combination of search engines may be provided, for example, by a results provider of the master data management system. For example, a row of data records of the result may be displayed on a graphical user interface to enable a user to access one or more data records of the result. For example, the user may perform a user operation on the provided results. The user operation may, for example, comprise a mouse click or touch gesture or another operation that enables the user to access the provided results.

The results provided may include all of the results obtained after the request is processed by the combination of the search engines, or may include only a predefined portion of those results. For example, combined search results from a search engine are aggregated and duplicates are removed, resulting in a candidate list of data records. The resulting candidate list of data records may be scored. For example, using multiple scoring engines of the master data management system. For example, depending on the attributes, the scoring function may or may not be available. Since a PME-based scorer may not be able to score all types of entities (e.g., contract-type data), multiple scoring engines are used. Of all the results obtained, one set of results might go to one scorer, while another set might go to some other scoring engine. The invocation of these scoring engines may be done in parallel to improve efficiency.

Based on the user operation performed on the provided results, the selection rule may be updated in step 113. The updated selection rule becomes the current selection rule and thus can be used for further received data requests of the master data management system. For example, step 105 may be repeated 113 upon receiving a subsequent request for the received request of step 103 of the data of the primary data management system, and during this repetition, the updated selection rules may be used in the selection step 107.

For example, selection rules are initially based primarily on the capabilities/applicability of the search engine corresponding to a given set of attributes, but selection rules remain to improve rules based on, for example, user clicks, feedback, and the results (quality and performance) of searches conducted so far. An alternate search engine may also be dynamically selected if the previous selection of the search engine did not deliver results.

FIG. 2 is a flow diagram of a method for providing search results for a set of one or more search engines. The method of fig. 2 may be applied, for example, to the data management system of fig. 1 (e.g., fig. 2 may provide details of step 111 of fig. 1) or may be applied to other search systems.

For example, the set of search engines may process search requests for data, and the search results may include data records, for example. In step 201, each data record of the result may be associated with or assigned a match score. The match scores may be obtained by one or more scoring engines. For example, the matching scores for the data records of the results may be obtained by one or more scoring engines. In the case of more than one scoring engine, the match score may be a combination (e.g., average) of the match scores obtained by the more than one scoring engine. In one example, of all the results obtained, one set of results may be processed by one scoring engine, while another set may be processed by some other scoring engine. At least a portion of the one or more scoring engines used to score the results of a given search engine may or may not be part of the given search engine.

For example, each search engine in the set of search engines may include a scoring engine configured to score results of the respective search engine. In another example, one or more public scoring engines may be used to score results obtained by a set of search engines. For example, each search engine in the set of search engines may be configured to connect to a scoring engine and receive scores for data records from the scoring engine.

The match scores may be weighted in step 203. The weighting of the match scores may be performed according to the performance of the components involved in producing the results. For example, to generate search results, a search process is performed. The search process may include process steps performed by a system element, such as a search engine, to obtain search results. The search process may thus have components that are process steps, system elements, and search results. Each of these components of the search process may have its own capabilities to perform the corresponding function. The performance of a component indicates how well the component performs its function or task. The performance of each component may be quantified by evaluating the corresponding performance parameter. This performance may affect the search results. In other words, each component of the search process contributes or impacts the quality of the obtained search results. At least a portion of these contributions may be considered by determining and assigning weights to at least a portion of the components of the search process. The weight assigned to a component may indicate (e.g., be proportional to) the performance of the component, e.g., the weight may be 0.8 if the efficiency of the method steps used to identify the attribute is 80%. In one example, a weight may be assigned to each of the components of the search process. In another example, portions of the components of the search process may be selected or identified (e.g., by the user), and those identified components may be associated with respective weights. In one example, the weights may be user-defined weights. The weighting step may result in each data record of the search results being associated with a weight of a component of the search process that resulted in the data record. The matching score of the data record may be weighted by a combination of its associated weights, e.g., the combination may be a product of the weights.

Using the weighted match score, the results may be provided in step 205 by removing duplicate data records of the results and retaining non-duplicate data records of the results having a weighted match score above a predefined score threshold. For example, the results may be displayed on a user interface, e.g., the user may see a list of rows, each row associated with a data record of the provided results.

The results provided may be manipulated or used by the user. For example, the user may perform a user operation on the provided results. These user actions may be monitored, for example, by an activity monitor. For example, after a list of results is shown to the user on the user interface, the activity monitor may track the user's clicks on the results shown. A click on a result line may be considered as the line that the user thinks s/he is looking for.

The user operation may optionally be processed and analyzed in step 207, for example. For example, the distribution of click counts relative to various characteristics of the data record (e.g., from which engine it came, what the confidence of entity type detection was, how complete the record was, how fresh the record was, etc.) may be analyzed. This data is captured to find correlations and weights are calculated based on a look-up table or derived from equations predicted from ML-based regression models accordingly. Thus, as each new click is fed back to the system, the distribution can be changed and thus help to reassign the weights. The calculated weights may be used to update the weights used to obtain the search results in step 209, e.g., the calculated weights may replace the corresponding weights used to obtain the search results. The updated weights may then be used when providing further search results that process further search requests.

FIG. 3 is a flow diagram of a method for providing search results for multiple search engines. The method of fig. 3 may be applied, for example, to the data management system of fig. 1, e.g., fig. 3 may provide details of step 111 of fig. 1, and for clarity, fig. 3 is described with reference to two search engines 1 and 2 and a set of five attributes, with reference to the examples in fig. 4A-4F. One search engine implements probabilistic searches and the other implements free-text searches. Further assume that the received request or input token is given a Name + date of birth (Name + DOB) and the entity identifier identifies the first token as a Name with 90% confidence and is sent to search engine 1 and the second token as a DOB with 60% confidence and is sent to search engine 2.

In this example, the components of the search process, such as performed by the method of FIG. 1, may include a search engine, the identifying step 105, and the results. Examples of the resulting data records R1 through R6 are provided in tables 401 and 402 of fig. 4A. The results R1 through R6 for the two search engines are aggregated and their match scores normalized, resulting in the match score for table 403.

In step 301, an engine weight may be assigned to each of the search engines. An example of engine weights is shown in FIG. 4B. For example, an initial weight of 0.5 may be assigned to search engine 1 and search engine 2.

In step 303, each of the sets of four attributes: the name, DOB, address, identifier, and email are assigned an attribute weight indicating a confidence level that identifies the attribute. The attribute weights shown in FIG. 4C may be an initial set of weights that may be updated after the search request is performed. For example, as shown in FIG. 4C, the attribute weight is 0.1 for the attribute name and a confidence between 0% and 10%. In one example, the value of the confidence level may be used to obtain the attribute weight, e.g., if the confidence level is less than 10%, the attribute weight may be equal to 0.1. However, other weight determination methods may be used.

In step 305, a weight indicating the completeness of a data record and a weight indicating the freshness of a data record may be assigned to each data record of the result. FIG. 4D is a table illustrating example values for the completion weight for a given data record. The completion weights shown in FIG. 4D may be an initial set of weights that may be updated after the search request is performed. For example, as shown in FIG. 4D, a completion weight for a given data record may be provided as a function of the completion of the data record. For example, for a completion between 10% and 20%, the completion weight is 0.2. In one example, the value of the completion may be used to obtain the completion weight, e.g., if the completion is less than 10%, the completion weight may be equal to 0.1. However, other example weighting methods may be used.

The table of FIG. 4E shows example values for freshness weighting for a given data record. The freshness weights shown in FIG. 4E can be an initial set of weights that can be updated after the search request is performed. For example, as shown in FIG. 4E, a freshness weight for a given data record may be provided based on the freshness of the data record. For example, for data records with a freshness of between 3 and 5 years, the freshness weighting is 0.8. However, other example weighting methods may be used.

For each data record of the result, the corresponding engine weight, attribute weight, completion weight, and freshness weight may be combined in step 307, and the score of the data record may be weighted by the combined weight. The combining weight may be, for example, the product of four weights. The final result score as a weighted score is shown in the table of fig. 4F. Using the final score, the results may be filtered and provided to the user. For example, only the data records R1, R2, and R6 may be provided to the user when their final score is above threshold 1. The table of FIG. 4F shows that for records R1, R2, and R3, the engine weight Wa is 0.5 because they are from Engine 1, and for records R4, R5, and R6, the engine weight Wa is 0.5 because they are from Engine 2 for records R1, R2, and R3, the attribute weight (associated with the name attribute) Wb is 0.9 because they are the result set of the entity identifier with 90% confidence that identifies the name attribute. The attribute weights (associated with DOB attributes) Wb for R4, R5, and R6 are 0.6 because they are the result set of entity identifiers that identify DOBs with a confidence of 60%. The completion weight wc is based on the completion of each record. For example, R1 is 80% complete, so 0.8 is the completion weight. The freshness weight Wd is based on the freshness of each record. For example, R1 is fresh, i.e., the last modified date is less than 1 year, so 1 is the freshness weight. The final score may be obtained as follows: the final score is the initial normalized score (a × Wa) ((B × Wb) × (C × Wc) ((D × Wd), where A, B, C and D are weights assumed to be 1 for simplicity.

FIG. 5 is a flow diagram of a method for updating weights used to weight match scores for data records of results of multiple search engine processing search requests. For simplicity, fig. 5 depicts the updating of the completion weight. However, the weight update method may be used for other weights. Fig. 5 may be described with reference to the example of fig. 4.

In providing the results to the user, the activity monitor may monitor user operations performed on the provided results in step 501. For example, the activity monitor may count the number of clicks that have been performed for each data record displayed to the user. This may result in the table of fig. 6A. FIG. 6A is a table illustrating the number of clicks performed by a user for different completions of a data record. For example, the user performs a mouse click on a row representing a data record with 80% completion.

At step 503, the results of the monitored operations, as shown in FIG. 6A, may be processed or analyzed to find updated completion weights. To this end, a look-up table as shown in fig. 6B may be generated. The lookup table includes an association between the completion range for weighting (see FIG. 4D) and the percentage of clicks performed by the user on data records with completion in the listed range. In this example, the data shows that the user has almost never clicked on records that are less than 30% complete, while 40% clicks occur on records that are greater than 80% complete. A new record with 60% completion will be given a weight proportional to 12% by weight in the look-up table. For example, for data records with between 50% and 60% completion, the score for a click is 12% obtained from the table of FIGS. 6A-6B. For example, the completion weight for the 50% to 60% completion range would become 0.12 instead of 0.6 of the initial weight (of FIG. 4D).

In another example, as illustrated in FIG. 6C, the analysis of the user operation may be performed by modeling the change in the degree of completion as a function of the click score. An example model 601 is shown in FIG. 6C. The model 601 may be used to determine updated weights for a given value of the degree of completion. The model 601 is described by an equation that can be predicted by an ML-based regression model.

The result of the method may be an updated weight that may be used in place of the initial weight provided in, for example, fig. 4, and the updated weight may be used to weight the match score of the data record resulting from the execution of the new search request.

Fig. 7 depicts a block diagram representation of a computer system 700, according to an example of the present disclosure. Computer system 700 may be configured to perform master data management, for example. The computer system 700 includes a master data management system 701 and one or more client systems 703. Client system 703 may access data source 705. The master data management system 701 may control access (read and write access, etc.) to the central repository 710. The master data management system 701 can utilize the index data 711 to process fuzzy searches.

The master data management system 701 may process data records received from the client systems 703 and store the data records in the central repository 710. Client system 703 may obtain data records, for example, from different data sources 705. Client systems 703 may communicate with the primary data management system 701 via a network connection including, for example, a Wireless Local Area Network (WLAN) connection, a WAN (wide area network) connection, a LAN (local area network) connection, or a combination thereof.

The master data management system 701 may also be configured to process requests or queries for accessing data stored in the central repository 710. For example, a query may be received from client system 703. The master data management system 701 includes an entity identifier 721 for identifying attributes or entities in a received data request. The entity identifier 721 may, for example, identify entities, numbers, and names and types of temporal expressions in the user input entered as unstructured text and map them with a certain probability or confidence to attributes of the data records stored in the central repository 710, which allows them to be used to perform structured search attributes. For example, the entity recognizer 721 may be a token recognizer that identifies string/value or pattern name, location, such as ABC @ UVW should be followed by e-mail. xyz or a 10-digit telephone number or an SSN following the AAA-BB-CCCC structure. The entity identifier 721 may be configured to use machine learning models to classify or identify input data attributes of data records stored in the central repository 710. The master data management system 701 further includes an engine selector 722 for selecting one or more engines suitable for performing the received search request. The engine selector 722 may decide to use one or more engines to process data in parallel or sequentially based on pre-established heuristics. For example, the rules originally used to select an engine are primarily based on the capabilities/applicability of the engine corresponding to a given set of attributes and entity type. After the initial processing of the first request, the engine selector keeps refining its rules based on the user's clicks, feedback, and the results (quality and performance) of the searches made so far. The engine selector 722 may also dynamically select an alternate engine if the previous selection of the search engine did not deliver results. Based on the rules of the engine selector 722, multiple search engines may be selected and used to obtain a good candidate list. Search results from all engines are aggregated and duplicates are removed. The resulting candidate list is then scored. Multiple scoring engines are used. Depending on the attributes, the scoring function may or may not be available. In addition to the PME-based scorer, other scoring engines are used to score the search results. For example, of all the results obtained, one set of results might go to one scorer, while another set might go to some other scoring engine. The invocation of these engines may be done in parallel to improve efficiency.

The master data management system 701 also includes a weight provider and results aggregator 723 for weighting and aggregating the results obtained by the search engine. Once all scorers complete the scoring, the aggregation of results may be based on a weighted average of the scores.

Weights are derived and refined over a period of time by finding correlations between the characteristics of the patterns and result set and the quality of the match. The analyzer may use machine learning to identify these correlations. The characteristics of the result set in the analysis may include (but are not limited to) at least one of: a matching engine for obtaining a score, e.g., a particular scoring engine may have a wider score range or be less reliable than other scoring engines; the entity identifier detects the certainty of the input data type; the completeness of the record, for example, indicates how many fields are filled and the freshness of the data (last update date). The weights are a set of numbers used to modify the scores of the result set. The quality of the match is indicated by an analysis of the user clicks. Clicking on the results shown indicates that the user understands a better match. The quality of the match may also be based on explicit feedback about the quality of the match that can be found on the UI. The analysis of the correlation is fed back to improve the weight provider 723. The weights are used to aggregate the results obtained by the search engine and then continue to the next stage based on a comparison with the threshold records.

The master data management system 701 also includes various APIs for allowing storage and access to data in the central repository 710. For example, the master data management system 701 includes a create, read, update and delete (CRUD) API 724 for enabling access to data such as storing new data records in the central repository 710. The master data management system 701 also includes an API associated with the search engine it includes. FIG. 7 shows two APIs for exemplary purposes, namely a structured search API 725 and an fuzzy search API 726.

The master data management system 701 also includes components that can filter results to be provided to a user. For example, the master data management system 701 includes a component 727 for applying visibility rules and another component 728 for applying consent management. The master data management system 701 includes a component 729 for applying standardized rules to data to be stored in the central repository 710. Filtering may be advantageous because data security and privacy are of paramount importance in the primary data management solution. While full-text searching attempts to project a wide web to find a match, it can be assured that such an excess range remains inside the system and that information is not inadvertently disclosed to an unsolicited user. To this end, multiple filters will check whether the querying user has access to the returned fields and whether the resulting record has the necessary associated permissions from the data owner for the processing purposes provided by the user. Filtering is performed at a later stage of the search process to allow for proper matching with all possible attributes. The result of the filtering may be a list of records in descending order of matching scores, including those that provide only the required consent, with those columns being allowed or visible to the user initiating the search.

The master data management system 701 also includes an indexing, matching, scoring, and linking service 730. Each client system 703 may include an administrative search User Interface (UI)741 for submitting search queries for querying the data in the central repository 710. Each client system may also include services such as a messaging service 742 and a bulk load service 743.

The operation of the computer system 700 will be described in detail with reference to fig. 8.

FIG. 8 depicts a flowchart of a method describing example operations of the master data management system 701. In block 801, a free text search may be entered in the browser, which may be an example of a managed search UI 741, for example. The entity identifier 721 may receive (block 802) a free text search request and may process the received request as described herein, e.g., in fig. 1, to identify an attribute or entity. The engine selector 722 may then be used (block 803) to select a search engine that is appropriate for the identified attributes. As illustrated in fig. 8, two search engines (blocks 804 and 805) are selected and used to execute the received search request. The results of the execution of the search request may be scored using the matching and scoring service of the primary data management system 701 (block 806). Scoring may also use additional scoring mechanisms (block 807). The results are then aggregated and the scores are normalized (block 808). Some filters may be applied before providing the results to the user (block 809). The filters may include, for example, at least one of visibility filters and rules based on agreed upon data filters, and custom filters. The filtered results are then displayed in a browser (e.g., the browser that received the free text search) (block 810). The displayed results may be monitored (block 811) and analyzed by the user click and quality feedback analyzer. For example, the analyzer may use a machine learning model to determine weights based on the user's actions on the results. The weights may be used to update the engine selector 722 and the weight provider 723 as indicated by arrows 812 and 813. The weights provided by the weight provider 723 may then be used in the scoring block 808 in the next iteration of the method.

FIG. 9 depicts a diagram illustrating an example of processing a request according to the present subject matter. The first column 901 of FIG. 9 shows example contents of a received request or input token. For example, the received request may include "Robert," Bangalore, "and the numbers" 123-45-6789. The second column 902 shows the result of the entity identification when processing the received request. For example, "Robert" is identified as the name attribute, "Bangalore" is identified as the address attribute, and the number "123-45-6789" is identified as the SSN attribute. Columns 902 and 904 indicate that the engine selector has selected the search engine "search engine 1" for processing the request "Robert". Columns 902 and 904 also indicate that the engine selector has selected the search engine "search engine 2" for processing the request "Bangalore". Columns 902 and 904 also indicate that the engine selector has selected both the search engines "search engine 1" and "search engine 2" for processing request "123-45-6789". The results of processing the request are processed, e.g., aggregated, before being provided as shown in column 905. For example, column 905 shows that the search engine "search engine 1" has found records R1, R2, and R3 when searching for "Robert". Column 905 also shows that when searching for "Bangalore," the search engine "search Engine 2" has found records R4 and R5. Column 905 also shows that the search engine "search engine 1" has found record R6 when searching "123-45-6789" and the search engine "search engine 2" has found record R7 when searching "123-45-6789". The results R1-R7 may need to be filtered using a data control filter as shown in column 906 before being provided to the user. After being filtered, the results may then be output to the user as shown in column 907. The birthday values are filtered out of the records R1-R7 as shown in column 907 because the user submitting the results is not allowed to access them.

It is to be understood that one or more of the above-described embodiments of the invention may be combined, as long as the combined embodiments are not mutually exclusive.

Various embodiments are specified in the following embodiments.

1. A method for accessing a data record of a primary data management system, the data record including a plurality of attributes, the method comprising:

augmenting the master data management system with one or more search engines to enable access to the data records;

receiving a request for data at a primary data management system;

identifying a set of one or more attributes of the plurality of attributes that are referenced in the received request;

selecting a combination of one or more of the search engines of the primary data management system whose performance for searching for values of at least a portion of the set of attributes satisfies a current selection rule;

processing the request using a combination of search engines;

providing at least a portion of the results of the processing.

2. The method of clause 1, further comprising updating the selection rule based on user manipulation of the provided results, the updated selection rule becoming the current selection rule, and repeating the identifying, selecting, processing and providing steps using the current selection rule upon receipt of another data request.

3. The method of clause 1, wherein the results comprise data records of the master data management system associated with respective match scores obtained by a scoring engine of the search engine, the method further comprising weighting the match scores according to performance of a component involved in providing the results, the component comprising method steps, elements for providing the results, and at least a portion of the results, wherein the provided results comprise non-duplicate data records having weighted match scores above a predefined score threshold.

4. The method of clause 3, the component comprising a search engine, an identifying step, and a result, the method further comprising:

assigning an engine weight to each of the search engines;

assigning an attribute weight to the set of attributes, wherein an attribute weight for an attribute indicates a confidence level at which the attribute is identified;

assigning a weight indicative of a completeness of the data record and a weight indicative of a freshness of the data record to each data record of the result;

for each data record of the result, a corresponding engine weight, attribute weight, completion weight, and freshness weight are combined, and the score of the data record is weighted by the combined weight.

5. The method of clause 4, further comprising:

providing user parameters quantifying user operations;

for each component in at least a portion of the components, determining a value of the user parameter and an associated value of a component parameter describing the component; and updating the weights assigned to the components using the determined associations.

6. The method of clause 5, further comprising providing a lookup table that associates values of user parameters with values of component parameters, and updating the weights assigned to the components using the lookup table.

7. The method of clause 5, further comprising modeling changes in the values of the user parameters with values of the component parameters using a predefined model, and using the model to determine updated weights for the components, and using the updated weights to update the weights assigned to the components.

8. The method of clause 5, wherein the user ones of the user operations include an indication of a result selection, the indication including a mouse click on a displayed result of the provided results, wherein the user parameters include at least one of a number of clicks, a frequency of clicks, and a duration of accessing a given one of the results.

9. The method of clause 1, wherein the results include data records of the master data management system associated with respective match scores obtained by a scoring engine of the search engine, wherein the provided results include non-duplicate data records having match scores above a predefined score threshold.

10. The method of clause 1, wherein, for each attribute in the set of attributes, the selection rule comprises:

for each of the search engines, determining a value of a performance parameter indicative of performance of the search engine for searching for the value of the attribute;

a search engine is selected having a performance parameter value above a predetermined performance threshold.

11. The method of clause 10, wherein the performance parameters include at least one of: the number of results and the degree to which the results match the expectations.

12. The method of clause 10, wherein the selection rule uses a table that associates attributes to corresponding search engines, the updating of the selection rule comprising:

determining a value of a user parameter that quantifies user operation of the provided results to each search engine in the combination of search engines; and

identifying values of the user parameters that are less than a predefined threshold using the determined values associated with each search engine in the combination of search engines, and for each identified value of the user parameter, determining the attributes of the set of attributes and the search engine associated with the identified value, and updating the table using the determined attributes and search engines.

13. The method of clause 1, wherein the processing of the request is performed in parallel by a combination of the search engines.

14. The method of clause 1, wherein the combination of search engines is a ranked list of search engines, wherein processing the request is performed consecutively after the ranked list until a minimum number of results is exceeded.

15. The method of clause 1, wherein identifying the set of attributes comprises inputting the received request to a predefined machine learning model; receiving a classification of the request from the machine learning model, the classification indicative of the set of attributes.

16. The method of clause 1, inputting the set of attributes to a predefined machine learning model, and receiving one or more search engines from the machine learning model that can be used to search the set of attributes.

17. The method of clause 16, further comprising: receiving a training set indicative of a different set of one or more training attributes, wherein each training attribute set is labeled to indicate a search engine suitable for executing the training attribute set; training a predefined machine learning algorithm using the training set, thereby generating the machine learning model.

18. The method of clause 1, wherein the provided results include data records that are filtered according to the sender of the request.

19. A method for providing search results of a search engine according to a predefined search process, the method comprising

Receiving results of a search request obtained by the search engine, each of the results associated with a match score;

for each of the results, determining a set of one or more components of the search process involved in providing the result, and assigning a predefined weight to each component of the set of components;

weighting the match scores using the weights;

providing results having a weighted match score above a predefined score threshold.

20. The method of clause 19, further comprising:

analyzing the user operation on the provided result by evaluating a user parameter quantifying the user operation;

for each component in at least a portion of the set of components, determining one or more values of a component parameter describing the component and an associated value of the user parameter; determining updated weights using the determined associations; and

replacing the weights assigned to the at least a portion of the components with the determined weights;

repeating the method for further received search results using the updated weights.

21. The method of clause 20, further comprising providing a table associating values of user parameters with values of component parameters, and using the table to update the weights assigned to the components.

22. The method of clause 20, further comprising modeling the associations between the values using a predefined model, and using the model to determine updated weights for the components, and using the updated weights to update the weights assigned to the components.

23. The method of clause 20, wherein the user ones of the user operations comprise mouse clicks on displayed ones of the provided results, wherein the user parameters comprise at least one of a number of clicks, a frequency of clicks, and a duration of access to a given one of the results.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium (or media) having computer-readable program instructions thereon for causing a processor to perform various aspects of the present invention.

The computer readable storage medium may be a tangible device capable of retaining and storing instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer-readable storage medium includes the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device such as a punch card or a raised pattern in a groove with instructions recorded thereon, and any suitable combination of the foregoing. A computer-readable storage medium as used herein should not be interpreted as a transitory signal per se, such as a radio wave or other freely propagating electromagnetic wave, an electromagnetic wave propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or an electrical signal transmitted through a wire.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a corresponding computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, optical transmission fibers, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium within the respective computing/processing device.

Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the computer of the user computer system, partly on the computer of the user computer system, as a stand-alone software package, partly on the computer of the user computer system and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer system through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, to perform aspects of the present invention, an electronic circuit comprising, for example, a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), may be personalized by executing computer-readable program instructions with state information of the computer-readable program instructions.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable storage medium having stored therein the instructions comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

30页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:通过协作过滤实时检测恶意活动

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!