Data caching method and device for distributed storage

文档序号:948184 发布日期:2020-10-30 浏览:18次 中文

阅读说明:本技术 一种分布式存储的数据缓存方法和装置 (Data caching method and device for distributed storage ) 是由 陈东河 于 2020-07-10 设计创作,主要内容包括:本发明公开了一种分布式存储的数据缓存方法和装置,方法包括:响应于发生数据读写访问而判断缓存是否命中数据读写访问相关数据;响应于缓存命中而基于缓存提供数据读写,并根据数据读写访问更新缓存热度表;响应于缓存未命中而进一步判断缓存空间是否足够;响应于缓存空间足够而将相关数据加载到缓存,基于缓存提供数据读写,并根据数据读写访问更新缓存热度表;响应于缓存空间不足而根据更新缓存热度表确定缓存中所有数据的热度,持续删除低热度缓存数据直到缓存空间足够。本发明能够提高缓存命中率并在缓存不足时淘汰过期热数据,清除缓存污染。(The invention discloses a data caching method and a data caching device for distributed storage, wherein the method comprises the following steps: judging whether the cache hits data read-write access related data or not in response to the occurrence of the data read-write access; providing data reading and writing based on the cache in response to the cache hit, and updating the cache heat table according to the data reading and writing access; further determining whether cache space is sufficient in response to the cache miss; loading relevant data to a cache in response to the cache space being sufficient, providing data reading and writing based on the cache, and updating a cache heat table according to data reading and writing access; and determining the heat of all the data in the cache according to the updated cache heat table in response to the insufficient cache space, and continuously deleting the low-heat cache data until the cache space is enough. The invention can improve the cache hit rate, eliminate out-of-date hot data when the cache is insufficient, and eliminate cache pollution.)

1. A data caching method for distributed storage is characterized by comprising the following steps:

judging whether the cache hits the data related to the data read-write access or not in response to the occurrence of the data read-write access;

providing data reading and writing based on the cache in response to the cache hit, and updating a cache heat table according to the data reading and writing access;

further determining whether the cache space is sufficient in response to the cache miss;

responding to the enough cache space to load the relevant data into the cache, providing data reading and writing based on the cache, and accessing the updated cache heat table according to the data reading and writing;

And determining the heat of all the data in the cache according to the updated cache heat table in response to the insufficient cache space, and continuously deleting the low-heat cache data until the cache space is enough.

2. The method of claim 1, wherein the updated cache hotlist is recorded with a last access time and a last access interval for all data, wherein the last access interval is a difference between the last access time and a penultimate access time;

updating the cache heat table according to the data read-write access comprises the following steps: and recording the difference between the time stamp of the data read-write access and the last access time of the related data in the updated cache heat table as the last access interval of the related data in a covering manner, and recording the time stamp of the data read-write access as the last access time of the related data in a covering manner.

3. The method of claim 2, wherein determining the heat of all data in the cache according to the updated cache heat table comprises performing the following steps for all data in the cache: determining a current time and determining an elapsed time based on a difference between the current time and the last access time; determining a heat based on a quotient of the last access interval and the elapsed time.

4. The method of claim 3, further comprising: in response to a particular cached data being accessed only once and thus not having the penultimate access time, it is directly hot to 1.

5. The method of claim 2, further comprising: writing the heat into the updated cache heat table in response to determining the heat of all data in the cache from the updated cache heat table.

6. A data caching apparatus for distributed storage, comprising:

a processor; and

a memory storing program code executable by the processor, the program code when executed performing the steps of:

judging whether the cache hits the data related to the data read-write access or not in response to the occurrence of the data read-write access;

providing data reading and writing based on the cache in response to the cache hit, and updating a cache heat table according to the data reading and writing access;

further determining whether the cache space is sufficient in response to the cache miss;

responding to the enough cache space to load the relevant data into the cache, providing data reading and writing based on the cache, and accessing the updated cache heat table according to the data reading and writing;

And determining the heat of all the data in the cache according to the updated cache heat table in response to the insufficient cache space, and continuously deleting the low-heat cache data until the cache space is enough.

7. The apparatus of claim 6, wherein the update cache hotlist records a last access time and a last access interval of all data, wherein the last access interval is a difference between the last access time and a penultimate access time;

updating the cache heat table according to the data read-write access comprises the following steps: and recording the difference between the time stamp of the data read-write access and the last access time of the related data in the updated cache heat table as the last access interval of the related data in a covering manner, and recording the time stamp of the data read-write access as the last access time of the related data in a covering manner.

8. The apparatus of claim 7, wherein determining the heat of all data in the cache according to the updated cache heat table comprises performing the following steps for all data in the cache: determining a current time and determining an elapsed time based on a difference between the current time and the last access time; determining a heat based on a quotient of the last access interval and the elapsed time.

9. The apparatus of claim 8, wherein the steps further comprise: in response to a particular cached data being accessed only once and thus not having the penultimate access time, it is directly hot to 1.

10. The apparatus of claim 7, wherein the steps further comprise: writing the heat into the updated cache heat table in response to determining the heat of all data in the cache from the updated cache heat table.

Technical Field

The present invention relates to the field of distributed storage, and in particular, to a data caching method and apparatus for distributed storage.

Background

The cache is a component of hardware or software used for storing data, so that corresponding data can be accessed more quickly in the follow-up process, the performance problem of hot data access in a high-concurrency and large-data scene is mainly solved, and high-performance data quick access is provided. Cache is a main technical means for improving the read-write performance of distributed storage, the hit rate of cache is an important index influencing the read-write performance, and how to improve the hit rate of cache is a main problem for research by people in the field.

The LRU (least recently used) algorithm is a relatively extensive cache implementation algorithm, and performs elimination of data according to the history access record of the data, and its core idea is that "if the data has been accessed recently, the probability of future access is higher. The method has the advantages that the realization is simple, when hot spot data exists, the LRU efficiency is good, but the LRU hit rate is sharply reduced due to sporadic and periodic batch operation, and the cache pollution condition is serious.

Aiming at the problems of low LRU hit rate and serious cache pollution in the prior art, no effective solution is available at present.

Disclosure of Invention

In view of this, an object of the embodiments of the present invention is to provide a data caching method and apparatus for distributed storage, which can improve cache hit rate, eliminate stale hot data when cache is insufficient, and eliminate cache pollution.

Based on the above object, a first aspect of the embodiments of the present invention provides a data caching method for distributed storage, including the following steps:

judging whether the cache hits data read-write access related data or not in response to the occurrence of the data read-write access;

providing data reading and writing based on the cache in response to the cache hit, and updating the cache heat table according to the data reading and writing access;

Further determining whether cache space is sufficient in response to the cache miss;

loading relevant data to a cache in response to the cache space being sufficient, providing data reading and writing based on the cache, and updating a cache heat table according to data reading and writing access;

and determining the heat of all the data in the cache according to the updated cache heat table in response to the insufficient cache space, and continuously deleting the low-heat cache data until the cache space is enough.

In some embodiments, the update cache hotlist is recorded with a last access time and a last access interval for all data, wherein the last access interval is a difference between the last access time and a penultimate access time;

the updating of the cache heat table according to the data read-write access comprises the following steps: and recording the difference between the time stamp of the data read-write access and the last access time of the related data in the updated cache heat table as the last access interval of the related data in a covering manner, and recording the time stamp of the data read-write access as the last access time of the related data in a covering manner.

In some embodiments, determining the heat of all data in the cache according to the updated cache heat table includes performing the following steps for all data in the cache: determining a current time and determining an elapsed time based on a difference between the current time and a last access time; the heat is determined based on the quotient of the last access interval and the elapsed time.

In some embodiments, the method further comprises: in response to a particular cached data being accessed only once and thus not having a penultimate access time, it is directly hot to 1.

In some embodiments, the method further comprises: the method further includes writing the heat to the updated cache heat table in response to determining the heat of all the data in the cache from the updated cache heat table.

A second aspect of the embodiments of the present invention provides a data caching apparatus for distributed storage, including:

a processor; and

a memory storing program code executable by the processor, the program code when executed performing the steps of:

judging whether the cache hits data read-write access related data or not in response to the occurrence of the data read-write access;

providing data reading and writing based on the cache in response to the cache hit, and updating the cache heat table according to the data reading and writing access;

further determining whether cache space is sufficient in response to the cache miss;

loading relevant data to a cache in response to the cache space being sufficient, providing data reading and writing based on the cache, and updating a cache heat table according to data reading and writing access;

and determining the heat of all the data in the cache according to the updated cache heat table in response to the insufficient cache space, and continuously deleting the low-heat cache data until the cache space is enough.

In some embodiments, the update cache hotlist is recorded with a last access time and a last access interval for all data, wherein the last access interval is a difference between the last access time and a penultimate access time;

the updating of the cache heat table according to the data read-write access comprises the following steps: and recording the difference between the time stamp of the data read-write access and the last access time of the related data in the updated cache heat table as the last access interval of the related data in a covering manner, and recording the time stamp of the data read-write access as the last access time of the related data in a covering manner.

In some embodiments, determining the heat of all data in the cache according to the updated cache heat table includes performing the following steps for all data in the cache: determining a current time and determining an elapsed time based on a difference between the current time and a last access time; the heat is determined based on the quotient of the last access interval and the elapsed time.

In some embodiments, the foregoing steps further comprise: in response to a particular cached data being accessed only once and thus not having a penultimate access time, it is directly hot to 1.

In some embodiments, the foregoing steps further comprise: the method further includes writing the heat to the updated cache heat table in response to determining the heat of all the data in the cache based on the updated cache heat table provision.

The invention has the following beneficial technical effects: the data caching method and the device for distributed storage provided by the embodiment of the invention judge whether the cache hits data read-write access related data by responding to the data read-write access; providing data reading and writing based on the cache in response to the cache hit, and updating the cache heat table according to the data reading and writing access; further determining whether cache space is sufficient in response to the cache miss; loading relevant data to a cache in response to the cache space being sufficient, providing data reading and writing based on the cache, and updating a cache heat table according to data reading and writing access; the technical scheme of responding to the shortage of the cache space, determining the heat of all data in the cache according to the updated cache heat table, and continuously deleting low-heat cache data until the cache space is enough can improve the cache hit rate, eliminate outdated hot data when the cache is insufficient, and eliminate cache pollution.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a data caching method for distributed storage according to the present invention;

fig. 2 is a detailed flowchart of the data caching method for distributed storage according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.

It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.

In view of the foregoing, a first aspect of the embodiments of the present invention provides an embodiment of a data caching method for distributed storage, which can improve cache hit rate, eliminate stale hot data when cache is insufficient, and eliminate cache pollution. Fig. 1 is a schematic flow chart of a data caching method for distributed storage according to the present invention.

The data caching method for distributed storage, as shown in fig. 1, includes the following steps:

Step S101: judging whether the cache hits data read-write access related data or not in response to the occurrence of the data read-write access;

step S103: providing data reading and writing based on the cache in response to the cache hit, and updating the cache heat table according to the data reading and writing access;

step S105: further determining whether cache space is sufficient in response to the cache miss;

step S107: loading relevant data to a cache in response to the cache space being sufficient, providing data reading and writing based on the cache, and updating a cache heat table according to data reading and writing access;

step S109: and determining the heat of all the data in the cache according to the updated cache heat table in response to the insufficient cache space, and continuously deleting the low-heat cache data until the cache space is enough.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program to instruct relevant hardware to perform the processes, and the processes can be stored in a computer readable storage medium, and when executed, the processes can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like. Embodiments of the computer program may achieve the same or similar effects as any of the preceding method embodiments to which it corresponds.

In some embodiments, the update cache hotlist is recorded with a last access time and a last access interval for all data, wherein the last access interval is a difference between the last access time and a penultimate access time;

the updating of the cache heat table according to the data read-write access comprises the following steps: and recording the difference between the time stamp of the data read-write access and the last access time of the related data in the updated cache heat table as the last access interval of the related data in a covering manner, and recording the time stamp of the data read-write access as the last access time of the related data in a covering manner.

In some embodiments, determining the heat of all data in the cache according to the updated cache heat table includes performing the following steps for all data in the cache: determining a current time and determining an elapsed time based on a difference between the current time and a last access time; the heat is determined based on the quotient of the last access interval and the elapsed time.

In some embodiments, the method further comprises: in response to a particular cached data being accessed only once and thus not having a penultimate access time, it is directly hot to 1.

In some embodiments, the method further comprises: the method further includes writing the heat to the updated cache heat table in response to determining the heat of all the data in the cache based on the updated cache heat table provision.

The invention provides a distributed storage cache implementation method. The hit rate of the cache data is high, which indicates that the number of times of access is large in unit time, that is, the time interval of each access is small, but the cache data with high history heat needs to be aged as soon as possible and exit from the cache when the service is switched. Therefore, on the basis of the LRU algorithm, the time interval of the last access of the cache data is introduced to represent the time interval of the average access of the cache data, and the smaller the time interval, the higher the hot degree of the cache data is represented, and the higher the hit probability is. In addition, the expired cache data with high release heat is quickly identified by recording the last accessed timestamp, and the heat of the current cache data is calculated by comprehensively establishing the following formula:

y=d/(t1-t2)

wherein y is the cache data heat, d is the time interval of the last access of the cache data, t1 is the current time of the cache data elimination calculation, and t2 is the last time stamp of the access of the cache data. (t1-t2)/d represents the number of times the cache data should theoretically be accessed in the period from the last time of access to the current time, and the larger the value, the more times the cache data should be accessed (actually not accessed) is calculated according to the accessed time interval d, so that the reciprocal is taken, and d/(t1-t2) represents the heat degree of the cache data.

The following further illustrates embodiments of the invention in terms of specific examples as shown in fig. 2. When data read-write access occurs, whether the cache is hit is judged, and if the cache is hit, the cache heat table is directly updated; if not, judging whether the current cache is full, if so, calculating to eliminate low-heat cache data, adding new data into the cache, and updating a cache heat table; and if the cache is not full, adding the new data into the cache, and updating the cache heat table.

For example: data A, B, C, D, E is cached by the LRU algorithm for the first read-write access, as follows:

data of Last access timestamp Last access time interval
A 10:20
B 10:15
C 10:10
D 10:05
E 10:00

The data recorded in the cache when data B, C, D is accessed a second time is as follows:

data of Last access timestamp Last access time interval
A 10:20
B 10:25 10
C 10:30 20
D 10:35 30
E 10:00

Assuming that new data F is accessed at the current time of 11:00, calculating the heat of each data in the current cache data:

since the cache data A and E are accessed only once and have no latest access time interval, the heat value is set to be 1; the latest access time interval of the cache data B is 10 minutes, the last access time is 10:25, the current time is 11:00, and 35/10 should be accessed for 3.5 times theoretically, but the cache data B is not accessed, so that the cache data B has expired due to the fact that the heat value is calculated as y 10/(11:00-10:25) 2/7 0.2857; similarly, the cache data C also belongs to the category of the heat degree expiration, and the heat degree value is y ═ 20/(11:00-10:30) ═ 2/3 ═ 0.6667, and the heat degree expiration is slight compared with the cache data B; the latest access time interval of the cache data D is 30 minutes, the latest access time is 10:35, the current time is 11:00, theoretically, 11:05 should be revisited, and the cache data belongs to the hot data which can be accessed immediately, and the hot value is y which is 30/(11:00-10:35) which is 6/5 which is 1.2;

So the data hot values in the current cache are ordered as follows (hot values are the same and the hot value in the last access time before is low):

Figure BDA0002578900930000091

therefore, the cache should be exited when the B heat in the cache data is the lowest, and the newly accessed data F should be added to the cache because the heat is 1 and the access time is the current 11:00, as follows:

it can be seen from the foregoing embodiments that, in the data caching method for distributed storage according to the embodiments of the present invention, whether the cache hits data read-write access-related data is determined in response to occurrence of data read-write access; providing data reading and writing based on the cache in response to the cache hit, and updating the cache heat table according to the data reading and writing access; further determining whether cache space is sufficient in response to the cache miss; loading relevant data to a cache in response to the cache space being sufficient, providing data reading and writing based on the cache, and updating a cache heat table according to data reading and writing access; the technical scheme of responding to the shortage of the cache space, determining the heat of all data in the cache according to the updated cache heat table, and continuously deleting low-heat cache data until the cache space is enough can improve the cache hit rate, eliminate outdated hot data when the cache is insufficient, and eliminate cache pollution.

It should be particularly noted that, steps in the embodiments of the data caching method for distributed storage described above may be mutually intersected, replaced, added, and deleted, and therefore, the data caching method for distributed storage that is transformed by these reasonable permutations and combinations shall also belong to the scope of the present invention, and shall not limit the scope of the present invention to the described embodiments.

In view of the foregoing, a second aspect of the embodiments of the present invention provides an embodiment of a data caching apparatus for distributed storage, which is capable of increasing a cache hit rate, eliminating stale hot data when a cache is insufficient, and eliminating cache pollution. The data caching device for distributed storage comprises:

a processor; and

a memory storing program code executable by the processor, the program code when executed performing the steps of:

judging whether the cache hits data read-write access related data or not in response to the occurrence of the data read-write access;

providing data reading and writing based on the cache in response to the cache hit, and updating the cache heat table according to the data reading and writing access;

further determining whether cache space is sufficient in response to the cache miss;

loading relevant data to a cache in response to the cache space being sufficient, providing data reading and writing based on the cache, and updating a cache heat table according to data reading and writing access;

And determining the heat of all the data in the cache according to the updated cache heat table in response to the insufficient cache space, and continuously deleting the low-heat cache data until the cache space is enough.

In some embodiments, the update cache hotlist is recorded with a last access time and a last access interval for all data, wherein the last access interval is a difference between the last access time and a penultimate access time;

the updating of the cache heat table according to the data read-write access comprises the following steps: and recording the difference between the time stamp of the data read-write access and the last access time of the related data in the updated cache heat table as the last access interval of the related data in a covering manner, and recording the time stamp of the data read-write access as the last access time of the related data in a covering manner.

In some embodiments, determining the heat of all data in the cache according to the updated cache heat table includes performing the following steps for all data in the cache: determining a current time and determining an elapsed time based on a difference between the current time and a last access time; the heat is determined based on the quotient of the last access interval and the elapsed time.

In some embodiments, the foregoing steps further comprise: in response to a particular cached data being accessed only once and thus not having a penultimate access time, it is directly hot to 1.

In some embodiments, the foregoing steps further comprise: the method further includes writing the heat to the updated cache heat table in response to determining the heat of all the data in the cache based on the updated cache heat table provision.

It can be seen from the foregoing embodiments that, in the data caching apparatus for distributed storage according to the embodiments of the present invention, whether data read-write access related data is hit in a cache is determined in response to occurrence of data read-write access; providing data reading and writing based on the cache in response to the cache hit, and updating the cache heat table according to the data reading and writing access; further determining whether cache space is sufficient in response to the cache miss; loading relevant data to a cache in response to the cache space being sufficient, providing data reading and writing based on the cache, and updating a cache heat table according to data reading and writing access; the technical scheme of responding to the shortage of the cache space, determining the heat of all data in the cache according to the updated cache heat table, and continuously deleting low-heat cache data until the cache space is enough can improve the cache hit rate, eliminate outdated hot data when the cache is insufficient, and eliminate cache pollution.

It should be particularly noted that, the above-mentioned embodiment of the data caching apparatus for distributed storage adopts the embodiment of the data caching method for distributed storage to specifically describe the working process of each module, and those skilled in the art can easily think that these modules are applied to other embodiments of the data caching method for distributed storage. Of course, since the steps in the embodiment of the data caching method for distributed storage may be mutually intersected, replaced, added, and deleted, the data caching apparatus for distributed storage, which is transformed by these reasonable permutations and combinations, shall also belong to the scope of the present invention, and shall not limit the scope of the present invention to the embodiment.

The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.

It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of an embodiment of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

13页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种使SPI主控兼容flash芯片的方法、装置、设备和介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类