Data processing method and device

文档序号:856853 发布日期:2021-04-02 浏览:3次 中文

阅读说明:本技术 数据处理方法及装置 (Data processing method and device ) 是由 何孝金 于 2020-12-22 设计创作,主要内容包括:本申请提供一种数据处理方法及装置,所述方法应用于存储系统,所述方法包括:接收主机发送的第一操作指令,所述第一操作指令包括用户对象的第一逻辑区块地址LBA以及第一快照序列号;根据所述第一LBA以及第一快照序列号,查找元数据缓存,所述元数据缓存包括至少一个元数据值,每个元数据值包括实际写入的ROW对象的第二LBA、第一标识以及第二标识;当从所述元数据缓存中获取到与所述第一LBA以及第一快照序列号对应的第一元数据值时,根据所述第一操作指令的类型,识别所述第一标识的值或者所述第二标识的值;根据识别结果,对所述第二LBA中存储的数据进行对应处理。(The application provides a data processing method and a data processing device, wherein the method is applied to a storage system and comprises the following steps: receiving a first operation instruction sent by a host, wherein the first operation instruction comprises a first Logic Block Address (LBA) and a first snapshot sequence number of a user object; according to the first LBA and the first snapshot serial number, a metadata cache is searched, the metadata cache comprises at least one metadata value, and each metadata value comprises a second LBA, a first identifier and a second identifier of an actually written ROW object; when a first metadata value corresponding to the first LBA and a first snapshot serial number is obtained from the metadata cache, identifying the value of the first identifier or the value of the second identifier according to the type of the first operation instruction; and correspondingly processing the data stored in the second LBA according to the identification result.)

1. A data processing method is applied to a storage system, and comprises the following steps:

receiving a first operation instruction sent by a host, wherein the first operation instruction comprises a first Logic Block Address (LBA) and a first snapshot sequence number of a user object;

according to the first LBA and the first snapshot serial number, a metadata cache is searched, the metadata cache comprises at least one metadata value, and each metadata value comprises a second LBA, a first identifier and a second identifier of an actually written ROW object;

when a first metadata value corresponding to the first LBA and a first snapshot serial number is obtained from the metadata cache, identifying the value of the first identifier or the value of the second identifier according to the type of the first operation instruction;

and correspondingly processing the data stored in the second LBA according to the identification result.

2. The method according to claim 1, wherein the identifying, according to the type of the first operation instruction, the value of the first identifier or the value of the second identifier specifically includes:

when the first operation instruction is a read operation instruction, identifying the value of the first identifier;

and when the first operation instruction is a deletion operation instruction, identifying the value of the second identifier.

3. The method according to claim 2, wherein the performing, according to the identification result, the corresponding processing on the data stored in the second LBA specifically includes:

when the value of the first identifier is 1, determining that the first metadata value belongs to an original object;

and reading the stored data from the second LBA and feeding back the data to the host.

4. The method of claim 2, further comprising:

when a first metadata value corresponding to the first LBA and a first snapshot serial number is not obtained from the metadata cache, and a second metadata value corresponding to the first LBA is obtained, judging whether the value of the first identifier is 1;

if the value is 1, determining that the second metadata value belongs to the original object;

and reading the stored data from the second LBA and feeding back the data to the host.

5. The method of claim 4, further comprising:

when a first metadata value corresponding to the first LBA and a first snapshot sequence number is not obtained from the metadata cache, and a plurality of second metadata values corresponding to the first LBA are obtained, judging whether the value of the first identifier is 1;

if the number of the second metadata values is 0, determining that the plurality of second metadata values do not belong to the original object;

and reading and comparing the stored data from the magnetic disk according to the second LBA included by each second metadata value, and feeding back the latest written data to the host.

6. The method of claim 5, further comprising:

when a second metadata value corresponding to the first LBA is obtained from the metadata cache and the second metadata value is valid, a third metadata value corresponding to the first LBA and a first snapshot serial number is generated according to the first LBA and the first snapshot serial number;

and the first identifier and the second identifier which are included in the third metadata value are both 0.

7. The method according to claim 2, wherein the performing the corresponding processing on the data stored in the second LBA according to the identification result specifically further includes:

when the value of the second identifier is 0, determining that actually written data is not stored in the second LBA;

deleting the first metadata value.

8. The method according to claim 2, wherein the performing the corresponding processing on the data stored in the second LBA according to the identification result specifically further includes:

when the value of the second identifier is 1 and a fourth data value corresponding to the first LBA is acquired, determining whether the value of the second identifier included in the fourth data value is 1;

if the value is 0, determining that an association relationship exists between the data stored in the second LBA and the data stored in the second LBA included in the fourth data value;

not deleting the first metadata value;

and the second snapshot sequence number corresponding to the fourth data value is smaller than the first snapshot sequence number.

9. The method according to any of claims 1-8, wherein the value of the first identifier is used to indicate whether a metadata value belongs to an original object; the value of the second identifier is used to indicate whether the metadata value is stored in the disk.

10. A data processing apparatus, wherein the apparatus is applied to a storage system, the apparatus comprising:

the system comprises a receiving unit, a sending unit and a processing unit, wherein the receiving unit is used for receiving a first operation instruction sent by a host, and the first operation instruction comprises a first Logic Block Address (LBA) and a first snapshot serial number of a user object;

a searching unit, configured to search a metadata cache according to the first LBA and the first snapshot sequence number, where the metadata cache includes at least one metadata value, and each metadata value includes a second LBA, a first identifier, and a second identifier of an actually written ROW object;

an identifying unit, configured to identify a value of the first identifier or a value of the second identifier according to a type of the first operation instruction when a first metadata value corresponding to the first LBA and the first snapshot serial number is obtained from the metadata cache;

and the processing unit is used for correspondingly processing the data stored in the second LBA according to the identification result.

Technical Field

The present application relates to the field of communications technologies, and in particular, to a data processing method and apparatus.

Background

At present, high performance Solid State Drives (SSD) are widely used. In the field of distributed storage, full SSD based distributed storage systems are also continuously released by vendors. In a distributed full flash memory system, data is written in a mode of redirection-on-write (ROW) during writing, so that the performance advantages of the SSD can be better exerted, and meanwhile, the ROW can provide better support for characteristics such as deduplication, compression, snapshot and the like.

The ROW snapshot is a lossless snapshot, and compared with a traditional Copy-On-Write (COW) snapshot, the ROW snapshot has little influence On the host service. The current ROW snapshot is realized by a snapshot sequence number. I.e., each time a snapshot is performed, the sequence number of the snapshot is incremented. When the host writes data into the storage system, the host carries the latest snapshot serial number, and after the data is written, the serial numbers of the snapshots are recorded into the metadata.

However, the ROW snapshot also has problems that the storage amount of the metadata is too large, and the metadata needs to be queried first each time the data is read, which also results in an increased reading delay. In order to solve the foregoing problem, most manufacturers set a cache for metadata, and the hit rate of the cache directly affects the read performance of the host.

Currently, the caching of metadata is mainly implemented by a KEY-VALUE (KEY-VALUE) database. The KEY is composed of a user object, a Logical Block Address (LBA) of the user object, and a snapshot sequence number, and the VALUE is an LBA of the actually written ROW object and an LBA of the actually written ROW object. When the host reads data, the corresponding metadata is found through the user object and the LBA of the user object, the actually written ROW object and the LBA of the actually written ROW object are obtained from the metadata, and the corresponding data can be read only by downloading and reading.

In a scenario without the ROW snapshot, the above-mentioned caching method for the metadata can solve the above-mentioned problems, but when there are multiple sequence numbers of the ROW snapshot, the above-mentioned caching scheme for the metadata also brings some problems. In a scene where the ROW snapshot exists, when the host reads data, metadata which is less than or equal to an input snapshot sequence number and corresponds to a current maximum valid snapshot sequence number needs to be read.

As shown in fig. 1, fig. 1 is a schematic diagram of performing a ROW snapshot on a source data volume. In fig. 1, the process of writing data by the host (the area with the grid in the figure is the area where the host issues the write request) is interspersed during the execution of the ROW snapshot on the source data set. Suppose that a user performs a ROW snapshot and a write operation on 1 user object, after a plurality of operations, the current snapshot serial number is 4, that is, the storage system performs 4 ROW snapshots on the user object.

If the user needs to read the latest data stored in the user object LBA1 now, the host issues a read request for LAB1, where the snapshot sequence number carried in the read request is 4. However, as shown in fig. 1, the LAB1 of the actually written ROW object writes data only when the snapshot sequence number is 1, even if the cache is written with metadata of the snapshot sequence number 1, the metadata of the sequence number 4 cannot be searched in the cache, and it cannot be determined that the metadata of the snapshot sequence number 1 is the metadata of the snapshot sequence number 4 in the cache. Therefore, the storage engine side that needs to read the metadata from the disk can read the latest metadata stored in the object LBA 1.

In practical application, the ROW snapshot is mostly used as a data backup, and when the host reads data, the original object is mostly read, but the read request carries the latest snapshot sequence number. Therefore, with existing caching schemes, the cache hit rate is very low.

Disclosure of Invention

In view of this, the present application provides a data processing method and apparatus, so as to solve the problem of a very low cache hit rate in the existing cache scheme.

In a first aspect, the present application provides a data processing method, where the method is applied to a storage system, and the method includes:

receiving a first operation instruction sent by a host, wherein the first operation instruction comprises a first Logic Block Address (LBA) and a first snapshot sequence number of a user object;

according to the first LBA and the first snapshot serial number, a metadata cache is searched, the metadata cache comprises at least one metadata value, and each metadata value comprises a second LBA, a first identifier and a second identifier of an actually written ROW object;

when a first metadata value corresponding to the first LBA and a first snapshot serial number is obtained from the metadata cache, identifying the value of the first identifier or the value of the second identifier according to the type of the first operation instruction;

and correspondingly processing the data stored in the second LBA according to the identification result.

In a second aspect, the present application provides a data processing apparatus, which is applied to a storage system, the apparatus comprising:

the system comprises a receiving unit, a sending unit and a processing unit, wherein the receiving unit is used for receiving a first operation instruction sent by a host, and the first operation instruction comprises a first Logic Block Address (LBA) and a first snapshot serial number of a user object;

a searching unit, configured to search a metadata cache according to the first LBA and the first snapshot sequence number, where the metadata cache includes at least one metadata value, and each metadata value includes a second LBA, a first identifier, and a second identifier of an actually written ROW object;

an identifying unit, configured to identify a value of the first identifier or a value of the second identifier according to a type of the first operation instruction when a first metadata value corresponding to the first LBA and the first snapshot serial number is obtained from the metadata cache;

and the processing unit is used for correspondingly processing the data stored in the second LBA according to the identification result.

Therefore, by applying the data processing method and apparatus provided by the present application, the storage system receives a first operation instruction sent by the host, where the first operation instruction includes a first logical block address LBA and a first snapshot serial number of the user object. According to the first LBA and the first snapshot sequence number, the storage system searches a metadata cache, wherein the metadata cache comprises at least one metadata value, and each metadata value comprises a second LBA, a first identifier and a second identifier of an actually written ROW object. When the first metadata value corresponding to the first LBA and the first snapshot sequence number is acquired from the metadata cache, the storage system identifies a value of the first identifier or a value of the second identifier according to the type of the first operation instruction. And according to the identification result, the storage system correspondingly processes the data stored in the second LBA.

Therefore, the problem that the cache hit rate is very low in the existing cache scheme is solved. On the basis of ensuring the consistency of the metadata, the metadata hit rate in the ROW snapshot sequence number scene is greatly improved.

Drawings

FIG. 1 is a schematic diagram of performing ROW snapshots on a source data volume;

fig. 2 is a flowchart of a data processing method according to an embodiment of the present application;

fig. 3 is a structural diagram of a data processing apparatus according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the corresponding listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

The data processing method provided in the embodiments of the present application is explained in detail below. Referring to fig. 2, fig. 2 is a flowchart of a data processing method according to an embodiment of the present application. The method is applied to a storage system, and the data processing method provided by the embodiment of the application can comprise the following steps.

Step 210, receiving a first operation instruction sent by a host, where the first operation instruction includes a first logical block address LBA and a first snapshot sequence number of a user object.

Specifically, the host executes various types of operations on the storage system in advance. The host generates a first operation instruction, which includes a first LBA of the user object and a first snapshot sequence number.

The user object may be a storage unit such as a file, a volume, a block, and the like. The first snapshot sequence number may be a snapshot sequence number having a largest sequence number value among the current snapshot sequence numbers. For example, the storage system performs a ROW snapshot, and increments the snapshot sequence number each time a ROW snapshot is performed. Alternatively, the first snapshot sequence number may be a snapshot sequence number of a snapshot where a certain ROW snapshot is executed.

The storage system receives a first operation instruction sent by a host. In the embodiment of the present application, the first operation instruction includes a read operation instruction, that is, the host reads data stored in the storage system. The first operation instruction further comprises a deletion operation instruction, namely the host deletes the data stored by the storage system.

Step 220, according to the first LBA and the first snapshot sequence number, finding a metadata cache, where the metadata cache includes at least one metadata value, and each metadata value includes a second LBA, a first identifier (tag1), and a second identifier (tag2) of an actually written ROW object.

Specifically, after receiving a first operation instruction sent by the host, the storage system obtains the first LBA and the first snapshot sequence number from the first operation instruction.

And searching the metadata cache by the storage system according to the first LAB and the first snapshot serial number. The metadata caching is realized through a KEY-VALUE database. The KEY (KEY, which may also be referred to as a metadata KEY) includes a user object, a first LBA of the user object, and a snapshot sequence number. The VALUE (VALUE, which may also be referred to as a metadata VALUE) includes the actually written ROW object and the second LBA of the actually written ROW object.

Further, in this embodiment of the present application, the metadata value further includes a first identifier and a second identifier. The value of the first identifier is used for indicating whether the metadata value belongs to an original object or not; the value of the second identifier is used to indicate whether the metadata value is stored in disk.

The first identification and the second identification can be represented by 2 bits in the cache, and the cache is not increased compared with the existing metadata cache.

As shown in fig. 1, each grid in fig. 1 identifies a storage area, the lowermost line segment (also referred to as X-axis) indicates LAB, the right line segment (also referred to as right Y-axis) indicates the number of ROW snapshots (snap), and the left line segment (also referred to as left Y-axis) is the snapshot sequence number (seq). In FIG. 1, the number of LABs is LBA1-LBA8, the number of ROW snapshots is snap1-snap5, and the snapshot sequence number is seq0-seq 4. Wherein, the grid marked by oblique lines indicates that the area is executed with the write operation instruction.

The original object specifically means that after the ROW snapshot is executed at a certain time and data is written in the LBA, the data in the LBA is not modified when the ROW snapshot is executed multiple times subsequently. In fig. 1, the regions indicated by KEY (LBA1, seq1) indicate that the data written into LBA1 is the original object because the data in LBA1 was not written or modified during its subsequent multiple ROW snapshots.

For better description of the embodiment of the present application, the description format stored in the metadata cache is (LBA, seq, tag1, tag 2).

Step 230, when the first metadata value corresponding to the first LBA and the first snapshot serial number is obtained from the metadata cache, identifying the value of the first identifier or the value of the second identifier according to the type of the first operation instruction.

Specifically, as described in step 220, the storage system looks up the metadata cache based on the first LAB and the first snapshot serial number.

When the first metadata value corresponding to the first LBA and the first snapshot sequence number is obtained from the metadata cache, the storage system identifies the value of the first identifier or the value of the second identifier according to the type of the first operation instruction.

In one example, as shown in fig. 1, it is assumed that at the time of executing the second ROW snapshot, a write operation instruction is executed on the LBA1, and a metadata cache is recorded, where the metadata cache is (LBA1, seq1, 1, 1).

At this time, if the first LBA included in the first operation instruction received by the storage system is LBA1, and the first snapshot sequence number is seq1, the storage system obtains the corresponding metadata value. Meanwhile, the storage system identifies the value of the first identifier or the value of the second identifier according to the type of the first operation instruction.

Further, when the first operation instruction is a read operation instruction, the storage system identifies a value of the first identifier; when the first operation instruction is a delete operation instruction, the storage system identifies the value of the second identifier.

It should be noted that, after receiving the first operation instruction, the storage system may obtain, according to the first LBA, all metadata values corresponding to the first LBA from the metadata cache in the process of searching the metadata cache.

In one example, as shown in fig. 1, the storage system executes three write operations to the LBA2 while executing the first, second, and third ROW snapshots, and records the metadata cache each time. If the first LBA included in the first operation instruction is LBA2, no matter which ROW snapshot the first snapshot serial number is, the storage system obtains all metadata caches corresponding to LBA2, that is, obtains three metadata caches from the metadata cache.

In another example, as shown in FIG. 1, the storage system performs a write operation instruction on LBA1 and records the metadata cache when performing the second ROW snapshot. If the first LBA included in the first operation instruction is LBA1, the storage system obtains a metadata cache from the metadata cache regardless of which ROW snapshot the first snapshot serial number is.

Of course, if the first metadata value corresponding to the first LBA and the first snapshot sequence number is stored in the metadata cache, the storage system may also obtain the corresponding first metadata value.

And step 240, performing corresponding processing on the data stored in the second LBA according to the identification result.

Specifically, the storage system identifies the value of the first identifier, or the value of the second identifier, according to the type of the first operation instruction, as described in step 230. And the storage system correspondingly processes the data stored in the second LBA according to the identification result.

Further, when the first operation instruction is a read operation instruction, the storage system identifies a value of the first flag. If the value of the first identifier is 1, the storage system determines that the first metadata value belongs to the original object. The storage system reads the stored data directly from the second LBA and feeds back the data to the host.

In one example, as shown in fig. 1, it is assumed that at the time of executing the second ROW snapshot, a write operation instruction is executed on the LBA1, and a metadata cache is recorded, where the metadata cache is (LBA1, seq1, 1, 1). After the storage system acquires the metadata value, the value of the first identifier is identified as 1. The storage system determines that the data stored by this LBA1 belongs to the original object. The storage system reads the stored data directly from LBA1 and feeds back the data to the host.

Further, when the first operation instruction is a delete operation instruction, the storage system identifies a value of the second identifier. When the value of the second flag is 0, the storage system determines that the actually written data is not stored in the second LBA. The storage system may delete the first metadata value directly.

In another example, as shown in fig. 1, it is assumed that at the time of performing the second ROW snapshot, the data stored in the LBA3 is read, and a metadata cache is recorded, which is (LBA3, seq1, 0, 0). After the storage system acquires the metadata value, the value of the second identifier is identified as 0. The storage system determines that the actual written data is not stored in this LBA 3. The storage system deletes the metadata value without performing a delete operation in the disk.

Therefore, by applying the data processing method and apparatus provided by the present application, the storage system receives a first operation instruction sent by the host, where the first operation instruction includes a first logical block address LBA and a first snapshot serial number of the user object. According to the first LBA and the first snapshot sequence number, the storage system searches a metadata cache, wherein the metadata cache comprises at least one metadata value, and each metadata value comprises a second LBA, a first identifier and a second identifier of an actually written ROW object. When the first metadata value corresponding to the first LBA and the first snapshot sequence number is acquired from the metadata cache, the storage system identifies a value of the first identifier or a value of the second identifier according to the type of the first operation instruction. And according to the identification result, the storage system correspondingly processes the data stored in the second LBA.

Therefore, the problem that the cache hit rate is very low in the existing cache scheme is solved. On the basis of ensuring the consistency of the metadata, the metadata hit rate in the ROW snapshot sequence number scene is greatly improved.

Optionally, in this embodiment of the present application, the storage system further includes the following multiple cases in the process of searching the metadata cache, which are respectively described below.

In one case, when a first metadata value corresponding to the first LBA and the first snapshot sequence number is not obtained from the metadata cache, and a second metadata value corresponding to the first LBA is obtained, the storage system determines whether the value of the first identifier is 1. If 1, the storage system determines that the second metadata value belongs to the original object. The storage system reads the stored data from the second LBA and feeds back the data to the host.

In one example, as shown in fig. 1, it is assumed that at the time of executing the second ROW snapshot, a write operation instruction is executed on the LBA1, and a metadata cache is recorded, where the metadata cache is (LBA1, seq1, 1, 1). At this time, the first operation command is a read operation command, the first LBA is LBA1, and the first snapshot sequence number is seq 4. The storage system does not acquire the first metadata values corresponding to LBA1 and seq 4. As can be seen from the foregoing description, the storage system obtains the entire metadata cache corresponding to LBA 1. In this example, the storage system will retrieve one metadata cache from the metadata cache, i.e., the second metadata value is (LBA1, seq1, 1, 1).

The storage system determines whether the value of the first flag included in the second metadata value is 1. If 1, the storage system determines that the data stored in LBA1 belongs to the original object. The storage system reads the stored data from LBA1 and feeds back the data to the host.

In another case, when the first metadata value corresponding to the first LBA and the first snapshot sequence number is not obtained from the metadata cache, and the plurality of second metadata values corresponding to the first LBA are obtained, the storage system determines whether the value of the first identifier is 1. If 0, the storage system determines that the plurality of second metadata values do not belong to the original object. According to the second LBA included in each second metadata value, the storage system reads and compares the stored data from the disk and feeds back the most recently written data to the host.

In one example, as shown in fig. 1, when the storage system executes the first, second, and third ROW snapshots, the storage system executes three write operation instructions to the LBA2, and each time the metadata cache is recorded, the metadata cache is sequentially (LBA2, seq0, 0, 1), (LBA2, seq1, 0, 1), (LBA2, seq2, 0, 1). At this time, the first operation command is a read operation command, the first LBA is LBA2, and the first snapshot sequence number is seq 4. The storage system does not acquire the first metadata values corresponding to LBA2 and seq 4. As can be seen from the foregoing description, the storage system obtains the entire metadata cache corresponding to LBA 2. In this example, the storage system will retrieve three metadata caches from the metadata cache, i.e., the second metadata value is (LBA2, seq0, 0, 1), (LBA2, seq1, 0, 1), (LBA2, seq2, 0, 1).

The storage system determines whether the value of the first flag included in the second metadata value is 1. If 0, the storage system determines that the data stored in LBA2 does not belong to the original object. According to the second LBA included in each second metadata value, the storage system reads and compares the stored data from the disk and feeds back the most recently written data to the host.

In another case, when the second metadata value corresponding to the first LBA is obtained from the metadata cache and the second metadata value is valid, the storage system generates a third metadata value corresponding to the first LBA and the first snapshot serial number according to the first LBA and the first snapshot serial number. And the third metadata value comprises a first identifier and a second identifier which are both 0.

In one example, as shown in fig. 1, when the storage system executes the first and third ROW snapshots, two write operations are performed on the LBA3, and each time the metadata cache is recorded, the metadata cache is sequentially (LBA3, seq0, 0, 1), (LBA3, seq3, 0, 1). At this time, the first operation command is a read operation command, the first LBA is LBA3, and the first snapshot sequence number is seq 1. The storage system does not acquire the first metadata values corresponding to LBA3 and seq 1. As can be seen from the foregoing description, the storage system obtains the entire metadata cache corresponding to LBA 3. In this example, the storage system will retrieve two metadata caches from the metadata cache, i.e., the second metadata values are (LBA3, seq0, 0, 1), (LBA3, seq3, 0, 1).

And the storage system judges that the acquired second metadata values are all in a valid state. The storage system generates a corresponding third metadata value, namely (LBA3, seq1, 0, 0), from LBA3 and seq 1. Since the read ROW snapshot is not the non-original object and the actually read seq does not match the seq included in the first operation instruction, the storage system sets the values of the first identifier and the second identifier to 0.

In another case, when the value of the second identifier is 1 and the fourth data value corresponding to the first LBA is acquired, the storage system determines whether the value of the second identifier included in the fourth data value is 1. If the value is 0, the storage system determines that an association exists between the data stored in the second LBA and the data stored in the second LBA included in the fourth data value. The storage system does not delete the first metadata value. And the second snapshot sequence number corresponding to the fourth data value is smaller than the first snapshot sequence number.

In one example, as shown in fig. 1, when the storage system executes the first and third ROW snapshots, two write operations are performed on the LBA3, and each time the metadata cache is recorded, the metadata cache is sequentially (LBA3, seq0, 0, 1), (LBA3, seq3, 0, 1).

According to the foregoing example, the storage system reads the data stored in the LBA3 when the second ROW snapshot is executed according to the read operation command, and generates a corresponding third metadata value, that is, (LBA3, seq1, 0, 0). The storage system receives the delete operation command sent by the host again, and deletes the data stored in the LBA3 at the time of the first ROW snapshot. The storage system obtains the first metadata value (LBA3, seq0, 0, 1). The storage system determines that the first metadata value includes a value of the second identifier of 1. At the same time, the storage system also obtains a third metadata value (LBA3, seq1, 0, 0) corresponding to LBA 3. The storage system determines that the second identification value included in the third metadata value is 0. The storage system determines that an association exists between the data stored by the second LBA in the first metadata value and the data stored by the second LBA in the third metadata value. That is, the data stored in the LBA3 at the time of performing the second ROW snapshot is dependent on the data stored in the LBA3 at the time of performing the first ROW snapshot. If the data stored in the LBA3 is valid when the first ROW snapshot is executed, the data cannot be directly deleted, otherwise the data stored in the LBA3 is lost when the second ROW snapshot is executed.

Based on the same inventive concept, the embodiment of the application also provides a data processing device corresponding to the data processing method. Referring to fig. 3, fig. 3 is a block diagram of a data processing apparatus provided in an embodiment of the present application, where the apparatus is applied to a storage system, and the apparatus includes:

a receiving unit 310, configured to receive a first operation instruction sent by a host, where the first operation instruction includes a first logical block address LBA of a user object and a first snapshot sequence number;

a searching unit 320, configured to search a metadata cache according to the first LBA and the first snapshot sequence number, where the metadata cache includes at least one metadata value, and each metadata value includes a second LBA, a first identifier, and a second identifier of an actually written ROW object;

an identifying unit 330, configured to identify, when a first metadata value corresponding to the first LBA and a first snapshot sequence number is obtained from the metadata cache, a value of the first identifier or a value of the second identifier according to a type of the first operation instruction;

the processing unit 340 is configured to perform corresponding processing on the data stored in the second LBA according to the identification result.

Optionally, the identifying unit 330 is specifically configured to identify a value of the first identifier when the first operation instruction is a read operation instruction;

and when the first operation instruction is a deletion operation instruction, identifying the value of the second identifier.

Optionally, the identifying unit 330 is specifically configured to determine that the first metadata value belongs to an original object when the value of the first identifier is 1;

and reading the stored data from the second LBA and feeding back the data to the host.

Optionally, the method further comprises: a determining unit (not shown in the figure), configured to determine whether a value of the first identifier is 1 when a first metadata value corresponding to the first LBA and the first snapshot sequence number is not obtained from the metadata cache and a second metadata value corresponding to the first LBA is obtained;

a determining unit (not shown in the figure), configured to determine that the second metadata value belongs to an original object if the second metadata value is 1;

a reading unit (not shown in the figure) for reading the stored data from the second LBA;

a sending unit (not shown in the figure) for feeding back the data to the host.

Optionally, the apparatus further comprises: a determining unit (not shown in the figure), configured to determine whether a value of the first identifier is 1 when a first metadata value corresponding to the first LBA and the first snapshot sequence number is not obtained from the metadata cache, and a plurality of second metadata values corresponding to the first LBA are obtained;

a determining unit (not shown in the figure) configured to determine that the plurality of second metadata values do not belong to the original object if the number is 0;

a reading unit (not shown in the figure) for reading and comparing the stored data from the magnetic disk according to the second LBA included in each of the second metadata values;

a sending unit (not shown in the figure) for feeding back the latest written data to the host.

Optionally, the apparatus further comprises: a generating unit (not shown in the figure), configured to generate, when a second metadata value corresponding to the first LBA is obtained from the metadata cache and the second metadata value is valid, a third metadata value corresponding to the first LBA and the first snapshot serial number according to the first LBA and the first snapshot serial number;

and the first identifier and the second identifier which are included in the third metadata value are both 0.

Optionally, the identifying unit 330 is specifically configured to determine that actually written data is not stored in the second LBA when the value of the second identifier is 0;

deleting the first metadata value.

Optionally, the identifying unit 330 is specifically configured to, when the value of the second identifier is 1 and a fourth data value corresponding to the first LBA is acquired, determine whether a value of the second identifier included in the fourth data value is 1;

if the value is 0, determining that an association relationship exists between the data stored in the second LBA and the data stored in the second LBA included in the fourth data value;

not deleting the first metadata value;

and the second snapshot sequence number corresponding to the fourth data value is smaller than the first snapshot sequence number.

Optionally, the value of the first identifier is used to indicate whether the metadata value belongs to the original object; the value of the second identifier is used to indicate whether the metadata value is stored in the disk.

Therefore, by applying the data processing apparatus provided by the present application, the apparatus receives a first operation instruction sent by a host, where the first operation instruction includes a first logical block address LBA and a first snapshot sequence number of a user object. According to the first LBA and the first snapshot sequence number, the apparatus searches a metadata cache, where the metadata cache includes at least one metadata value, and each metadata value includes a second LBA, a first identifier, and a second identifier of an actually written ROW object. When the first metadata value corresponding to the first LBA and the first snapshot sequence number is obtained from the metadata cache, the device identifies the value of the first identifier or the value of the second identifier according to the type of the first operation instruction. And according to the identification result, the device carries out corresponding processing on the data stored in the second LBA.

Therefore, the problem that the cache hit rate is very low in the existing cache scheme is solved. On the basis of ensuring the consistency of the metadata, the metadata hit rate in the ROW snapshot sequence number scene is greatly improved.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.

For the data processing device embodiment, since the content of the related method is basically similar to the method embodiment described above, the description is relatively simple, and the relevant points can be referred to the partial description of the method embodiment.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

13页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:用于路由器的缓存分配方法、片上网络及电子设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类