Storage space optimization method and system for super-fusion architecture

文档序号:923454 发布日期:2021-03-02 浏览:2次 中文

阅读说明:本技术 一种针对超融合架构的存储空间优化方法及系统 (Storage space optimization method and system for super-fusion architecture ) 是由 赵井达 马亮 张辉 郎铁山 于 2020-11-23 设计创作,主要内容包括:本发明提供了一种针对超融合架构的存储空间优化方法及系统,将超融合架构内不同物理主机的存储目录组成存储池,将存储池抽象成存储域为虚机,针对不同的物理主机分散配置有相应物理存储块;构建虚拟数据层,接收写请求,将写请求的数据临时写入配置的物理存储块;获取写入数据的物理存储块签名,比较签名,若发现有相同签名的物理存储块,进行数据的逐字节比较,以验证它们是否真正相同;如果相同,指向已有的对应的物理存储块,并释放临时分配的物理存储块;如果不同,对分配的物理存储块写入请求数据,进行数据压缩,对已有新存入数据的物理存储块进行合并和打包。本发明能够显著提高存储空间的利用率。(The invention provides a storage space optimization method and a system aiming at a super-fusion framework, wherein storage directories of different physical hosts in the super-fusion framework form a storage pool, the storage pool is abstracted into a storage domain which is a virtual machine, and corresponding physical storage blocks are dispersedly configured aiming at the different physical hosts; constructing a virtual data layer, receiving a write request, and temporarily writing data of the write request into a configured physical storage block; acquiring the signature of a physical storage block written in data, comparing the signatures, and if the physical storage blocks with the same signature are found, comparing the data byte by byte to verify whether the physical storage blocks are really the same; if the two physical storage blocks are the same, pointing to the existing corresponding physical storage block, and releasing the temporarily allocated physical storage block; and if the data is different, writing the request data into the distributed physical storage blocks, performing data compression, and merging and packaging the physical storage blocks which have newly stored data. The invention can obviously improve the utilization rate of the storage space.)

1. A storage space optimization method aiming at a super-fusion architecture is characterized by comprising the following steps: the method comprises the following steps:

forming storage directories of different physical hosts in the super-fusion framework into a storage pool, abstracting the storage pool into storage domains as virtual machines, and dispersedly configuring corresponding physical storage blocks for the different physical hosts;

constructing a virtual data layer, receiving a write request, and temporarily writing data of the write request into a configured physical storage block;

acquiring the signature of a physical storage block written in data, comparing the signatures, and if the physical storage blocks with the same signature are found, comparing the data byte by byte to verify whether the physical storage blocks are really the same;

if the two physical storage blocks are the same, pointing to the existing corresponding physical storage block, and releasing the temporarily allocated physical storage block;

and if the data is different, writing the request data into the distributed physical storage blocks, performing data compression, and merging and packaging the physical storage blocks which have newly stored data.

2. The method as claimed in claim 1, wherein the method for optimizing the storage space of the super-fusion architecture comprises: at least three physical hosts are arranged in the super-fusion framework, and each physical host is provided with a certain number of physical disks serving as physical storage blocks.

3. The method as claimed in claim 1, wherein the method for optimizing the storage space of the super-fusion architecture comprises: and the virtual data layer marks the content stored in each physical storage block, constructs index information and stores the index information.

4. The method as claimed in claim 1, wherein the method for optimizing the storage space of the super-fusion architecture comprises: after receiving the write request, adopting an asynchronous mode, firstly replying and confirming the write request, and simultaneously entering a write data processing process.

5. The method as claimed in claim 1, wherein the method for optimizing the storage space of the super-fusion architecture comprises: the specific process of comparing the signatures comprises the steps of calculating a message authentication code by utilizing a Hash algorithm for the data requested to be written as a signature, and comparing whether the signature of the existing physical storage block is consistent with the signature of the newly requested data block.

6. The method as claimed in claim 1, wherein the method for optimizing the storage space of the super-fusion architecture comprises: the specific process of performing byte-by-byte comparison of data comprises: and judging whether the index information of the data in the request is the same as the index information of the physical storage block of the stored data, and if so, considering that the contents are consistent.

7. The method as claimed in claim 1, wherein the method for optimizing the storage space of the super-fusion architecture comprises: the specific process of writing the requested data to the allocated physical storage block comprises the following steps: the same data block does not exist, which indicates that the data writing request is the writing of new data, the same data does not exist in the original physical storage block, the persistence operation is executed on the previously temporarily allocated physical storage block, and the new data is stored.

8. A storage space optimization system aiming at a super-fusion architecture is characterized in that: the method comprises the following steps:

the virtual machine construction module is configured to form storage directories of different physical hosts in the super-fusion framework into a storage pool, abstract the storage pool into a storage domain as a virtual machine, and dispersedly configure corresponding physical storage blocks for the different physical hosts;

the virtual data layer building module is configured to build a virtual data layer, receive a write request and temporarily write data of the write request into a configured physical storage block;

the comparison module is configured to acquire the signature of the physical storage block written in the data, compare the signatures, and if the physical storage blocks with the same signature are found, perform byte-by-byte comparison on the data to verify whether the physical storage blocks are really the same;

the execution module is configured to point to the existing corresponding physical storage block and release the temporarily allocated physical storage block if the comparison results are the same; and if the comparison results are different, writing the request data into the distributed physical storage blocks, performing data compression, and merging and packaging the physical storage blocks with the newly stored data.

9. A storage space optimization system aiming at a super-fusion architecture is characterized in that: the method comprises the following steps:

the virtual machine is used for initiating a write request;

the virtualization layer is used for abstracting the storage pool into a storage domain to provide the virtual machine with the storage use of the virtual disk;

the distributed file system is used for connecting the physical hosts, and the storage directories on the physical hosts form a storage pool;

the system comprises a plurality of physical hosts, wherein each physical host comprises a plurality of physical storage blocks, after the physical storage blocks form a redundant independent disk array, the physical storage blocks are managed through a logical volume manager, a virtual data layer is configured between the redundant independent disk array and the logical volume manager and used for receiving a write request and determining whether a new data request is the same as the existing data before, if so, the repeated data is deleted, otherwise, the new data is stored and compressed.

10. The system according to claim 9, wherein the storage space optimization system for the hyper-converged framework comprises: the virtual data layer comprises a kernel index module and a kernel processing module, wherein the kernel index module is used for recording index information of each data block, is configured to acquire a signature of a physical storage block written in data, compares the signatures, and if the physical storage blocks with the same signature are found, performs byte-by-byte comparison on the data to verify whether the physical storage blocks are really the same; the kernel processing module is used for providing a deleting service of the repeated data of the data block and a compressing service of the data block, and is configured to point to the existing corresponding physical storage block and release the temporarily allocated physical storage block if the comparison results are the same; and if the comparison results are different, writing the request data into the distributed physical storage blocks, performing data compression, and merging and packaging the physical storage blocks with the newly stored data.

Technical Field

The invention belongs to the technical field of storage space optimization, and particularly relates to a storage space optimization method and system for a super-fusion architecture.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

With the maturity of the super-fusion architecture and the gradual application in the production environment, the requirement for the data storage space in the super-fusion architecture is gradually shown, the super-fusion architecture is based on a distributed storage technology, the distributed storage is that local storage disks on a plurality of hosts form a logical storage pool, based on the requirement for the safety and reliability of the super-fusion data, the data in the storage pool is usually stored by adopting a plurality of copies, so that the actual effective storage space is one copy fraction, and the waste of the storage space is caused.

According to the inventor, at present, there is no storage space optimization method specially for the super-fusion architecture, but there is a data deduplication method applied to the storage device, for example, a fingerprint of each data block is recorded on the storage device, comparison is performed every time reading and writing is performed, repeated data is not written again, and the same address space is returned. The method only considers the processing of the storage end, cannot directly provide the support of a file system, is suitable for the traditional storage mode and is not suitable for the data storage of the super-fusion architecture; and because the reading and writing of each data block need to be compared, the efficiency is greatly influenced.

Disclosure of Invention

The invention provides a storage space optimization method and a storage space optimization system for a super-fusion architecture in order to solve the problems, and the invention can effectively solve the problem that the effective data storage space is reduced due to a plurality of copies in the super-fusion architecture.

According to some embodiments, the invention adopts the following technical scheme:

a storage space optimization method for a super-fusion architecture comprises the following steps:

forming storage directories of different physical hosts in the super-fusion framework into a storage pool, abstracting the storage pool into storage domains as virtual machines, and dispersedly configuring corresponding physical storage blocks for the different physical hosts;

constructing a virtual data layer, receiving a write request, and temporarily writing data of the write request into a configured physical storage block;

acquiring the signature of a physical storage block written in data, comparing the signatures, and if the physical storage blocks with the same signature are found, comparing the data byte by byte to verify whether the physical storage blocks are really the same;

if the two physical storage blocks are the same, pointing to the existing corresponding physical storage block, and releasing the temporarily allocated physical storage block;

and if the data is different, writing the request data into the distributed physical storage blocks, performing data compression, and merging and packaging the physical storage blocks which have newly stored data.

As an alternative embodiment, at least three physical hosts are arranged in the super-fusion architecture, and each physical host has a certain number of physical disks as physical storage blocks.

As an alternative embodiment, the virtual data layer marks the content stored in each physical storage block, constructs index information, and stores the index information.

As an alternative embodiment, after receiving the write request, an asynchronous mode is adopted, the write request is replied and confirmed, and the write data processing process is simultaneously carried out.

As an alternative embodiment, the specific process of comparing signatures includes calculating a message authentication code as a signature using a hash algorithm for the data requested to be written, and comparing whether the signature of the existing physical storage block is consistent with the signature of the newly requested data block.

As an alternative embodiment, the specific process of performing byte-by-byte comparison of data includes: and judging whether the index information of the data in the request is the same as the index information of the physical storage block of the stored data, and if so, considering that the contents are consistent.

As an alternative embodiment, the specific process of writing the request data to the allocated physical storage block includes: the same data block does not exist, which indicates that the data writing request is the writing of new data, the same data does not exist in the original physical storage block, the persistence operation is executed on the previously temporarily allocated physical storage block, and the new data is stored.

A storage space optimization system for a hyper-converged architecture, comprising:

the virtual machine construction module is configured to form storage directories of different physical hosts in the super-fusion framework into a storage pool, abstract the storage pool into a storage domain as a virtual machine, and dispersedly configure corresponding physical storage blocks for the different physical hosts;

the virtual data layer building module is configured to build a virtual data layer, receive a write request and temporarily write data of the write request into a configured physical storage block;

the comparison module is configured to acquire the signature of the physical storage block written in the data, compare the signatures, and if the physical storage blocks with the same signature are found, perform byte-by-byte comparison on the data to verify whether the physical storage blocks are really the same;

the execution module is configured to point to the existing corresponding physical storage block and release the temporarily allocated physical storage block if the comparison results are the same; and if the comparison results are different, writing the request data into the distributed physical storage blocks, performing data compression, and merging and packaging the physical storage blocks with the newly stored data.

A storage space optimization system for a hyper-converged architecture, comprising:

the virtual machine is used for initiating a write request;

the virtualization layer is used for abstracting the storage pool into a storage domain to provide the virtual machine with the storage use of the virtual disk;

the distributed file system is used for connecting the physical hosts, and the storage directories on the physical hosts form a storage pool;

the system comprises a plurality of physical hosts, wherein each physical host comprises a plurality of physical storage blocks, after the physical storage blocks form a redundant independent disk array, the physical storage blocks are managed through a logical volume manager, a virtual data layer is configured between the redundant independent disk array and the logical volume manager and used for receiving a write request and determining whether a new data request is the same as the existing data before, if so, the repeated data is deleted, otherwise, the new data is stored and compressed.

As an alternative implementation, the virtual data layer includes a kernel index module and a kernel processing module, where the kernel index module is configured to record index information of each data block, and is configured to obtain a signature of a physical storage block in which data is written, compare the signatures, and if physical storage blocks with the same signature are found, perform byte-by-byte comparison on the data to verify whether the physical storage blocks are truly the same; the kernel processing module is used for providing a deleting service of the repeated data of the data block and a compressing service of the data block, and is configured to point to the existing corresponding physical storage block and release the temporarily allocated physical storage block if the comparison results are the same; and if the comparison results are different, writing the request data into the distributed physical storage blocks, performing data compression, and merging and packaging the physical storage blocks with the newly stored data.

Compared with the prior art, the invention has the beneficial effects that:

according to the invention, a virtual data layer is added in the super-fusion framework for optimizing data, whether a new data request is the same as the existing data can be quickly determined according to the condition of writing data and the index information, and then the deleting service or the compressing service of the repeated data of the physical storage block is provided, so that the utilization rate of the effective storage space can be improved by times.

The invention adopts an asynchronous mode when the kernel processing module carries out data deduplication operation, thereby further improving the efficiency.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

FIG. 1 is a diagram of an optimized system architecture;

fig. 2 is a flow of bit data optimization processing.

The specific implementation mode is as follows:

the invention is further described with reference to the following figures and examples.

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The data optimization system applied to the super-fusion architecture is applicable to the fusion architecture which comprises at least three physical hosts, wherein each physical host is provided with a certain number of physical disks, and the storage of the super-fusion architecture consists of the physical disks (which can be referred to as data block live physical storage blocks for short) on each physical host.

As shown in fig. 1, after Raid is formed by a plurality of physical disks on a physical host, block management is performed by an LVM, a G l ustrifs distributed file system is constructed on the LVM, the G l ustrifs forms a storage pool from storage directories on a plurality of physical hosts, and a virtualization layer abstracts the storage pool into a storage domain to provide storage use of a virtual disk for a virtual machine.

The data of each virtual machine is stored in the storage with at least three copies and is distributed on three different physical hosts.

Adding a virtual data layer between the Raid and the LVM, wherein the virtual data layer comprises two kernel modules: the kernel index module records the index information of each data block, and the kernel index module can quickly determine whether a new data request is the same as the data which exists before. The kernel processing module provides a service for deleting repeated data of the data blocks and a service for compressing the data blocks, and works between a block interface of the kernel and an actual storage device driver.

As shown in fig. 2, the specific process includes:

after the virtual machine initiates a write request of data, a kernel processing module of a virtual data layer receives the write request, adopts an asynchronous mode, firstly replies and confirms the write request, and simultaneously enters a write data processing process;

distributing corresponding physical storage blocks according to the received data condition of the write request, and temporarily writing the data into a data block;

calculating the data and obtaining a signature based on a MurmurHash-3 algorithm for executing subsequent data de-duplication operation;

providing the obtained signature information to a kernel index module, comparing the signatures by the kernel index module, and if data blocks with the same signature are found, comparing the two data blocks byte by byte to verify whether the two data blocks are really the same;

after the two data blocks are confirmed to be really the same, the upper layer logic block points to the existing corresponding data block, and the distributed temporary data block is released, so that the deletion operation of the repeated data is completed;

if the same data block does not exist after the comparison of the signatures, the data writing request at this time is the writing of new data, and the same data does not exist in the original data block, the operation of persisting the previously temporarily allocated data block is executed, and the new data is stored;

and after the new data block data is obtained, starting a data compression process, and merging and packaging the existing multiple new data blocks by the kernel processing module based on a parallel compression algorithm.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

9页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种固态硬盘的垃圾回收处理方法及系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类