Data consistency implementation method based on advanced updating

文档序号:1324098 发布日期:2020-07-14 浏览:12次 中文

阅读说明:本技术 一种基于提前更新的数据一致性实现方法 (Data consistency implementation method based on advanced updating ) 是由 顾晓峰 李青青 虞致国 魏敬和 于 2020-03-24 设计创作,主要内容包括:本发明公开了一种基于提前更新的数据一致性实现方法,属于集成电路技术领域。所述方法包括通过为每个L1 DCache和其他各级Cache的Cacheline增设一个计数器,记录含脏数据副本的Cacheline的访问情况,从而在存储器空闲的时候就将Cache中的含有脏数据的数据副本提前更新至下一层存储器,而不是等到DMA传输数据前才刷新Cache,从而缓解了DMA传输数据前的Cache刷新操作带来的延时问题,充分调动存储器,提高了DMA传输系统的效率。(The method comprises the step of adding a counter for each L1 DCache and other cachelines of all levels of caches, and recording the access condition of the Cacheline containing dirty data copies, so that the data copies containing dirty data in the caches are updated to the next layer of memory in advance when the memory is idle, instead of refreshing the caches until the data is transmitted by DMA, thereby relieving the delay problem caused by the Cache refreshing operation before the data is transmitted by DMA, fully transferring the memory, and improving the efficiency of a DMA transmission system.)

1. A data consistency implementation method is characterized by being applied to a multi-core processor system and comprising the steps of adding a counter for Cache lines of L1 DCache and other levels of Cache in the multi-core processor system, recording the access condition of the Cache lines containing dirty data copies, and updating the data copies containing the dirty data to a next level of memory in advance when L1 DCache and other levels of Cache are idle.

2. The method according to claim 1, wherein the multi-core processor system comprises at least two CPUs, and the updating of the data copy containing the dirty data to the next-stage memory in advance when L1 DCache and other stages of caches are idle comprises:

step 1, a first CPU requests to access a certain data copy, wherein the first CPU is any CPU in a multi-core processor system;

step 2, when a certain Cache is idle, the counters corresponding to the cachelines are compared, and the Cacheline with the maximum counter value is actively written back to the next-level memory, meanwhile, if another Cache at the same level initiates an access request to the next-level memory and is not actively written back, the next-level memory preferentially processes the access request of another Cache at the same level, wherein the certain Cache refers to any L1 DCache or other caches at all levels in the multi-core processor system;

step 3, the Cache receives the write-back response and actively writes the Cache line with the maximum counter value back to the next-level memory; if other caches at the same level also contain the data copy, the dirty position 0 of the Cacheline corresponding to other caches at the same level makes corresponding transition according to the consistency protocol;

step 4, DMA initiates an access request;

step 5, the CPU receives an access request of the DMA, starts to refresh the Cache, waits for the corresponding dirty data containing copy to be fully refreshed to the main memory, and returns a response;

and 6, receiving the response information sent by the CPU by the DMA, and starting to transmit data.

3. The method according to claim 2, wherein in step 5, the first CPU writes back the partially dirty copy of the data to the main memory in advance before the DMA initiates the access request.

4. The method according to claim 2, wherein in step 1:

when the first CPU has write miss, after waiting for the first CPU to finish the write operation, setting a counter corresponding to the Cacheline to be 1, and adding 1 to counters corresponding to other cachelines containing dirty data.

5. The method according to claim 2, wherein in step 1:

when a write hit occurs to the first CPU, there may be two states for the data copy in Cacheline: and the data copy is consistent with the next-level storage and inconsistent with the next-level storage, wherein the inconsistency with the next-level storage indicates that the data copy in the Cacheline contains dirty data:

if the data copy in the Cacheline is consistent with the next-level storage, after the first CPU finishes the write operation, setting a counter corresponding to the Cacheline to be 1, and adding 1 to counters corresponding to other cachelines containing dirty data;

if the data copy in the Cacheline contains dirty data, after the first CPU completes the write operation, the counter corresponding to the Cacheline is set to be 1, if the numerical values of other counters are smaller than the original values of the counters corresponding to the Cacheline which are written and hit, the numerical values of the other counters are added to be 1, and the numerical values of the other counters are kept unchanged.

6. The method according to claim 2, wherein in step 1:

when a read hit occurs for the first CPU:

if dirty data is contained in the Cacheline, subtracting 1 from a counter corresponding to the Cacheline, adding 1 to a counter value which is smaller than the original value of the counter by 1, and keeping the values of other counters unchanged; when the value of a counter corresponding to Cacheline is less than or equal to 2, if the CPU requests to read the Cacheline, the value of the counter is kept unchanged;

if the data copy is consistent with the next-level storage, the counter values corresponding to all the cachelines are not changed after the first CPU finishes the reading operation.

7. The method according to claim 2, wherein in step 1:

when a read miss occurs for the first CPU:

when the data copy exists in other peer caches and the Cacheline contains dirty data, if the multi-core processor system can share the data copy containing the dirty data, the first CPU reads the data copy from the other peer caches, a counter of the Cacheline in the local Cache is set to be 1, and the value of the counter of the other Cacheline containing the dirty data is added with 1; if the multi-core processor system cannot share the copy containing the dirty data, after the first CPU finishes the read operation, setting the counter of the Cacheline in the local Cache to be an initial value of 0, and keeping the numerical values of the counters of other cachelines unchanged;

if the data copy exists in other same-level caches and is consistent with the next-level storage, or only exists in a low-level memory, after the first CPU finishes the read operation, the counter of the Cacheline in the local Cache is set as an initial value of 0, and the values of the counters of other cachelines containing dirty cachelines are kept unchanged.

8. The method of claim 2, wherein when a Cacheline containing dirty data in any L1 DCache and other caches of each stage in the multi-core processor system is written back or invalidated, a counter corresponding to the Cacheline is cleared, if a counter value corresponding to the other cachelines containing dirty data is greater than an original value of the counter corresponding to the Cacheline, the counter value is decremented by 1, and the values of the remaining counters remain unchanged.

9. The method of claim 1, wherein the maximum recordable value of the counter added for the cachelines of each L1 DCache and other caches in the multi-core processor system is the number N of cachelines of the current Cache, and the bit width of the counter is [ log ]2N-1,0]。

10. A multi-core processor system is characterized in that the multi-core processor system adopts the method of any one of claims 1 to 8 to achieve data consistency, the multi-core processor system comprises at least two CPUs, a counter is additionally arranged for each L1 DCache and cachelines of other levels of caches in the multi-core processor system in the achieving process, the access condition of the cachelines containing dirty data copies is recorded, and the data copies containing the dirty data in the caches are updated to a main memory in advance when L1 DCache and other levels of caches are idle.

Technical Field

The invention relates to a method for realizing data consistency based on advanced updating, belonging to the technical field of integrated circuits.

Background

At present, a hierarchical storage system is mostly adopted for a main stream processor, that is, a multi-level Cache (Cache memory) is added between a processor and a main memory (hereinafter referred to as a "main memory") to make up for a performance gap between a CPU and the main memory.

Partial data copies of a main memory are stored in the Cache, and the data consistency of the multi-level Cache is maintained by generally adopting two write strategies, namely a write-back method and a write-through method. The former writes back a dirty copy of data to main memory only if the Cacheline containing the dirty is replaced or invalidated. The strategy reduces the access times of the main memory, improves the system efficiency, but increases the maintenance difficulty of the Cache consistency. The write-through method updates data in a main memory when the CPU writes to the Cache. Although the strategy effectively ensures the consistency of the Cache, the data transmission quantity of the bus is increased, and the write operation of the main memory is delayed for a long time, thereby affecting the overall performance of the system. Therefore, modern processors mostly adopt a write-back method.

DMA (Direct Memory Access) is an efficient data transmission method, and a DMA controller is used to control data to be directly transmitted between an I/O device and a main Memory and between peripheral devices without intervention of a CPU. However, the data consistency problem is also introduced in the DMA transfer, and researchers mainly solve the data consistency problem between the DMA transfer and each level of Cache and main memory from two layers of software and hardware at present. But either software or hardware solutions require that the Cache be flushed before DMA transfers data. According to the characteristics of large data volume of DMA once transmission and large main memory read-write delay, the Cache refresh operation before DMA transmission needs to take a long time, and the efficiency of the DMA cannot be fully exerted.

Disclosure of Invention

In order to solve the problems that the Cache refreshing operation before DMA transmission needs to take a long time and the efficiency of the DMA cannot be fully exerted, the invention provides a data consistency implementation method based on advanced updating, and the technical scheme is as follows:

a method for realizing data consistency is applied to a multi-core processor system and comprises the steps of adding a counter for each L1 DCache and cachelines of other levels of Cache in the multi-core processor system, recording the access condition of the Cacheline containing dirty data copies, and updating the data copies containing the dirty data to a next-level memory in advance when L1 DCache and other levels of Cache are idle.

Optionally, the multi-core processor system includes at least two CPUs, and when L1 DCache and other caches in different levels are idle, the data copy containing the dirty data in the Cache is updated to the next layer of memory in advance, where the updating includes:

step 1, a first CPU requests to access a certain data copy, wherein the first CPU is any CPU in a multi-core processor system;

step 2, when a certain Cache is idle, the counters corresponding to the cachelines are compared, and the Cacheline with the maximum counter value is actively written back to the next-level memory, meanwhile, if another Cache at the same level initiates an access request to the next-level memory and is not actively written back, the next-level memory preferentially processes the access request of another Cache at the same level, wherein the certain Cache refers to any L1 DCache or other caches at all levels in the multi-core processor system;

step 3, the Cache receives the write-back response and actively writes the Cache line with the maximum counter value back to the next-level memory; if other caches at the same level also contain the data copy of the Cacheline and the dirty position 0 of the Cacheline corresponding to the Cache, the state makes corresponding transition according to a consistency protocol;

step 4, DMA initiates an access request;

step 5, the first CPU receives an access request of the DMA, starts to refresh the Cache, waits for the first CPU to refresh the corresponding dirty data copy into the main memory, and returns a response;

and 6, receiving the response information sent by the first CPU by the DMA, and starting to transmit data.

Optionally, in step 5, before the DMA initiates the access request, the first CPU writes back the partially dirty data copy to the main memory in advance.

Optionally, in step 1:

when the first CPU has write miss, after waiting for the first CPU to finish the write operation, setting the counter corresponding to the Cacheline to be 1, and adding 1 to the counters corresponding to other cachelines containing dirty data.

Optionally, in step 1:

when a write hit occurs to the first CPU, there may be two states for the data copy in Cacheline: and the data copy is consistent with the next-level storage and inconsistent with the next-level storage, wherein the inconsistency with the next-level storage indicates that the data copy in the Cacheline contains dirty data:

if the data copy in the Cacheline is consistent with the next-level storage, after the first CPU finishes the write operation, setting a counter corresponding to the Cacheline to be 1, and adding 1 to counters corresponding to other cachelines containing dirty data;

if the data copy in the Cacheline contains dirty data, after the first CPU completes the write operation, the counter corresponding to the Cacheline is set to be 1, if the numerical values of other counters are smaller than the original values of the counters corresponding to the Cacheline which are written and hit, the numerical values of the other counters are added to be 1, and the numerical values of the other counters are kept unchanged.

Optionally, in step 1:

when a read hit occurs for the first CPU:

if dirty data is contained in the Cacheline, subtracting 1 from a counter corresponding to the Cacheline, adding 1 to a counter value which is smaller than the original value of the counter by 1, and keeping the values of other counters unchanged; when the value of a counter corresponding to Cacheline is less than or equal to 2, if the CPU requests to read the Cacheline, the value of the counter is kept unchanged;

if the data copy is consistent with the next-level storage, the counter values corresponding to all the cachelines are not changed after the first CPU finishes the reading operation.

Optionally, in step 1:

when a read miss occurs for the first CPU:

when the data copy exists in other peer caches and the Cacheline contains dirty data, if the multi-core processor system can share the data copy containing the dirty data, the first CPU reads the data copy from the other peer caches, a counter of the Cacheline in the local Cache is set to be 1, and the value of the counter of the other Cacheline containing the dirty data is added with 1; if the multi-core processor system cannot share the copy containing the dirty data, after the first CPU finishes the read operation, setting the counter of the Cacheline in the local Cache to be an initial value of 0, and keeping the numerical values of the counters of other cachelines unchanged;

if the data copy exists in other same-level caches and is consistent with the next-level storage, or only exists in a low-level memory, after the first CPU finishes the read operation, the counter of the Cacheline in the local Cache is set as an initial value of 0, and the values of the counters of other cachelines containing dirty cachelines are kept unchanged.

Optionally, when any L1 DCache in the multi-core processor system and cachelines containing dirty data in other caches at different levels are written back or are invalidated, the counter corresponding to the Cacheline is cleared, if the counter value corresponding to the cachelines containing dirty data is greater than the original value of the counter corresponding to the Cacheline, the counter value is decremented by 1, and the values of the remaining counters remain unchanged.

Optionally, the maximum recordable value of the counter additionally provided for each L1 DCache and cachelines of other caches in the multi-core processor system is the number N of cachelines of the current Cache, and the bit width of the counter is [ log ]2N-1,0]。

The invention also provides a multi-core processor system which adopts the method to realize data consistency, the multi-core processor system comprises at least two CPUs, a counter is additionally arranged for each L1 DCache and cachelines of other levels of caches in the multi-core processor system in the realization process, the access condition of the cachelines containing dirty data copies is recorded, and the data copies containing the dirty data in the caches are updated to the next layer of memory in advance when L1 DCache and other levels of caches are idle.

The invention also provides the data consistency implementation method and/or the application of the multi-core processor system in the technical field of integrated circuits.

The invention has the beneficial effects that:

according to the invention, a counter is additionally arranged for each L1 DCache and the cachelines of other caches at all levels, and the access condition of the Cacheline containing the dirty data copy is recorded, so that the data copy containing the dirty data in the caches is updated to the main memory in advance when the memory is idle, the delay problem caused by Cache refreshing operation before data transmission by DMA is relieved, the memory is fully called, and the efficiency of a DMA transmission system is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart of the steps described in the present invention.

Fig. 2 is a diagram of a processor system architecture for an embodiment.

FIG. 3 is a state transition diagram of the Cache coherence protocol used in the embodiment.

FIG. 4 is a flow chart of a CPU request write operation.

FIG. 5 is a flow chart of a CPU request read operation.

FIG. 6 is a flow chart of the CPU0 proactive write back conflicting with a CPU1 write operation.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Introduction of basic terms:

ICache, instruction cache.

DCache, data cache.

Cacheline, Cacheline.

Cache, Cache memory, a Cache is generally divided into a plurality of groups, each group is composed of a plurality of cachelines, in a multi-level storage system, the Cache has level differentiation, which is represented by L1, L2, …, and represents different levels of Cache, such as L1 DCache, L2 Cache.

15页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:计算机代码完整性检查

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类