Advanced accurate updating device and method for non-blocking Cache replacement information table

文档序号:829261 发布日期:2021-03-30 浏览:32次 中文

阅读说明:本技术 一种非阻塞Cache替换信息表超前精确更新装置及方法 (Advanced accurate updating device and method for non-blocking Cache replacement information table ) 是由 张骏 田泽 陈佳 韩立敏 裴希杰 任向隆 于 2020-12-05 设计创作,主要内容包括:本发明涉及一种非阻塞Cache替换信息表超前精确更新装置及方法。本发明装置包括替换操作模块和缺失状态保持寄存器单元,该更新装置还包括了替换信息超前计算和控制单元以及目标替换信息保持寄存器,替换信息超前计算和控制单元与替换操作模块连接,缺失状态保持寄存器单元的每一个入口接目标替换信息保持寄存器。本发明根据应用程序实际访存行为和顺序对非阻塞Cache替换信息表进行精确更新,从而更好的捕捉数据局部性,提升Cache空间的利用和命中率。(The invention relates to a device and a method for accurately updating a non-blocking Cache replacement information table in advance. The device comprises a replacement operation module and a missing state holding register unit, and further comprises a replacement information advanced calculation and control unit and a target replacement information holding register, wherein the replacement information advanced calculation and control unit is connected with the replacement operation module, and each inlet of the missing state holding register unit is connected with the target replacement information holding register. The method accurately updates the non-blocking Cache replacement information table according to the actual access behavior and the actual sequence of the application program, thereby better capturing the data locality and improving the utilization and hit rate of the Cache space.)

1. The utility model provides a device is updated to non-blocking Cache's replacement information table precision in advance, includes replacement operation module and missing state holding register unit, its characterized in that: the updating apparatus further includes a replacement information look-ahead calculating and controlling unit connected to the replacement operating block, and a target replacement information holding register to which each entry of the missing state holding register unit is connected, wherein:

the replacement information advanced calculation and control unit calculates the block addresses of a plurality of Cache missing replacement target Cache possibly existing in the missing state holding register unit according to the memory access sequence during the access delay of the external memory, and stores the calculated addresses in the missing state holding register unit;

and the target replacement information holding register is used for storing the accurate replacement target Cache block line group information corresponding to the Cache miss so as to accurately complete the updating of the Cache data array.

2. The apparatus of claim 1, wherein the apparatus for advanced and precise update of the replacement information table of the non-blocking Cache comprises: the replacement information look-ahead calculation and control unit comprises two modules: the replacement address look-ahead calculation and control unit receives the missing access sequence information from the missing state holding register unit and the replacement algorithm target block output information from the Cache replacement algorithm unit, and then the replacement address look-ahead calculation module completes the calculation of the replacement target block information according to the missing sequence in turn under the control of the replacement address look-ahead calculation control module.

3. The apparatus of claim 1, wherein the apparatus for advanced and precise update of the replacement information table of the non-blocking Cache comprises: the target replacement information holding register calculates the block address of a replacement target Cache missed by a plurality of caches in the missing state holding register unit in advance, and the replacement information updating of the Cache replacement algorithm unit is carried out in sequence according to the sequence of program flow, so that the correctness of program execution cannot be influenced no matter whether the subsequent Cache access operation is hit or not.

4. A method for implementing the advanced and accurate update of the replacement information table of the non-blocking Cache according to claim 1, characterized in that: the method comprises the following steps:

1) during the access delay period of an external memory, the replacement information advanced computing and control unit computes the block addresses of a plurality of Cache missing replacement target caches which may exist in the missing state holding register unit according to the access sequence, and stores the computed addresses in the missing state holding register unit;

2) then, the missing state holding register unit sequentially performs Cache replacement information updating operation according to the returned data address to realize accurate updating of the content of the non-blocking Cache replacement information table, so that Cache jitter is reduced, and the Cache hit rate is improved;

3) if the Cache line data corresponding to the target replacement information holding register is dirty, the replacement information advanced computing and control unit actively initiates write-back operation to write back the dirty data to the main memory in advance, so that the Cache missing data corresponding to the target replacement information holding register is directly written into the Cache line after being returned, and the total Cache access missing delay is reduced.

Technical Field

The invention relates to the technical field of computer hardware, in particular to a device and a method for accurately updating a non-blocking Cache replacement information table in advance.

Background

The non-blocking Cache is a key component of a storage system on a high-performance microprocessor chip, realizes the buffering of common data, is well known to hide the long delay of the access of an off-chip large-capacity main memory, and is widely applied to almost all types of core processing chips such as a CPU, a DSP, a GPU and the like. However, for the microprocessor, after power-on operation, the data Cache will be filled with the access data immediately, unless the Cache is invalidated due to Cache consistency operation, there are almost no empty lines waiting for the main memory data to be returned to be written in, and directly writing in a Cache empty line is the least delayed. One straightforward way is to find and invalidate the target replacement block in advance according to the Cache replacement algorithm whenever a miss occurs, waiting for the returned main memory data to be written. However, since one off-chip main memory access usually takes 100-200 core clock cycles, considering that multiple memory accesses may occur during this period, invalidating Cache data in advance may destroy the locality of the data, especially when the current Cache design has stream, non-blocking, multi-core sharing (the width of one Cache data block is usually greater than the width of one data access). Therefore, existing Cache replacement algorithms determine whether there is an idle Cache line after data is returned from a main memory, and if there is no idle line, a target replacement block is selected according to the Cache replacement algorithm to perform Cache replacement, so that delay is generated when the replacement algorithm is executed to select the target block and update replacement information.

Especially for non-blocking caches, the ability to tolerate multiple Load misses while still maintaining the ability to provide data to the processor in the event of a Cache hit is required. The key problem is that the data returned by the multiple Load misses are serial (the Load misses with the same address are merged in the MHSR), and the LRU replacement information table must also be serial when the Cache data block is updated and the Cache replacement algorithm unit (LRU) replacement information table is updated, because the update of the LRU replacement information table must be established on the basis of the last update result of the LRU replacement information table. Thus, although non-blocking Cache has the ability to more fully utilize the bandwidth of the external memory while continuously providing data to the processor core from on-chip to off-chip, the LRU replacement information table is updated serially after the external memory data is returned from off-chip to on-chip. Obviously, if the accuracy of updating the Cache replacement information is strictly guaranteed, the updating sequence must be guaranteed to be consistent with the access sequence of the memory, and the serial non-blocking Cache replacement information updating mechanism prolongs the time for the processor core to finally obtain data, even if the data can be obtained through Cache hit by the Cache non-blocking mechanism when the external memory does not need to be accessed.

The non-blocking Cache replacement information updating algorithm may affect performance indexes of multiple aspects such as the space utilization rate of the Cache, the hit rate of the Cache and the like, the overall working efficiency of the Cache is finally affected, and optimizing the performance of the non-blocking Cache replacement information updating algorithm has very important theoretical significance and application value for developing high-performance CPUs, DSPs and GPUs.

FIG. 1 is a delay scenario for a conventional non-blocking Cache access sequence, where the stream of access requests to the Cache includes 8 requests, where requests 1, 2, 5, 6, 8 miss and 3, 4, 7 hit. For non-blocking Cache, the relevant information for requests 1, 2, 5, 6, 8 would go to the MSHR waiting for the data to return from external memory. And the requests 3, 4, and 7 directly access the Cache to obtain data and update the data based on the state of the current LRU replacement information table, where the partial access delay and the replacement information update delay can be hidden, but the update of the replacement information table is not accurate, such as the position of the red circle in fig. 1, and when the request 3 updates the replacement information table, the state of the replacement information table should be the result of the request 2 updating the replacement information table, and actually, since the missing request needs to wait until the data is returned to update the replacement information table, and the delay when the Cache is hit is small, the state when the request 3 updates the replacement information table is the result before the request 1 updates the replacement information table, and the request 4 and the request 7 are also the same. This causes the update order of the replacement information table to be inconsistent with the order in which the actual application accesses the memory, thereby destroying the data locality. In fig. 1, Cache request (3, 4, 7) latency and replacement information table update latency for a hit can be hidden. When the data of the missing request returns, the replacement information table is updated, and the off-chip data access of the next missing request can be started, so the delay of updating the replacement information table can also be hidden, and in summary, the total delay of 8 Cache access requests in fig. 1 is 100 × 5 — 500.

Therefore, if the mode of updating the replacement information table can be optimized on the basis of the non-blocking Cache, the replacement information table can be updated according to the real memory access sequence, the behavior characteristics of the application program can be better met during Cache replacement operation, and the locality of the application program data on time and space can be better captured.

Disclosure of Invention

The invention provides a device and a method for accurately updating a non-blocking Cache replacement information table in advance for solving the problems in the background art, wherein the non-blocking Cache replacement information table is accurately updated according to the actual access behavior and the sequence of an application program, so that the locality of data is better captured, and the utilization and hit rate of a Cache space are improved.

The technical solution of the invention is as follows: the invention relates to a device for accurately updating a replacement information table of a non-blocking Cache in advance, which comprises a replacement operation module and a missing state holding register unit (MSHR), and is characterized in that: the updating apparatus further includes a replacement information look-ahead calculating and controlling unit (PLU) connected to the replacement operating block, and a target replacement information holding register to which each entry of the missing state holding register unit is connected, wherein:

the replacement information advanced calculation and control unit calculates the block addresses of a plurality of Cache missed replacement target caches which may exist in the missing state holding register unit according to the memory access sequence during the access delay period of the external memory, and stores the calculated addresses in the missing state holding register unit;

and the target replacement information holding register is used for storing the accurate replacement target Cache block line group information corresponding to the Cache miss so as to accurately complete the updating of the Cache data array.

Preferably, the replacement information look-ahead calculation and control unit comprises two modules: the replacement address look-ahead calculation control module and the replacement address look-ahead calculation module are used for receiving the missing access sequence information from the missing state holding register unit and the replacement algorithm target block output information from the Cache replacement algorithm unit, and then the replacement address look-ahead calculation module completes the calculation of the replacement target block information according to the missing sequence in turn under the control of the replacement address look-ahead calculation control module.

Preferably, the target replacement information holding register calculates the block addresses of the replacement target Cache missed by a plurality of caches in the missing state holding register unit in advance, and the updating of the replacement information of the Cache replacement algorithm unit is performed in sequence according to the sequence of the program flow, so that the correctness of the program execution cannot be influenced no matter whether the subsequent Cache access operation is hit or not.

A method for realizing the advanced and accurate updating of the replacement information table of the non-blocking Cache is characterized in that: the method comprises the following steps:

1) during the access delay period of an external memory, the replacement information advanced computing and control unit computes the block addresses of a plurality of Cache missing replacement target caches which may exist in the missing state holding register unit according to the access sequence, and stores the computed addresses in the missing state holding register unit;

2) then, the missing state holding register unit sequentially performs Cache replacement information updating operation according to the returned data address to realize accurate updating of the content of the non-blocking Cache replacement information table, so that Cache jitter is reduced, and the Cache hit rate is improved;

3) if the Cache line data corresponding to the target replacement information holding register is dirty, the replacement information advanced computing and control unit actively initiates write-back operation to write back the dirty data to the main memory in advance, so that the Cache missing data corresponding to the target replacement information holding register is directly written into the Cache line after being returned, and the total Cache access missing delay is reduced.

The invention provides a device and a method (PLUA) for accurately updating a replacement information table of a non-blocking Cache in advance, which are used for adding a replacement information advanced computing and controlling unit (PLU) for a replacement operation module on the basis of a conventional non-blocking Cache, and simultaneously, the PLUA also adds an access corresponding target replacement information holding register for each entrance of a missing state holding register unit (MSHR). Therefore, the method can support calculation of replacement information of multiple Cache access misses, can actively and sequentially accurately calculate all Cache miss data to return the block addresses of target Cache data to be written according to the Cache miss information hung in the MSHR before the memory data corresponding to the Cache access misses are returned, then stores the block addresses of the Cache data in a newly-added domain in the MSHR, simultaneously updates LRU replacement information for use when a target address is replaced by subsequent Cache miss calculation, and directly writes the missing data into the corresponding Cache data block according to the target Cache block address data in the MSHR after the missing data are returned.

Because PLUA carries out Cache replacement information updating operation in sequence, the accurate updating of the content of the non-blocking Cache replacement information table can be realized, thereby reducing Cache jitter and improving the Cache hit rate. In addition, since the latency of serial update of LRU replacement information is much smaller relative to external memory access latency, this approach can hide serial LRU replacement information update latency caused by all Cache misses in the MSHR.

Drawings

FIG. 1 is a schematic diagram of replacement information update of a conventional non-blocking Cache;

FIG. 2 is a block diagram of the apparatus of the present invention;

fig. 3 is a schematic diagram of the method update of the present invention.

Detailed Description

The invention provides a device for accurately updating a replacement information table of a non-blocking Cache in advance, which comprises a replacement operation module, a missing state holding register unit (MSHR), a replacement information advance calculation and control unit (PLU) and a target replacement information holding register, wherein the replacement information advance calculation and control unit is connected with the replacement operation module, each inlet of the missing state holding register unit is connected with the target replacement information holding register, and the device comprises:

the replacement information advanced calculation and control unit calculates the block addresses of a plurality of Cache missed replacement target caches which may exist in the missing state holding register unit according to the memory access sequence during the access delay period of the external memory, and stores the calculated addresses in the missing state holding register unit;

and the target replacement information holding register is used for storing the accurate replacement target Cache block line group information corresponding to the Cache miss so as to accurately complete the updating of the Cache data array.

Further, the replacement information look-ahead calculation and control unit comprises two modules: the replacement address look-ahead calculation control module and the replacement address look-ahead calculation module are used for receiving the missing access sequence information from the missing state holding register unit and the replacement algorithm target block output information from the Cache replacement algorithm unit, and then the replacement address look-ahead calculation module completes the calculation of the replacement target block information according to the missing sequence in turn under the control of the replacement address look-ahead calculation control module.

Furthermore, the address of a target replacement information holding register to the address of a replacement target Cache missed by a plurality of caches in the missing state holding register unit is calculated in advance, and the updating of the replacement information of the Cache replacement algorithm unit is sequentially carried out according to the sequence of program flow, so that the correctness of program execution cannot be influenced no matter whether the subsequent Cache access operation is hit or not.

The invention also provides a method for realizing the advanced and accurate updating of the replacement information table of the non-blocking Cache, which comprises the following steps:

1) during the access delay period of an external memory, the replacement information advanced computing and control unit computes the block addresses of a plurality of Cache missing replacement target caches which may exist in the missing state holding register unit according to the access sequence, and stores the computed addresses in the missing state holding register unit;

2) then, the missing state holding register unit sequentially performs Cache replacement information updating operation according to the returned data address to realize accurate updating of the content of the non-blocking Cache replacement information table, so that Cache jitter is reduced, and the Cache hit rate is improved;

3) if the Cache line data corresponding to the target replacement information holding register is dirty, the replacement information advanced computing and control unit actively initiates write-back operation to write back the dirty data to the main memory in advance, so that the Cache missing data corresponding to the target replacement information holding register is directly written into the Cache line after being returned, and the total Cache access missing delay is reduced.

The technical solution of the present invention is further described in detail with reference to the accompanying drawings and specific embodiments.

Referring to fig. 2, the apparatus for advanced and accurate update of a replacement information table of a non-blocking Cache according to the embodiment of the present invention includes a replacement operation module, a missing state holding register unit (MSHR), a replacement information advanced calculation and control unit (PLU), and a target replacement information holding register, where the replacement information advanced calculation and control unit is connected to the replacement operation module, each entry of the missing state holding register unit is connected to the target replacement information holding register, and the replacement information advanced calculation and control unit calculates, according to an access sequence, a plurality of addresses of replacement target Cache blocks that may be missing from the missing state holding register unit during an access delay of an external memory, and stores the calculated addresses in the missing state holding register unit; and the target replacement information holding register is used for storing the accurate replacement target Cache block line group information corresponding to the Cache miss so as to accurately complete the updating of the Cache data array. The replacement information look-ahead calculation and control unit comprises two modules: the replacement address look-ahead calculation control module and the replacement address look-ahead calculation module are used for receiving the missing access sequence information from the missing state holding register unit and the replacement algorithm target block output information from the Cache replacement algorithm unit, and then the replacement address look-ahead calculation module completes the calculation of the replacement target block information according to the missing sequence in turn under the control of the replacement address look-ahead calculation control module. The target replacement information holding register calculates the block address of a replacement target Cache missed by a plurality of caches in the missing state holding register unit in advance, and the updating of the replacement information of the Cache replacement algorithm unit is sequentially carried out according to the sequence of program flow, so that the correctness of program execution cannot be influenced no matter whether the subsequent Cache access operation is hit or not.

Referring to fig. 3, the Cache access sequence delay condition of the non-blocking Cache replacement information table advanced precise update method according to the embodiment of the present invention is also 8 Cache access requests, where the requests 1, 2, 5, 6, and 8 are missing, and the requests 3, 4, and 7 are hit. Since the miss delay of the request 1 is 100CLK, in the period, under the condition that the MSHR depth is 8, the 5 Cache access misses can be continuously suspended, and the response of the hit of the rest 3 Cache access requests is completed.

Because the delay of replacing target selection and replacing information updating is 5 according to the Cache replacing algorithm when the Cache is lost, and the delay of data returning and replacing information updating is 8 when the Cache is hit, the delay of Cache replacing information calculation and updating operation of all continuous 8 Cache access requests according to the PLUA algorithm is 55CLK, namely, the accurate updating of the Cache replacing information according to the sequence can be completed before the data of the Cache access missing request 1 is sent to the main memory, and the calculation is not required to be carried out after the actual data is returned from the main memory. For requests 1, 2, 5, 6 and 8 with Cache miss, before the data corresponding to the requests 1, 2, 5, 6 and 8 are returned from the main memory, Cache way group replacement information corresponding to the requests, which is calculated in advance by the PLUA algorithm, is retained in a target replacement information retaining register corresponding to the Cache way group replacement information, and the Cache way group replacement information is used when the data is returned to the Cache. The most important thing is that the used and updated replacement information of each Cache access is the most accurate no matter the Cache access request is hit or lost, and the Cache access is completely performed according to the sequence of the memory access requests in the practical application program, so that unnecessary Cache jitter is avoided, and the utilization rate of the Cache space and the Cache hit rate are improved.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art; the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

10页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种叠瓦式磁记录磁盘的数据恢复方法、装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类