Selective refresh mechanism for DRAM

文档序号:1559713 发布日期:2020-01-21 浏览:22次 中文

阅读说明:本技术 用于dram的选择性刷新机制 (Selective refresh mechanism for DRAM ) 是由 F·I·阿塔拉 G·M·赖特 S·普立亚达尔西 G·M·德拉帕拉 H·W·凯恩三世 E·赫 于 2018-06-18 设计创作,主要内容包括:本发明涉及用于高速缓存,例如被实施为嵌入式DRAM eDRAM的末级高速缓存,的选择性刷新的系统及方法。刷新位及重用位与所述高速缓存的至少一组的每个路相关。最近最少使用LRU堆栈跟踪所述路的位置,其中朝向阈值的最近最多使用位置的位置包括最近较多使用位置,及朝向所述阈值的最近最少使用位置的位置包括最近较少使用位置。如果满足以下条件,那么选择性地刷新路中的线:所述路的所述位置是所述最近较多使用位置中的一个,且与所述路相关的所述刷新位被设定,或所述路的所述位置是所述最近较少使用位置中的一个,且与所述路相关的所述刷新位及所述重用位两者均被设定。(The present invention relates to systems and methods for selective refresh of a cache, such as a last level cache implemented as embedded DRAM eDRAM. A flush bit and a reuse bit are associated with each way of at least one set of the cache. The least recently used LRU stack tracks the location of the way, wherein the location of the most recently used location toward the threshold comprises a more recently used location and the location of the least recently used location toward the threshold comprises a less recently used location. The line in the way is selectively refreshed if the following conditions are met: the location of the way is one of the most recently used locations and the flush bit associated with the way is set, or the location of the way is one of the less recently used locations and both the flush bit and the reuse bit associated with the way are set.)

1. A method of refreshing a line of a cache, the method comprising:

associating a flush bit and a reuse bit with each of a set of two or more ways of the cache;

associating a least recently used LRU stack with the set, wherein the LRU stack includes a location associated with each of the two or more ways, the location ranging from a most recently used location to a least recently used location;

assigning a threshold to the LRU stack, wherein locations toward the most recently used location of the threshold comprise most recently used locations and locations toward the least recently used location of the threshold comprise least recently used locations; and

selectively flushing a line in a way of the cache if the following conditions are met:

the location of the way is one of the most recently used locations and the refresh bit associated with the way is set; or

The position of the way is one of the least recently used positions, and both the refresh bit and the reuse bit associated with the way are set.

2. The method of claim 1, wherein when the line is reinserted into the way following a miss in the cache of the line:

associating the location of the way with one of the most recently more used locations;

setting the refresh bit; and

resetting the reuse bit.

3. The method of claim 2, further comprising, when the location of the way crosses the threshold and the location of the way is one of the least recently used locations,

if the reuse bit is set, then the refresh bit is held set; or

Resetting the refresh bit if the reuse bit is not set.

4. The method of claim 2, further comprising, upon a hit in the cache for the line, setting the reuse bit.

5. The method of claim 1, further comprising, upon a cache hit for the line, returning the line from the cache to a requestor of the line if the flush bit is set and the reuse bit is also set.

6. The method of claim 1, further comprising, upon a cache hit of the line, if the flush bit is not set, processing the cache hit as a cache miss and delivering a request for the line to a backup memory of the cache.

7. The method of claim 1, wherein the refresh bit is set if the position of the way crosses the threshold from one of the least recently used positions to one of the more recently used positions and the reuse bit is set.

8. The method of claim 1, wherein the refresh bit is reset if the position of the way crosses the threshold from one of the least recently used positions to one of the more recently used positions and the reuse bit is not set.

9. The method of claim 1, wherein the threshold is fixed relative to the location of the LRU stack.

10. The method of claim 1, wherein the threshold is dynamically variable based on a value of a counter associated with the LRU stack, wherein the counter associated with a way having a cache hit is incremented.

11. The method of claim 10, wherein the counter is common to two or more ways.

12. The method of claim 1, wherein the cache is implemented as an embedded DRAM eDRAM.

13. The method of claim 1, wherein the cache is configured as a last level cache of a processing system.

14. An apparatus, comprising:

a cache configured as a set-associative cache having at least one set and two or more ways in the at least one set;

a cache controller configured for selective refresh of the at least one set of lines, the cache controller comprising:

two or more refresh bit registers comprising two or more refresh bits, each refresh bit associated with a corresponding one of the two or more ways;

two or more reuse bit registers comprising two or more reuse bits, each reuse bit associated with a corresponding one of the two or more ways; and

a least recently used LRU stack comprising two or more locations, each location associated with a corresponding one of the two or more ways, the two or more locations ranging from a most recently used location to a least recently used location,

wherein a location of the most recently used location toward a threshold assigned for the LRU stack comprises a more recently used location and a location of the least recently used location toward the threshold comprises a less recently used location; and

wherein the cache controller is configured to selectively flush a line in a way of the two or more ways if the following condition is met:

the location of the way is one of the most recently used locations and the refresh bit associated with the way is set; or

The position of the way is one of the least recently used positions, and both the refresh bit and the reuse bit associated with the way are set.

15. The apparatus of claim 14, wherein the cache controller is further configured to, when the line is reinserted into the way after a miss in the cache of the line:

associating the location of the way with one of the most recently more used locations;

setting the refresh bit; and

resetting the reuse bit.

16. The apparatus of claim 15, wherein the cache controller is further configured to, when the location of the way crosses the threshold and the location of the way is one of the least recently used locations:

if the reuse bit is set, then the refresh bit is held set; or

Resetting the refresh bit if the reuse bit is not set.

17. The apparatus of claim 15, wherein the cache controller is further configured to set the reuse bit upon a hit in the cache for the line.

18. The apparatus of claim 14, wherein the cache controller is further configured to, upon a cache hit for the line, return the line from the cache to a requestor of the line if the flush bit is set and the reuse bit is also set.

19. The apparatus of claim 14, wherein the cache controller is further configured to, upon a cache hit for the line, if the flush bit is not set, process the cache hit as a cache miss, and deliver a request for the line to a backup memory of the cache.

20. The apparatus of claim 14, wherein the cache controller is further configured to set the flush bit if the position of the way crosses the threshold from one of the least recently used positions to one of the more recently used positions and the reuse bit is set.

21. The apparatus of claim 14, wherein the cache controller is further configured to reset the refresh bit if the position of the way crosses the threshold from one of the least recently used positions to one of the more recently used positions and the reuse bit is not set.

22. The apparatus of claim 14, wherein the threshold is fixed relative to the location of the LRU stack.

23. The apparatus of claim 14, wherein the cache controller further comprises a counter associated with the LRU stack, and wherein the threshold is dynamically variable based on a value of the counter, and wherein the counter associated with a way having a cache hit is incremented.

24. The apparatus of claim 23, wherein the counter is common to two or more ways.

25. The apparatus of claim 14, wherein the cache is implemented as an embedded DRAM eDRAM.

26. The apparatus of claim 14, comprising a processing system, wherein the cache is configured as a last level cache of the processing system.

27. The apparatus of claim 14 integrated into a device selected from the group consisting of each of: a set top box, a server, a music player, a video player, an entertainment unit, a navigation device, a personal digital assistant, PDA, a fixed location data unit, a computer, a portable computer, a tablet computer, a communications device, and a mobile telephone.

28. An apparatus, comprising:

a cache configured as a set-associative cache having at least one set and two or more ways in the at least one set;

means for tracking a location associated with each of the two or more ways of the at least one group, the location ranging from a most recently used location to a least recently used location, and wherein locations of the most recently used location toward a threshold comprise most recently used locations and locations of the least recently used location toward the threshold comprise least recently used locations; and

means for selectively flushing a line in a way of the cache if the following conditions are met:

the location of the way is one of the most recently used locations and indicates that a first means of refresh associated with the way is set; or

The location of the way is one of the least recently used locations, and both the first device indicating refresh and a second device indicating reuse associated with the way are set.

29. A non-transitory computer-readable storage medium comprising code, which, when executed by a computer, causes the computer to perform operations to flush a line of a cache, the non-transitory computer-readable storage medium comprising:

code for associating a flush bit and a reuse bit with each of a set of two or more ways of the cache;

code for associating a least recently used LRU stack with the set, wherein the LRU stack includes a location associated with each of the two or more ways, the location ranging from a most recently used location to a least recently used location;

code for specifying a threshold for the LRU stack, wherein locations toward the most recently used location of the threshold comprise most recently used locations and locations toward the least recently used location of the threshold comprise least recently used locations; and

code to selectively flush a line in a way of the cache if the following condition is satisfied:

the location of the way is one of the most recently used locations and the refresh bit associated with the way is set; or

The position of the way is one of the least recently used positions, and both the refresh bit and the reuse bit associated with the way are set.

30. The non-transitory computer-readable storage medium of claim 29, further comprising, when the line is reinserted into the way following a miss in the cache of the line:

code for associating the location of the way with one of the most recently more used locations;

code for setting the refresh bit; and

code for resetting the reuse bit.

Technical Field

The disclosed aspects are directed to power management and efficiency improvements for memory systems. More specifically, exemplary aspects are directed to a selective refresh mechanism for Dynamic Random Access Memory (DRAM) to reduce power consumption and increase availability of the DRAM.

Background

DRAM systems provide a low cost data storage solution due to the simplicity of construction. Basically, a DRAM cell consists of a switch or transistor coupled to a capacitor. DRAM systems are organized as DRAM arrays, which include DRAM cells arranged in rows (or lines) and columns. It can be appreciated that the construction cost of a DRAM system is lower in view of the simplicity of DRAM cells, and high density integration of DRAM arrays is possible. However, since capacitors are subject to leakage, the charge stored in a DRAM cell needs to be refreshed periodically to properly retain the information stored therein.

For the purpose of preserving the information stored therein, conventional refresh operations involve reading out each DRAM cell (e.g., row by row) in a DRAM array and immediately writing back the read out data to the corresponding DRAM cell without modification. Therefore, the refresh operation consumes power. Depending on the particular implementation of a DRAM system (e.g., Double Data Rate (DDR), low power DDR (LPDDR), embedded DRAM (eDRAM), etc., as known in the art), a minimum refresh frequency is defined, wherein if a DRAM cell is not refreshed at a frequency of at least the minimum refresh frequency, the probability of the information stored therein being corrupted increases. If a DRAM cell is accessed for a memory access operation, such as a read or write operation, then the accessed DRAM cell is refreshed as part of performing the memory access operation. To ensure that DRAM cells are refreshed at least at a rate that satisfies a minimum refresh frequency even when the DRAM cells are not accessed by a memory access operation, various dedicated refresh mechanisms may be provided for DRAM systems.

However, it has been recognized that, for example, in larger last level cache implementations such as level 3 (L3) data cache eDRAMs, the periodic refresh of each line of DRAM may be too expensive in terms of time and power to be feasible in conventional implementations. In an effort to alleviate the time consuming process, some approaches are directed to refreshing groups of two or more lines in parallel, but such approaches may also have drawbacks. For example, if the number of lines refreshed simultaneously is relatively small, the time consumed refreshing the DRAM may still be too high, which may reduce the DRAM's availability for other access requests (e.g., read/write). This is because an ongoing refresh operation may delay or prevent the DRAM from servicing an access request. On the other hand, if the number of lines to be refreshed simultaneously is large, the corresponding power consumption increases, which in turn may increase the need for stability of a Power Delivery Network (PDN) for supplying power to the DRAM. More complex PDNs may also reduce the routing tracks available for other lines associated with the DRAM circuitry, and increase the size of the DRAM die.

Accordingly, it has been recognized that there is a need in the art to improve the refresh mechanism of a DRAM to avoid the above-described drawbacks of conventional implementations.

Disclosure of Invention

Illustrative aspects of the invention are directed to systems and methods for selective refresh of a cache, such as a last level cache of a processing system implemented as embedded dram (edram). The cache may be configured as a banked cache having at least one set and two or more ways in the at least one set, and a cache controller may be provided that is configured for selective flushing of lines of the at least one set. The cache controller may include two or more flush bit registers including two or more flush bits, each flush bit associated with a corresponding one of the two or more ways, and two or more reuse bit registers including two or more reuse bits, each reuse bit associated with a corresponding one of the two or more ways. The refresh and reuse bits are used to determine whether to refresh the associated line in the following manner. The cache controller may further include a Least Recently Used (LRU) stack comprising two or more locations, each location associated with a corresponding one of the two or more ways, the two or more locations ranging from a most recently used location to a least recently used location, wherein a location toward the most recently used location of a threshold assigned for the LRU stack comprises a more recently used location and a location toward the least recently used location of the threshold comprises a less recently used location. If the location of the way is one of the more recently used locations and the flush bit associated with the way is set, or the location of the way is one of the less recently used locations and both the flush bit and the reuse bit associated with the way are set, the cache controller is configured to selectively flush a line in a way of the two or more ways.

For example, one exemplary aspect is directed to a method of refreshing a line of a cache. The method comprises the following steps: associating a flush bit and a reuse bit with each of a set of two or more ways of the cache, associating a Least Recently Used (LRU) stack with the set, wherein the LRU stack includes a location associated with each of the two or more ways ranging from a most recently used location to a least recently used location, and assigning a threshold to the LRU stack, wherein locations of the most recently used locations toward the threshold include more recently used locations and locations of the least recently used locations toward the threshold include less recently used locations. If the location of the way is one of the more recently used locations and the flush bit associated with the way is set, or the location of the way is one of the less recently used locations and both the flush bit and the reuse bit associated with the way are set, then a line in a way of a cache is selectively flushed.

Another exemplary aspect is directed to an apparatus comprising a cache configured as a banked cache having at least one bank and two or more ways in the at least one bank, and a cache controller configured for selective flushing of lines of the at least one bank. The cache controller includes two or more refresh bit registers including two or more refresh bits, each refresh bit associated with a corresponding one of the two or more ways, two or more reuse bit registers, comprising two or more reuse bits, each reuse bit associated with a corresponding one of the two or more ways, and a Least Recently Used (LRU) stack, comprising two or more locations, each location associated with a corresponding one of the two or more ways, the two or more location ranges are a most recently used location to a least recently used location, wherein a location of the most recently used location toward a threshold assigned for the LRU stack comprises a more recently used location and a location of the least recently used location toward the threshold comprises a less recently used location. If the location of the way is one of the more recently used locations and the flush bit associated with the way is set, or the location of the way is one of the less recently used locations and both the flush bit and the reuse bit associated with the way are set, the cache controller is configured to selectively flush a line in a way of the two or more ways.

Yet another exemplary aspect is directed to an apparatus comprising a cache configured as a set associative cache having at least one set and two or more ways in the at least one set, means for tracking a location associated with each of the two or more ways of the at least one set, the location ranging from a most recently used location to a least recently used location, and wherein locations of the most recently used location toward the threshold comprise most recently used locations and locations of the least recently used location toward the threshold comprise least recently used locations. The apparatus further comprises means for selectively flushing a line in a way of the cache if the following conditions are met: the location of the way is one of the most recently used locations and a first means indicating a refresh associated with the way is set, or the location of the way is one of the less recently used locations and both the first means indicating a refresh and a second means indicating a reuse associated with the way are set.

Another exemplary aspect is directed to a non-transitory computer-readable storage medium comprising code, which, when executed by a computer, causes the computer to perform operations to flush a line of a cache. The non-transitory computer-readable storage medium comprises: code for associating a flush bit and a reuse bit with each of a set of two or more ways of the cache, code for associating a Least Recently Used (LRU) stack with the set, wherein the LRU stack comprises a location associated with each of the two or more ways, the location ranging from a most recently used location to a least recently used location, code for specifying a threshold for the LRU stack, wherein locations of the most recently used locations toward the threshold comprise more recently used locations, and locations of the least recently used locations toward the threshold comprise less recently used locations, and code for selectively flushing a line in a way of the cache if the following conditions are met: the location of the way is one of the most recently used locations and the refresh bit associated with the way is set; or the position of the way is one of the least recently used positions, and both the refresh bit and the reuse bit associated with the way are set.

Drawings

The accompanying drawings are presented to aid in the description of aspects of the invention and are provided solely for illustration of the aspects and not limitation thereof.

FIG. 1 depicts an exemplary processing system including a cache configured with a selective refresh mechanism, according to an aspect of the present invention.

FIGS. 2A-B illustrate aspects of dynamic threshold calculation for an exemplary cache, in accordance with aspects of the present invention.

FIG. 3 depicts an exemplary method of flushing a cache, according to an aspect of the present invention.

FIG. 4 depicts an exemplary computing device in which an aspect of the present invention may be advantageously employed.

Detailed Description

Aspects of the invention are disclosed in the following description and related drawings directed to specific aspects of the invention. Alternative aspects may be devised without departing from the scope of the invention. Additionally, well-known elements of the invention will not be described in detail or will be omitted so as not to obscure the relevant details of the invention.

The word "exemplary" is used herein to mean "serving as an example, instance, or illustration. Any aspect described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other aspects. Likewise, the term "aspects of the invention" does not require that all aspects of the invention include the discussed feature, advantage or mode of operation.

The terminology used herein is for the purpose of describing particular aspects only and is not intended to be limiting of aspects of the invention. As used herein, the singular forms "a", "an" and "the" include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

Further, many aspects are described in terms of sequences of actions to be performed by, for example, elements of a computing device. It should be recognized that various actions described herein can be performed by specific circuits (e.g., application specific integrated circuits (application specific integrated circuits; ASIC), by program instructions executed by one or more processors, additionally, such sequence of actions described herein can be considered to be embodied entirely within any form of computer readable storage medium, the computer-readable storage medium stores a corresponding set of computer instructions that, when executed, will cause an associated processor to perform the functionality described herein. For each of the aspects described herein, a corresponding form of any such aspect may be described herein as, for example, "logic configured to" perform the described action.

In an illustrative aspect of the invention, a selective refresh mechanism is provided for a DRAM, such as an eDRAM implemented in a last level cache, e.g., an L3 cache. eDRAM may be integrated on the same system on chip (SoC) as the processor accessing the last level cache (although this is not a requirement). For such last level caches, it should be recognized that a significant proportion of cache lines may not receive any hits after being made cached, as the location of such cache lines may be filtered by inner level caches closer to, for example, level 1 (L1), level 2 (L2) caches of the processor making the access request to the cache. Further, in a set associative cache implementation of the last-level cache, as cache lines are organized in two or more ways per set, it is also recognized that among the cache lines that hit in the last-level cache, the corresponding hits may be limited to a subset of ways that contains a set of most recently used ways (e.g., 4 most recently used locations in a Least Recently Used (LRU) stack associated with a set of 8-way-inclusive last-level caches). Thus, the selective refresh mechanism described herein is directed to selectively refreshing only lines that may be reused, particularly lines in the least recently used ways of a cache configured using DRAM technology.

In one aspect, 2 bits, referred to as refresh bits and reuse bits, are associated with each way (e.g., by strengthening the way-associated tag, for example, with two additional bits). Further, a threshold is assigned to the LRU stack of the cache, wherein the threshold indicates a spacing between the most recently used lines and the least recently used lines. In one aspect, the threshold may be fixed, while in another aspect, the threshold may be dynamically changed, using a counter to analyze the number of ways that receive hits.

In general, a flush bit that is set to "1" for a way (or simply "set") is employed to indicate that a cache line stored in the associated way should be flushed. The reuse bit set to "1" for a way (or simply "set") is employed to indicate that the cache line in the way has undergone at least one reuse. In an illustrative aspect, a cache line and its set of flush bits will be flushed when the cache line is in a way whose location is most recently used; but if the way's location crosses the threshold to the least recently used location, then the cache line is flushed if its flush bit is set and its reuse bit is also set. This is because cache lines in recently less used ways are generally recognized as less likely to experience reuse, and therefore are not flushed unless their reuse bit is set to indicate that such cache lines experience reuse.

By selectively refreshing the lines in this manner, power consumption related to the refresh operation is reduced. Furthermore, by not refreshing a particular line that may have been conventionally refreshed, the availability of the cache in other access operations, such as read/write operations, is increased.

Referring initially to FIG. 1, an exemplary processing system 100 is illustrated in which a processor 102, cache 104, and memory 106 are representatively shown, with the understanding that various other components may be present which are not illustrated for the sake of clarity. The processor 102 may be any processing element configured to make memory access requests to a memory 106, which may be a main memory. Cache 104 may be one of several caches that exist between processor 102 and memory 106 in a memory hierarchy of processing system 100. In an example, the cache 104 may be a last level cache (e.g., a level 3 or L3 cache), with one or more higher level caches, such as a level 1 (L1) cache and one or more level 2 (L2) caches, present between the processor 102 and the cache 104, although such caches are not shown. In an aspect, the cache 104 may be configured as an eDRAM cache and may be integrated onto the same chip as the processor 102 (although this is not required). Cache controller 103 has been illustrated with dashed lines to represent logic configured to perform exemplary control operations with respect to cache 104, including managing and implementing the selective refresh operations described herein. Although the cache controller 103 has been illustrated in FIG. 1 as a wrapper around the cache 104, it should be understood that the logic and/or functionality of the cache controller 103 may be integrated in the processing system 100 in any other suitable manner without departing from the scope of the present invention.

As shown, for illustration, in an example, the cache 104 may be a set-associative cache having four sets 104 a-104 d. Each set 104 a-104 d may have multiple cache lines (also referred to as cache blocks). Eight ways w0 through w7 for set 104c of cache lines have been representatively illustrated in the example of FIG. 1. The temporal location of a cache access may be estimated by recording the order of cache lines in ways w0 through w7 in stack 105c (which is also referred to as an LRU stack) from the most recently accessed or Most Recently Used (MRU) to least recently accessed or Least Recently Used (LRU) order. For example, LRU stack 105c may be an ordered set of buffers or registers, where each entry of LRU stack 105c may include an indication of a way, ranging from MRU to LRU (e.g., in the illustrative example, each entry of LRU stack 105c may include 3 bits to point to one of eight ways w 0-w 7, such that MRU entries may point to a first way, e.g., w5, while LRU entries may point to a second way, e.g., w 3). In the illustrated example implementation, the LRU stack 105c may be provided in or be part of the cache controller 103.

In an illustrative aspect, a threshold may be used to partition entries of the LRU stack 105c, where locations toward a Most Recently Used (MRU) location of the threshold are referred to as more recently used locations, and locations toward a Least Recently Used (LRU) location of the threshold are referred to as less recently used locations. With such threshold assignments, lines of the LRU stack 105c in ways associated with the most recently used locations may be substantially refreshed, while lines in ways associated with the less recently used locations may not be refreshed unless they undergo reuse. In this manner, selective refresh is performed by using two bits to track whether a line is to be refreshed or not.

The two bits are representatively shown as a refresh bit 110c and a reuse bit 112c associated with each way w 0-w 7 of the bank 104 c. The refresh bit 110c and the reuse bit 112c may be configured as additional bits (not separately shown) of the tag array. More generally, in alternative examples, refresh bit 110c may be stored in any memory structure, such as a refresh bit register (not identified in fig. 1 as a separate reference number) for each way w 0-w 7 of bank 104c, and similarly, reuse bit 112c may be stored in any memory structure, such as a reuse bit register (not identified in fig. 1 as a separate reference number) for each way w 0-w 7 of bank 104 c. Thus, for two or more ways w0 to 27 in each set, cache controller 103 may include a corresponding number of two or more flush bit registers including flush bit 110c, and two or more reuse bit registers including reuse bit 112 c. As previously mentioned, if the flush bit 110c is set (e.g., to a value of "1") for the way of the set 104c, this means that the cache line in the corresponding way is to be flushed. If the reuse bit 112c is set (e.g., to a value of "1"), this means that the corresponding line has undergone at least one reuse.

In an exemplary aspect, the cache controller 103 (or any other suitable logic) may be configured to perform an exemplary refresh operation on the cache 104 based on the state or values of the refresh bit 110c and reuse bit 112c for each way, which allows for selectively refreshing only lines in ways of the set 104c that may be reused. The description provides example functions that may be implemented in cache controller 103 to perform selective refresh operations on cache 104, and more specifically, to perform selective refresh of lines in ways w 0-w 7 of set 104c of cache 104. In the illustrative aspect, a line in a way is refreshed only when its associated refresh bit 110c is set, and is not refreshed when its associated refresh bit 110c is not set (or is set to a value of "0"). The following strategy may be used to set/reset the refresh bit 110c and the reuse bit 112c for each line of the bank 104 c.

When a new cache line is inserted into cache 104, e.g., set 104c, the corresponding flush bit 110c is set (e.g., to the value "1"). The way of the re-inserted cache line will be in the most recently used position in LRU stack 105 c. When a line is inserted into another way, the location of the way drops from the most recently used location to the least recently used location. Flush bit 110c will remain set until the location associated with the way in which the line is inserted in LRU stack 105c crosses the threshold, changing from a more recently used line assignment to a less recently used line assignment.

Once the location of a way changes to a least recently used assignment, the refresh bit 110c for that way is refreshed based on the value of the reuse bit 112 c. If the reuse bit 112c is set (e.g., to a value of "1") when, for example, the line has experienced a cache hit, the flush bit 110c is likewise set and the line will be flushed until the line becomes stale (i.e., its reuse bit 112c is reset or set to a value of "0"). On the other hand, if reuse bit 112c is not set (e.g., to a value of "0") when, for example, the line has not experienced a cache hit, then flush bit 110c is set to "0" and the line is no longer flushed.

On a cache miss for a line in set 104c, the line may be installed in the way of set 104c, and its flush bit 110c may be set to "1" and reuse bit 112c reset or set to "0". The relative usage of a line is tracked by its way location in the LRU stack 105 c. As before, once the way crossing threshold is into a location in the LRU stack 105c that is assigned to be least recently used, and if the line has not been reused (i.e., reuse bit 112c is "0"), the corresponding flush bit 110c is reset or set to "0" to avoid flushing dead lines that have not been recently used and may not have a high reuse probability.

For a cache hit on a line in a way of set 104c, if its flush bit 110c is set, its reuse bit 112c is also set, and the line is returned or passed to the requestor, e.g., processor 102. In some aspects, if the flush bit 110c is not set (or is set to "0") for the way, a cache hit may be considered a cache miss for a line in the way. In more detail, the line in which the refresh bit 110c in the way is not set (or is set to "0") is assumed to have exceeded the refresh limit and is accordingly treated as failed and, thus, is not returned to the processor 102. The request for the cache line that was treated as a miss is then sent to the next level of backup memory, e.g., main memory 106, so that a fresh and correct copy can be fetched into cache 104 again.

In an aspect, if a line is in a way that has passed through the threshold towards the MRU position into the set 104c of more recently used positions in the LRU stack 105c (e.g., the line is in the four more recently used positions), and if the reuse bit 112c is set, then the flush bit 110c is also set because the line has undergone reuse and, therefore, the line is always flushed. On the other hand, if the line crosses the threshold into the most recently used position and its reuse bit 112c is not set, then the refresh bit 110c is reset or set to "0" because the line has not undergone reuse; and as such may have a low future reuse probability; accordingly, the refreshing of the lines is suspended or not performed.

In some aspects, instead of a fixed threshold as described above, a dynamically variable threshold may be used in conjunction with the location of the LRU stack 105c for the instance set 104c of the cache 104. For example, the threshold may change dynamically based on the program phase or some other indicator.

FIG. 2A illustrates one embodiment of dynamic thresholds. The LRU stack 105c of figure 1 is shown as an example having a representative set of counters 205c, one counter associated with each way of the LRU stack 105 c. The counters 205c may be selected according to implementation requirements, but typically may each be M bits in size and set to increment each time a respective line of the bank 104c receives a hit. Thus, counter 205c may be used to analyze the number of hits received by a line of bank 104 c. Based on the values in such counters, which are sampled, for example, at specified intervals, the threshold for the LRU stack 105c (based on which lines crossing the more recently used locations toward the MRU location may be refreshed, while lines in the less recently used locations toward the LRU location may not be refreshed, as previously discussed) may be adjusted for the next sampling interval. In an example, the highest value of counter 205c is associated with the MRU position and the lowest value of counter 205c is associated with the LRU position, the value of counter 205c between the highest and lowest values being associated with the position between the MRU position and the LRU position, from a more recently used assignment to a less recently used assignment. Thus, if a particular counter (e.g., associated with way w 5) has the highest value, the line in the associated way is flushed until the counter value falls below the value associated with the w5 position of LRU stack 105 c.

In some designs, it may be desirable to reduce the hardware and/or associated resources of counter 205c of fig. 2A. Figure 2B illustrates another aspect in which the resources consumed by the counter used to determine the threshold for the LRU stack 105c may be reduced. The counters 210c shown in fig. 2B illustrate the grouping in such counters. For example, one of the two counters 210c may be used for reuse among traces w 4-w 7, while the other of the two counters 210c may be used for reuse among traces w 0-w 3. In this way, there is no need to consume a separate counter for each way. However, the analysis is coarser than the granularity that can be provided by the embodiment of fig. 2A, with the benefit of reduced resources. Based on the two counters 210c, a decision on the threshold may be made, for example, by analyzing the upper or lower half of the way of the set 104c to find more reuse.

In yet another implementation, although not explicitly shown, counters may be provided for only a subset of the total number set of caches 104. For example, if counters N1-N4 are provided to track the upper half of four sets of ways within 16 sets in an implementation of cache 104 (not corresponding to the illustration shown in FIG. 1), and counters M1-M4 are provided to track the lower half of four sets of ways within 16 sets, the LRU threshold may be calculated in terms of maximum (avg (N1 … N4), avg (M1 … M4)).

Accordingly, it should be appreciated that the exemplary aspects include various methods for performing the processes, functions and/or algorithms disclosed herein. For example, as discussed further below, method 300 is directed to a method of refreshing a line of a cache (e.g., cache 104).

In block 302, method 300 includes associating the flush bit and the reuse bit with each of two or more ways of a set of caches (e.g., by cache controller 103 associating flush bit 110c and reuse bit 112c with ways w 0-w 7 of set 104 c).

Block 304 includes associating a Least Recently Used (LRU) stack with the set, wherein the LRU stack includes a location associated with each of the two or more ways ranging from a most recently used location to a least recently used location (e.g., LRU stack 105c of cache controller 103 associated with set 104c ranging from MRU to LRU).

Block 306 includes assigning a threshold to the LRU stack, wherein the location of the most recently used location toward the threshold comprises a more recently used location, and the location of the least recently used location toward the threshold comprises a less recently used location (e.g., a fixed threshold or a dynamic threshold, in fig. 1, for example, the location of the MRU location toward the threshold in LRU stack 105c is shown as the more recently used location, and the location of the LRU location toward the threshold is shown as the less recently used location).

In block 308, the line in the way of the cache may be selectively flushed if the following conditions are met: the location of the way is one of the most recently used locations, and the refresh bit associated with the way is set; or the way's position is one of the least recently used positions and both the way-dependent flush bit and reuse bit are set (e.g., cache controller 103 may be configured to selectively direct a flush operation to be performed on a line in a way of two or more ways w 0-w 7 of set 104c of cache 104 if the condition is met that the way's position is one of the most recently more used positions and the way-dependent flush bit 110c is set, or the way's position is one of the least recently used positions and both the way-dependent flush bit 110c and reuse bit 112c are set).

It should be appreciated that aspects of the present invention also include any apparatus configured to perform, or comprising means for performing, the functionality described herein. For example, according to an aspect, an exemplary apparatus includes a cache (e.g., cache 104) configured as a set-associative cache having at least one set (e.g., set 104c) and two or more ways of the at least one set (e.g., ways w 0-w 7). As such, the apparatus may include means for tracking a location associated with each of the two or more ways of the at least one set (e.g., LRU stack 105c) ranging from a most recently used location to a least recently used location, and wherein the location of the most recently used location toward the threshold comprises a more recently used location and the location of the least recently used location toward the threshold comprises a less recently used location. The apparatus may also include means (e.g., cache controller 103) for selectively flushing lines in ways of the cache if the following conditions are met: the location of the way is one of the most recently used locations and a first device indicating a refresh (e.g., refresh bit 110c) associated with the way is set; or the position of the way is one of the least recently used positions, and both the first device indicating a flush and a second device indicating reuse (e.g., reuse bit 112c) related to the way are set.

An example device that may utilize exemplary aspects of the present invention will now be discussed with respect to fig. 4. Fig. 4 shows a block diagram of a computing device 400. The computing device 400 may correspond to an exemplary implementation of a processing system configured to perform the method 300 of fig. 3. In the description of fig. 4, computing device 400 is shown to include processor 102 and cache 104, along with cache controller 103 shown in fig. 1. Cache controller 103 is configured to perform a selective refresh mechanism on cache 104 as discussed herein (although further details of cache 104 have been shown in fig. 1, such as sets 104 a-104 d, ways w 0-w 7, and further details of cache controller 103, such as refresh bit 110c, reuse bit 112c, LRU stack 105c, etc., have been omitted from this view for clarity). In FIG. 4, the processor 102 is illustratively shown coupled to the memory 106, and the cache 104 is between the processor 102 and the memory 106, as described with reference to FIG. 1, although it should be understood that other memory configurations known in the art may also be supported by the computing device 400.

FIG. 4 also shows a display controller 426 that is coupled to the processor 102 and to a display 428. In some cases, computing device 400 may be used for wireless communication, and FIG. 4 likewise shows optional blocks in dashed lines, such as a coder/decoder (CODEC) 434 (e.g., an audio and/or speech CODEC) coupled to processor 102, and a speaker 436 and a microphone 438 may be coupled to CODEC 434; and a wireless antenna 442 coupled to the wireless controller 440, which is coupled to the processor 102. In a particular aspect, the processor 102, the display controller 426, the memory 106, and the wireless controller 440 are included in a system-in-package or system-on-chip device 422 when one or more of such selectable blocks are present.

Thus, in a particular aspect, an input device 430 and a power supply 444 are coupled to the system-on-chip device 422. Moreover, in a particular aspect, as illustrated in fig. 4, when one or more selectable blocks are present, the display 428, the input device 430, the speaker 436, the microphone 438, the wireless antenna 442, and the power supply 444 are external to the system-on-chip device 422. However, each of the display 428, the input device 430, the speaker 436, the microphone 438, the wireless antenna 442, and the power supply 444 may be coupled to a component of the system-on-chip device 422, such as an interface or a controller.

It should be noted that although fig. 4 generally depicts a computing device, processor 102 and memory 106 may also be integrated into a set top box, server, music player, video player, entertainment unit, navigation device, Personal Digital Assistant (PDA), fixed location data unit, computer, portable computer, tablet computer, communications device, mobile phone, or other similar device.

Those of skill in the art would understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.

Furthermore, those of skill in the art will appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the aspects disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

The methods, sequences and/or algorithms described in connection with the aspects disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor.

Accordingly, an aspect of the present invention can include a computer readable medium embodying a method for selective refresh of a DRAM. Accordingly, the invention is not limited to the illustrated examples, and any means for performing the functionality described herein are included in aspects of the invention.

While the foregoing disclosure shows illustrative aspects of the invention, it should be noted that various changes and modifications could be made herein without departing from the scope of the invention as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the aspects of the invention described herein need not be performed in any particular order. Furthermore, although elements of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.

18页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:通信装置、通信方法、程序和通信系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类