Electronic device and electronic system

文档序号:1112803 发布日期:2020-09-29 浏览:9次 中文

阅读说明:本技术 电子设备和电子系统 (Electronic device and electronic system ) 是由 维卡斯·辛哈 西恩·勒 塔伦·那克拉 田颖莹 艾普瓦·帕特尔 奥马尔·托雷斯 于 2020-03-19 设计创作,主要内容包括:提供一种电子设备和电子系统。根据一个总体方面,所述电子设备可包括:处理器,被配置为发出针对来自高速缓冲存储器的一条数据的第一请求和针对来自系统存储器的所述一条数据的第二请求。所述电子设备可包括:高速缓冲存储器,被配置为临时存储数据的子集。所述电子设备可包括存储器互连。存储器互连可被配置为接收针对来自系统存储器的所述一条数据的第二请求。存储器互连可被配置为确定所述一条数据是否被存储在高速缓冲存储器中。存储器互连可被配置为:如果所述一条数据被确定为存储在高速缓冲存储器中,则取消针对来自系统存储器的所述一条数据的第二请求。(An electronic device and an electronic system are provided. According to one general aspect, the electronic device may include: a processor configured to issue a first request for a piece of data from the cache memory and a second request for the piece of data from the system memory. The electronic device may include: a cache memory configured to temporarily store a subset of the data. The electronic device may include a memory interconnect. The memory interconnect may be configured to receive a second request for the piece of data from the system memory. The memory interconnect may be configured to determine whether the piece of data is stored in the cache memory. The memory interconnect may be configured to: canceling the second request for the piece of data from the system memory if the piece of data is determined to be stored in the cache memory.)

1. An electronic device, comprising:

a processor configured to issue a first request for a piece of data from the cache memory and a second request for the piece of data from the system memory;

a cache configured to store a subset of data; and

a memory interconnect configured to: receiving a second request for the piece of data from the system memory, determining whether the piece of data is stored in the cache memory, and canceling the second request for the piece of data from the system memory if the piece of data is determined to be stored in the cache memory.

2. The electronic device of claim 1, wherein the processor is configured to include a speculative flag in the second request for the piece of data from the system memory.

3. The electronic device of claim 1, wherein the memory interconnect is configured to:

if the piece of data is determined to be stored in the cache memory, the second request for the piece of data from the system memory is canceled by issuing a cancellation response message to the processor.

4. The electronic device of claim 1, wherein the memory interconnect is configured to determine whether the piece of data is stored in the cache memory by checking a snoop filter directory.

5. The electronic device of claim 4, wherein the snoop filter directory includes false positive results instead of false negative results.

6. The electronic device of claim 1, wherein the processor is configured to:

a third request for the piece of data from the system memory is issued in response to receiving both the failure of the first request and the cancellation of the second request.

7. The electronic device of claim 1, wherein the memory interconnect is configured to: when the second request is issued, access requests to system memory for a piece of data already stored in the cache memory are blocked.

8. The electronic device of claim 1, wherein the memory interconnect is configured to:

canceling the second request for the piece of data from the system memory if the second request arrives at the memory interconnect before the write request associated with the piece of data, wherein the second request is earlier than the write request.

9. An electronic system, comprising:

a plurality of processors, wherein a requesting processor among the plurality of processors is configured to issue a first request for a piece of data from a cache memory system and a second request for the piece of data from a system memory;

a cache memory system including, for each processor of the plurality of processors, a portion of the cache memory system associated with the respective processor; and

a memory interconnect configured to:

facilitating cache coherency among the plurality of processors,

receiving a second request for the piece of data from the system memory,

determining whether the piece of data is stored in a portion of the cache memory system that is accessible by the requesting processor, and

canceling the second request for the piece of data from the system memory if the piece of data is determined to be stored in a portion of the cache memory system that is accessible by the requesting processor.

10. The electronic system of claim 9, wherein the requesting processor is configured to include a speculative flag in a second request for the piece of data from system memory.

11. The electronic system of claim 9, wherein the memory interconnect is configured to:

if the piece of data is determined to be stored in a portion of the cache memory system that is accessible by the requesting processor, a second request for the piece of data from the system memory is canceled by issuing a cancel response message to the requesting processor.

12. The electronic system of claim 9, wherein the memory interconnect is configured to determine whether the piece of data is stored in the portion of the cache memory accessible to the requesting processor by checking a snoop filter directory.

13. The electronic system of claim 12, wherein the snoop filter directory includes false positive results instead of false negative results.

14. The electronic system of claim 9, wherein the requesting processor is configured to:

a third request for the piece of data from the system memory is issued in response to receiving both the failure of the first request and the cancellation of the second request.

15. The electronic system of claim 9, wherein the memory interconnect is configured to: when the second request is issued, access requests to system memory for a piece of data already stored in the cache memory system are blocked.

16. The electronic system of claim 9, wherein the memory interconnect is configured to:

canceling the second request for the piece of data from the system memory if the second request arrives at the memory interconnect before the write request associated with the piece of data, wherein the second request is earlier than the write request.

17. An electronic device, comprising:

memory access interface circuitry configured to receive and transmit memory access requests and responses;

a cache coherency data structure configured to indicate contents of a cache memory; and

speculative request management circuitry configured to:

a speculative request to system memory for a piece of data is received,

determining whether the piece of data is stored in at least a portion of the cache memory, and

if the piece of data is determined to be stored in the cache, the speculative request is cancelled.

18. The electronic device of claim 17, wherein the speculative request management circuit is configured to:

the speculative request is cancelled, at least in part, by issuing a cancel request message to the requesting device.

19. The electronic device of claim 17, wherein the speculative request management circuit is configured to:

determining whether the piece of data is stored in at least one portion of the cache memory by accessing a cache coherency data structure.

Technical Field

This description relates to memory operations, and more particularly, to speculative Dynamic Random Access Memory (DRAM) reads borrowing interconnect directories in parallel with cache level searches.

Background

When a particular data is shared by multiple caches and the processor modifies the value of the shared data, the change must be propagated to all other caches that have copies of the data. This change propagation prevents the system from violating cache coherency. Notification of data changes may be accomplished by bus snooping.

Bus snooping or bus snooping is a scheme in which a coherency controller (snooper) in a cache monitors or snoops bus transactions, with the goal of maintaining cache coherency in a distributed shared memory system. A cache containing a coherence controller (snooper) is referred to as a snooped cache.

All snoopers monitor each transaction on the bus. If a transaction occurs on the bus that modifies a shared cache block, all snoopers check whether their caches have identical copies of the shared block. If the cache has a copy of the shared block, the corresponding snooper performs an action to ensure cache coherency. This action may be a flush (flush) or invalidation of the cache block. It also relates to changes in cache block status according to a cache coherency protocol.

When a bus transaction occurs for a particular cache block, all snoopers must snoop the bus transaction. The snoopers then query their corresponding cache tags to check if they have the same cache block. In most cases, the cache does not have this cache block because a well optimized parallel program does not share much data among multiple threads. Therefore, the snooper's cache tag lookup is typically an unnecessary task for caches that do not have the cache block. However, tag queries interfere with cache accesses by the processor and cause additional power consumption.

One way to reduce unnecessary interception is to use an interception filter. The snoop filter determines whether the listener needs to check its cache tag. The snoop filter is a directory-based structure and monitors all coherency traffic to keep track of the coherency state of the cache blocks. It means that the snoop filter knows the cache with a copy of the cache block. Thus, it may prevent unnecessary snoops from caches that do not have copies of the cache block. Depending on the location of the listening filter, there are two types of filters. One is a source filter that is located on the cache side and performs filtering before coherency traffic reaches the shared bus. The other is a destination filter that is located on the bus side and blocks unwanted coherent traffic flowing from the shared bus. Interception filters are also classified as inclusive and exclusive. The inclusive snoop filter keeps track of the presence of cache blocks in the cache. However, the exclusive snoop filter monitors the cache for the absence of cache blocks in the cache. In other words, a hit in the inclusive snoop filter means that the corresponding cache block is held by the cache. On the other hand, a hit in the exclusive snoop filter means that no cache has the requested cache block.

Disclosure of Invention

It is an object of the present disclosure to provide a device and a system having reduced delay and efficiently utilizing resources.

According to one general aspect, an apparatus may comprise: a processor configured to issue a first request for a piece of data from the cache memory and a second request for the piece of data from the system memory. The apparatus may comprise: a cache memory configured to store a subset of the data. The apparatus may include a memory interconnect. The memory interconnect may be configured to receive a second request for the piece of data from the system memory. The memory interconnect may be configured to determine whether the piece of data is stored in the cache memory. The memory interconnect may be configured to: canceling the second request for the piece of data from the system memory if the piece of data is determined to be stored in the cache memory.

According to another general aspect, a system may include: a plurality of processors, wherein a requesting processor is configured to issue a first request for a piece of data from the cache memory system and a second request for the piece of data from the system memory. The system may include: a cache memory system comprising, for each processor, a portion of the cache memory system associated with the respective processor. The system may include a memory interconnect. The memory interconnect may be configured to facilitate cache coherency among the plurality of processors. The memory interconnect may be configured to receive a second request for the piece of data from the system memory. The memory interconnect may be configured to determine whether the piece of data is stored in a portion of the cache memory system that is accessible by the requesting processor. The memory interconnect may be configured to: canceling the second request for the piece of data from the system memory if the piece of data is determined to be stored in the portion of the cache memory system.

According to another general aspect, an apparatus may include: memory access interface circuitry configured to receive and transmit memory access requests and responses. The apparatus may comprise: a cache coherency data structure configured to indicate contents of a cache memory. The apparatus may comprise: speculative request management circuitry configured to: the method includes receiving a speculative request to system memory for a piece of data, determining whether the piece of data is stored in at least a portion of a cache memory, and cancelling the speculative request if the piece of data is determined to be stored in the cache memory.

The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features will be apparent from the description and drawings, and from the claims.

As set forth more fully in the claims, a system and/or method for memory operations, and more particularly, speculative Dynamic Random Access Memory (DRAM) reads borrowing interconnect directories in parallel with cache level searches is set forth substantially as shown in and/or described in connection with at least one of the figures.

According to the present disclosure, requests to system memory are cancelled when a request to cache memory is speculatively successful. Thus, a device and system are provided that have reduced latency and efficiently utilize resources.

Drawings

Fig. 1A is a block diagram of an example embodiment of a system in accordance with the disclosed subject matter.

Fig. 1B is a block diagram of an example embodiment of a system in accordance with the disclosed subject matter.

Fig. 2 is a block diagram of an example embodiment of a system in accordance with the disclosed subject matter.

Fig. 3 is a flow diagram of an example embodiment of a technique in accordance with the disclosed subject matter.

FIG. 4 is a schematic block diagram of an information handling system that may include devices formed in accordance with the principles of the disclosed subject matter.

Like reference symbols in the various drawings indicate like elements.

Detailed Description

Various example embodiments will be described more fully hereinafter with reference to the accompanying drawings, in which some example embodiments are shown. The subject matter of the present disclosure may, however, be embodied in many different forms and should not be construed as limited to the example embodiments set forth herein. Rather, these example embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the subject matter of the disclosure to those skilled in the art. In the drawings, the size and relative sizes of layers and regions may be exaggerated for clarity.

It will be understood that when an element or layer is referred to as being "on," "connected to" or "coupled to" another element or layer, it can be directly on, connected or coupled to the other element or layer or intervening elements or layers may be present. In contrast, when an element is referred to as being "directly on," "directly connected to" or "directly coupled to" another element or layer, there are no intervening elements or layers present. Like numbers refer to like elements throughout. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, third, etc. may be used herein to describe various elements, components, regions, layers and/or sections, these elements, components, regions, layers and/or sections should not be limited by these terms. These terms are only used to distinguish one element, component, region, layer or section from another element, component, region, layer or section. Thus, a first element, a first component, a first region, a first layer and/or a first portion discussed below could be termed a second element, a second component, a second region, a second layer and/or a second portion without departing from the teachings of the presently disclosed subject matter.

For ease of description, spatially relative terms (such as "below …," "below …," "below," "above …," "above," and the like) may be used herein to describe the relationship of one element or feature to another element or feature as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, elements described as "below" or "beneath" other elements or features would then be oriented "above" the other elements or features. Thus, the exemplary term "below …" can include both an orientation of "above …" and "below …". The device may be otherwise oriented (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted accordingly.

Likewise, for ease of description, electrical terms (such as "high," "low," "pull-up," "pull-down," "1," "0," etc.) may be used herein to describe voltage levels or currents relative to other voltage levels or another element or feature as illustrated in the figures. It will be understood that electrically relative terms are intended to encompass different reference voltages of the device in use or operation in addition to the voltages and currents depicted in the figures. For example, if a device or signal in the figures is flipped or other reference voltages, currents, or charges are used, then an element described as "high" or "pull-up" will then be "low" or "pull-down" compared to the new reference voltage or current. Thus, the exemplary term "high" may include a relatively low voltage or current or both a relatively high voltage or current. The device may additionally be based on different electrical frames of reference, the electrical relative descriptors used herein interpreted accordingly.

The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting of the subject matter of the present disclosure. As used herein, the singular forms also are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

Example embodiments are described herein with reference to cross-sectional illustrations that are schematic illustrations of idealized example embodiments (and intermediate structures). As such, variations from the shapes of the illustrations as a result, for example, of manufacturing techniques and/or tolerances, are to be expected. Thus, example embodiments should not be construed as limited to the particular shapes of regions illustrated herein but are to include deviations in shapes that result, for example, from manufacturing. For example, an implanted region illustrated as a rectangle will typically have rounded corners or curved features and/or a gradient of implant concentration at its edges rather than a binary change from implanted to non-implanted region. Likewise, a buried region formed by implantation may result in some implantation in the region between the buried region and the surface where the implantation occurs. Thus, the regions illustrated in the figures are schematic in nature and their shapes are not intended to illustrate the actual shape of a region of a device and are not intended to limit the scope of the presently disclosed subject matter.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the presently disclosed subject matter belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Hereinafter, example embodiments will be explained in detail with reference to the accompanying drawings.

Fig. 1A is a block diagram of an example embodiment of a system 100 according to the disclosed subject matter. In various embodiments, the system 100 (also referred to as an electronic system, electronic device, etc.) may include a computing device (such as, for example, a laptop computer, a desktop computer, a workstation, a system on a chip (SOC), a personal digital assistant, a smartphone, a tablet computer, and other suitable computers or virtual machines or virtual computing devices thereof).

In the illustrated embodiment, the system 100 may include a processor 102. The processor 102 may be configured to execute one or more machine-executable instructions or fragments of executable software, firmware, or a combination thereof. In the illustrated embodiment, the processor 102 may include a core, a processing unit, or a portion of a larger integrated circuit.

In the illustrated embodiment, the system 100 may include a memory cache circuit or system (also referred to as a cache, cache memory) 104. Cache 104 may be configured to temporarily store data (e.g., data 133). In the illustrated embodiment, the cache 104 may include a level 1 (L1) cache 111 and a level 2 (L2) cache 112. In such embodiments, caches 111 and 112 may be hierarchical. In various embodiments, other cache levels may be included in cache memory 104 or may be removed from cache memory 104. In various embodiments, a portion of the cache levels may be included in the processor 102 (e.g., a level 0 (L0) cache). It is to be understood that the above is merely one illustrative example, and that the disclosed subject matter is not so limited.

In the illustrated embodiment, system 100 may include a memory interconnect 106. In various embodiments, memory interconnect 106 may connect and/or manage multiple cache systems 104. While not present in this simplified illustration, such multiple cache systems 104 may be present in other embodiments. In such embodiments, the memory interconnect 106 may facilitate cache coherency.

In the illustrated embodiment, the system 100 may include a memory controller circuit 108. The memory controller 108 may be configured to interface with the system memory 110 and may send and receive messages between the system memory 110 (or another intermediate circuit) and the processor 102 (and its intermediate circuits such as the memory interconnect 106, etc.).

In the illustrated embodiment, system 100 may include a system memory 110. System memory 110 may be configured to store data (e.g., data 133). In various embodiments, system memory 110 may comprise Dynamic Random Access Memory (DRAM). Nevertheless, it is to be understood that the above is merely one illustrative example, and the disclosed subject matter is not so limited. In various embodiments, volatile and/or nonvolatile memory technology may be employed. In general, system memory 110 may store a copy of each piece of data stored in cache 104, but the copy may be old or outdated (e.g., writes to the cache may not yet have been propagated to system memory).

In various embodiments, the processor 102, cache 104, memory interconnect 106, and memory controller 108 may be included in an integrated circuit or processing unit 180 (e.g., a system on a chip (SOC)). In various embodiments, system memory 110 may be included in or integrated with other components. It is to be understood that the above is merely one illustrative example, and the disclosed subject matter is not so limited.

In the illustrated embodiment, processor 102 may desire to access a piece of data (e.g., data 133). To do so, conventionally, the processor 102 will issue a memory access request (e.g., a read request, a write request). The system 100 will check the data in the cache 104 by first searching for the nearest tier (tier) or level (e.g., L1) before proceeding to the next level (e.g., L2). If the data is not yet found, a request for the data will be made to system memory 110. This series of searches wastes time, particularly when the data is critical and latency is important.

In the illustrated embodiment, the processor 102 may issue more than one request for data, but two requests for data in substantially parallel. In the illustrated embodiment, the processor 102 may issue a first request 191 for a desired piece of data from the cache 104 (as is conventionally done). That is, the processor 102 may issue a first request 191 to the cache 104 for a desired piece of data. However, in various embodiments, the processor 102 may also issue a second request 192 for a desired piece of data from the system memory 110 (the second request 192 being routed through the memory interconnect 106 and the memory controller 108). That is, the processor 102 may also issue a second request 192 to the system memory 110 for a desired piece of data.

In various embodiments, the second request 192 may be allowed to occur naturally or as would a conventional memory access to system memory 110 and cause the data 133 to be returned. However, if cache request 191 also returns data 133, then two versions of data 133 may cause problems. The cached and system-level versions may be inconsistent (i.e., have different values), the write-after-read ordering may be complex, and the completed system memory request 192 unnecessarily uses system resources. As such, it may not be desirable to allow system level memory accesses to occur without further analysis.

In the illustrated embodiment, processor 102 may speculatively issue second request 192. In such embodiments, second request 192 may include a bit or flag to indicate that second request 192 is not a normal (e.g., non-speculative or legacy) memory access, but rather a special speculative memory access requiring non-legacy handling and processing.

In the illustrated embodiment, the second request 192 may be received by the memory interconnect 106. This is in contrast to embodiments in which only non-speculative second requests are used, which may bypass the memory interconnect 106 and instead go to the memory controller 108. In the illustrated embodiment, the second request 192 may be routed to the memory interconnect 106 because the memory interconnect 106 has access to information needed to process the second request 192 of a speculative nature.

In the illustrated embodiment, memory interconnect 106 may include snoop filter directory 116 and/or snoop caches (not shown but described above). In such embodiments, snoop filter 116 (or similar data structure) may be configured to indicate what data 133 is currently stored in cache 104.

In such embodiments, instead of completing the second request 192 to the system memory 110, the memory interconnect 106 may predict or determine whether the first request 191 will or may be successful, regardless of whether the first request 191 to the cache 104 will be successful.

In various embodiments, the memory interconnect 106 may predict or determine whether the first request 191 will or may be successful by determining whether a piece of data (e.g., data 133) of the request is included or stored in the cache 104, where "storing" includes the idea that the data has also been updated or modified in the cache of the requester processor. If it is determined that a piece of data (e.g., data 133) of the request is included or stored in cache 104, then first request 191 may be determined or predicted to be successful. If it is determined that a piece of data (e.g., data 133) of the request is not included or stored in cache 104, then first request 191 may be determined or predicted to fail.

In various embodiments, this determination may be made by examining or using snoop filter directory 116 or cache coherency mechanisms/circuits of memory interconnect 106. In various embodiments, snoop filter directory 116 may be configured to be conservative or pessimistic. In such embodiments, snoop filter directory 116 may be allowed to incorrectly indicate that a piece of data is in cache 104 (i.e., false positives), but not incorrectly indicate that a piece of data is not in cache 104 (i.e., false negatives). In such embodiments, the memory interconnect 106 would not be configured to incorrectly predict only that the second request 192 should be cancelled, but not that the second request should be made. In such embodiments, the memory interconnect 106 may be biased or weighted towards manageable prediction failures.

In various embodiments, the memory interconnect 106 may allow a second request 192 to be made to the system memory 110 if the first request 191 is determined or predicted to fail. The second request 192 may retrieve the data 133 from the system memory 110 and return it to the processor 102. In such an embodiment, since the first request 191 has failed, there will not be a second version of the data 133 that conflicts with the version of the system memory 110, and resources for retrieving data from the system memory 110 are not wasted. Instead, as described above, the speculative second request 192 will have the retrieved data with lower latency than the conventional "cache miss-system request" scheme.

Conversely, if the first request 191 is determined or predicted to be successful (i.e., the cache 104 is believed to include the data 133), the memory interconnect 106 may cancel the second request 192 and not allow the second request 192 to proceed to the system memory 110. In such embodiments, preventing the second request 192 from proceeding to the memory controller 108 may instruct the memory controller 108 not to forward the second request 192.

In such embodiments, by canceling the second request, the processor 102 may not have the problem described above of having multiple versions (from both the cache 104 and the system memory 110) of the same data. Furthermore, resources expended in retrieving data from system memory 110 may not be expended or wasted. It is to be understood that the above are merely some illustrative examples, and the disclosed subject matter is not limited thereto.

In various embodiments, memory interconnect 106 may be configured to ensure that there will be no system memory 110 response data when unsecured for coherency reasons. Thus, when system memory 110 response data is received, it may be forwarded directly to processor 102 without any further checking in cache 104 by bypassing cache 104.

In various example embodiments, the memory interconnect 106 may be further configured to: the speculative second request 192 is prevented or canceled in the event that the memory interconnect 106, system memory 110, and/or cache 104 queue (or other system resource) is heavily loaded. In such embodiments, the second request 192 may be delayed or throttled if the system 100 has a large burden. Alternatively, if a second request 192 for a piece of data from the system memory arrives on the memory interconnect 106 before the write request associated with the piece of data, the second request 192 is cancelled. At this point, the second request 192 may be earlier than the write request.

Further, in various embodiments, the processor 102 may be configured to issue the speculative second request 192 only when memory access is deemed critical or latency is important. In such embodiments, this may assist in the management of system resources. In some embodiments, the threshold for issuing speculative second requests 192 may be dynamic, such that requests 192 are made more often when system resources are sufficient. In another embodiment, the processor 102 may issue the speculative second request 192 only if it predicts a cache miss. In another embodiment, the processor 102 may issue the second request 192 as a natural task for all or most memory accesses. It is to be understood that the above are merely some illustrative examples, and the disclosed subject matter is not limited thereto.

In the illustrated embodiment, if the memory interconnect 106 decides to cancel the second request 192, it may issue a cancel response 195. In such embodiments, the cancellation response 195 may inform the requesting processor 102 that the second request 192 has been cancelled. In such embodiments, the processor 102 may no longer wait or allocate resources to data expected to be returned due to the second request 192. In another embodiment, if the memory interconnect 106 incorrectly cancels the second request 192 (e.g., due to a misprediction or false positive), the cancellation response 195 may notify the processor 102 that a third request (shown in FIG. 1B) may be needed. It is to be understood that the above is merely one illustrative example, and the disclosed subject matter is not so limited.

In the illustrated embodiment, the first request 191 to the cache 104 completes normally or conventionally. In such an embodiment, the first request 191 may determine whether the data is in the cache 104. The first request 191 causes each cache level, tier or level (e.g., L1 cache 111, L2 cache 112, etc.) to be checked in turn until either data is found or there are no more levels in the cache to check. In such an embodiment, the first request 191 may either return a successful response or a failed response (neither shown). In various embodiments, when the second request 192 is issued in parallel with the first request 191, the first request 191 may (in the worst case) terminate at the last cache level of the search. In another embodiment, the first request 191 may be issued to the system memory 110 as conventionally described.

Fig. 1B is a block diagram of an example embodiment of a system 101 according to the disclosed subject matter. In various embodiments, system 101 may include a computing device, such as, for example, a laptop computer, desktop computer, workstation, system on a chip (SOC), personal digital assistant, smartphone, tablet, and other suitable computer or virtual machine or virtual computing device thereof. In various embodiments, system 101 may illustrate a multi-core version of system 100 of FIG. 1A.

In such embodiments, as described above, system 101 may include processor 102, cache system 174 (similar to cache system 104), memory interconnect 106, memory controller 108, and system memory 110. In various embodiments, integrated circuit 181 may include processor 102, cache system 174, memory interconnect 106, memory controller 108, and processor 102B. It is to be understood that the above is merely one illustrative example, and the disclosed subject matter is not so limited.

In the illustrated embodiment, the system 101 may include multiple processors, multiple processing units, or multiple cores (represented by the addition of the processor 102B). In such embodiments, each processor 102 may be associated with its own respective portion of the cache system 174.

In the illustrated embodiment, cache system 174 may include L1 caches 111 and 111B, L2 caches 112 and 112B and level 3 (L3) cache 113. In such embodiments, caches 111 and 112 (grouped into portion 114) may be dedicated to processor 102. However caches 111B and 112B (grouped into portion 114B) may be dedicated to processor 102B. In the illustrated embodiment, the cache 113 may be used for their unified or shared use. In another embodiment, cache 113 may not be present. It is to be understood that the above are merely some illustrative examples, and the disclosed subject matter is not limited thereto.

In the illustrated embodiment, when processing the speculative second request 192, the memory interconnect 106 may be configured to only check for the presence of the requested data in a cache associated with the requesting processor (e.g., processor 102). In such embodiments, caches 111, 112, and even 113 may be checked (via snoop filter 116 or other cache coherency structure), while caches 111B and 112B (associated with non-requesting processor 102B) may not be checked. Likewise, if processor 102B is the requesting processor, caches 111B, 112B, and 113 may have been checked.

In various embodiments, once the cache associated with the requesting processor (e.g., processor 102) is checked and found to be missing, the other caches (e.g., caches 111B and 112B) may be checked. In such embodiments, other caches (e.g., caches 111B and 112B) may include the data, and since the snoop operation may return the desired data, the speculative second request 192 to system memory 110 is made unnecessary. In another embodiment, only the presence of data in the requester's cache (e.g., caches 111, 112, and 113) may be checked for use in generating speculative cancellation response 195.

As described above, in various embodiments, the memory interconnect 106 may cancel the second request 192 and issue a cancel response 195 to the requesting processor 102. However, as described above, sometimes an error may occur with memory interconnect 106 with respect to the presence of data in cache 174. In such an embodiment, the first request 191 may eventually fail. In such embodiments, the processor 102 may be configured to issue a non-speculative third request 193 to the system memory 110 for the desired data. In various embodiments, the non-speculative third request 193 may be routed to the cache 174 to avoid the possibility that data is brought into the cache system 174 (e.g., in the cache 113) after the first request 191 is issued. In another embodiment, the non-speculative third request 193 may be routed to the memory interconnect 106 or the memory controller 108 (when the memory interconnect 106 does not gate or cancel the non-speculative request). In various embodiments, non-speculative third request 193 may include a flag or bit indicating that it is non-speculative and may not be cancelled as described herein.

Fig. 2 is a block diagram of an example embodiment of a system 200 according to the disclosed subject matter. In various embodiments, system 200 may include, or may be part of, the memory interconnect circuitry described above.

In various embodiments, the system 200 may include a memory access interface circuit 202 configured to receive and transmit memory access requests and responses. In various embodiments, the memory access interface circuitry 202 may be configured to receive speculative memory access requests as described above. In another embodiment, the memory access interface circuitry 202 may be configured to send a cancellation response message as described above. In another embodiment, the memory access interface circuitry 202 may be configured to receive non-speculative memory access requests as described above. It is to be understood that the above are merely some illustrative examples, and the disclosed subject matter is not limited thereto.

In various embodiments, system 200 may include a cache coherency data structure 204. In various embodiments, the cache coherency data structure 204 may be capable of indicating whether a piece of data is currently stored in the cache system. In various embodiments, such cache coherency data structures 204 may be accessed or queried when determining whether speculative memory accesses should be allowed.

In various embodiments, cache coherency data structure 204 may comprise a snoop filter directory as described above. In another embodiment, the cache coherency data structure 204 may comprise a cache tag or circuitry for accessing a cache tag of a cache system. In another embodiment, a directory-based cache coherency mechanism may be employed. It is to be understood that the above is merely one illustrative example, and the disclosed subject matter is not so limited.

In various embodiments, system 200 may include speculative request management circuitry 206. In various embodiments, speculative request management circuitry 206 may be configured to gate or determine whether a speculative request will be an appropriate use or a forgiving use of system resources (e.g., cycles, bandwidth, etc.). As described above, in such embodiments, speculative request management circuitry 206 may be configured to determine whether a requested piece of data is already stored in the cache, or whether substantially parallel memory accesses to the cache system are likely to be successful. In such embodiments, the determination may be made using the cache coherency data structure 204.

In various embodiments, if speculative request management circuitry 206 determines that a speculative request is likely to be redundant, it may cancel or otherwise block the speculative request. For example, speculative request management circuitry 206 cancels a speculative request at least in part by issuing a cancel request message to the requesting device. In such embodiments, if speculative request management circuitry 206 determines that the speculative request is likely to be redundant, it may allow the speculative request to proceed.

Fig. 3 is a flow diagram of an example embodiment of a technique 300 in accordance with the disclosed subject matter. In various embodiments, the technique 300 may be used or generated by a system (such as the systems of fig. 1A, 1B, and 2).

Nevertheless, it is understood that the above are merely some illustrative examples and that the disclosed subject matter is not so limited. It is to be understood that the disclosed subject matter is not limited to the order or number of acts shown by technique 300.

Block 302 illustrates: in one embodiment, as described above, the requesting processor or originating circuit may determine whether the memory access is a latency critical access or an otherwise important access. Block 302 may also include additional prediction logic to determine whether to issue speculative requests. The decision need not be based on delay only. If the memory access is not a latency critical access or another important access, a single memory request may be made to the cache. If the memory access is a latency critical access or another important access, then two memory requests may be made substantially in parallel to both the cache and the system memory, as described above.

Block 304 illustrates: in one embodiment, as part of the first cache based request, a check may be made to determine if the data is actually in the cache system. Block 306 illustrates: in one embodiment, a cache hit or miss may occur. Block 399 shows: in one embodiment, if a cache hit occurs, a first cache based request may be used to complete the memory access.

Block 308 depicts: in one embodiment, if a cache miss occurs, a determination may be made as to whether a speculative second request has been made (per block 302). Block 307 shows: in one embodiment, as described above, if a speculative request is not made, a non-speculative request to system memory may be made. Block 309 shows: in one embodiment, if a speculative request has been made, the system may wait for the result of the request. In various embodiments, the step of block 334 may occur if the speculative request is eventually (and incorrectly) cancelled. Connector 335 connects block 309 and block 334.

Block 312 shows: in one embodiment, the speculative second request may be issued to system memory via a memory interconnect, as described above. Blocks 312 and 316 illustrate: in one embodiment, the memory interconnect may check a snoop filter directory or other cache coherency structure to determine whether the requested data is currently stored in the cache system.

Block 318 illustrates: in one embodiment, as described above, if the requested data is not in the cache system, the memory access may proceed to system memory. In such embodiments, the memory request may be implemented from system memory.

Block 322 shows: in one embodiment, if the data is in a cache system, it may be determined how best to provide the data to the requesting processor. In particular, it may be determined whether data is stored in the cache. In various embodiments, further testing may include whether the data is stored in an acceptable state (e.g., an invalid state that is not a MESI (modified-exclusive-shared-invalid) protocol).

Block 324 depicts: in one embodiment, if the data is not in the cache associated with the requesting processor, it may be determined whether it is actually in a cache associated with another processor of the multiprocessor system. Block 324 depicts: in one embodiment, if the result of block 316 is incorrect or too limited, and the data is not also in the cache of another processor, the speculative request may proceed to system memory (block 318). Otherwise, if the data is in another cache, block 326 illustrates: in one embodiment, data may be made available via snooping other caches.

Block 332 illustrates: in one embodiment, as described above, if the data is available in the cache and it is not desired to get it via system memory (block 318), the speculative request may be cancelled. As described above, this may include sending a cancellation response back to the requesting processor. In a preferred embodiment, there may not be a link between blocks 324 and 332. In such embodiments, if the desired data is not in the requester cache but in other caches, the memory interconnect may supply the data via snooping and may return a cancellation response to the processor. Making a retry unnecessary.

Block 334 depicts: in one embodiment, at some later time, the first cache-based request may be completed. Block 399 shows: in one embodiment, if a cache hit occurs, a first cache based request may be used to complete the memory access. Block 338 depicts: in one embodiment, as described above, on the other hand, if the memory interconnect is in error in cancelling the speculative request (block 332) and the data is not actually in the cache (blocks 316 and 306), then a non-speculative request may be attempted. In such embodiments, non-speculative requests may be issued to system memory.

Fig. 4 is a schematic block diagram of an information handling system 400 that may include semiconductor devices formed in accordance with the principles of the disclosed subject matter.

Referring to FIG. 4, an information handling system 400 may include one or more devices constructed in accordance with the principles of the disclosed subject matter. In another embodiment, information handling system 400 may employ or perform one or more techniques in accordance with the principles of the disclosed subject matter.

In various embodiments, information handling system 400 may include computing devices (such as, for example, laptop computers, desktop computers, workstations, servers, blade servers, personal digital assistants, smart phones, tablet computers, and other suitable computers or virtual machines or virtual computing devices thereof). In various embodiments, information handling system 400 may be used by a user (not shown).

Information handling system 400 according to the disclosed subject matter may also include a Central Processing Unit (CPU), logic, or processor 410. In some embodiments, processor 410 may include one or more blocks of Functional Units (FUBs) or Combinational Logic (CLBs) 415. In such embodiments, the combinational logic block may include various boolean logic operations (e.g., nand, nor, xor), stable logic devices (flip-flops, latches), other logic devices, or combinations thereof. These combinational logic operations may be configured in a simple or complex manner to process the input signals to achieve the desired result. It is understood that while some illustrative examples of synchronous combinational logic operations are described, the disclosed subject matter is not so limited and may include asynchronous operations or a mixture thereof. In one embodiment, the combinational logic operation may include a plurality of Complementary Metal Oxide Semiconductor (CMOS) transistors. In various embodiments, these CMOS transistors may be arranged in gates that perform logical operations; nevertheless, it is understood that other techniques may be used and are within the scope of the disclosed subject matter.

The information processing system 400 according to the disclosed subject matter may also include volatile memory 420 (e.g., Random Access Memory (RAM)). Information handling system 400 according to the disclosed subject matter may also include non-volatile memory 430 (e.g., a hard disk drive, optical memory, NAND, or flash memory). In some embodiments, volatile memory 420, non-volatile memory 430, or combinations or portions thereof, may also be referred to as "storage media". In various embodiments, the volatile memory 420 and/or nonvolatile memory 430 may be configured to store data in a semi-permanent or substantially permanent form.

In various embodiments, the information handling system 400 may include one or more network interfaces 440 configured to allow the information handling system 400 to become part of and communicate via a communication network. Examples of Wi-Fi protocols can include, but are not limited to: institute of Electrical and Electronics Engineers (IEEE)802.11g, IEEE 802.11 n. Examples of cellular protocols may include, but are not limited to: IEEE 802.16m (also known as wireless-MAN (metropolitan area network) advanced), Long Term Evolution (LTE) advanced, enhanced data rates for GSM evolution (EDGE), evolved high speed packet access (HSPA +). Examples of wired protocols may include, but are not limited to: IEEE 802.3 (also known as ethernet), fibre channel, power line communication (e.g., HomePlug, IEEE 1901). It is to be understood that the above are merely some illustrative examples, and the disclosed subject matter is not limited thereto.

The information processing system 400 according to the disclosed subject matter may also include a user interface unit 450 (e.g., a display adapter, a haptic interface, a human interface device). In various embodiments, this user interface unit 450 may be configured to receive input from a user and/or provide output to a user. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); input from the user may be received in any form, including acoustic, speech, or tactile input.

In various embodiments, the information handling system 400 may include one or more other devices or hardware components 460 (e.g., a display or monitor, a keyboard, a mouse, a camera, a fingerprint reader, a video processor). It is to be understood that the above are merely some illustrative examples, and the disclosed subject matter is not limited thereto.

The information handling system 400 according to the disclosed subject matter may also include one or more system buses 405. In such embodiments, the system bus 405 may be configured to communicatively connect the processor 410, the volatile memory 420, the non-volatile memory 430, the network interface 440, the user interface unit 450, and the one or more hardware components 460. Data processed by the processor 410 or data input from outside the non-volatile memory 430 may be stored in the non-volatile memory 430 or the volatile memory 420.

In various embodiments, information handling system 400 may include or execute one or more software components 470. In some embodiments, the software components 470 may include an Operating System (OS) and/or applications. In some embodiments, the OS may be configured to provide one or more services to applications and manage or act as an intermediary between the applications and various hardware components (e.g., processor 410, network interface 440) of the information handling system 400. In such embodiments, the information handling system 400 may include one or more native applications that may be installed locally (e.g., within the non-volatile memory 430) and configured to be executed directly by the processor 410 and to interact directly with the OS. In such embodiments, the native application may comprise pre-compiled machine executable code. In some embodiments, the native application may include a script interpreter (e.g., csh, AppleScript, AutoHotkey) or a virtual execution machine (VM) (e.g., Java virtual machine, microsoft common language runtime) configured to translate source or object code into executable code that is then executed by the processor 410.

The semiconductor devices described above may be packaged using various packaging processes. For example, a semiconductor device constructed in accordance with the principles of the disclosed subject matter may be packaged using any of the following: package On Package (POP) technology, Ball Grid Array (BGA) technology, Chip Scale Package (CSP) technology, leaded plastic chip carrier (PLCC) technology, plastic dual in-line package (PDIP) technology, waffle die package technology, die in wafer form technology, Chip On Board (COB) technology, ceramic dual in-line package (CERDIP) technology, Plastic Metric Quad Flat Pack (PMQFP) technology, Plastic Quad Flat Pack (PQFP) technology, small outline package (SOIC) technology, Shrink Small Outline Package (SSOP) technology, Thin Small Outline Package (TSOP) technology, Thin Quad Flat Pack (TQFP) technology, system-in-package (SIP) technology, multi-chip package (MCP) technology, wafer-level fabrication package (WFP) technology, wafer-level processing package on package (WSP) technology, and other technologies as will be known to those skilled in the art.

Method steps may be performed by one or more programmable processors executing a computer program to perform functions by operating on input data and generating output. Method steps can also be performed by, and an apparatus can be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

In various embodiments, a computer-readable medium may include instructions that, when executed, cause an apparatus to perform at least a portion of the method steps. In some embodiments, the computer readable medium may be included in magnetic media, optical media, other media, or a combination thereof (e.g., CD-ROM, hard drive, read-only memory, flash drive). In such embodiments, the computer-readable medium may be an article of manufacture that is tangibly and non-transitory to implement.

While the principles of the disclosed subject matter have been described with reference to example embodiments, it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit or scope of the disclosed concepts. Accordingly, it should be understood that the above embodiments are not limiting, but merely illustrative. Thus, the scope of the disclosed concept is to be determined by the broadest permissible interpretation of the following claims and their equivalents, and shall not be restricted or limited by the foregoing description. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the scope of the embodiments.

19页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种大型货运无人机测控信息实时处理系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类