Method, system, and medium for controlled buffer injection of incoming data

文档序号:510377 发布日期:2021-05-28 浏览:20次 中文

阅读说明:本技术 呼入数据的受控缓存注入的方法、系统和介质 (Method, system, and medium for controlled buffer injection of incoming data ) 是由 拉马·克里希纳·戈文达拉胡 程立群 帕塔萨拉蒂·兰加纳坦 于 2015-05-05 设计创作,主要内容包括:本发明涉及呼入数据的受控缓存注入的方法、系统和介质,包括在计算机存储介质上编码的计算机程序。所述方法、系统、以及设备包括向输入输出装置提供对于数据的请求以及接收用于所请求的数据的存储器地址的集合的动作。附加性动作包括确定存储器地址的子集,提供对于处理器预取或注入与存储器地址的子集相对应的数据的请求,以及接收所请求的数据和存储器地址的集合。附加性动作包括确定所接收的数据包括用于已经请求被预取或注入的存储器地址的子集的数据,将用于存储器地址的子集的数据存储在处理器的缓存中以及将用于存储器地址的所接收的数据的剩余数据存储在主存储器中。(The present invention relates to methods, systems, and media, including computer programs encoded on computer storage media, for controlled cache injection of incoming data. The method, system, and apparatus include acts of providing a request for data to an input-output device and receiving a set of memory addresses for the requested data. Additional actions include determining a subset of memory addresses, providing a request for a processor to prefetch or inject data corresponding to the subset of memory addresses, and receiving the requested data and a set of memory addresses. Additional actions include determining that the received data includes data for a subset of memory addresses that have been requested to be prefetched or injected, storing the data for the subset of memory addresses in a cache of the processor and storing remaining data of the received data for the memory addresses in the main memory.)

1. A computer-implemented method, comprising:

providing, by a user process to an input-output device, a request for data that is not yet available, wherein the user process is a software application that uses data received from the input-output device;

receiving, by the user process and from the input-output device, a set of memory addresses for the requested data that is not yet available;

determining, by the user process, a subset of the received memory addresses of the requested not yet available data, the subset corresponding to data to be cached by a processor;

determining, by the user process, a cache level in a cache of a processor to which the subset of memory addresses is prefetched;

providing, by the user process to the processor, a request for the processor to allocate space in the cache of the processor to cache data corresponding to the subset of memory addresses at the determined cache level for the requested not yet available data;

receiving, by a memory controller, the requested data and the set of memory addresses corresponding to the requested data; and

storing, by the memory controller, data corresponding to a subset of the memory addresses of the received data for the set of memory addresses in the allocated space of the cache of the processor and storing remaining data of the received data for the set of memory addresses in a main memory.

2. The method of claim 1, wherein storing, by the memory controller, data for the subset of memory addresses in a cache of the processor and remaining data of the received data for the memory addresses in a main memory comprises:

storing, by the memory controller, data for a subset of the memory addresses in a first level cache of the processor, and storing remaining data of the received data for the memory addresses in a main memory.

3. The method of claim 1, wherein determining the subset of memory addresses for the requested data comprises:

determining a number of memory addresses to prefetch in the cache; and

selecting the determined number of memory addresses from the set of memory addresses as a subset of the memory addresses.

4. The method of claim 1, wherein determining the subset of memory addresses corresponding to data to be cached by a processor comprises:

dynamically selecting a memory address from the set of memory addresses as a subset of the memory addresses based on a user process or system behavior.

5. The method of claim 1, wherein providing the request for the processor to allocate space in the cache of the processor to cache data corresponding to the subset of memory addresses for the requested data comprises:

providing an instruction indicating a subset of the memory addresses to be cached.

6. The method of claim 1, comprising:

receiving, by the processor, the request for the processor to cache data corresponding to a subset of the memory addresses for the requested data;

selecting, by the processor, a cache of the processor from a plurality of caches; and

caching the memory address in a cache of the processor selected by the processor.

7. The method of claim 1, further comprising:

issuing a DMA write operation indicating the requested data and the memory address, an

Wherein receiving, by a memory controller, the requested data and the set of memory addresses corresponding to the requested data comprises receiving the DMA write operation.

8. The method of claim 1, wherein storing, by the memory controller, data for the subset of memory addresses corresponding to the subset of memory addresses in the allocated space of the cache of the processor and remaining data of the received data for the set of memory addresses in main memory comprises:

trigger the processor to store data for a subset of the memory addresses in the cache; and

triggering storage of the remaining data of the received data for the memory address in the main memory.

9. The method of claim 1, wherein providing a request for the processor to cache data corresponding to a subset of the memory addresses for the requested data comprises:

providing a request for the processor to cache data corresponding to a subset of the memory addresses for the requested data to a particular portion of the cache.

10. The method of claim 1, comprising:

determining to prefetch a portion of the remaining data stored in the main memory;

providing, by the user process, to the processor for the processor to prefetch data corresponding to another subset of the memory addresses for the requested data.

11. A system, comprising:

one or more computers; and

one or more storage devices storing instructions operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising:

providing, by a user process to an input-output device, a request for data that is not yet available, wherein the user process is a software application that uses data received from the input-output device;

receiving, by the user process and from the input-output device, a set of memory addresses for the requested data that is not yet available;

determining, by the user process, a subset of the received memory addresses of the requested not yet available data, the subset corresponding to data to be cached by a processor;

determining, by the user process, a cache level in a cache of a processor to which the subset of memory addresses is prefetched;

providing, by the user process to the processor, a request for the processor to allocate space in the cache of the processor to cache data corresponding to the subset of memory addresses at the determined cache level for the requested not yet available data;

receiving, by a memory controller, the requested data and the set of memory addresses corresponding to the requested data; and

storing, by the memory controller, data corresponding to a subset of the memory addresses of the received data for the set of memory addresses in the allocated space of the cache of the processor and storing remaining data of the received data for the set of memory addresses in a main memory.

12. The system of claim 11, wherein storing, by the memory controller, data for the subset of memory addresses in a cache of the processor and remaining data of the received data for the memory addresses in a main memory comprises:

storing, by the memory controller, data for a subset of the memory addresses in a first level cache of the processor, and storing remaining data of the received data for the memory addresses in a main memory.

13. The system of claim 11, wherein determining the subset of memory addresses for the requested data comprises:

determining a number of memory addresses to prefetch in the cache; and

selecting the determined number of memory addresses from the set of memory addresses as a subset of the memory addresses.

14. The system of claim 11, wherein determining the subset of memory addresses corresponding to data to be cached by a processor comprises:

dynamically selecting a memory address from the set of memory addresses as a subset of the memory addresses based on a user process or system behavior.

15. The system of claim 11, wherein providing the request for the processor to allocate space in the cache of the processor to cache data corresponding to the subset of memory addresses for the requested data comprises:

providing an instruction indicating a subset of the memory addresses to be cached.

16. The system of claim 11, comprising:

receiving, by the processor, the request for the processor to cache data corresponding to a subset of the memory addresses for the requested data;

selecting, by the processor, a cache of the processor from a plurality of caches; and

caching the memory address in a cache of the processor selected by the processor.

17. The system of claim 11, the operations further comprising:

issuing a DMA write operation indicating the requested data and the memory address, an

Wherein receiving, by a memory controller, the requested data and the set of memory addresses corresponding to the requested data comprises receiving the DMA write operation.

18. The system of claim 11, wherein storing, by the memory controller, data for the subset of memory addresses corresponding to the subset of memory addresses in the allocated space of the cache of the processor and remaining data of the received data for the set of memory addresses in main memory comprises:

trigger the processor to store data for a subset of the memory addresses in the cache; and

triggering storage of the remaining data of the received data for the memory address in the main memory.

19. The system of claim 11, wherein providing a request for the processor to cache data corresponding to a subset of the memory addresses for the requested data comprises:

providing a request for the processor to cache data corresponding to a subset of the memory addresses for the requested data to a particular portion of the cache.

20. A non-transitory computer-readable medium storing software, the software including instructions executable by one or more computers, which upon such execution cause the one or more computers to perform operations comprising:

providing, by a user process to an input-output device, a request for data that is not yet available, wherein the user process is a software application that uses data received from the input-output device;

receiving, by the user process and from the input-output device, a set of memory addresses for the requested data that is not yet available;

determining, by the user process, a subset of the received memory addresses of the requested not yet available data, the subset corresponding to data to be cached by a processor;

determining, by the user process, a cache level in a cache of a processor to which the subset of memory addresses is prefetched;

providing, by the user process to the processor, a request for the processor to allocate space in the cache of the processor to cache data corresponding to the subset of memory addresses at the determined cache level for the requested not yet available data;

receiving, by a memory controller, the requested data and the set of memory addresses corresponding to the requested data; and

storing, by the memory controller, data corresponding to a subset of the memory addresses of the received data for the set of memory addresses in the allocated space of the cache of the processor and storing remaining data of the received data for the set of memory addresses in a main memory.

Technical Field

The present disclosure relates generally to processor caches.

Background

A computer may be able to access data from a cache faster than the computer may be able to access data from main memory. Thus, to speed up the processing involved in accessing data, a computer may store some portions of the data in a cache instead of main memory to speed up execution.

Disclosure of Invention

In general, aspects of the subject matter described in this specification may relate to processes for prefetching or actively injecting incoming input-output data. The processor may include a cache from which the processor may access data faster than the processor may access data from main memory. These caches may be associated with different levels. For example, a multi-core processor may include multiple caches (each associated with a particular core of the processor) at a first level (e.g., an L1 cache) and a second level (e.g., an L2 cache) and a last level cache (e.g., an L3 cache) that is uniformly shared by the multiple cores. Different levels may be associated with different access speeds and sharing models. For example, data from a first level cache may be accessed faster than data from a lower level cache (e.g., a last level cache).

To speed up the process involved in accessing data, the system may prefetch data into the processor's cache. Prefetching data may include accessing and storing the data in a cache before the data is used by the processor. For example, if the system determines that the processor can use data stored in a particular memory address of the main memory after one microsecond, the system may issue a prefetch instruction, access the data stored in the particular memory address of the main memory, and store the accessed data in the processor's cache before the one microsecond passes so that it is available to the processor in a timely manner with low latency. After one microsecond and when the processor needs to use the data, the processor may access the data from the cache instead of from the main memory, thereby obtaining faster execution because the processor does not have to stall waiting for the data from the main memory.

In an embodiment, the system may initially inject all data received from the input output device into the last level cache of the processor. For example, the system may assume that the processor may soon need all the data recently received by the input-output device. Thus, the system may cache all recently received data into the processor's L3 cache. However, not all received data is necessarily used soon. For example, half of the received data may be used immediately, but half of the data may not be used until after one minute or more. The data cache that will not be used soon can replace the existing data in the cache that will be used soon, polluting the cache, which can also be referred to as a working set in the polluting cache. In another example, for streaming data situations (e.g., large amounts of incoming data), inefficiencies may arise when later data overwrites previous data before the previous data is consumed by the processor and incorporated into the ongoing computation. Furthermore, accessing data from the last level cache may be slower than accessing data from the first level cache of the processor. Thus, data cached in the last level cache rather than the first level cache may slow down processing by the processor, although the processing is still faster than not caching.

In another embodiment, rather than initially prefetching all of the data received from the input output device in the last level cache of the processor, the system may prefetch a subset of the data received from the input output device into the first level cache of the processor. The subset of data may be a prefix or header portion of the data. For example, of twenty cache lines of data received from an input-output device, the system may prefetch the first three cache lines of data and cache the data for the first three cache lines into the L1 cache of a particular core of the processor. In processing data stored in the first level cache, the system may utilize just-in-time prefetching (also known as stream prefetching) to access the remainder from main memory and store the remainder in the first level cache just before the processor consumes the remainder and assimilates the data into the ongoing computation. This may be important because the first level cache may be small in size and any injection into the first level cache may need to be controlled and used for only the first few cache lines.

In prefetching the subset, the system may determine the amount of data that the processor is likely to use soon. For example, the first level cache may be represented as an array having 8 columns and 64 rows, each row representing a cache line, with each cell representing a memory address, and the determined amount of data may correspond to a number of rows, e.g., 1 to 64 rows. The system may then store data corresponding to the determined amount in a first level cache of the processor, and store data corresponding to a remaining portion of the received data in a main memory. Thus, the system may reduce cache pollution and increase data access speed.

In some aspects, the subject matter described herein may be embodied in methods that may include the following actions: the method further includes providing, by the user process, a request for data to the input-output device, and receiving, by the user process and from the input-output device, a set of memory addresses for the requested data in response to providing the request for data to the input-output device. Additional actions include determining, by the user process, a subset of memory addresses for the requested data in response to receiving the set of memory addresses for the requested data, and providing, by the user process and to the processor, a request for the processor to prefetch data corresponding to the subset of memory addresses for the requested data in response to determining the subset of memory addresses for the requested data. Other actions include receiving, by the memory controller, the requested data and the set of memory addresses for the requested data after providing a request for the processor to prefetch data corresponding to a subset of the memory addresses for the requested data. Additional actions include, in response to receiving, by the memory controller, the requested data and the set of memory addresses for the requested data, determining, by the memory controller, that the received data includes data for the subset of memory addresses that have been requested to be prefetched, and in response to determining that the received data includes data for the subset of memory addresses that have been requested to be prefetched, storing, by the memory controller, the data for the subset of memory addresses in a first level cache of the processor and storing, in the main memory, remaining data of the received data for the memory addresses.

Other versions include corresponding systems, apparatus, and computer programs, encoded on computer storage devices, configured to perform the actions of the methods.

These and other versions may each optionally include one or more of the following features. For example, in some implementations, determining the memory address for the requested data includes determining a number of memory addresses to prefetch and selecting the determined number of memory addresses from the set of memory addresses as a subset of the memory addresses.

In some aspects, providing a request for a processor to prefetch data corresponding to a subset of memory addresses for the requested data includes providing an instruction that indicates the subset of memory addresses and indicates that the subset of memory addresses is to be prefetched.

In some aspects, the additional actions include receiving a request for the processor to prefetch data corresponding to a subset of the memory addresses for the requested data, selecting a first level cache of the processor from a plurality of first level caches, and caching the memory addresses in the first level cache of the processor in response to receiving the request for the processor to prefetch data corresponding to the subset of the memory addresses for the requested data.

In some implementations, the additional action includes issuing a DMA write operation indicating the requested data and the memory address in response to receiving the requested data through the input output device, wherein receiving the requested data and the set of memory addresses for the requested data through the memory controller includes receiving the DMA write operation.

In some aspects, storing, by the memory controller, data for a subset of the memory addresses in a first level cache of the processor and remaining data of the received data for the memory addresses in the main memory comprises: the trigger processor stores data for a subset of the memory addresses in the first level cache and triggers storage of remaining data of the received data for the memory addresses in the main memory.

In some aspects, a memory controller is included in a processor.

The details of one or more implementations of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other potential features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.

Drawings

FIG. 1A is a block diagram of a system for prefetching data in a last level cache of a processor.

FIG. 1B is a block diagram of a system for prefetching data in a first level cache of a processor.

FIG. 2 is a flow diagram of an example process that may wait for incoming input and output data and send a prefetch into a first level cache of a processor.

FIG. 3 is a flow diagram of an example process for a write transaction from an input output device into system memory, possibly with an injection into a first level or second level cache.

Like reference symbols in the various drawings indicate like elements.

Detailed Description

FIG. 1A is a block diagram of a system 100 for prefetching data in a last-level cache of a processor. The system 100 generally includes a user process 110, an input output device (I-O) device driver 120 for an I-O device 122, and a memory controller 130 as part of a processor 140. The system 100 may be a computing device, such as a server, desktop computer, laptop computer, mobile phone, or tablet computer.

User process 110 may be an application that uses data received from I-O device 122. For example, the user process 110 may be a web browser application that uses web page data received from the I-O device 122 as a network interface card. To use the data, the user process 110 may provide a request for the data to the I-O device driver 120 of the I-O device 122 (152). For example, a web browser may provide a data request for web page data for a particular web site to a network interface card driver that controls the network interface device. However, in some other examples not shown, the user process 110 may communicate with the I-O device 122 instead of the I-O device driver 120 via user space communication using a user buffer mapped to the I-O device 122 (the mapping may be implemented by the I-O device driver 120). In this case, the user process 110 may handshake directly with the I-O device 122 to send and receive data (after setup is completed during initialization by the I-O device driver 120).

The I-O device driver 120 may receive data requests from the user process 110 and control the I-O devices 122 to receive data. For example, the network interface card driver may receive a request for data of a web page from a web browser and control the network interface card to obtain the data of the web page. In some embodiments, the I-O device 122 may be a network interface card, hard disk, CD-ROM drive, scanner, sound card, other kind of peripheral component interconnect express (PCIe) attached I-O device (e.g., Solid State Drive (SSD)), non-volatile memory, or other device connected to an I-O interface. The I-O device driver 120 may provide 154 the received data to the memory controller 130. For example, the I-O device driver 120 may issue a Direct Memory Access (DMA) write operation indicating the received data.

Memory controller 130, which is part of processor 140, may receive data from I-O device driver 120 and buffer the data in the last level cache of processor 140 (156). For example, memory controller 130 may receive a DMA write operation from a network interface card driver indicating web page data and cache the web page data in the L3 cache of processor 140.

However, some or all of the data cached in the last level cache of the processor 140 may not be used soon. Therefore, the cache data may contaminate the regular working set of the last level cache. Furthermore, accessing data from the last level cache may be slower than accessing data from the first level cache of the processor. Thus, data cached in the last level cache rather than the first level cache may slow down processing by the processor, although the processing is still faster than not caching.

FIG. 1B is a block diagram of a system for prefetching data in a first level cache of a processor. The system 160 may generally include a user process 110, a device driver 120 for an I-O device 122, and a memory controller 130 as part of a processor 140. The system 160 may be a computing device, such as a server, desktop computer, laptop computer, or mobile phone.

User process 110 may be an application that uses data received from I-O device 122. For example, the user process 110 may be a web browser application that uses web page data received from the I-O device 122 as a network interface card. To use the data, the user process 110 may provide a data request to the I-O device driver 120 of the I-O device 122 (172). For example, a web browser may provide a data request for web page data to a network interface card driver that controls a network interface device.

In response to the data request, the user process 110 may initially receive a memory address from the I-O device driver 120 (174). The memory address may indicate a memory address where data received by the I-O device 122 in response to a data request may be stored. For example, the user process 110 may receive an indication that the memory addresses Y1-Y5 are to store data received by the I-O device 122 in response to a request for web page data.

In response to receiving the memory address, the user process 110 may determine a subset of the memory addresses to prefetch. For example, the user process 110 may determine to prefetch a subset of the memory addresses Y1-Y3 from the set of memory addresses Y1-Y5. The user process 110 may determine a subset of the memory addresses to prefetch based on determining a number of the first few addresses in the set of memory addresses to prefetch, and select a first number of addresses from the set of memory addresses. For example, the user process 110 may determine to prefetch two, three, or four addresses in the set of memory addresses and select the first two, three, or four addresses from the set of memory addresses, respectively. In some implementations, the memory addresses of the set of memory addresses may be consecutive, and the first memory address may be the memory address having the lowest value. For example, memory address Y5 may increment the value of memory address Y1 by 4.

The user process 110 may determine the number of memory addresses to prefetch. In one embodiment, the user process 110 may be constantly using the same number. For example, for a particular user process, the number of addresses to prefetch may always be three addresses. For another particular user process, the number of addresses to prefetch may always be two addresses.

In another embodiment, user process 110 may dynamically determine the number of addresses to prefetch. For example, if user process 110 determines that data can be processed soon, user process 110 may determine to prefetch four memory addresses. In another example, if user process 110 determines that the data may not be processed soon, user process 110 may determine to prefetch two memory addresses.

In an additional example, user process 110 may determine the number of addresses to prefetch based on a duty cycle that merges each cache line into the processing time of the ongoing calculation. For example, for a higher duty cycle, the user process 110 may determine a higher number, and for a lower duty cycle, the user process 110 may determine a lower number.

In yet another example, user process 110 may determine the number of addresses to prefetch as half the number of memory addresses receiving data rounded. For example, if data is received for a set of five memory addresses, user process 110 may determine that the number of addresses to prefetch is half five rounded, resulting in three.

In some embodiments, the user process 110 may determine the subset of memory addresses to prefetch based on determining a number of cache lines of the memory address to prefetch, determining that the first cache line reaches the determined number, and including the determined memory address in the first cache line in the subset of memory addresses to prefetch.

In response to determining the subset of memory addresses, user process 110 may provide a prefetch command to processor 140 instructing processor 140 to prefetch the determined subset of memory addresses (176). For example, the user process 110 may provide the processor 140 with a prefetch command 176 to addresses Y1-Y3. In another embodiment, the user process 110 may provide a prefetch command for a memory address, and the memory controller or another engine may alternatively determine a subset of the memory addresses to prefetch.

In some embodiments, the user process 110 may determine which cache level or cache partition to prefetch a memory address to. For example, the user process 110 may determine to prefetch a memory address into the second level cache and provide a prefetch command 176 to prefetch the memory address into the second level cache of the processor 140.

With respect to the I-O device driver 120, the I-O device driver 120 may receive data requests from the user process 110 and control the I-O devices 122 to receive the data. For example, the I-O device driver 120, in the form of a network interface card driver, may receive a request from a web browser for data for a web page and control the I-O device 122, in the form of a network interface card, to receive the data for the web page. In some embodiments, I-O device 122 may alternatively be a hard disk, CD-ROM drive, DVD-ROM, scanner, sound card, or any other peripheral I-O device.

In response to receiving the data request, the I-O device driver 120 may determine a memory address of the data. For example, the I-O device driver 120 may determine that a request for web page data may be stored in memory addresses Y1-Y5. The I-O device driver 120 may determine the memory address itself. For example, the I-O device driver 120 may determine that memory addresses Y1-Y5 are the next addresses to store data in the circular buffer, and in response, determine that memory addresses Y1-Y5 are to store data.

Additionally or alternatively, the I-O device driver 120 may determine the memory address in response to information from the operating system or the user process 110. For example, in response to a data request, the device driver 120 may query the operating system or user process 110 for a memory address to store data received in response to the data request and determine to store the received data in a memory address identified by the operating system or user process 110.

After determining the memory address for the data to be received, the I-O device driver 120 may receive the data and provide the received data to the memory controller 130 (178). For example, after receiving data for memory addresses Y1-Y5, the I-O device driver 120 may issue a DMA write operation indicating memory addresses Y1-Y5 and the data received for that memory address.

As described above, in some embodiments, user process 110 and memory controller 130 may communicate with I-O device 122, rather than I-O device driver 120, via user space communications using user buffers mapped to I-O device 122. Thus, the communications (172, 174, and 178) shown associated with the I-O device driver 120 may alternatively be associated with a user buffer.

With respect to memory controller 130, memory controller 130 may be part of processor 140 and may receive memory addresses and data for the memory addresses from I-O device driver 120 (178). For example, the memory controller 130 may receive DMA write operations issued by the I-O device driver 120 indicating memory addresses Y1-Y5 and data for the memory addresses.

In response to receiving the memory address and the data for the memory address from the I-O device driver 120, the memory controller 130 may determine whether the memory address has been cached in the first level cache of the processor 140. For example, memory controller 130 may determine whether any of memory addresses Y1-Y5 indicated in the DMA write operation have been cached in any of the L1 caches of processor 140.

Memory controller 130 may determine whether a memory address has been cached in a first level cache of processor 140 by monitoring memory addresses cached in a cache of processor 140. For example, memory controller 130 may track which memory addresses are stored in each cache of processor 140. Memory controller 130 may determine whether a memory address has been cached in a first level cache of processor 140 by accessing information describing what is stored in the cache of processor 140. Information may be stored in the last level cache of processor 140 in the form of a bitmap or bitmask that indicates memory addresses corresponding to a portion of the cache.

Memory controller 130 may store the data for the memory address in a cache (180). For example, memory controller 130 may determine that memory addresses Y1-Y3 are cached in the L1 cache of core 1, and in response, memory controller 130 may store data corresponding to memory addresses Y1-Y3 in the cache associated with the memory addresses. In another example, memory controller 130 may determine that memory addresses Y1-Y3 are cached in the L1 cache of core 2, and in response, memory controller 130 may store data corresponding to memory addresses Y1-Y3 in the cache associated with the memory addresses.

Memory controller 130 may store the remaining data for the memory address determined not to be cached in the cache of processor 140 in main memory (182). For example, memory controller 130 may determine that memory addresses Y4-Y5 are not cached in any L1 cache of processor 140 and store the remaining data for memory addresses Y4-Y5 in main memory.

Processor 140 may include multiple cores, each core including an associated first level cache, such as an L1 cache, and a last level cache, such as an L3 cache, shared by the multiple cores. For example, processor 140 may include two cores, each core including a central processor and an L1 cache, where the two cores share an L3 cache.

Processor 140 may receive a prefetch command from user process 110. The prefetch command may indicate one or more memory addresses to prefetch. For example, the prefetch command may indicate memory addresses Y1-Y3. In response to the prefetch command, the processor 140 may cache the memory address in a first level cache of the core that executes the prefetch command. For example, if core 1 executes a prefetch command, the processor 140 may cache memory addresses Y1-Y3 in core 1's L1 cache. In another example, if core 2 executes a prefetch command, processor 140 may cache memory addresses Y1-Y3 in core 2's L1 cache.

Different configurations of the systems 100 and 160 may be used in which the functions of the user process 110, the I-O device driver 120, the I-O device 122, the memory controller 130, the main memory 132, and the processor 140 may be combined, further separated, distributed, or interchanged. Systems 100 and 160 may be implemented in a single device or may be distributed across multiple devices.

FIG. 2 is a flow diagram of an example process 200 that may be waiting for incoming input and output data and sending a prefetch to a first level cache of a processor 200. The process 200 may be performed by an I-O device driver 120 executing on the processor 140 or by a user process 110. Process 200 may include waiting for incoming data to be needed immediately (210). For example, process 200 may wait for incoming data from a hard disk drive or a network interface card.

Process 200 may include determining whether to store the new data in an address of, for example, X previous cache lines for the incoming data (220). The new data may represent incoming data. For example, the process may determine that the first two cachelines for incoming data correspond to target addresses Y1-Y8 and determine whether to store new data representing the incoming data in the target addresses.

If process 200 determines to store the new data in the target address, process 200 may include processing the next packet from the cache and absorbing the data into the ongoing calculation (230). For example, process 200 may include obtaining data for a next data packet from a target address of a cache line and processing using the obtained data.

Process 200 may include determining whether all incoming data has been processed (240). For example, process 200 may include determining whether additional data packets have not been processed. If all incoming data has not been processed, process 200 may repeat processing the next packet until all data packets have been processed (230).

If process 200 determines that the new data is not stored in the target address, process 200 may issue a prefetch for a first number of cache lines (250). For example, process 200 may determine that the new data is not stored in the target addresses of the first number of cache lines and may issue a prefetch to allocate the cache lines in the first level cache of the processor.

Process 200 may include spin polling the first number of cache lines or going back to doing other work (260). For example, process 200 may include waiting until new data representing the incoming data is stored in a first number of cache lines, or performing other processes that do not require the incoming data.

FIG. 3 is a flow diagram of an example process 300 for a write transaction from an input output device into system memory, possibly with an injection into a first level or second level cache. Process 300 may include receiving a DMA write operation that includes a header and data (310). The header may indicate where the data is stored.

The process 300 may include determining one or more target addresses in memory in which to write data (320). For example, from the header, process 300 may include determining to write data to addresses Y1-Y8.

Process 300 may include determining whether to allocate one or more target addresses in a first level cache (330). For example, process 300 may include determining, with memory controller 130, whether to allocate these target addresses in a first level cache of any first level cache of processor 140.

If process 300 determines that one or more target addresses are to be allocated in the first level cache, process 300 may include determining a particular first level cache for the target addresses (340). For example, process 300 may include determining to allocate a target address in a first level cache of a first core or in a first level cache of a second core. In an example, if the process 300 determines to store the target address in first level caches of the plurality of cores, the process 300 may select one of the first level caches. For example, the process 300 may select the first level cache of the nearest core.

The process 300 may then inject and/or overwrite the data of the DMA write operation into a particular first level cache (350). For example, process 300 may include writing data corresponding to a target address allocated in a first level cache to the allocated address.

If process 300 determines that one or more target addresses are not allocated in the first level cache, process 300 may include determining whether one or more target addresses are allocated in the second level cache (360). For example, process 300 may include determining, with memory controller 130, whether to allocate these target addresses in a second level cache of any second level cache of processor 140.

If the process 300 determines that one or more target addresses are to be allocated in the second level cache, the process 300 may include determining a particular second level cache for the target addresses (360). For example, process 300 may include determining to allocate the target address in a second level cache of the first core or in a second level cache of the second core. In an example, if the process 300 determines to store the target address in second level caches of the multiple cores, the process 300 may select one of the second level caches. For example, the process 300 may select the second level cache of the nearest core.

The process 300 may then inject and/or overwrite the data of the DMA write operation into a particular second level cache (370). For example, process 300 may include writing data to an allocated address in the second level cache.

If the process 300 determines that one or more target addresses are not allocated in the second level cache, the process 300 may include storing a first portion of the data in a third level cache of the processor 140 and storing the remaining data in the memory. For example, process 300 may include storing data for addresses Y1-Y3 in a third level cache of processor 140 and storing data for addresses Y4-Y5 in memory.

In an alternative embodiment of a process that caches data in a first level cache of a processor, the user process 110 may provide a data request to the I-O device driver 120. For example, a user process in the form of a media player may provide data requests for media data to an I-O device driver in the form of a DVD-ROM drive.

The process may continue by the I-O device driver 120 receiving the data request and determining the address at which to store the data. For example, the DVD-ROM drive may determine that data to be retrieved from the DVD-ROM device is to be stored in addresses Y10-Y20. The I-O device driver 120 may provide the determined address to the user process 110. For example, a DVD-ROM drive may indicate to a media player that addresses Y10-Y20 are to store media data.

The process may continue by the user process 110 receiving a memory address. For example, the media player may receive an indication that memory addresses Y10-Y20 are to store data in response to a data request provided by the media player.

The process may continue by the user process 110 determining a subset of addresses to cache. For example, the media player may determine that the data is consumed soon and determine to prefetch the first five memory addresses, and determine that the first five memory addresses are Y10-Y14. In another embodiment, the media player may determine that data will not be consumed soon and determine to prefetch only the first memory address, and determine to prefetch memory address Y10.

The process may continue by the user process 110 providing a prefetch command for a subset of addresses. For example, user process 110 may provide prefetch commands to processor 140 to prefetch memory addresses Y10-Y14.

The process may continue by the memory controller 130 being part of the processor 140 receiving the prefetch command and in response caching a subset of the addresses in the first level cache. For example, in response to receiving a prefetch command for memory addresses Y10-Y14, processor 140 may allocate space for memory addresses Y10-Y14 in the L1 cache of core 2 of processor 140.

The process may continue by the I-O device driver 120 receiving data for the address. For example, a DVD-ROM device driver may receive media data for memory addresses Y10-Y14. The I-O device driver 120 may provide the received data as well as the memory address to the memory controller 130 in the processor 140. For example, upon receiving media data from a DVD-ROM device via a DVD-ROM drive, the DVD-ROM drive may issue DMA write operations indicating memory addresses Y10-Y20 and the media data for the memory addresses.

The process may continue by the processor 140 determining whether to cache the address. For example, memory controller 130 in processor 140 may determine whether any of memory addresses Y10-Y20 are stored in any of processor 140's caches.

The process may continue by memory controller 130 in processor 140 storing data corresponding to a subset of the memory addresses in the cache of processor 140 and storing the remaining data in main memory. For example, in response to determining to store memory addresses Y10-Y14 from memory addresses Y10-Y20 in the L1 cache of core 2 of processor 140, memory controller 130 may store data for memory addresses Y10-Y14 in the L1 cache of core 2 and the remaining data for memory addresses Y15-Y20 in main memory 132.

Embodiments of the subject matter, the functional operations and the processes described in this specification can be implemented in digital electronic circuitry, in tangibly embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible, non-volatile program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal (e.g., a machine-generated electrical, optical, or electromagnetic signal) that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

The term "data processing apparatus" encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. An apparatus can comprise special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.

A computer program (which may also be referred to or described as a program, software application, module, software module, script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Computers suitable for executing a computer program include (as an example, may be based on) general or special purpose microprocessors or both, or any other type of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such a device. Moreover, the computer may be embedded in another device, e.g., a mobile telephone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game player, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.

Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media, and memory devices, including by way of example semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices); magnetic disks (e.g., internal hard disks or removable disks); magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other types of devices may also be utilized to provide for interaction with the user; for example, feedback provided to the user can be any form of sensory feedback, such as visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including acoustic, speech, or tactile input. Further, the computer may interact with the user by sending and receiving documents to and from the device used by the user; for example, by sending a web page to a web browser on the user's client device in response to a request received from the web browser.

Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a Local Area Network (LAN) and a Wide Area Network (WAN), such as the Internet.

The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

While this specification contains many specific implementation details, these should not be construed as limitations on the scope of what is claimed, but rather as descriptions of features specific to particular embodiments. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

Specific embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results. As one example, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may be advantageous. Other steps may be provided or steps may be eliminated from the process. Accordingly, other embodiments are within the scope of the following claims.

19页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:数据缓存处理方法、装置以及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类