Memory cache management for graphics processing

文档序号：1694449 发布日期：2019-12-10 浏览：11次中文

阅读说明：本技术 用于图形处理的存储器高速缓存管理 (Memory cache management for graphics processing ) 是由 M·伊姆布罗格诺 R·D·施密特于 2019-05-28 设计创作，主要内容包括：本发明题为“用于图形处理的存储器高速缓存管理”。描述了管理用于图形处理的存储器高速缓存的系统、方法和计算机可读介质。处理器为多个图形应用程序接口(API)资源创建资源组。该处理器随后在命令缓冲区内编码引用该资源组的设置命令,并将数据集标识符(DSID)分配给资源组。该处理器还编码以下命令：该命令缓冲区内的写入命令,该写入命令使图形处理器在高速缓存行内写入数据并用该DSID标记写入的高速缓存行；读取命令,该读取命令使图形处理器读取写入资源组中的数据；以及删除命令,该删除命令使图形处理器通知存储器高速缓存在不清空到存储器的情况下删除存储在高速缓存行内的数据。(the invention provides memory cache management for graphics processing. Systems, methods, and computer-readable media for managing memory cache for graphics processing are described. The processor creates a resource group for a plurality of graphics Application Program Interface (API) resources. The processor then encodes a set command referencing the resource group within the command buffer and assigns a Data Set Identifier (DSID) to the resource group. The processor also encodes the following commands: a write command in the command buffer, the write command causing the graphics processor to write data within a cache line and marking the cache line written with the DSID; a read command that causes the graphics processor to read data in the set of write resources; and a delete command that causes the graphics processor to notify the memory cache to delete data stored within the cache line without flushing to memory.)

1. A non-transitory program storage device readable by a processor and comprising instructions stored thereon to cause the processor to:

Creating a set of resources for a plurality of graphics Application Program Interface (API) resources, wherein each graphics API resource corresponds to a memory allocation that stores data accessible by a graphics processor;

encoding a set command referencing the resource group within a command buffer, wherein the set command associates a Data Set Identifier (DSID) with the resource group;

Encoding, within the command buffer, a write command referencing the resource group, the write command causing the graphics processor to write data to a cache line within a memory cache, wherein the write command associates the cache line with the DSID;

Encoding a delete command that causes the graphics processor to notify the memory cache to delete data stored within the cache line without flushing to memory; and

Submitting one or more command buffers including the set command, the write command, and the delete command for execution on the graphics processor.

2. the non-transitory program storage device of claim 1, wherein the instructions further cause the processor to encode a read command that causes the graphics processor to read data written in the set of resources.

3. the non-transitory program storage device of claim 2, wherein the write command and the read command are encoded within different command buffers.

4. the non-transitory program storage device of claim 2, wherein the write command and the read command are encoded within a single command buffer.

5. the non-transitory program storage device of claim 2, wherein the instructions further cause the processor to:

Encoding a second set command within a second command buffer of the one or more command buffers; and

Encoding the read command within the second command buffer.

6. The non-transitory program storage device of claim 1, wherein the delete command is encoded within a second command buffer.

7. The non-transitory program storage device of claim 6, wherein the command buffer is sorted within a first queue and the second command buffer is sorted within a second queue.

8. The non-transitory program storage device of claim 7, wherein the instructions further cause the processor to:

Encoding a barrier update command after the write command within the command buffer in the first queue; and

Encoding a barrier wait command before the delete command within the second command buffer in the second queue.

9. The non-transitory program storage device of claim 1, wherein the delete command is associated with the DSID that references the resource group, and wherein the delete command causes the graphics processor to notify the memory cache to delete data stored in the cache line associated with the DSID.

10. A system, comprising:

A memory; and

a processor operable to interact with the memory and configured to:

Encoding a set command referencing a set of resources within a command buffer, wherein the set command associates a Data Set Identifier (DSID) with the set of resources comprising a plurality of graphics Application Program Interface (API) resources;

Encoding a write command within the command buffer that references the resource group, the write command causing a graphics processor to write data to a cache line within a memory cache, wherein the write command causes the DSID to be marked with the cache line;

Encoding a delete command that references the DSID and causes the graphics processor to notify the memory cache to delete data stored within the cache line without eviction to memory; and

Submitting one or more command buffers including the set command, the write command, and the delete command for execution on the graphics processor.

11. The system of claim 10, wherein the processor is further configured to:

Encoding a read command that causes the graphics processor to read data written in the set of resources.

12. The system of claim 11, wherein the delete command is associated with the DSID that references the resource group, and wherein the delete command causes the cache line marked with the DSID to be deleted.

13. The system of claim 10, wherein the processor is further configured to:

Encoding a second set command within a second command buffer of the one or more command buffers; and

Encoding a read command within the second command buffer, wherein the read command causes the graphics processor to read data written in the set of resources.

14. the system of claim 13, wherein the delete command is encoded within the second command buffer.

15. The system of claim 13, wherein the command buffer is ordered within a first queue and the second command buffer is ordered within a second queue.

16. The system of claim 15, wherein the processor is further configured to:

Encoding a barrier update command after the write command within the command buffer in the first queue; and

encoding a barrier wait command before the delete command within the second command buffer in the second queue.

17. The system of claim 10, wherein the processor is further configured to:

reusing the DSID after the graphics processor completes execution of the delete command.

18. a non-transitory program storage device readable by a processor and comprising instructions stored thereon to cause the processor to:

obtaining a write command and a delete command associated with a Data Set Identifier (DSID) from one or more command buffers, wherein the DSID is associated with a resource group comprising a plurality of graphics Application Program Interface (API) resources and a cache line in a memory cache;

executing the write command to write data within the cache line to generate a dirty cache line, wherein the write command associates the cache line with the DSID; and

executing a delete command after the write command to notify the memory cache to delete the write data within the dirty cache line, wherein the write data is deleted without flushing the dirty cache line to main memory.

19. The non-transitory program storage device of claim 18, wherein the processor is a Graphics Processing Unit (GPU) and the command buffer is encoded by a Central Processing Unit (CPU).

20. The non-transitory program storage device of claim 18, wherein the delete command is associated with the DSID that references the resource group, and wherein the delete command causes the cache line associated with the DSID to be deleted.

Technical Field

The present disclosure relates generally to the field of graphics processing. More particularly, but not by way of limitation, the present disclosure relates to managing a memory cache with a graphics processor, such as a Graphics Processing Unit (GPU), by marking and deleting Data Set Identifiers (DSIDs) associated with cache lines.

Background

Computers, mobile devices, and other computing systems typically have at least one programmable processor, such as a Central Processing Unit (CPU) and other programmable processors dedicated to performing certain processes or functions (e.g., graphics processing). Examples of programmable processors dedicated to performing graphics processing operations include, but are not limited to, a GPU, a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), and/or a CPU that simulates a GPU. GPUs in particular comprise a plurality of execution cores (also called shader cores) designed to execute the same instructions on parallel data streams, so that they are more efficient than general-purpose processors for parallel processing of large data blocks. For example, the CPU acts as a host and hands off specialized parallel tasks to the GPU. In particular, the CPU may execute an application stored in system memory that includes graphics data associated with a video frame. Instead of processing the graphics data, the CPU forwards the graphics data to the GPU for processing; thus, the CPU is freed to perform other tasks concurrently with the GPU processing the graphics data.

GPU processing (such as rendering to texture transfer) typically writes and reads from a memory cache to improve performance and save power. For example, render-to-texture delivery renders a frame to a texture resource, which can later be re-delivered to a shader for further processing. By doing so, the GPU may be written to and/or read from texture resources before the GPU is completed with the texture resources. The availability of texture resources within the memory cache during this time period may improve GPU performance. However, the memory cache may not know when to complete the GPU with texture resources. In some cases, the memory cache may save data to system memory (e.g., DRAM) while the GPU is still using texture resources, resulting in reduced GPU performance.

Disclosure of Invention

In one implementation, a method of associating a DSID with a write command and then using the DSID to delete the written data (if any) from the cache is disclosed. The exemplary method creates a set of resources for a plurality of graphics Application Program Interface (API) resources, wherein each graphics API resource corresponds to a memory allocation for storing data accessible to a graphics processor. The exemplary method encodes a set command that references a set of resources within a command buffer. The set command assigns a DSID to a resource group. Write commands in the command buffer cause the graphics processor to write data within the cache lines. The write command can also associate the written cache line with the DSID. The read command causes the graphics processor to read data in the set of write resources. The delete command causes the graphics processor to notify the memory cache to delete data stored within the write cache line associated with the DSID without flushing to memory. The processor then submits one or more command buffers including a set command, a write command, a read command, and a delete command for execution on the graphics processor.

In another implementation, a system for utilizing write commands and delete commands is disclosed, wherein the system includes a memory and a processor operable to interact with the memory. The processor can encode a set command referencing a set of resources in the command buffer. The set command assigns the DSID to a resource group comprising a plurality of graphics API resources. The processor is also capable of encoding a write command within the command buffer that causes the graphics processor to write data within the cache line. The write command also causes the DSID to be marked as the written cache line. The delete command is encoded to cause the graphics processor to notify the memory cache that data stored in the write cache line marked with the DSID was deleted without eviction to memory. The processor then submits one or more command buffers including a set command, a write command, and a delete command for execution on the graphics processor.

in yet another implementation, a system for associating a DSID with a write command and subsequently deleting the written data (if any) from a cache line using the DSID is disclosed. The system includes a memory and a graphics processor operable to interact with the memory. The graphics processor fetches write commands and delete commands from one or more command buffers, both of which are associated with a DSID. The graphics processor executes the write command to write data in the cache line, generating a dirty cache line in the memory cache. The DSID corresponds to a group of resources including a plurality of graphics API resources, and the write command marks the DSID as a dirty cache line. Thereafter, the graphics processor executes a delete command after the write command to notify the memory cache to delete the write data in the dirty cache line. The data is deleted without flushing the dirty cache line to main memory.

In yet another implementation, each of the methods described above and variations thereof may be implemented as a series of computer-executable instructions. Such instructions may use any one or more convenient programming languages. Such instructions may be collected into an engine and/or program and may be stored in any medium readable and executable by a computer system or other programmable control device.

Drawings

While certain implementations will be described in conjunction with the exemplary implementations shown herein, the disclosure is not limited to those implementations. On the contrary, all alternatives, modifications, and equivalents are included within the spirit and scope of the disclosure as defined by the appended claims. In the drawings, which are not to scale, the same reference numerals are used throughout the specification and drawings for parts and elements having the same structure, and primed reference numerals are used for parts and elements having similar function and configuration as those parts and elements having the same reference numerals without a prime.

FIG. 1 is a schematic diagram of a graphics processing path in which implementations of the present disclosure may operate.

FIG. 2 is a block diagram of a system in which implementations of the present disclosure may operate.

FIG. 3 is a block diagram of a memory virtualization architecture for managing a memory cache when allocating, associating, and deleting DSIDs for resource groups.

FIG. 4 is a specific implementation of a command buffer including DSID commands referencing a created resource group.

FIG. 5 is a specific implementation of referencing a created resource group across multiple command buffers within a command queue.

FIG. 6 is a specific implementation of referencing a created resource group across multiple command queues.

FIG. 7 depicts a flowchart showing graphics processing operations for managing memory cache for graphics processing.

FIG. 8 is a block diagram of a computing system in which implementations of the present disclosure may operate.

FIG. 9 is a block diagram of a specific implementation of software layers and architectures in which a specific implementation of the present disclosure may operate.

FIG. 10 is a block diagram of another implementation of software layers and architecture in which implementations of the present disclosure may operate.

Detailed Description

The present disclosure includes various exemplary implementations of assigning a resource group to a DSID, associating the DSID to a cache line when writing the resource group, and subsequently deleting the DSID when the application no longer needs to access content in the resource group. In one implementation, a graphics API (e.g., Or(OPENGL is a registered trademark of Silicon Graphics, Inc.; DIRECT3D is a registered trademark of Microsoft Corporation; and METAL is a registered trademark of Apple Inc.) allows developers and/or applications to create resource groups that include one or more resources (e.g., buffers and textures)). The graphics API also allows a Central Processing Unit (CPU) to generate one or more set commands within the command buffer to obtain a DSID for the created resource group. The command buffer may also include marking and/or updating bits within the memory cache with the DSID when writing data into a resource groupOne or more write commands to cache lines, one or more read commands to read data from a resource group, and/or one or more delete commands to delete a particular cache line associated with a DSID. After the CPU renders and submits the command buffer to the GPU for execution, the graphics driver schedules set commands, write commands, read commands, and/or delete commands within the submitted command buffer for GPU execution. When the GPU executes a delete command associated with the DSID, the GPU provides a delete hint to the memory cache for deleting data stored in the cache line that corresponds to the DSID without flushing the content to system memory. The graphics API also allows the CPU to encode commands that assign and/or delete DSIDs for resource groups across different command buffers and/or across different command queues.

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the inventive concepts. Some of the figures in the present disclosure show structures and devices in block diagram form as part of this specification to avoid obscuring the disclosed principles. In the interest of clarity, not all features of an actual implementation are described. Moreover, the language used in the present disclosure has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter, resort to the claims being necessary to determine such inventive subject matter. Reference in the present disclosure to "one implementation" or "an implementation" means that a particular feature, structure, or characteristic described in connection with the implementation is included in at least one implementation of the present disclosure, and multiple references to "one implementation" or "an implementation" should not be understood as necessarily all referring to the same implementation.

Unless explicitly defined otherwise, the terms "a," "an," and "the" are not intended to refer to a singular entity, but include the general class of which a particular example may be used for illustration. Thus, use of the terms "a" or "an" can mean any number of at least one, including "a," one or more, "" at least one, "and" one or more than one. The term "or" means any of the alternatives and any combination of alternatives, including all alternatives, unless alternatives are explicitly indicated as mutually exclusive. The phrase "at least one of," when combined with a list of items, refers to a single item in the list or any combination of items in the list. The phrase does not require all of the listed items unless explicitly so limited.

As used herein, the term "kernel" in this disclosure refers to an operating system (e.g., Mac OSX) that is typically associated with a relatively high or highest level of security^TM) A part of the core layer of (1). The "kernel" is capable of performing certain tasks, such as managing hardware interactions (e.g., use of hardware drivers) and handling interrupts of the operating system. To prevent applications or other processes in user space from interfering with the "kernel", the code of the "kernel" is typically loaded into a separate protected area of memory. Within this context, the term "kernel" may be interchanged with the term "operating system kernel" throughout this disclosure.

The present disclosure also uses the term "compute kernel," which has another meaning and should not be confused with the terms "kernel" or "operating system kernel". In particular, the term "compute kernel" refers to a program for a graphics processor (e.g., GPU, DSP, or FPGA). In the context of graphics processing operations, programs for a graphics processor are classified as "compute kernels" or "shaders". The term "compute kernel" refers to a program of a graphics processor that performs general compute operations (e.g., compute commands), and the term "shader" refers to a program of a graphics processor that performs graphics operations (e.g., render commands).

As used herein, the term "command" in this disclosure refers to a graphics API command encoded within a data structure, such as a command buffer or command list. The term "command" may refer to a rendering command (e.g., for a draw call) and/or a compute command (e.g., for a dispatch call) that a graphics processor is capable of executing. Examples of commands related to managing memory caches relevant to the present disclosure include: a "set command" to get the DSID for the created resource group, a "write command" (e.g., a render command) to associate the DSID to the written cache line in order to write the resource group, a "read command" (e.g., a render command) to read from the resource group, and a "delete command" (e.g., a render command) to notify (e.g., delete hint) the memory cache that it can delete the write cache line associated with the DSID.

For the purposes of this disclosure, the term "processor" refers to a programmable hardware device capable of processing data from one or more data sources, such as memory. One type of "processor" is a general-purpose processor (e.g., a CPU) that is not customized to perform specific operations (e.g., procedures, calculations, functions, or tasks), but is constructed to perform general-purpose computing operations. Other types of "processors" are special-purpose processors that are customized to perform a particular operation (e.g., a procedure, computation, function, or task). Non-limiting examples of special purpose processors include GPUs, floating point processing units (FPUs), DSPs, FPGAs, Application Specific Integrated Circuits (ASICs), and embedded processors (e.g., Universal Serial Bus (USB) controllers).

As used herein, the term "graphics processor" refers to a special-purpose processor for performing graphics processing operations. Examples of "graphics processors" include, but are not limited to, GPUs, DSPs, FPGAs, and/or CPUs that simulate GPUs. In one or more implementations, the graphics processor may also be capable of performing non-specific operations that the general-purpose processor is capable of performing. As previously mentioned, examples of these general compute operations are compute commands associated with a compute kernel.

as used herein, the term "resource" refers to an allocation of memory space for storing data that a graphics processor, such as a GPU, can access based on a graphics API. For purposes of this disclosure, the term "resource" is synonymous and may also be referred to as a "graphics API resource". Examples of graphics API resources include buffers and textures. The buffer represents an allocation of unformatted memory that may contain data, such as vertex, shader, and compute state data. Texture represents the allocation of memory for storing formatted image data. The term "resource group" refers to a data structure that contains a list of resources that are logically grouped together over a transitional period of time. In one implementation, a resource group is an immutable list of resources in which once an application creates a resource group, resources cannot be added to or deleted from the resource group.

FIG. 1 is a schematic diagram of a graphics processing path 100 in which implementations of the present disclosure may operate. FIG. 1 shows an example in which a graphics processing path 100 utilizes processor resources 110 and graphics processor resources 112. Processor resources 110 include one or more general-purpose processors (e.g., CPUs) each having one or more cores. The processor resources 110 may also include and/or communicate with memory, a microcontroller, and/or any other hardware resource that a processor may use to process commands for execution by the graphics processor resources 112. Graphics processor resources 112 include one or more graphics processors (e.g., GPUs), where each graphics processor has one or more execution cores and other computing logic for performing graphics and/or general computing operations. In other words, the graphics processor resources 112 may also encompass and/or communicate with memory (e.g., memory cache 108) and/or other hardware resources for executing programs, such as shaders or compute kernels. For example, graphics processor resources 112 can utilize a rendering pipeline to process shaders and a compute pipeline to process compute kernels.

FIG. 1 illustrates application 101 generating graphics API calls for the purpose of encoding commands for graphics processor resource 112 to execute. To generate the graphics API call, application 101 includes code written in a graphics API. The graphics API (e.g.,) Representing published and defined functions and/or other operations that application 101 can have with a graphics processorAnd/or standardized graphic libraries and frameworks. For example, the graphics API allows application 101 to control the organization, processing, and submission of rendering and computing commands, as well as the management of data and resources associated with those commands.

In one or more implementations, application 101 is a graphics application that calls a graphics API to communicate a description of a graphics scene. In particular, user space driver 102 receives graphics API calls from application 101 and maps the graphics API calls to operations understood and performable by graphics processor resource 112. For example, user space driver 102 may translate the API calls into commands encoded within a command buffer before being transmitted to kernel driver 103. The translation operation may involve user space driver 102 compiling shaders and/or compute kernels into commands that can be executed by graphics processor resources 112. The command buffer is then sent to kernel driver 103 to prepare the command buffer for execution on graphics processor resource 112. For example, kernel driver 103 can perform memory allocation and scheduling of command buffers to be sent to graphics processor resources 112. For purposes of this disclosure and for ease of description and explanation, user space driver 102 and kernel driver 103 are collectively referred to as graphics drivers, unless otherwise specified.

FIG. 1 shows graphics processor firmware 104 obtaining a command buffer that processor resources 110 submit for execution. Graphics processor firmware 104 may perform various operations for managing graphics processor hardware 105, including powering up graphics processor hardware 105 and/or scheduling the order in which graphics processor hardware 105 receives commands for execution. Referring to FIG. 1 as an example, graphics processor firmware 104 may be implemented by a microcontroller executing graphics processor firmware 104. In particular, the microcontroller may be embedded in the same package as the graphics processor within graphics processor resource 112 and set up commands for pre-processing the graphics processor. In other implementations, the microcontroller is physically separate from the graphics processor.

After scheduling the commands, in FIG. 1, graphics processor firmware 104 sends a command stream to graphics processor hardware 105. Graphics processor hardware 105 then executes the commands within the command stream according to the order in which graphics processor hardware 105 receives the commands. Graphics processor hardware 105 includes multiple (e.g., numerous) execution cores and thus may execute multiple received commands in parallel. The graphics processor hardware 105 then outputs the rendered frames to the frame buffer 106. In one implementation, the frame buffer 106 is a portion of memory, such as a memory buffer, that contains a bitmap that drives the display 107. The display 107 then accesses the frame buffer 106 and converts the rendered frames (e.g., bitmaps) to a video signal (e.g., with a display controller) for display.

In one or more implementations, the graphics processing path 100 can also support creating a resource group, assigning a DSID to the created resource group, associating the DSID to a cache line with a write command, and deleting the DSID of the created resource group. In FIG. 1, application 101 may generate a graphics API call to create a resource group that logically groups resources (e.g., buffers and textures) over a period of time. The graphics API also allows the processor resource 110 to generate set commands within the command buffer to obtain and allocate DSIDs for the created resource group. The command buffer may also include a write command tagged with a DSID to write data to a resource group, a read command to read data from a resource group, and/or a delete command to delete a cache line associated with the assigned DSID.

After the processor resource 110 submits the command buffer to the graphics processor resource 112, the graphics driver schedules the set command, write command, read command, and/or delete command for execution. When the graphics processor resource 112 executes a delete command referencing the DSID, the graphics processor hardware 105 notifies the memory cache 108 that the memory cache 108 no longer needs to store content in a given cache line. Thus, the memory cache 108 may delete content corresponding to the DSID in a given cache line. When the memory cache 108 deletes contents in a cache line, the memory cache 108 invalidates contents stored in the cache line so that the contents of the cache line are not flushed to a memory mapped to a memory cache (e.g., main memory).

In one or more implementations, the memory cache 108 includes a cache controller (not shown in FIG. 1) that accesses actual blocks of the memory cache. The delete notification sent by the graphics processor hardware 105 to the memory cache 108 acts as a delete hint. If the memory cache 108 still stores the content stored in the corresponding cache line and the content has not been previously evicted from the system memory, the cache controller deletes the content. In some cases, memory cache 108 has evicted the content stored in the cache line prior to receiving the delete hint from graphics processor hardware 105. When this occurs, the cache controller does not delete the content stored in the memory cache 108 in response to receiving the deletion notification. For the purposes of this disclosure, the term "flush" may also be referred to throughout this disclosure as "persist" or "evict".

User space driver 102 is configured to manage and assign DSIDs to resource groups. The DSID represents a set of virtual identifiers that are ultimately associated with the cache line. When user-space driver 102 receives an API call to set a resource group, user-space driver 102 obtains the DSID and assigns it to the created resource group. For example, user space driver 102 may initially have a total of approximately 64,000 available DSIDs to allocate to the created resource group. User-space driver 102 obtains one of the available DSIDs (e.g., a DSID not assigned to another resource group) and assigns the available DSID to the created resource group. As shown in FIG. 1, user space driver 102 then generates the DSID associated with the resource group within the set command to kernel driver 103.

In one or more implementations, after receiving the set command from the user space driver 102, the kernel driver 103 assigns the DSID received from the user space driver 102 to the hardware DSID. Thus, the DSID assigned by user space driver 102 to the created resource group serves as a virtual identifier that kernel driver 103 subsequently maps to the hardware DSID. Kernel driver 103 may also maintain other mappings between other hardware DSIDs and other DSIDs assigned to other resource groups. Kernel driver 103 may also track work (e.g., kicks) submitted to graphics processor resource 112 utilizing the hardware DSID. Kernel driver 103 provides hardware DSID and job tracking information to graphics processor firmware 104. Graphics processor firmware 104 may utilize the hardware DSID and job tracking information to manage operations between graphics processor hardware 105 and memory cache 108. For example, the graphics processor firmware 104 may instruct the graphics processor hardware 105 when to access the memory cache 108 for a given hardware DSID and manage when to initiate a delete notification to the memory cache 108 for a given hardware DSID.

After the set command assigns the created resource group to the DSID and the hardware DSID, application 101 references the created resource group by marking the DSID within other commands (e.g., write commands and read commands) in the command buffer. User space driver 102 then passes other commands tagged with the DSID to kernel driver 103. For write commands, the user space drive 102 may also tag the address range with a DSID. In one or more implementations, the address range may have a range start address and end address that is about 128 bytes in length. The user space driver 102 is responsible for ensuring that the address ranges of each DSID do not overlap. Thereafter, kernel driver 103 determines the hardware DSID that maps to the DSID and forwards the tagged command to graphics processor resource 112 along with the hardware DSID. The graphics processor resource 112 then executes the tagged command using the hardware DSID.

When application 101 issues a write command for a resource group, one or more cache lines in memory cache 108 that store the write contents of the resource group become associated with the DSID and the hardware DSID. The hardware DSID represents the following identifiers: the graphics processor hardware 105 uses it to generate a notification to the memory cache 108 to delete the contents within the corresponding cache line. A cache line is a term understood by those skilled in the art to refer to a block of memory that is transferred to or from the memory cache 108. A cache line may have a fixed data size when transferring data to or from the memory cache 108. For example, a cache line may be set to include a plurality of bytes and/or data words, where the entire cache line is read or written during a data transfer. The hardware DSID may correspond to or be part of a tag, index, or other address information used to identify the cache line.

When application 101 is completed with a resource group, application 101 may initiate a delete command to notify memory cache 108 that content stored in a cache line may be deleted (e.g., deleted or marked as coverable) from memory cache 108 without evicting the content to a mapped memory location in main memory. After the user space driver 102 receives a delete command for a given DSID, the user space driver 102 may include the given DSID in association with the delete command within the list of DSIDs that may be deleted from the memory cache 108. User space driver 102 then sends the DSID list to kernel driver 103. Kernel driver 103 and graphics processor firmware 104 manage and schedule the deletion of the list of DSIDs so that the deletion does not occur until graphics processor resources 112 complete the write and read commands associated with a given DSID. Once graphics processor resource 112 completes the delete operation, user space driver 102 and kernel driver 103 can release and reuse the given DSID and mapped hardware DSID, respectively, for new setup commands.

Although fig. 1 illustrates a particular implementation of graphics processing path 100, the present disclosure is not limited to the particular implementation illustrated in fig. 1. For example, graphics processing path 100 may include other frameworks, APIs, and/or application layer services not specifically shown in fig. 1. For example, application 101 may have access to core animations to animate the view and/or user interface of application 101. Also, fig. 1 does not show all of the hardware resources and/or components (e.g., power management units or memory resources, such as system memory) available to graphics processing path 100. Additionally or alternatively, although fig. 1 shows the processor resource 110 and the graphics processor resource 112 as separate devices, other implementations may integrate the processor resource 110 and the graphics processor resource 112 on a single device (e.g., a system on a chip). The use and discussion of fig. 1 are merely examples that are convenient for description and explanation.

Fig. 2 is a block diagram of a system 200 in which implementations of the present disclosure may operate. In particular, the system 200 is capable of implementing the graphics processing path 100 shown in FIG. 1. FIG. 2 shows a system 200 that includes processor resources 110 and graphics processor resources 112. FIG. 2 shows processor threads 204A and 204B. The processor thread 204A is responsible for utilizing the command encoders 206A and 206B, and the processor thread 204B is responsible for utilizing the command encoders 206C and 206D. The command encoders 206A and 206B encode commands within the command buffer 208A, and the command encoders 206C and 206D encode commands within the command buffer 208B. In other implementations, a different number of processor threads and command encoders may be included than the two processor threads and four command encoders shown in the example of FIG. 2. The command encoders 206A-206D represent encoders that encode commands into the command buffers 208A and 208B for execution by the graphics processor resource 112. Examples of command encoder types include, but are not limited to, Blit command encoders (e.g., graphics API resource copy and graphics API resource synchronization commands), compute command encoders (e.g., compute commands), and render command encoders (e.g., render commands).

command buffers 208A and 208B, also referred to as "command lists," represent data structures that store sequences of encoded commands for execution by graphics processor resource 112. When one or more graphics API calls render and submit command buffers 208A and 208B to a graphics driver (e.g., user space driver 102, shown in FIG. 1), processor resource 110 organizes command buffers 208A and 208B into command queue 210. The command queue 210 organizes the order in which the command buffers 208 are sent to the graphics processor resources 112 for execution. Taking FIG. 2 as an example, command queue 210 contains command buffers 208C-208N, where command buffer 208C is at the top of command queue 210 and is the next command buffer 208C to be sent to graphics processor resource 112 for execution. When the processor resource 110 commits the command buffers 208A and 208B for execution, the processor resource 110 cannot encode any additional commands into the command buffers 208A and 208B. After the command buffer 208 is committed, the command buffer becomes available to the graphics processor resource 112 for execution.

the example of FIG. 2 also illustrates the processor resources 110 and the graphics processor resources 112 in bilateral communication with the memory controller 202. The memory controller 202 manages the flow of information to and from the system memory 212, and is sometimes responsible for maintaining the system memory itself (e.g., refresh or other functions depending on the type of memory). As shown in FIG. 2, a single memory controller 202 performs memory control on both the processor resources 110 and the graphics processor resources 112. In another implementation, the memory controller 202 includes separate memory controllers, one memory controller for the processor resource 110 and another memory controller for the graphics processor resource 112. The memory controller 202 is in bilateral communication with a system memory 212, which may be divided into a processor resource memory 214 and a graphics processor resource memory 216. Some implementations of system memory 212 use physically or logically separate memory for each of the processor resources 110 and graphics processor resources 112, while other implementations require sharing system memory 212 on a physical or logical basis.

Taking FIG. 2 as an example, processor resource 110 can generate a set command within command buffer 208 (e.g., 208C) to obtain the DSID of the created resource group. The same command buffer 208 (e.g., 208C) may also include a write command tagged with a DSID for writing data to a resource group, a read command for reading data from a resource group, and/or a delete command for deleting a cache line associated with the DSID. In another implementation, different command buffers 208 (e.g., 208C, 208D, and/or 208E) may include write commands, read commands, and delete commands that reference the same DSID. After the processor resource 110 submits the command buffer 208 to the graphics processor resource 112, the graphics driver schedules the DSID command for execution on the graphics processor resource 112.

When the graphics processor resource 112 executes a delete command associated with the DSID, the graphics processor hardware 105 notifies the memory cache 108 that it can invalidate the content stored in the corresponding cache line so that the cache line content does not need to be evicted to the system memory 212. The graphics processor hardware 105 does not guarantee that the contents in the memory cache are actually deleted, but instead provides the memory cache 108 with a deletion hint for the content fulfillment application identified by the DSID. For example, after the memory cache 108 receives the delete hint from the graphics processor hardware 105, if the memory cache 108 still contains content corresponding to the DSID, the memory cache 108 deletes the cache line (e.g., dirty cache line) without flushing the content to the system memory 212. In some cases, when the memory cache 108 has persisted a cache line to the system memory 212 (or specifically the graphics processor resource memory 216), the memory cache 108 cannot delete the cache line corresponding to the DSID. A "dirty cache line" refers to a modified cache line that is not modified in memory (e.g., main memory or system memory 212) mapped to the memory cache. In other words, the data stored in the cache line is different from its counterpart data stored in system memory 212.

Having a graphics API that supports creating a set of resources, assigning DSIDs to the set of resources, associating the DSIDs with write commands referencing the set of resources, and deleting the DSIDs referencing the set of resources can provide performance and power benefits. In particular, exposing memory cache management to applications for graphics processing may reduce bandwidth usage. For example, having the graphics processor continuously access memory cache 108 instead of system memory 212 to perform rendering-to-texture transfers reduces the bandwidth usage of running applications. The overall reduction in bandwidth usage translates into an increase in performance and reduced power consumption for bandwidth-bound type operations. Issuing a delete hint to the memory cache 108 to delete the cache contents before saving the contents to system memory also provides additional power savings. Consuming less power also generates less heat from the system 200. In one implementation, the system 200 is subjected to a thermal mitigation operation that reduces the frequency and power to the system 200. By doing so, the thermal mitigation operation causes the system 200 to enter a reduced performance state.

Although fig. 2 illustrates a particular implementation in which system 200 associates a DSID with a write command and subsequently uses the DSID to delete the written data (if any) from memory cache 108, the disclosure is not limited to the particular implementation illustrated in fig. 2. For example, although FIG. 2 shows a single command queue 210; one of ordinary skill in the art will recognize that command buffer 208 may be placed in other command queues 210 not shown in FIG. 2. The use and discussion of fig. 2 are merely examples that are convenient for description and explanation.

FIG. 3 is a block diagram of a memory virtualization architecture 300 for managing memory caches in allocating, marking, and deleting DSIDs for resource groups. FIG. 3 illustrates application 101 creating a resource group 302 that includes one or more resources. One or more resources within a resource group 302 may also belong to other resource groups 302 previously created by application 101. After creating resource group 302, application 101 can send a graphics API call to set resource group 302 to user-space driver 102. In response to the graphics API call, the user space driver 102 assigns a DSID304 to the resource group 302. For example, user-space driver 102 may initially have a total of approximately 64,000 DSIDs available for allocation to resource group 302. Based on the graphics API call used to set resource group 302, user-space driver 102 assigns one of the available DSIDs (e.g., a DSID not assigned to another resource group) to resource group 302.

kernel driver 103 then maps DSID304 to hardware DSID 306. By doing so, DSID304 serves as a virtual identifier that kernel driver 103 subsequently maps to hardware DSID 306. Having memory virtualization architecture 300 that maps DSID304 to hardware DSID306 allows the graphics API architecture that manages and assigns DSID304 to be separate and independent of the hardware architecture used to manage and assign hardware DSID 306. For example, if the architecture used to manage and allocate the hardware DSID306 for a graphics processor changes due to redesigned hardware in the graphics processor, then significant modifications to the graphics API architecture may not be required.

As previously described, kernel driver 103 may also maintain other mappings between other hardware DSIDs 306 and other DSIDs 304 assigned to other resource groups 302. Graphics processor firmware 104 may receive a hardware DSID from kernel driver 103 to manage operations between the graphics processor and memory cache 108. For example, the graphics processor firmware 104 may determine when to initiate a delete notification to the memory cache 108 for a given hardware DSID. Based on the graphics processor firmware 104, the graphics processor can communicate with the memory cache 108 to access the cache line 308 associated with the hardware DSID 306. For example, the graphics processor may read, write, and/or delete content from a cache line associated with hardware DSID 306. Reviewing hardware DSID306 may correspond to or be part of a tag, index, or other address information used to identify a cache line.

After executing the set command, DSID304 and hardware DSID306 may become associated with cache line 308 based on the write command for resource group 302. Application 101 may generate a write command to write data to resource group 302. The write command can also associate the DSID304 and the hardware DSID306 to the write cache line 308 if the write command causes at least some data of the resource group 302 to be written into the cache line 308. In other words, the cache line 308 updated with a write command for the resource group 302 may be marked using the DSID304 and the hardware DSID 306. In one or more implementations, DSID304 and hardware DSID306 may be associated with more than one relevant cache line 308. Thereafter, any cache lines associated with the two identifiers may be later deleted using DSID304 and hardware DSID 306.

In one or more implementations, user space driver 102 and/or kernel driver 103 may assign default DSID304 and/or default hardware DSID306, respectively (e.g., DSID304 and/or hardware DSID306 have a value of zero). The default DSID304 and the default hardware DSID306 each represent an identifier that instructs the graphics processor to treat the resource group 302 as a normal cache line within the memory cache 108. In other words, the resource group 302 assigned with the default DSID304 and/or the default hardware DSID306 does not benefit from the tagging and deletion operations previously discussed with reference to fig. 1 and 2. The use of the default DSID304 and/or the default hardware DSID306 occurs when the user space driver 102 and/or the kernel driver 103 do not have any DSIDs 304 and/or hardware DSIDs 306 available after receiving the set command. Additionally or alternatively, a default hardware DSID306 may be useful when one or more resources belong to multiple resource groups 302. Within memory virtualization architecture 300, application 101 may inadvertently set up two resource groups 302 having at least one resource in common in some cases. Rather than having the graphics processor assign a different hardware DSID306 to each resource group 302, the graphics processor may classify setting two resource groups 302 as programming errors and set the hardware DSIDs 306 of the two resource groups 302 as the default hardware DSID 306.

FIG. 4 is a specific implementation of the command buffer 208 including one or more set commands 408, one or more write commands 410, one or more read commands 412, and one or more delete commands 414 that reference the created resource group. Reviewing a general purpose processor (e.g., a CPU) renders and submits command buffer 208 for execution on a graphics processor. After the general purpose processor commits the command buffer 208, the general purpose processor cannot encode additional commands into the command buffer 208. FIG. 4 shows that command buffer 208 includes set commands 408A and 408B, write commands 410A and 410B, read command 412A, and delete command 414A that reference the created resource group 302. The command buffer 208 includes three different portions 402, 404, and 406. Portion 402 represents a command encoder encoding commands to command buffer 208; portion 404 represents commands that a different command encoder (e.g., a rendering command encoder) encodes command buffer 208; and portion 406 represents another command that the command encoder encodes command buffer 208. Each command encoder may be associated with specific graphics API resources (e.g., buffers and textures) and states (e.g., stencil states and pipeline states) for encoding commands within each portion 402, 404, and 406 of the command buffer 208.

Referring to FIG. 4, set command 408A allows a developer and/or application to indicate that at least a portion of command buffer 208 is to operate on referenced resource group 302. Within sections 402 and 404, command buffer 208 includes a set command 408A that assigns a DSID (e.g., DSID #1) to the created resource group 302, a write command 410A that writes data to the referenced resource group 302, a read command that reads data from the referenced resource group 302, and a delete command that deletes the DSID associated with the resource group 302. The command encoder that encodes the command into portion 402 marks write command 410A with the assigned DSID (e.g., DSID #1) and the address range of resource group 302. The subsequent command encoder also inherits the DSID status information and marks the DSID (e.g., DSID #1) as a command referencing resource group 302. For example, read command 412A is also tagged with DSID #1 to identify resource group 302. Read command 214A with the assigned DSID tag may prevent access to deleted data due to hardware level parallelism of the same command buffer. The command encoder that encodes commands into the command buffer 208 inherits the DSID state information until the delete command 414 is executed.

The command buffer 208 may include multiple writes and reads from the created resource group 302. In one implementation, after the delete command 414, if the application and/or developer wishes to write and read back to the same resource group 302, the application and/or developer issues another API call to set the resource group 302 again. Taking FIG. 4 as an example, the command encoder that encodes the command within portion 406 does not inherit the DSID status established by set command 408A due to delete command 414A. In contrast, after the delete command 414A, the command encoder encodes a second set command 408B that assigns a new DSID (e.g., DSID #2) to the resource group 302. The command encoder uses the same DSID status information corresponding to the second set command 408B and marks the new DSID (e.g., DSID #2) with write command 410B.

FIG. 5 is a specific implementation of referencing a created resource group across multiple command buffers 208A, 208B, and 208C within command queue 210. Multiple command buffers 208 may include DSID commands that reference the same created group of resources. In FIG. 5, because DSID status information is not inherited across command buffers, each command buffer 208 includes a set command 408 before other commands that implement a set of reference resources. Similar to FIG. 4, at least a portion of the command buffer inherits the DSID status information associated with the set command 408. In contrast to FIG. 4, FIG. 5 shows that the write command 410A, the read command 412A, and the delete command 414A are located on different command buffers 208A, 208B, and 208C, respectively, rather than on a single command buffer 208.

As shown in FIG. 5, each command buffer 208A, 208B, and 208C includes its own set command 408A, 408B, and 408C, respectively, because DSID status information is not inherited across the command buffers 208. Specifically, within command buffer 208A, set command 408A assigns a DSID (e.g., DSID #1) to the created set of resources, and marks write command 410A, which occurs after set command 408A, with the assigned DSID (e.g., DSID #1) to write into the set of resources. The command buffer 208B that the graphics processor subsequently executes after executing command buffer 208A includes a set command 408B that assigns a DSID (e.g., DSID #1) to the created set of resources. Read command 412A is tagged with the assigned DSID (e.g., DSID #1) to write into the resource group. Command buffer 208C, which the graphics processor executes after executing command buffer 208B, includes a set command 408C that assigns a DSID (e.g., DSID #1) to the created set of resources. Delete command 414A references the assigned DSID (e.g., DSID #1) for providing a delete hint to the memory cache to delete content within the cache line corresponding to the assigned DSID.

FIG. 6 is a specific implementation of referencing a created resource group across multiple command queues 210. In contrast to FIG. 5, FIG. 6 shows that write command 410A is located in command buffer 208A within command queue 210A, and delete command 414A is located in a different command buffer 208O in a different command queue 210B. As shown in FIG. 6, each command buffer 208A and 208O includes its own set command 408A and 408B because the different command buffers do not inherit the DSID status information. The command encoder places the barrier update command 602 after the write command 410A within the command buffer 208A because the read command 412A is located in a different command queue 210. In the command buffer 208O, a barrier wait command 604 is placed before the read command 412A and the delete command 414A, indicating that the write command 410A in the command queue 210A was executed before the read command 412A located in the command queue 210B was executed.

In FIG. 6, an application may insert a barrier update command 602 and a barrier wait command 604 to track and manage resource dependencies across the command queue 210. Resource dependencies arise when resources are generated and used by different commands, regardless of whether the commands are encoded into the same command queue 210 or different command queues 210. The barrier wait command 604 captures the work of the graph processor before a certain point in time. When the graphics processor encounters barrier wait command 604, the graphics processor waits until the work associated with the trap is complete (e.g., a corresponding barrier update command 602 is received) and then continues to execute.

although fig. 4-6 illustrate particular implementations of command buffer 208 including DSID commands, the present disclosure is not limited to the particular implementations of fig. 4-6. For example, although FIG. 4 shows two set commands 408A and 408B, other implementations of command buffer 208 may include more than two set commands 408 or a single set command 408 referencing a resource group 302. With respect to FIG. 5, other implementations of the command buffers 208A, 208B, and 208C may each include more than one set command 408, write command 410, read command 412, or delete command 414. The use and discussion of fig. 4-6 are merely examples for ease of description and explanation.

FIG. 7 depicts a flowchart that illustrates graphics processing operations 700 for managing memory cache for graphics processing. To manage a memory cache, operation 700 can generate a DSID command that references a set of resources within a single command buffer across multiple command buffers or across multiple command queues. In one implementation, the operations 700 may be implemented by the processor resources 110 shown in fig. 1 and 2. In particular, the blocks within operation 700 may be implemented by user space driver 102 and/or kernel driver 103 shown in FIG. 1. The use and discussion of fig. 7 is merely an example for ease of explanation and is not intended to limit the present disclosure to this particular example. For example, block 702 may be optional such that operation 700 may not perform block 702 each time operation 700 assigns, marks, and deletes a DSID for a referenced resource group.

Operations 700 may begin at block 702 and create a resource group. The operations can create a resource group using various operations including, but not limited to, creating a new resource group, replicating an existing resource group, or executing a variable copy of an existing resource group. Operation 700 then moves to block 704 and generates a set command that assigns a DSID to the created resource group. Using fig. 4-6, the set command may occur before the write command, before the read command, and/or before the delete command. In one or more implementations, the operation 700 may have previously generated the set command in another command buffer (e.g., within the same command queue or a different command queue). As previously described in fig. 3, based on the set command, the operations 700 may utilize a memory virtualization architecture to map a DSID to a hardware DSID. Having a memory virtualization architecture that maps DSID304 to hardware DSID306 allows the graphics API architecture that manages and assigns DSID304 to be separate and independent of the hardware architecture used to manage and assign hardware DSID 306.

At block 706, the operation 700 generates a write command within the command buffer that references the DSID to write the resource group. As previously described, operation 700 assigns a DSID to a resource group at block 704. Based on the DSID allocation, operation 700 associates the DSID to the cache line if the write command causes at least some data of the resource group to be written into one or more cache lines. Operation 700 then moves to block 708 and generates a read command that references the DSID to read from the resource group. In implementations where the read command is located on the same command buffer as the set command, the read command inherits the DSID status information from the set command. In the case where the read command is located on a different command buffer, operation 700 may generate an additional set command (not shown in FIG. 7) before generating the read command within the different command buffer. In addition, for the case where the write command generated in block 706 is located in a different command queue than the read command generated in block 708, the operation 700 may generate a fence wait command before the read command.

After completing block 708, operation 700 may then move to block 710 and generate a delete command referencing the created DSID. The delete command generates a delete hint that informs the memory cache to delete the content from the cache line mapped to the DSID. In other words, operation 700 does not guarantee that the delete command causes the memory cache to delete the contents within the identified cache line, but instead notifies the memory cache that the application is completed with the contents identified by the DSID. For example, after the memory cache receives the delete hint from operation 700, if the memory cache still contains content corresponding to the DSID, the memory cache deletes the cache line (e.g., a dirty cache line) without flushing the content to system memory. Alternatively, where the memory cache has persisted the cache line to system memory, the memory cache does not delete the contents corresponding to the DSID within the cache line. Once the graphics processor has finished executing the delete command, the DSID may be used to reassign to subsequent set commands.

Exemplary hardware and software

The present disclosure may be involved and used in and with a wide variety of electronic devices, including single-processor and multi-processor computing systems and upright devices (e.g., cameras, gaming systems, appliances, etc.) that include single-or multi-processor computing systems. The discussion herein is made with reference to common computing configurations of a variety of different electronic computing devices (e.g., computers, laptops, mobile devices, etc.). This common computing configuration may have CPU resources including one or more microprocessors and graphics processing resources including one or more GPUs. Other computing systems having other (now or in the future) known or common hardware configurations are fully contemplated and contemplated. While some implementation concerns mobile systems that employ a minimized GPU, hardware configurations may also exist, for example, in servers, workstations, laptops, tablets, desktop computers, gaming platforms (whether portable or not), televisions, entertainment systems, smart phones, telephones, or any other computing device (whether mobile or stationary, upright or general).

Referring to fig. 8, a disclosed implementation can be performed by a representative computing system 800. For example, a representative computer system may act as an end-user device or any other device that generates or displays graphics. For example, computing system 800 may be implemented in an electronic device, such as a general purpose computer system, a television, a set top box, a media player, a multimedia entertainment system, an image processing workstation, a handheld device, or any device that may be coupled to or that may contain a display or presentation device as described herein. Computing system 800 may include one or more processors 805, memory 810(810A and 810B), one or more storage devices 815, and graphics hardware 820 (e.g., including one or more graphics processors). Computing system 800 may also have device sensors 825 to include one or more of: a depth sensor (such as a depth camera), a 3D depth sensor, an imaging device (such as a fixed and/or video-enabled image capture unit), an RGB sensor, a proximity sensor, an ambient light sensor, an accelerometer, a gyroscope, any type of still or video camera, a LIDAR device, a SONAR device, a microphone, a CCD (or other image sensor), an infrared sensor, a thermometer, and the like. These and other sensors may work in conjunction with one or more GPUs, DSPs, or conventional microprocessors with appropriate programming so that the sensor outputs can be properly interpreted and/or combined and interpreted.

Returning to FIG. 8, system 800 may also include a communication interface 830, a user interface adapter 835, and a display adapter 840, all of which may be coupled via a system bus, backplane, fabric, or network 845. Memory 810 may include one or more different types of non-transitory media (e.g., solid-state, DRAM, optical, magnetic, etc.) used by processor 805 and graphics hardware 820. For example, memory 810 may include memory cache, Read Only Memory (ROM), and/or Random Access Memory (RAM). Storage 815 may include one or more non-transitory storage media including, for example, magnetic disks (fixed, floppy, and removable disks) and tape, optical media such as CD-ROMs and Digital Video Disks (DVDs), and semiconductor memory devices such as electrically programmable read-only memories (EPROMs), solid state memory devices, and electrically erasable programmable read-only memories (EEPROMs). Memory 810 and storage 815 may be used to hold media data (e.g., audio, image, and video files), preference information, device profile information, computer program instructions organized into one or more modules and written in any desired computer programming language, and any other suitable data. Such computer program code, when executed by processor 805 and/or graphics hardware 820, may implement one or more operations or processes described herein. Further, the system may employ a microcontroller (not shown) that may also execute such computer program code to implement one or more of the operations shown herein or the computer-readable media claims. In some implementations, the microcontroller may operate as a companion to graphics processor or general purpose processor resources.

communication interface 830 may include semiconductor-based circuitry and may be used to connect computing system 800 to one or more networks. Exemplary networks include, but are not limited to: a local network, such as a USB network; a commercial local area network; and a wide area network such as the internet, and may use any suitable technology (e.g., wired or wireless). Communication technologies that may be implemented include cellular-based communications (e.g., LTE, CDMA, GSM, HSDPA, etc.) or other communications (Apple lightning, Ethernet, Bluetooth, etc.),USB、Etc.). (WIFI is a registered trademark of Wi-Fi Alliance Corporation. BLUETOOTH is a registered trademark of Bluetooth Sig, Inc., and THUNDERBOT and FIREWIRE are registered trademarks of Apple Inc.). The user interface adapter 835 may be used to connect a keyboard 850, a microphone 855, a pointing device 860, a speaker 865, and other user interface devices, such as a touchpad and/or touchscreen (not shown). Display adapter 840 may be used to connect one or more displays 870.

the processor 805 may execute instructions necessary to carry out or control the operation of many functions performed by the computing system 800, such as evaluation, transformation, mathematical computation, or graphical program compilation. The processor 805 may, for example, drive the display 870 and may receive user input from the user interface adapter 835 or any other user interface implemented by the system. The user interface adapter 835 can, for example, take on a variety of forms such as a button, keypad, touchpad, mouse, dial, click wheel, keyboard, display screen, and/or touch screen. Further, the processor 805 may be based on a Reduced Instruction Set Computer (RISC) or Complex Instruction Set Computer (CISC) architecture or any other suitable architecture, and may include one or more processing cores. The graphics hardware 820 may be dedicated computing hardware for processing graphics and/or assisting the processor 805 in performing computing tasks. In some implementations, the graphics hardware 820 may include CPU integrated graphics and/or one or more independent programmable GPUs. Computing system 800 (implementing one or more implementations discussed herein) may allow one or more users to control the same system (e.g., computing system 800) or another system (e.g., another computer or entertainment system) through user activities, which may include audio instructions, natural activities, and/or predetermined gestures such as hand gestures.

Various implementations within the present disclosure may use sensors such as cameras. Cameras and similar sensor systems may include an auto-focus system to accurately capture video or image data that is ultimately used for various applications such as photo applications, augmented reality applications, virtual reality applications, and games. Processing the image and performing recognition on the image received by the camera sensor (or otherwise) may be performed locally on the host device or in combination with a network-accessible resource (e.g., a cloud server accessed over the internet).

Returning to fig. 8, device sensors 825 may capture contextual and/or environmental phenomena, such as time; location information; the state of the device with respect to light, gravity, and magnetic north; even still and video images. In addition, information that the network has access to, such as weather information, may also be used as part of the context. All of the captured contextual and environmental phenomena may be used to provide context for user activity or information about user activity. For example, contextual information may be used as part of an analysis that may be performed using the techniques discussed herein in accessing a user's gestures or expressions or emotions.

The output from device sensor 825 may be processed at least in part by processor 805 and/or graphics hardware 820 and/or a dedicated image processing unit incorporated within computing system 800 or without computing system 800. The information so captured may be stored in memory 810 and/or storage 815 and/or any storage accessible over an attached network. Memory 810 may include one or more different types of media used by processor 805, graphics hardware 820, and device sensors 825 to perform device functions. The storage 815 may store data, such as media (e.g., audio, image, and video files); metadata of the media; computer program instructions; graphics programming instructions and graphics resources; memory 810 and storage 815 may be used to hold computer program instructions or code organized into one or more modules, in compiled form or written in any desired computer programming language, such computer program code may implement one or more actions or functions described herein (e.g., interpreting and responding to user activities including commands and/or gestures) when executed by, for example, a microcontroller, GPU, or processor 805.

As indicated above, implementations within the present disclosure include software. Thus, a description of a common computing software architecture is provided as represented in the layer diagram in fig. 9. Similar to the hardware examples, the software architectures discussed herein are not intended to be exclusive in any way, but rather are illustrative. This is particularly true for layer type diagrams, which software developers tend to represent in slightly different ways. In this case, the description begins with layers that begin with a basic hardware layer 995 that instantiates a hardware layer 940, which may include memory, general purpose processors, graphics processors, microcontrollers, or other processing and/or computer hardware, such as a memory controller and dedicated hardware. Above the hardware layer is an operating system kernel layer 990, which is kernel software that can perform memory management, device management, and system calls, illustrating one example as operating system kernel 945. The operating system kernel layer 990 is a typical location for hardware drivers, such as graphics processor drivers. The notation employed herein is generally intended to illustrate that the software elements shown in a layer use resources from a lower layer and provide services to an upper layer. In practice, however, all components of a particular software element may not function exactly in this manner.

Returning to FIG. 9, the operating system services layer 985 is instantiated by operating system services 950. Operating system services 950 can provide core operating system functionality in a protected environment. Further, the operating system services shown in operating system services layer 985 may include951、Etc,952. A user space driver 953 and a software rasterizer 954. (OPENSL is a registered trademark of Silicon Graphics International Corporation. OPENCL is a registered trademark of Apple Inc. and CUDA is a registered trademark of NVIDIA Corporation). Although most of these examples involve graphics processor processing or graphics and/or graphics libraries, other types of services are contemplated by varying the implementations of the present disclosure. These specific examples also represent a graphics framework/library that can operate in a lower level of the framework, such that developers can use shading and primitives and/or gain fairly tightly coupled control over the graphics hardware. In addition, the specific example specified in fig. 9 may also pass its work product to hardware or a hardware driver, such as a graphics processor driver, for displaying relevant materials or computing operations.

With reference again to figure 9 of the drawings,951 represents an example of well-known libraries and application programming interfaces for graphics processor computing operations and graphics rendering (including 2D and 3D graphics).952 also represents published graphics libraries and frameworks, but is generally considered to be a lower level than OpenGL/OpenCL 951, supporting fine-grained, low-level control of organizing, processing, and submitting graphics and computing commands, as well as managing the associated data and resources of those commands. The user space driver 953 is software related to the control of hardware present in the user space for reasons typically associated with a particular device or function. In many implementations, the user-space driver 953 works in conjunction with a kernel driver and/or firmware to perform the overall functions of a hardware driver. The software rasterizer 954 generally refers to software used to generate graphics information, such as pixels, without specialized graphics hardware (e.g., using only a CPU). These libraries or frameworks shown within the operating system services layer 985 are merely exemplary and are intended to be used to expose a general hierarchy of layers and how they relate to other software in a sample arrangement (e.g., kernel operations are generally below and high-level application services 960 are generally above). In addition, it may be useful to note that,952 represents the published framework/library of Apple inc. known to developers in the art. In addition to this, the present invention is,951 may represent the framework/library present in the current version of the software released by Apple inc.

Above the operating system services layer 985 is an application services layer 980, which includes a Sprite Kit 961, a Scene Kit 962, core animations 963, core graphics 964, and other application services 960. The operating system services layer 985 represents a high-level framework that is typically accessed directly by applications. In some implementations of the present disclosure, the operating system services layer 985 includes a graphics-related framework, which is a high-level hierarchy in that it is agnostic to underlying graphics libraries (such as those discussed with reference to the operating system services layer 985). In such implementations, these high-level graphics frameworks are intended to provide developers access to graphics functionality and allow developers to be relieved of the work with shading and primitives in a more user/developer friendly manner. By way of example, the Sprite Kit 961 is a graphics rendering and animation infrastructure provided by Apple inc. The Sprite Kit 961 can be used to animate texture images or "sprites". Scene Kit 962 is a 3D rendering framework from Apple inc, which supports importing, manipulating, and rendering 3D assets at a higher level than a framework with similar capabilities, such as OpenGL. The core animation 963 is the graphics rendering and animation infrastructure provided by Apple inc. The core animation 963 may be used to animate views and other visual elements of an application. Core graphics 964 is a two-dimensional rendering engine from Apple inc, which provides 2D rendering of applications.

Above the application services layer 980 is an application layer 975, which may include any type of application. By way of example, fig. 9 shows three specific applications: photos 971 (photo management, editing and sharing programs),972 (financial management program) and973 (movie production and sharing program). (QUICKEN is a registered trademark of Intuit Inc., and IMOVIE is a registered trademark of Apple Inc.). The application layer 975 also shows two general applications 970 and 974, which represent that there are any other applications that may interact with or may be part of the inventive implementations disclosed herein. Generally, some implementations of the present disclosure employ and/or interact with an application that generates displayable and/or viewable content or generates computing operations suitable for GPU processing.

In evaluating operating system services layer 985 and application services layer 980, it may be useful to recognize that different frameworks have higher or lower level application program interfaces, even though the frameworks are represented in the same layer of the FIG. 9 schematic. The illustration of FIG. 9 is provided to provide a general guidance and to present an exemplary framework that may be discussed later. Further, some implementations of the present disclosure may show that the framework in the application services layer 980 utilizes libraries represented in the operating system services layer 985. Thus, fig. 9 provides knowledge reinforcement for these examples. Importantly, FIG. 9 is not intended to limit the types of frameworks or libraries that may be used in any particular manner or in any particular implementation. In general, many implementations of the present disclosure relate to the ability of an application in layer 975 or a framework in layer 980 or 985 to divide a long continuous graphics processor task into smaller blocks. Moreover, many implementations of the present disclosure involve graphics processor (e.g., GPU) driver software in the operating system kernel layer 990 and/or microcontroller firmware embodied as in the hardware layer 995; such drivers perform scheduling functions for graphics processor resources (e.g., GPUs).

fig. 10 shows a software architecture similar to the standard architecture shown in fig. 9. By way of distinction, the architecture of fig. 10 shows: user space graphics drivers 1005A and 1005B; kernel graphics drivers 1010A and 1010B in the operating system kernel 945; microcontroller 1015, with accompanying microcontroller firmware 1020, including graphics driver firmware 1025 in hardware layer 940; and execution core 1030 in hardware layer 940. The presence of multiple instances of a graphics driver (user space graphics drivers 1005A and 1005B, kernel graphics drivers 1010A and 1010B, and graphics driver firmware 1025 in microcontroller firmware 1020) indicates various options for implementing the graphics driver. As a technical possibility, any of the three drivers shown may operate independently as the only graphics driver. In some implementations of the present disclosure, the overall graphics driver is implemented in a combination of kernel graphics drivers 1010A and 1010B and graphics driver firmware 1025 (e.g., in operating system kernel 945 and microcontroller firmware 1020, respectively). In other implementations, the overall graphics driver may be implemented by a combined effort of all three illustrated drivers 1005A and 1005B, 1010A and 1010B, and 1025.

At least one implementation is disclosed, and variations, combinations, and/or modifications of the implementations and/or features of the implementations made by those of ordinary skill in the art are within the scope of the disclosure. Alternative implementations resulting from combining, integrating, and/or omitting features of the implementations are also within the scope of the present disclosure. Where numerical ranges or limitations are expressly stated, such express ranges or limitations should be understood to include iterative ranges or limitations of like magnitude falling within the expressly stated ranges or limitations (e.g., from about 1 to about 10 includes, 2, 3, 4, etc.; greater than 0.10 includes 0.11, 0.12, 0.13, etc.). Unless otherwise indicated, the use of the term "about" means ± 10% of the latter number.

Many other implementations will be apparent to those of skill in the art upon a careful study of the above description. The scope of the invention should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. In the appended claims, the terms "including" and "in which" are used as the plain-english equivalents of the respective terms "comprising" and "in which".

27页详细技术资料下载

Memory cache management for graphics processing

相关技术

网友询问留言