Memory page management method and memory page conversion method for GPU

文档序号：435833 发布日期：2021-12-24 浏览：30次中文

阅读说明：本技术 用于gpu的内存页管理方法和内存页转换方法 (Memory page management method and memory page conversion method for GPU ) 是由赵夏唐玉华张光达黄安文温家辉孙懿淳张鸿云张宇于 2021-08-30 设计创作，主要内容包括：本发明公开了一种用于GPU的内存页管理方法和内存页转换方法,该内存页管理方法包括：在至少一个物理页中嵌套一个小物理页,其中,小物理页的内存小于物理页；在TLB的页表项中增加预设内存大小的嵌套页标记位,其中,嵌套页标记位用于表示当前TLB的页表项对应的物理页是否嵌套有小物理页、以及用于表示虚拟地址是否命中当前TLB的页表项对应的物理页；根据调整后的TLB进行虚实地址转换。本发明的用于GPU的内存页管理方法和内存页转换方法能够提高内存空间的使用效率,降低存储成本,减少多任务GPU中内存碎片化问题。(The invention discloses a memory page management method and a memory page conversion method for a GPU, wherein the memory page management method comprises the following steps: nesting a small physical page in at least one physical page, wherein the memory of the small physical page is smaller than the physical page; adding a nesting page marking bit with a preset memory size in a page table entry of the TLB, wherein the nesting page marking bit is used for indicating whether a physical page corresponding to the page table entry of the current TLB is nested with a small physical page or not and indicating whether a virtual address hits the physical page corresponding to the page table entry of the current TLB or not; and performing virtual-real address translation according to the adjusted TLB. The memory page management method and the memory page conversion method for the GPU can improve the use efficiency of the memory space, reduce the storage cost and reduce the problem of memory fragmentation in the multitask GPU.)

1. A memory page management method for a GPU, comprising:

nesting a small physical page in at least one physical page, wherein the memory of the small physical page is smaller than the physical page;

adding a nesting page marking bit with a preset memory size in a page table entry of the TLB, wherein the nesting page marking bit is used for indicating whether a physical page corresponding to the page table entry of the current TLB is nested with a small physical page or not and indicating whether a virtual address hits the physical page corresponding to the page table entry of the current TLB or not;

and performing virtual-real address translation according to the adjusted TLB.

2. The memory page management method for a GPU of claim 1, wherein the memory size of the physical page is 64 KB.

3. The memory page management method for a GPU of claim 2, wherein the memory size of the small physical page is any of 4KB, 8KB, 16KB and 32 KB.

4. The memory page management method for a GPU of claim 3, wherein the memory size of the nested page tag bits is 4 bits.

5. The memory page management method for a GPU of claim 4, configuring 4 bits of the nested page flag bit to 4 ' b0000, 4 ' b0001, 4 ' b0010, 4 ' b0100, or 4 ' b1000 according to whether the physical page is nested with the small physical page, wherein when the 4 bits are configured as 4' b0000, it indicates that the physical page is not nested with the small physical page, wherein, when 4 bits are configured as 4' b0001, it indicates that the physical page is nested with a small physical page with a memory size of 4KB, wherein, when 4 bits are configured as 4' b0010, it means that the physical page is nested with a small physical page with a memory size of 8KB, wherein, when 4 bits are configured as 4' b0100, it indicates that the physical page is nested with a small physical page with a memory size of 16KB, when the 4 bits are configured as 4' b1000, the physical page is nested with a small physical page with a memory size of 32 KB.

6. The method according to any of claims 1 to 5, wherein the performing virtual-real address translation according to the adjusted TLB includes:

s11, calculating the virtual page number and the page offset according to the input virtual address and the memory page size;

s12, using virtual page number to search TLB, judging whether TLB is hit, if TLB is not hit, proceeding step S13, if TLB is hit, proceeding step S14;

s13, accessing the page table entry stored in the memory by PTW to obtain the page table entry corresponding to the virtual address, putting the obtained page table entry into TLB, and returning to step S11;

s14, judging whether the physical page corresponding to the page table entry of the hit TLB is nested with a small physical page, if not, performing the step S15, and if so, performing the step S16;

s15, combining the offset address in the page in the virtual address with the physical page number contained in the page table entry of the current TLB to obtain a physical address;

s16, comparing the preset bit of the offset in the page in the virtual address with the nested page mark bit contained in the page table entry of the current TLB, and determining whether the virtual address falls in the nested small physical page; if the virtual address falls into the nested small physical page, accessing the page table entry stored in the memory by using the PTW to obtain the page table entry corresponding to the virtual address, putting the obtained page table entry into the TLB, and returning to the step S11; and if the virtual address falls outside the nested small physical page, combining the offset address in the page in the virtual address with the physical page number contained in the page table entry of the current TLB to obtain the physical address.

7. The method for managing memory pages for a GPU as claimed in claim 6, wherein in step S12, determining whether the TLB is hit includes:

if the TLB is fully associative mapping, sequentially comparing each page table entry in the TLB with the virtual page number, and judging whether the TLB is hit;

if the TLB is set-associative mapping, the virtual page number is used for calculating the TLB set number, and each page table entry in the obtained TLB corresponding set is sequentially compared with the corresponding bit in the virtual page number to judge whether the TLB is hit.

8. The memory page management method for a GPU of claim 6, wherein when the virtual address is 32 bits and the memory page size is 64KB, the virtual page number is [31:16] bits of the virtual address and the intra-page offset is [15:0] bits of the virtual address.

9. The memory page management method of claim 6, wherein the preset bits of the page offset in step S15 are [15:12] bits when the virtual address is 32 bits and the memory page size is 64 KB.

10. A memory page conversion method for a GPU is characterized by comprising the following steps:

s21, inputting the virtual address of the memory access request;

s22, calculating the virtual page number and the page offset according to the input virtual address and the memory page size;

s23, using virtual page number to search TLB, judging whether TLB is hit, if TLB is not hit, proceeding step S24, if TLB is hit, proceeding step S25;

s24, accessing the page table entry stored in the memory by PTW to obtain the page table entry corresponding to the virtual address, putting the obtained page table entry into TLB, and returning to step S22;

s25, judging whether the physical page corresponding to the page table entry of the hit TLB is nested with a small physical page, if not, performing the step S26, and if so, performing the step S27;

s26, combining the offset address in the page in the virtual address with the physical page number contained in the page table entry of the current TLB to obtain a physical address;

s27, comparing the preset bit of the offset in the page in the virtual address with the nested page mark bit contained in the page table entry of the current TLB, and determining whether the virtual address falls in the nested small physical page; if the virtual address falls into the nested small physical page, accessing the page table entry stored in the memory by using the PTW to obtain the page table entry corresponding to the virtual address, putting the obtained page table entry into the TLB, and returning to the step S22; if the virtual address falls outside the nested small physical page, combining an offset address in the page in the virtual address with a physical page number contained in a page table entry of the current TLB to obtain a physical address;

the method comprises the steps that a small physical page is nested in at least one physical page of a GPU, the memory of the small physical page is smaller than the physical page, a nested page marking bit with a preset memory size is configured in a page table entry of the TLB, and the nested page marking bit is used for indicating whether the physical page corresponding to the page table entry of the current TLB is nested with the small physical page or not and indicating whether a virtual address hits the physical page corresponding to the page table entry of the current TLB or not.

Technical Field

The invention relates to the technical field of a GPU (graphics processing Unit), in particular to a memory page management method and a memory page conversion method for the GPU.

Background

A Graphics Processing Unit (GPU) is a microprocessor used for performing operations related to images and Graphics, and the GPU is widely used in cloud computing platforms and data centers due to its powerful computing capability, and provides a user with required computation. Under the virtual memory technology, a memory address generated by a memory access instruction for executing a certain task on a GPU is called a virtual address or a logical address, an address for accessing a real physical memory is called a physical address, and the virtual memory technology is responsible for completing address mapping from the virtual address to the physical address, so that the problems of the position, occupied space and the like of a program in the physical memory can be not considered when the program is written.

Currently, the virtual memory technology of the GPU generally adopts a page-type memory management mode to allocate a physical memory and complete mapping of the physical memory to the virtual memory. Specifically, in the page memory management method, the virtual memory of each task on the GPU is divided into a plurality of virtual memory pages (pages), and the physical memory is also divided into a plurality of physical memory pages (physical pages) having the same size as the virtual memory pages, and a mapping problem of a certain address of the virtual memory is a mapping problem from the virtual memory page to the physical memory page and a page offset problem is added in the page memory management method. In order to optimize the Management of Memory pages, modern GPUs generally employ a Memory Management Unit (MMU) and a Translation Lookaside Buffer (TLB) to quickly complete the mapping from a virtual address to a physical address, where the TLB is a high-speed Memory and stores page table entries of the mapping from a virtual address to a physical address. When a virtual address arrives, the MMU firstly searches the TLB, and if the TLB is hit, the physical address corresponding to the virtual address can be directly returned; if the TLB is invalid, the translation of the virtual address and the real address is completed by accessing a Page Table entry stored in an internal memory of the GPU system through a Page Table look-up (PTW). FIG. 1 is a schematic diagram illustrating a virtual-real address translation process corresponding to a 64KB memory page; as shown in fig. 1, in the existing page-type memory management mode, a virtual address is divided into two parts according to the size of a memory page, wherein [31:16] bit is a virtual page number, and [15:0] bit is an internal page offset, a memory access request for virtual-real address translation uses the virtual page number to search a TLB, whether a corresponding bit in the virtual page number is the same as a tag bit of a TLB page table entry is judged, if so, the TLB hits, and a physical address can be obtained; if the virtual address and the real address are different, the TLB is invalid, and the page table in the GPU system memory is accessed through the PTW to perform virtual address and real address conversion; when a TLB hits, the physical address is obtained by concatenating the real page number with the in-page offset.

Since the memory page size of a GPU is typically 4KB, a memory page of 4KB is typically referred to as a small page, and a memory page larger than 4KB is referred to as a large page. The small page is used on the GPU, so that the memory page transmission delay between the CPU and the GPU can be effectively reduced, and meanwhile, the large page is used on the GPU, so that the failure rate of the TLB can be effectively reduced. Because different GPU tasks have different requirements for large and small pages due to different program characteristics, the existing modern GPU system generally supports memory management of multiple page sizes so as to manage the memory space more efficiently.

However, although the use of a large page on the GPU can effectively increase the hit rate of the TLB and improve the performance of the program, it is also unavoidable to bring about the memory fragmentation problem, wherein the memory fragmentation problem includes internal fragmentation and external fragmentation. Taking a 64KB memory page as an example, if the current task cannot fully use 64KB of physical memory space, a large intra-page waste may be generated, and such actually unused space cannot be allocated to other tasks, nor to different virtual memory regions of the same task, which is called an internal fragmentation. Due to the requirement of page alignment, if the free small pages are scattered in the memory space, a large segment of memory space with continuous addresses cannot be allocated to a large page, and the problem is called external fragmentation. When the problem of memory fragmentation occurs, space is wasted, and program overhead and storage cost are increased.

Disclosure of Invention

In order to solve some or all of the technical problems in the prior art, the present invention provides a memory page management method and a memory page conversion method for a GPU.

The technical scheme of the invention is as follows:

in a first aspect, a memory page management method for a GPU is provided, including:

nesting a small physical page in at least one physical page, wherein the memory of the small physical page is smaller than the physical page;

and performing virtual-real address translation according to the adjusted TLB.

In some possible implementations, the physical page has a memory size of 64 KB.

In some possible implementations, the memory size of the small physical page is any of 4KB, 8KB, 16KB, and 32 KB.

In some possible implementations, the memory size of the nested page tag bits is 4 bits.

In some possible implementations, the 4 bits of the nested page flag bit are configured to be 4 'b 0000, 4' b0001, 4 'b 0010, 4' b0100 or 4 'b 1000 according to whether the physical page is nested with a small physical page, wherein when the 4 bits are configured to be 4' b0000, it indicates that the physical page is not nested with the small physical page, wherein when the 4 bits are configured to be 4 'b 0001, it indicates that the physical page is nested with the small physical page having the memory size of 4KB, wherein when the 4 bits are configured to be 4' b0010, it indicates that the physical page is nested with the small physical page having the memory size of 8KB, wherein when the 4 bits are configured to be 4 'b 0100, it indicates that the physical page is nested with the small physical page having the memory size of 16KB, and wherein when the 4 bits are configured to be 4' b1000, it indicates that the physical page is nested with the small physical page having the memory size of 32 KB.

In some possible implementations, the performing virtual-real address translation according to the adjusted TLB includes the following steps:

s11, calculating the virtual page number and the page offset according to the input virtual address and the memory page size;

s12, using virtual page number to search TLB, judging whether TLB is hit, if TLB is not hit, proceeding step S13, if TLB is hit, proceeding step S14;

s14, judging whether the physical page corresponding to the page table entry of the hit TLB is nested with a small physical page, if not, performing the step S15, and if so, performing the step S16;

s15, combining the offset address in the page in the virtual address with the physical page number contained in the page table entry of the current TLB to obtain a physical address;

In some possible implementations, the determining whether the TLB hits in step S12 includes:

if the TLB is fully associative mapping, sequentially comparing each page table entry in the TLB with the virtual page number, and judging whether the TLB is hit;

In some possible implementations, when the virtual address is 32 bits and the memory page size is 64KB, the virtual page number is [31:16] bits of the virtual address and the intra-page offset is [15:0] bits of the virtual address.

In some possible implementations, when the virtual address is 32 bits and the memory page size is 64KB, the preset bits of the intra-page offset in step S15 are [15:12] bits.

In a second aspect, a memory page conversion method for a GPU is provided, including the following steps:

s21, inputting the virtual address of the memory access request;

s22, calculating the virtual page number and the page offset according to the input virtual address and the memory page size;

s23, using virtual page number to search TLB, judging whether TLB is hit, if TLB is not hit, proceeding step S24, if TLB is hit, proceeding step S25;

s25, judging whether the physical page corresponding to the page table entry of the hit TLB is nested with a small physical page, if not, performing the step S26, and if so, performing the step S27;

s26, combining the offset address in the page in the virtual address with the physical page number contained in the page table entry of the current TLB to obtain a physical address;

The technical scheme of the invention has the following main advantages:

according to the memory page management method and the memory page conversion method for the GPU, the small physical page with the smaller memory is nested in the physical page with the larger memory, the unused space in the physical page with the larger memory can be effectively utilized by utilizing the small physical page, the possibility of occurrence of internal fragments is reduced, and the small physical page with the smaller memory is nested in the physical page with the larger memory, so that a large number of idle small pages can be effectively prevented from scattering in the memory space, the possibility of occurrence of external fragments is reduced, the use efficiency of the memory space is improved, and the storage cost is reduced; meanwhile, on the basis of nesting by adopting the memory pages, the nested page marking bit is additionally arranged in the page table entry of the TLB, so that whether the memory access request hits the TLB or not and whether the physical page is nested with a small physical page or not can be accurately and efficiently judged, and virtual-real address translation based on the memory page nesting is realized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic diagram illustrating a virtual-real address translation process corresponding to a 64KB memory page;

FIG. 2 is a flowchart illustrating a memory page management method for a GPU according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating virtual-real address translation according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the specific embodiments of the present invention and the accompanying drawings. It is to be understood that the described embodiments are merely a few embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

The technical solution provided by an embodiment of the present invention is described in detail below with reference to the accompanying drawings.

Referring to fig. 2, in a first aspect, an embodiment of the present invention provides a memory page management method for a GPU, where the method includes:

nesting a small physical page in at least one physical page, wherein the memory of the small physical page is smaller than the physical page;

and performing virtual-real address translation according to the adjusted TLB.

In the embodiment of the invention, the small physical pages with the smaller memory are nested in the physical pages with the larger memory, the unused space in the physical pages with the larger memory can be effectively utilized by utilizing the small physical pages, the possibility of internal fragments is reduced, and the small physical pages with the smaller memory are nested in the physical pages with the larger memory, so that a large number of idle small pages can be effectively prevented from being scattered in the memory space, the possibility of external fragments is reduced, the use efficiency of the memory space is improved, and the storage cost is reduced; meanwhile, on the basis of nesting by adopting the memory pages, the nested page marking bit is additionally arranged in the page table entry of the TLB, so that whether the memory access request hits the TLB or not and whether the physical page is nested with a small physical page or not can be accurately and efficiently judged, and virtual-real address translation based on the memory page nesting is realized.

Further, the memory size of the physical page for nesting the small physical pages may be 64KB, and the memory size of the physical page of 64KB means that only the physical page with the memory size of 64KB in the GPU system memory is selected for nesting the small physical pages, wherein the number of the physical pages for nesting the small physical pages may be determined according to the actual program overhead and the memory usage requirement.

Since the memory of a small physical page is smaller than a physical page, the memory size of the small physical page may be any one of 4KB, 8KB, 16KB and 32KB on the basis of nesting the small physical page with a physical page having a memory size of 64 KB. For example, physical pages may be nested with a small physical page having a memory size of 4KB, or with other memory sizes, and the memory sizes of the small physical pages nested in different physical pages may be different.

Further, on the basis of nesting a small physical page with any one of the physical pages with the memory size of 4KB, 8KB, 16KB and 32KB by using a physical page with the memory size of 64KB, in order to accurately and efficiently judge whether the access request hits in the TLB and judge whether the physical page is nested with the small physical page, in the present application, the memory size of the nested page flag bit is 4 bits, and the 4 bits may be configured to indicate whether the physical page corresponding to the page table entry of the current TLB is nested with the small physical page and indicate whether the virtual address hits in the physical page corresponding to the page table entry of the current TLB.

Optionally, the 4 bits of the nesting page flag bit are configured to be 4 'b 0000, 4' b0001, 4 'b 0010, 4' b0100 or 4 'b 1000 according to whether the physical page is nested with the small physical page, wherein when the 4 bits are configured to be 4' b0000, it indicates that the physical page is not nested with the small physical page, wherein when the 4 bits are configured to be 4 'b 0001, it indicates that the physical page is nested with the small physical page having the memory size of 4KB, wherein when the 4 bits are configured to be 4' b0010, it indicates that the physical page is nested with the small physical page having the memory size of 8KB, wherein when the 4 bits are configured to be 4 'b 0100, it indicates that the physical page is nested with the small physical page having the memory size of 16KB, and wherein when the 4 bits are configured to be 4' b1000, it indicates that the physical page is nested with the small physical page having the memory size of 32 KB.

Further, when a virtual memory address arrives, the virtual address of the memory access request accesses the TLB to perform virtual-real address translation.

Specifically, referring to fig. 3, the virtual-real address translation according to the adjusted TLB includes the following steps:

s11, calculating the virtual page number and the page offset according to the input virtual address and the memory page size;

s12, using virtual page number to search TLB, judging whether TLB is hit, if TLB is not hit, proceeding step S13, if TLB is hit, proceeding step S14;

s14, judging whether the physical page corresponding to the page table entry of the hit TLB is nested with a small physical page, if not, performing the step S15, and if so, performing the step S16;

s15, combining the offset address in the page in the virtual address with the physical page number contained in the page table entry of the current TLB to obtain a physical address;

Further, the step S12, determining whether the TLB is hit includes:

if the TLB is fully associative mapping, sequentially comparing each page table entry in the TLB with the virtual page number, and judging whether the TLB is hit;

Further, when the virtual address is 32 bits and the memory page size is 64KB, the virtual page number is [31:16] bits of the virtual address, the intra-page offset is [15:0] bits of the virtual address, and the preset bits of the intra-page offset in step S15 are [15:12] bits.

Further, in step S14, it may be determined whether the physical page corresponding to the page table entry of the TLB that is hit is nested with a small physical page according to the nested page flag bit of the page table entry of the TLB. Specifically, based on the specific configuration of the 4 bits of the nested page tag bits, if all the nested page tag bits are 0, the physical page corresponding to the page table entry of the hit TLB is not nested with a small physical page, and if all the nested page tag bits are 0, the physical page corresponding to the page table entry of the hit TLB is nested with a small physical page.

In a second aspect, an embodiment of the present application further provides a memory page conversion method for a GPU, where the method includes the following steps:

s21, inputting the virtual address of the memory access request;

s22, calculating the virtual page number and the page offset according to the input virtual address and the memory page size;

s23, using virtual page number to search TLB, judging whether TLB is hit, if TLB is not hit, proceeding step S24, if TLB is hit, proceeding step S25;

s25, judging whether the physical page corresponding to the page table entry of the hit TLB is nested with a small physical page, if not, performing the step S26, and if so, performing the step S27;

s26, combining the offset address in the page in the virtual address with the physical page number contained in the page table entry of the current TLB to obtain a physical address;

Therefore, the memory page management method and the memory page conversion method for the GPU provided by the embodiment of the present invention can improve the utilization efficiency of the memory space, reduce the storage cost, and reduce the problem of memory fragmentation in the multitask GPU.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. In addition, "front", "rear", "left", "right", "upper" and "lower" in this document are referred to the placement states shown in the drawings.

Finally, it should be noted that: the above examples are only for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

12页详细技术资料下载

Memory page management method and memory page conversion method for GPU

相关技术

网友询问留言