Method, system, equipment and readable medium for multi-core memory consistency

文档序号:1270913 发布日期:2020-08-25 浏览:8次 中文

阅读说明:本技术 一种多核存储器一致性的方法、系统、设备及可读介质 (Method, system, equipment and readable medium for multi-core memory consistency ) 是由 刘同强 王朝辉 李拓 周玉龙 邹晓峰 于 2020-05-06 设计创作,主要内容包括:本发明公开了一种多核存储器一致性的方法,包括以下步骤:接收主处理器发送的命令,判断命令在主缓存是否命中;响应于命令在主缓存未命中,对命令进行判断;响应于命令为读命令,将读命令发送到同簇处理器并判断同簇处理器是否返回数据响应;响应于同簇处理器未返回数据响应,将读命令发送到片外存储控制器并对片外存储控制器的状态位进行判断;以及响应于片外存储控制器的状态为未使用,读取片外存储器的数据对主缓存数据进行更新并将数据返回至主处理器。本发明还公开了一种多核存储器一致性的系统、计算机设备和可读存储介质。本发明通过设计适合嵌入式领域的多核存储一致性协议,提高了数据处理的性能和效率,节约了成本。(The invention discloses a method for consistency of multi-core memories, which comprises the following steps: receiving a command sent by a main processor, and judging whether the command is hit in a main cache; responding to the miss of the command in the main cache, and judging the command; responding to the command as a read command, sending the read command to the same cluster processor and judging whether the same cluster processor returns a data response or not; responding to the situation that the data response is not returned by the same cluster processor, sending a read command to the off-chip storage controller and judging the state bit of the off-chip storage controller; and in response to the state of the off-chip memory controller being unused, reading data of the off-chip memory to update the main cache data and returning the data to the main processor. The invention also discloses a system, a computer device and a readable storage medium for multi-core memory consistency. By designing a multi-core storage consistency protocol suitable for the embedded field, the invention improves the performance and efficiency of data processing and saves the cost.)

1. A method of multi-core memory coherency, comprising the steps of:

receiving a command sent by a main processor, and judging whether the command is hit in a main cache;

responding to the command missing in a main cache, and judging the command;

responding to the command as a read command, sending the read command to the same cluster processor and judging whether the same cluster processor returns a data response or not;

responding to the situation that the data response is not returned by the same cluster processor, sending the read command to an off-chip storage controller and judging the state bit of the off-chip storage controller; and

and in response to the state of the off-chip memory controller being unused, reading data of the off-chip memory to update the main cache data and returning the data to the main processor.

2. The method of multi-core memory coherence of claim 1, further comprising:

responding to the hit of the command in a main cache, and judging the command;

in response to the command being a read command, returning a data response to the host processor;

and responding to the command as a write command, and updating the main cache data.

3. The method of multi-core memory coherence of claim 1, further comprising:

in response to the command being a write command, sending the write command to an off-chip storage controller;

receiving a data response returned by the off-chip storage controller to the main cache and replacing the data of the main cache;

writing the pre-replacement data to the off-chip memory controller and updating the pre-replacement data to an off-chip memory.

4. The method of claim 1, wherein sending the read command to a co-clustered processor and determining whether the co-clustered processor returns a data response comprises:

and responding to a data response returned by the same-cluster processor, updating the main cache data and returning the data response to the main processor.

5. The method of multi-core memory coherence of claim 1, further comprising:

in response to the state of the off-chip memory controller being used, writing cache data of a processor cluster using the off-chip memory to the off-chip memory, and updating the state of the off-chip memory controller to be unused.

6. The method of claim 1, wherein sending the read command to the processors in the same cluster comprises: sequentially sending the read commands to processors in the same cluster;

the method further comprises the following steps: and in response to a data response returned by any one of the processors in the same cluster, stopping sending the read command to other processors in the same cluster, updating the main cache data and returning the data response to the main processor.

7. The method of multi-core memory coherence of claim 1, wherein reading off-chip memory data to update the main cache data and returning the data to the main processor further comprises: updating the state of the off-chip storage controller to use.

8. A system for multi-core memory coherency, comprising:

the main cache judging module is configured to receive a command sent by the main processor and judge whether the command hits in the main cache;

the command judging module is configured to respond to the miss of the command in the main cache and judge the command;

the same cluster processor judging module is configured to respond to the command as a read command, send the read command to the same cluster processor and judge whether the same cluster processor returns a data response;

the off-chip storage controller judging module is configured to respond to the data response not returned by the same cluster processor, send the read command to the off-chip storage controller and judge the state bit of the off-chip storage controller; and

and the processing module is configured to read data of the off-chip memory to update the main cache data and return the data to the main processor in response to the state of the off-chip memory controller being unused.

9. A computer device, comprising:

at least one processor; and

a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of any of the methods 1-7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.

Technical Field

The present invention relates to the field of multi-core processing technologies, and in particular, to a method, a system, a device, and a readable medium for multi-core memory consistency.

Background

With the development of modern society science and technology, computers have been widely used in various fields, and with the rapid development of processor technology, the computer technology has become popular. The processor is one of the main devices in a computer, and is a core accessory in the computer. Its functions are mainly to interpret computer instructions and to process data in computer software. With the increasing demand for computing speed and computing scale, the unlimited increase of the processor computing performance of a single-processor computer system is impossible due to the limitation of chip speed and processing technology, which makes the multiprocessor technology suitable.

There is a cache of shared and private data in the multiprocessor. Private data is used by a single processor, while shared data is used by multiple processors, essentially, communication between processors is accomplished by reading and writing shared data, which makes cache coherency a necessary technique for multiple processors.

The illinois protocol MESI is a widely used cache coherency protocol that supports write-back policies. MESI is an acronym for four cache segment states Invalid, Shared, Exclusive, Modified, representing Invalid, Shared, Exclusive, and Modified, respectively. Any cache segment in a multiprocessor system is in one of these four states. The MESI protocol is a suitable state machine that can handle requests from local processors as well as broadcast information onto the bus.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method, a system, a device, and a readable medium for multi-core memory consistency, which improve the performance and efficiency of data processing and save the cost by designing a multi-core memory consistency protocol suitable for the embedded field.

In view of the above, an aspect of the embodiments of the present invention provides a method for multi-core memory consistency, including the following steps: receiving a command sent by a main processor, and judging whether the command is hit in a main cache; responding to the miss of the command in the main cache, and judging the command; responding to the command as a read command, sending the read command to the same cluster processor and judging whether the same cluster processor returns a data response or not; responding to the situation that the data response is not returned by the same cluster processor, sending a read command to the off-chip storage controller and judging the state bit of the off-chip storage controller; and in response to the state of the off-chip memory controller being unused, reading data of the off-chip memory to update the main cache data and returning the data to the main processor.

In some embodiments, the method further comprises: responding to the hit of the command in the main cache, and judging the command; in response to the command being a read command, returning a data response to the host processor; and updating the main cache data in response to the command being a write command.

In some embodiments, the method further comprises: in response to the command being a write command, sending the write command to an off-chip storage controller; receiving a data response returned by the off-chip storage controller to the main cache and replacing the main cache data; writing the pre-replacement data to the off-chip memory controller and updating the pre-replacement data to the off-chip memory.

In some embodiments, sending the read command to the co-clustered processor and determining whether the co-clustered processor returns a data response comprises: and responding to the data response returned by the processor in the same cluster, updating the main cache data and returning the data response to the main processor.

In some embodiments, the method further comprises: and in response to the state of the off-chip storage controller being used, writing the data of the off-chip storage controller into the off-chip storage, and judging the state of the off-chip storage controller again.

In some embodiments, sending the read command to the clustered processors includes: sequentially sending the read commands to the processors in the same cluster;

the method further comprises the following steps: and in response to the data response returned by any one of the processors in the same cluster, stopping sending the read command to other processors in the same cluster, updating the main cache data and returning the data response to the main processor.

In some embodiments, reading data of the off-chip memory to update the main cache data and returning the data to the main processor further comprises: the state of the off-chip storage controller is updated to use.

In another aspect of the embodiments of the present invention, a system for multi-core memory consistency is further provided, including: the main cache judging module is configured to receive a command sent by the main processor and judge whether the command is hit in the main cache; the command judging module is configured to respond to the miss of the command in the main cache and judge the command; the same cluster processor judging module is configured to respond to the command as a read command, send the read command to the same cluster processor and judge whether the same cluster processor returns a data response; the off-chip storage controller judging module is configured to respond to the data response not returned by the same cluster processor, send a read command to the off-chip storage controller and judge the state bit of the off-chip storage controller; and the processing module is configured to read data of the off-chip memory to update the main cache data and return the data to the main processor in response to the state of the off-chip memory controller being unused.

In another aspect of the embodiments of the present invention, there is also provided a computer device, including: at least one processor; and a memory storing computer instructions executable on the processor, the instructions being executable by the processor to implement the method steps as above.

In a further aspect of the embodiments of the present invention, a computer-readable storage medium is also provided, in which a computer program for implementing the above method steps is stored when the computer program is executed by a processor.

The invention has the following beneficial technical effects: by designing a multi-core storage consistency protocol suitable for the embedded field, the performance and efficiency of data processing are improved, and the cost is saved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

FIG. 1 is a diagram illustrating an embodiment of a method for multi-core memory coherency according to the present invention;

FIG. 2 is a block diagram of a method for multi-core memory coherency according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.

It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.

In view of the above objects, a first aspect of the embodiments of the present invention provides an embodiment of a method for multi-core memory coherency. FIG. 1 is a diagram illustrating an embodiment of a method for multi-core memory coherency according to the present invention. As shown in fig. 1, the embodiment of the present invention includes the following steps:

s1, receiving a command sent by the main processor, and judging whether the command hits in the main cache;

s2, responding to the miss of the command in the main cache, and judging the command;

s3, responding to the command as a read command, sending the read command to the processors in the same cluster and judging whether the processors in the same cluster return data response or not;

s4, responding to the data response not returned by the same cluster processor, sending the read command to the off-chip storage controller and judging the state bit of the off-chip storage controller; and

and S5, responding to the state of the off-chip memory controller as unused, reading the data of the off-chip memory to update the main cache data and returning the data to the main processor.

In this embodiment, fig. 2 is a block diagram illustrating an embodiment of a method for multi-core memory consistency according to the present invention. As shown in fig. 2, the inventive architecture includes a multi-core risc-v processor cluster and off-chip storage. The multi-core chip is a chip with risc-v as an instruction set, and internally comprises a plurality of processor clusters with 2 cores as a cluster, a cache and an off-chip storage DRAM controller. The conversion rule of the cache is as follows: the invalid state is converted into an effective state through local read/write, and the invalid state is kept through remote read/write; the valid state is converted to an invalid state by remote read/write and the valid state is maintained by local read/write. The conversion rules for an off-chip memory DRAM controller are: the unused state is converted into the used state by a read operation, and the used state is converted into the unused state by a write operation.

In this embodiment, the processor 0 sends a read command to its cache0, the controller of the cache0 queries the local cache, if not, sends a request command to the processor 1, determines whether the request is a data response, if not, the cache0 sends a read request to the off-chip DRAM controller, the off-chip DRAM controller checks whether the status bit is 0, that is, whether the status is unused, and if 0, reads the off-chip DRAM data and returns a data response to the cache0 controller, and updates the corresponding status bit of the off-chip DRAM controller. The cache0 updates the data to the cache and returns the data to processor 0, ending the process. The method ensures that only one processor has effective data of a certain address at the same time, greatly reduces conflict scenes, namely reduces multi-request processing of the same address. For the read-write request of the same address, the serial execution is not interrupted.

In some embodiments of the invention, the method further comprises: responding to the hit of the command in the main cache, and judging the command; in response to the command being a read command, returning a data response to the host processor; and updating the main cache data in response to the command being a write command.

In this embodiment, the processor 0 sends a read command to its own cache0, the controller of the cache0 queries the local cache, and if the local cache is hit, returns a data response directly, and ends the processing; the processor 0 sends a write command to its own cache0, and the controller of the cache0 queries the local cache, and if the cache hits, directly updates the data, and ends the processing.

In some embodiments of the invention, the method further comprises: in response to the command being a write command, sending the write command to an off-chip storage controller; receiving a data response returned by the off-chip storage controller to the main cache and replacing the main cache data; writing the pre-replacement data to the off-chip memory controller and updating the pre-replacement data to the off-chip memory.

In this embodiment, the processor 0 sends a write command to its own cache0, the controller of the cache0 queries the local cache, and if the write command is not hit, sends a request command to the off-chip DRAM controller, the off-chip DRAM controller returns data corresponding to the controller of the cache0, the controller of the cache0 updates the data, and writes other data that has been replaced back to the off-chip DRAM controller, the off-chip DRAM controller updates the data to the off-chip DRAM, and the processing is ended.

In some embodiments of the present invention, sending the read command to the co-clustered processor and determining whether the co-clustered processor returns a data response comprises: and responding to the data response returned by the processor in the same cluster, updating the main cache data and returning the data response to the main processor.

In this embodiment, the processor 0 sends a read command to its cache0, the controller of the cache0 queries the local cache, and if not, sends a request command to the processor 1 to determine whether the request command is a data response, and if the request command is a data response, the cache0 updates the data to the cache, returns the data response to the processor 0, and updates the corresponding status bit. The cache1 controller invalidates the corresponding status bit and terminates processing.

In some embodiments of the invention, the method further comprises: and in response to the state of the off-chip storage controller being used, writing the data of the off-chip storage controller into the off-chip storage, and judging the state of the off-chip storage controller again.

In this embodiment, the processor 0 sends a read command to its own cache0, the controller of the cache0 queries the local cache, if not, sends a request command to the processor 1, determines whether the request is a data response, if not, the cache0 sends a read request to the off-chip storage DRAM controller, the off-chip storage DRAM controller checks whether the status bit is 0, that is, whether the status is unused, if not, the status bit is 1, the status bit is used, sends a disable command to the cluster 1, the controllers of the cache2 and the cache3 in the cluster 1 detect the respective status bit, and if the address of the disable request is included, the data is stored in the off-chip storage DRAM controller, and the corresponding status bit is disabled. The off-chip memory DRAM controller updates data to the off-chip memory DRAM and returns a data response to the cache0 controller, the cache0 updates data to the cache and returns a data response to the processor 0, and the processing is finished.

In some embodiments of the invention, sending a read command to a co-clustered processor comprises: and sequentially sending the read commands to the processors in the same cluster. The method further comprises the following steps: and in response to the data response returned by any one of the processors in the same cluster, stopping sending the read command to other processors in the same cluster, updating the main cache data and returning the data response to the main processor. And responding to the situation that all the processors in the same cluster do not return data response, and then entering the subsequent step of sending the read command to the off-chip storage controller and judging the state bit of the off-chip storage controller.

In some embodiments of the present invention, reading data of the off-chip memory to update the main cache data and returning the data to the main processor further comprises: the state of the off-chip storage controller is updated to use.

It should be particularly noted that, the steps in the embodiments of the method for multi-core memory consistency described above may be mutually intersected, replaced, added, and deleted, and therefore, these methods for multi-core memory consistency transformed by reasonable permutation and combination shall also belong to the protection scope of the present invention, and shall not limit the protection scope of the present invention to the embodiments.

In view of the above object, a second aspect of the embodiments of the present invention provides a system for multi-core memory consistency, including: the main cache judging module is configured to receive a command sent by the main processor and judge whether the command is hit in the main cache; the command judging module is configured to respond to the miss of the command in the main cache and judge the command; the same cluster processor judging module is configured to respond to the command as a read command, send the read command to the same cluster processor and judge whether the same cluster processor returns a data response; the off-chip storage controller judging module is configured to respond to the data response not returned by the same cluster processor, send a read command to the off-chip storage controller and judge the state bit of the off-chip storage controller; and the processing module is configured to read data of the off-chip memory to update the main cache data and return the data to the main processor in response to the state of the off-chip memory controller being unused.

In view of the above object, a third aspect of the embodiments of the present invention provides a computer device, including: at least one processor; and a memory storing computer instructions executable on the processor, the instructions being executable by the processor to implement the method steps as above.

The invention also provides a computer readable storage medium storing a computer program which, when executed by a processor, performs the method as above.

Finally, it should be noted that, as one of ordinary skill in the art can appreciate that all or part of the processes of the methods of the above embodiments can be implemented by a computer program to instruct related hardware, and the program of the method for multi-core memory consistency can be stored in a computer readable storage medium, and when executed, the program can include the processes of the embodiments of the methods as described above. The storage medium of the program may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.

Furthermore, the methods disclosed according to embodiments of the present invention may also be implemented as a computer program executed by a processor, which may be stored in a computer-readable storage medium. Which when executed by a processor performs the above-described functions defined in the methods disclosed in embodiments of the invention.

Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.

Further, it should be appreciated that the computer-readable storage media (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.

The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with the following components designed to perform the functions herein: a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP, and/or any other such configuration.

The steps of a method or algorithm described in connection with the disclosure herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.

In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.

It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.

The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

10页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种用于激光雷达的DDR4内存

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!