Method and system for realizing memory set optimization strategy

文档序号：1845428 发布日期：2021-11-16 浏览：2次中文

阅读说明：本技术 一种memory set的优化策略的实现方法及系统 (Method and system for realizing memory set optimization strategy ) 是由李长林余红斌于 2021-07-15 设计创作，主要内容包括：本发明涉及微处理器技术领域,具体涉及一种memory set的优化策略的实现方法及系统,包括store-buffer模块、pipeline模块、stream-model-dector模块和WCB模块。解决了现有技术中由于Miss,需要向L2发出reload请求,然后L2返回reload数据,最后再把store的数据和reload的数据一同写入D-cache中,这样会导致latence太长；由于向L2中发出了请求,其实reload回来的数据并没有用,这样就无缘无故的加大的了L2请求的带宽；造成D-cache的污染,并且造成的污染数据会很大,对D-cache的命中率将会有大幅度的下降的问题。本发明减少了latence,同时降低了L1对L2请求的带宽,并且不会造成L1-cache的污染,不会对D-cache的命中率有负面影响,具有很强的市场应用前景。(The invention relates to the technical field of microprocessors, in particular to a method and a system for realizing a memory set optimization strategy. The problem that in the prior art, due to Miss, a load request needs to be sent to L2, then L2 returns load data, and finally the data of the store and the data of the load are written into D _ cache together, so that reference is too long is solved; because the request is sent to the L2, the data returned by the real load is useless, so that the bandwidth of the L2 request is increased without any reason; the pollution of the D _ cache is caused, the caused pollution data is large, and the hit rate of the D _ cache is greatly reduced. The invention reduces lattice, reduces the bandwidth of L1 to the L2 request, does not cause the pollution of L1_ cache, does not have negative influence on the hit rate of D _ cache, and has strong market application prospect.)

1. A method for implementing a memory set optimization strategy is characterized by comprising the following steps:

s1, the initialization is completed, the store instruction enters the store buffer module, the store _ buffer sends out a read request, and pipeline is started;

s2 judging whether the D _ cache hits the M/E state, the S state or the state without hit, and carrying out corresponding operation according to the state;

s3 sends a reload request to L2 by the miss queue entry, and L2 wakes up the store _ buffer entry in the S state hit by miss in the store buffer while returning reload E-state data;

s4, backfilling the reload data with a refill _ buffer, piping on the refill, and detecting that the data with the same cacheline needs to be stored in the storage _ buffer;

s5, writing the data of the store into the D _ cache together with the merge, and simultaneously deallocating the store _ buffer and the refill _ buffer entry;

s6, after the cacheline is full, the data of the store is written into L2, and after the data is confirmed to be written into L2, the WCB entry is lost by deallocate.

2. The method of claim 1, wherein if M/E state is hit in D _ cache, then store _ buffer issues a write request, writes data to D _ cache, and modifies status of cache to E state.

3. The method for implementing optimization policy of memory set as claimed in claim 1, wherein in the method, if hit in D _ cache is S-state, then request to allocate an entry to miss queue.

4. The method for implementing the optimization strategy of the memory set according to claim 1, wherein in the method, if there is no hit in the D _ cache, if it is determined that the D _ cache is already in the stream _ model mode, the data is directly written into the WCB after the whole cacheline of the store is full, and if the D _ cache is not in the stream _ model mode, an entry is requested to be allocated to the miss queue.

5. The method for implementing the memory set optimization strategy of claim 1, wherein in the method, the awakened store _ buffer entry gives a merge window of N clock cycles to pipeline on refill, where N is a natural number.

6. The method as claimed in claim 1, wherein in the method, after a whole cacheline is full, data is written into the WCB, the WCB writes the data of the store into L2, the WCB receives response ack information from L2 to confirm that the data is written into L2, and the WCB entry is removed by deallocate.

7. A memory set optimization strategy implementation system, which is used for implementing the memory set optimization strategy implementation method as claimed in any one of claims 1 to 6, and is characterized by comprising a store _ buffer module, a pipeline module, a stream _ model _ sector module and a WCB module.

8. The system for implementing an optimization policy for a memory set according to claim 7, wherein the tore _ buffer module writes the data of the store into the store _ buffer module when the store queue completes sta and std operations and the obtained store data of the graduation module confirming that the store instruction can be written into the memory, and the store _ buffer module determines whether to directly write the data of the store into the D _ cache or request to allocate a misssu entry according to a hit state of the D _ cache and a stream _ model mode, or to directly write into the WCB after the whole cache is full.

9. The system of claim 7, wherein the pipeline module is responsible for accessing data after pipeline instruction, if an entry is requested to be allocated in the Miss new missq, pipeline on refill, and if there is data in the store _ buffer that is consistent with the address and needs to be stored, the two data are merged and written into the D _ cache.

10. The system for implementing the optimization strategy of the memory set as claimed in claim 7, wherein the stream model selector module is responsible for detecting a stream _ model mode, determining to enter/exit/maintain the stream _ model mode, and the WCB module is responsible for writing data into L2.

Technical Field

The invention relates to the technical field of microprocessors, in particular to a method and a system for realizing a memory set optimization strategy.

Background

For a memory set program, data of continuous addresses need to be written into a memory, at this time, the probability is miss, if the data is normally reloaded, the reloaded data is then written into a D _ cache, but the data does not need to be used in subsequent instruction operation, so that the D _ cache is polluted, the polluted data is large, and the hit rate of the D _ cache is greatly reduced.

In the prior art, due to Miss, a load request needs to be sent to L2, then L2 returns load data, and finally the data of the store and the data of the load are written into D _ cache together, which results in a long reference; because the request is sent to the L2, the data returned by the real load is useless, so that the bandwidth of the L2 request is increased without any reason; the D _ cache is polluted, the polluted data is large, and the hit rate of the D _ cache is greatly reduced, so that a method and a system for implementing a memory set optimization strategy are provided.

Disclosure of Invention

Aiming at the defects of the prior art, the invention discloses a method and a system for realizing a memory set optimization strategy, which are used for solving the problem that in the prior art, due to Miss, a load request needs to be sent to L2, then, the L2 returns load data, and finally, the data of store and the data of load are written into D _ cache together, so that latence is too long; because the request is sent to the L2, the data returned by the real load is useless, so that the bandwidth of the L2 request is increased without any reason; the pollution of the D _ cache is caused, the caused pollution data is large, and the hit rate of the D _ cache is greatly reduced.

The invention is realized by the following technical scheme:

in a first aspect, the invention discloses a method for implementing a memory set optimization strategy, comprising the following steps:

s1, the initialization is completed, the store instruction enters the store buffer module, the store _ buffer sends out a read request, and pipeline is started;

s2 judging whether the D _ cache hits the M/E state, the S state or the state without hit, and carrying out corresponding operation according to the state;

s3 sends a reload request to L2 by the miss queue entry, and L2 wakes up the store _ buffer entry in the S state hit by miss in the store buffer while returning reload E-state data;

s4, backfilling the reload data with a refill _ buffer, piping on the refill, and detecting that the data with the same cacheline needs to be stored in the storage _ buffer;

s5, writing the data of the store into the D _ cache together with the merge, and simultaneously deallocating the store _ buffer and the refill _ buffer entry;

s6, after the cacheline is full, the data of the store is written into L2, and after the data is confirmed to be written into L2, the WCB entry is lost by deallocate.

Further, in the method, if the M/E state is hit in the D _ cache, the store _ buffer then issues a write request, writes the data to the D _ cache, and modifies the status of the cacheline to the E state.

Further, in the method, if the hit in the D _ cache is S state, an entry item is allocated to the miss queue request.

Furthermore, in the method, if no hit occurs in the D _ cache, it is judged that if the D _ cache is in the stream _ model mode, the data is directly written into the WCB after the whole cacheline waiting for store is full, and if the D _ cache is not in the stream _ model mode, an entry item is requested to be allocated to the miss queue.

Further, in the method, the awakened store _ buffer entry is given to a merge window of pipeline N clock cycles on the refill, where N is a natural number.

Furthermore, in the method, after the whole cacheline is full, data is written into the WCB, the WCB writes the data of the store into the L2, the WCB receives the response ack information of the L2 to confirm that the data is written into the L2, and the WCB entry is deleted.

In a second aspect, the invention discloses a system for implementing a memory set optimization strategy, which is used for implementing the method for implementing the memory set optimization strategy in the first aspect, and comprises a store _ buffer module, a pipeline module, a stream _ model _ sector module and a WCB module.

Furthermore, the store _ buffer module writes the data of the store into the store _ buffer module when the store queue completes sta and std operations and the obtained store data of the graduation module confirming that the store instruction can be written into the memory, and the store _ buffer module determines whether to directly write the data of the store into the D _ cache or request to allocate a missq entry or directly write the data of the store into the WCB after the whole cache is full according to the hit state of the D _ cache and the stream _ model mode.

Furthermore, the pipeline module is responsible for data access after the pipeline is instructed, if an entry item is requested to be allocated in the Miss new missq, the pipeline is on the refill, and if the data consistent with the address in the store _ buffer needs to be stored, the two data are merged and then written into the D _ cache.

Furthermore, the stream model decoder module is responsible for detecting the stream _ model mode, determining to enter/exit/maintain the stream _ model mode, and the WCB module is responsible for writing data into the L2.

The invention has the beneficial effects that:

the invention reduces lattice, reduces the bandwidth of L1 to the L2 request, does not cause the pollution of L1_ cache, does not have negative influence on the hit rate of D _ cache, and has strong market application prospect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of a method of implementing a memory set optimization strategy;

FIG. 2 is a block diagram of a system for implementing a memory set optimization strategy;

FIG. 3 is a diagram of detection of entry and exit of a stream model mode.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

Referring to fig. 1, the invention discloses a method for implementing a memory set optimization strategy, which comprises the following steps:

s1, the initialization is completed, the store instruction enters the store buffer module, the store _ buffer sends out a read request, and pipeline is started;

s2 judging whether the D _ cache hits the M/E state, the S state or the state without hit, and carrying out corresponding operation according to the state;

s3 sends a reload request to L2 by the miss queue entry, and L2 wakes up the store _ buffer entry in the S state hit by miss in the store buffer while returning reload E-state data;

s4, backfilling the reload data with a refill _ buffer, piping on the refill, and detecting that the data with the same cacheline needs to be stored in the storage _ buffer;

s5, writing the data of the store into the D _ cache together with the merge, and simultaneously deallocating the store _ buffer and the refill _ buffer entry;

s6, after the cacheline is full, the data of the store is written into L2, and after the data is confirmed to be written into L2, the WCB entry is lost by deallocate.

In this embodiment, if M/E state is hit in D _ cache, the store _ buffer then issues a write request, writes the data into D _ cache, and modifies status of cache to E state.

In this embodiment, if the hit in the D _ cache is in the S state, an entry is allocated to the miss queue request.

In this embodiment, if there is no hit in the D _ cache, it is determined that if the D _ cache is already in the stream _ model mode, the data is directly written into the WCB after the whole cacheline to be stored is full, and if the D _ cache is not in the stream _ model mode, an entry is requested to be allocated to the miss queue.

In this embodiment, the awakened store _ buffer entry is given to a merge window of pipeline N clock cycles on the refill, where N is a natural number.

In this embodiment, after the whole cacheline is full, the data is written into the WCB, the WCB writes the data of the store into the L2, the WCB receives the response ack information of the L2 and confirms that the data is written into the L2, and the WCB entry is deleted by the allocate.

The embodiment solves the problem that in the prior art, due to Miss, a reload request needs to be sent to L2, then the L2 returns reload data, and finally the data of the store and the reload data are written into the D _ cache together, which causes long reference.

The embodiment solves the problem that the bandwidth of the L2 request is increased without any reason because the request is sent to the L2 and the data returned by the real load is useless; the pollution of the D _ cache is caused, the caused pollution data is large, and the hit rate of the D _ cache is greatly reduced

Example 2

Referring to fig. 2, the main implementation of this embodiment is in the store _ buffer module, pipeline, stream _ model _ sector module, and WCB module.

The store _ buffer module in this embodiment is configured to, after the store queue completes sta and std operations, obtain the store data of the store instruction and confirm that the store data can be written into the memory by the graduation module, and write the store data into the store _ buffer.

And determining whether to directly write the data of the store into the D _ cache or request to allocate a missq entry or to directly write the data of the store into the WCB after the whole cacheline is full according to the hit state of the D _ cache and the stream _ model mode.

The pipeline module in this embodiment is responsible for accessing data after the pipeline is instructed, if an entry is requested to be allocated in the Miss new missq, the pipeline is on the refill, and if there is data in the store _ buffer that is consistent with the address and needs to be stored, the two data are merged and then written into the D _ cache.

In this embodiment, the stream model decoder module is responsible for detecting the stream _ model mode and determining to enter/exit/maintain the stream _ model mode.

The WCB module of this embodiment is responsible for writing data into L2.

Example 3

Referring to fig. 3, the present embodiment discloses a detection of entry and exit of a stream model mode.

The conditions entered in this example are: there are N complete cachelines for consecutive stores, where N is a configurable value.

The exit conditions in this embodiment are:

1. the address of the store does not comply with the continuity requirement;

2. store another cacheline if the size of store does not fill the entire cacheline;

3. stream model will be exited when any of the above conditions of stream _ pa [39:9] ═ stream _ pa [39:9] of pipeline of load is satisfied.

In this embodiment, the implementation that the store address is not detected, and as in a normal store instruction, the execute attribute of the cacheable is first acquired, and then the data of the store is written into the D _ cache is still within the protection range of this embodiment.

In conclusion, the invention reduces table, reduces the bandwidth of the L2 request by the L1, does not pollute the L1_ cache, does not have negative influence on the hit rate of the D _ cache, and has strong market application prospect.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

9页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：缓存管理方法、固态硬盘控制器及固态硬盘

Method and system for realizing memory set optimization strategy

相关技术

网友询问留言