Decoupling trace data streams using cache coherency protocol data

文档序号:1205467 发布日期:2020-09-01 浏览:10次 中文

阅读说明:本技术 使用高速缓存一致性协议数据来解耦跟踪数据流 (Decoupling trace data streams using cache coherency protocol data ) 是由 J·莫拉 于 2019-01-09 设计创作,主要内容包括:使用高速缓存一致性协议(CCP)数据来解耦跟踪数据流。一个或多个跟踪数据流包括与执行多个线程有关的高速缓存活动跟踪数据和CCP跟踪数据。高速缓存活动跟踪数据包括线程间数据相依性,线程间数据相依性包括相依高速缓存活动跟踪条目,每个相依高速缓存活动跟踪条目依赖于被跟踪的线程之间的CCP相依性来记录对应线程进行的对应存储器访问。移除线程间数据相依性,以针对多个线程中的每个线程创建使得每个线程能够被独立地重放的独立高速缓存活动跟踪数据。针对每个相依高速缓存活动跟踪条目,该移除包括:(i)基于被跟踪的线程之间的CCP相依性,标识对应线程进行的对应存储器访问的对应值,以及(ii)代表对应线程记录对应存储器访问的对应值。(Cache Coherency Protocol (CCP) data is used to decouple the trace data stream. The one or more trace data streams include cache activity trace data and CCP trace data related to executing the plurality of threads. The cache activity trace data includes inter-thread data dependencies including dependent cache activity trace entries, each dependent cache activity trace entry recording a corresponding memory access by a corresponding thread in dependence upon CCP dependencies between the threads being traced. Inter-thread data dependencies are removed to create independent cache activity tracking data for each of a plurality of threads that enables each thread to be independently replayed. Tracking entries for each dependent cache activity, the removing comprising: (i) based on CCP dependencies between the tracked threads, identify corresponding values for corresponding memory accesses by the corresponding threads, and (ii) record the corresponding values for the corresponding memory accesses on behalf of the corresponding threads.)

1. A computer system, comprising:

one or more processors; and

one or more computer-readable media having stored thereon computer-executable instructions executable by the one or more processors to cause the computer system to decouple a trace data stream using Cache Coherence Protocol (CCP) data, the computer-executable instructions comprising instructions executable to cause the computer system to at least:

identifying, within a trace of a previous execution of an application, one or more trace data flows that document execution of a plurality of threads during the previous execution of the application, the one or more trace data flows comprising cache activity trace data and CCP trace data, the cache activity trace data comprising one or more inter-thread data dependencies, the inter-thread data dependencies comprising one or more dependent cache activity trace entries, each dependent cache activity trace entry to document a corresponding memory access by a corresponding thread in dependence upon CCP dependencies between traced threads; and

removing the one or more inter-thread data dependencies to create, for each thread of the plurality of threads, independent cache activity trace data that enables each thread to be independently replayed, including performing, for each dependent cache activity trace entry, at least the following:

identifying, based on CCP dependencies between the tracked threads, corresponding values of the corresponding memory accesses by the corresponding threads; and

recording the corresponding value of the corresponding memory access on behalf of the corresponding thread.

2. The computer system of claim 1, wherein identifying, for at least one dependent cache activity trace entry, the corresponding value of the corresponding memory access by the corresponding thread comprises:

replaying the corresponding thread to the point of the corresponding memory access; and

replaying at least one further thread based on CCP dependencies between the tracked threads, the corresponding memory access dependent on the at least one further thread to identify the corresponding value.

3. The computer system of claim 1, wherein identifying, for at least one dependent cache activity trace entry, the corresponding value of the corresponding memory access by the corresponding thread comprises:

skipping replay of the corresponding memory access while replaying the corresponding thread; and

replaying at least one further thread based on CCP dependencies between the tracked threads, the corresponding memory access dependent on the at least one further thread to identify the corresponding value.

4. The computer system of claim 1, wherein identifying, for at least one dependent cache activity trace entry, the corresponding value of the corresponding memory access by the corresponding thread comprises:

identifying, based on CCP dependencies between the tracked threads, that the corresponding memory access by the corresponding thread is dependent on a memory operation of another thread; and

identifying the corresponding value based on cache activity tracking data for the other thread.

5. The computer system of claim 1, wherein removing the one or more inter-thread data dependencies comprises:

identifying a set of inequalities expressing data dependencies among the one or more threads based on CCP dependencies among the tracked threads; and

based on solving the set of inequalities, a ranking of memory access events is identified.

6. The computer system of claim 1, wherein removing the one or more inter-thread data dependencies comprises:

identifying a directed graph expressing the one or more inter-thread data dependencies based on CCP dependencies between the tracked threads; and

identifying a ranking of memory access events based on the one or more inter-thread dependencies in the directed graph.

7. The computer system of claim 1, wherein removing the one or more inter-thread data dependencies comprises: iteratively replay one or more portions of the plurality of threads, wherein each iteration includes replacing at least one dependent cache activity trace entry with an independent cache activity trace entry that directly stores a memory value.

8. The computer system of claim 1, wherein recording the corresponding value of the corresponding memory access on behalf of the corresponding thread comprises recording the corresponding value into a data packet that identifies at least one of:

a processing unit that executed the corresponding thread; or

A slot containing the processing unit that executed the corresponding thread.

9. The computer system of claim 1, wherein recording the corresponding value of the corresponding memory access on behalf of the corresponding thread comprises: recording the corresponding value in a data packet that is part of a data stream that is unique to the processing unit that executed the corresponding thread.

10. The computer system of claim 1, wherein said cache activity trace data is based on at least a subset of processor data cache misses caused by said plurality of threads during said previous execution of said application, and wherein said CCP trace data is based on at least a subset of cache coherency state changes that occur as a result of said processor data cache misses caused by said plurality of threads during said previous execution of said application.

11. The computer system of claim 1, wherein removing the one or more inter-thread data dependencies comprises resolving an ambiguity based at least in part on:

replaying the first execution path using the first start condition; and

the second execution path is replayed using a second start condition,

the first and second starting conditions are different options selected from a plurality of options of the ambiguity.

12. The computer system of claim 11, the first execution path is converged with the second execution path.

13. A method for decoupling trace data flows using Cache Coherence Protocol (CCP) data, the method implemented at a computer system comprising one or more processors, the method comprising:

identifying, within a trace of a previous execution of an application, one or more trace data flows that document execution of a plurality of threads during the previous execution of the application, the one or more trace data flows comprising cache activity trace data and CCP trace data, the cache activity trace data comprising one or more inter-thread data dependencies, the inter-thread data dependencies comprising one or more dependent cache activity trace entries, each dependent cache activity trace entry to document a corresponding memory access by a corresponding thread in dependence upon CCP dependencies between traced threads; and

removing the one or more inter-thread data dependencies to create, for each thread of the plurality of threads, independent cache activity trace data that enables each thread to be independently replayed, including performing, for each dependent cache activity trace entry, at least the following:

identifying, based on CCP dependencies between the tracked threads, corresponding values of the corresponding memory accesses by the corresponding threads; and

recording the corresponding value of the corresponding memory access on behalf of the corresponding thread.

14. The method of claim 13, wherein identifying, for at least one dependent cache activity trace entry, the corresponding value of the corresponding memory access by the corresponding thread comprises:

replaying the corresponding thread to the point of the corresponding memory access; and

replaying at least one further thread based on CCP dependencies between the tracked threads, the corresponding memory access dependent on the at least one further thread to identify the corresponding value.

15. A computer program product comprising one or more computer-readable media having stored thereon computer-executable instructions executable by one or more processors to cause a computer system to decouple trace data flow using Cache Coherence Protocol (CCP) data, the computer-executable instructions comprising instructions executable to cause the computer system to perform at least the following:

identifying, within a trace of a previous execution of an application, one or more trace data flows that document execution of a plurality of threads during the previous execution of the application, the one or more trace data flows comprising cache activity trace data and CCP trace data, the cache activity trace data comprising one or more inter-thread data dependencies, the inter-thread data dependencies comprising one or more dependent cache activity trace entries, each dependent cache activity trace entry to document a corresponding memory access by a corresponding thread in dependence upon CCP dependencies between traced threads; and

removing the one or more inter-thread data dependencies to create, for each thread of the plurality of threads, independent cache activity trace data that enables each thread to be independently replayed, including performing, for each dependent cache activity trace entry, at least the following:

identifying, based on CCP dependencies between the tracked threads, corresponding values of the corresponding memory accesses by the corresponding threads; and

recording the corresponding value of the corresponding memory access on behalf of the corresponding thread.

Background

When writing code during development of a software application, developers often spend a significant amount of time "debugging" the code to find runtime and other source code errors. In doing so, a developer may take several approaches to reproduce and locate source code errors, such as observing the behavior of a program based on different inputs, inserting debugging code (e.g., to print variable values, to trace execution branches, etc.), temporarily removing portions of code, and so forth. Tracing (track) runtime errors to ascertain code errors may take a significant portion of application development time.

To assist developers in the code debugging process, many types of debugging applications ("debuggers") have been developed. These tools provide developers with the ability to track (trace), visualize, and alter the execution of computer code. For example, a debugger may visualize the execution of code instructions, may present code variable values at different times during code execution, may enable a developer to alter code execution paths, and/or may enable a developer to set "breakpoints" and/or "watch points" and the like for code elements of interest (which may cause the execution of code to be suspended when reaching a "breakpoint" and/or a "watch point" during execution).

Emerging forms of commissioning applications implement "time-of-travel", "reverse", or "historical" commissioning. With "time-travel" debugging, a bit-accurate trace of the execution of a program (e.g., an executable entity such as a thread) is recorded/traced by a trace application into one or more trace files. These bit accurate traces can then be used to replay the execution of the program for both forward and backward analysis. For example, a "time-travel" debugger may enable developers to set forward breakpoint/watch points (like conventional debuggers) as well as reverse breakpoint/watch points.

Some "time-of-travel" debugging tools reduce the overhead of record tracing (e.g., processor and memory overhead, and output trace file size) by utilizing a shared cache of processors and their Cache Coherency Protocol (CCP) to determine which data should be logged (log) to the trace file. This can reduce the size of the trace file by several orders of magnitude compared to existing methods, thereby significantly reducing the overhead of trace recording.

Disclosure of Invention

Embodiments herein enhance the replay performance of logged traces by decoupling trace data dependencies (dependencies) within the CCP-based traces with the processor's Cache Coherency Protocol (CCP) so that these traces can be replayed in a thread-independent manner. In particular, when using CCP logging tracking, the values read by one thread may actually have been logged in connection with another thread, or may otherwise be available based on execution of the other thread. For example, the trace recorder may document a particular value stored at a particular memory location in relation to a first thread that initially reads the value from the particular memory location. Alternatively, if the value is written to a particular memory location by the first thread, the trace recorder may choose to avoid logging the value because it can be obtained by replay by the first thread. When a second thread reads the same value from a particular memory location via the shared cache, the logger may use the CCP data to choose to avoid re-logging the value associated with the second thread. Thus, the second thread becomes dependent on the first thread during trace replay.

While recording traces in this manner can significantly improve recording performance and reduce trace file size, this limits the ability of the debugger to parallelize the replay of the traced program during consumption of the trace file(s), which can significantly reduce the replay performance of applications for multithreading. Thus, embodiments employ techniques to process these CCP-based traces after they have been logged to eliminate these inter-thread dependencies. This includes identifying dependencies, determining any suitable values, and recording new trace data streams, and/or augmenting existing trace data stream(s) so that they can be replayed in a thread independent manner.

Drawings

In order to describe the manner in which the above-recited and other advantages and features of the invention can be obtained, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered to be limiting of its scope, the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings, in which:

FIG. 1 illustrates an example computing environment that facilitates recording and replay of "bit accurate" traces of code execution via a shared cache using Cache Coherency Protocol (CCP) data;

FIG. 2 illustrates an example of a shared cache;

FIG. 3 illustrates a flow diagram of an example method for performing cache-based trace records using CCP data;

FIG. 4A illustrates an example shared cache with each cache line thereof extended with one or more additional accounting bits;

FIG. 4B illustrates an example of a shared cache including one or more reserved cache lines for storing accounting (accounting) bits appropriate for regular cache lines;

FIG. 5 illustrates an example of a group association mapping;

FIG. 6A illustrates an application example of the method of FIG. 3;

FIG. 6B illustrates an example decoupled trace data stream corresponding to the example of FIG. 6A;

FIG. 7 illustrates an example of CCP-based logging, where the CCP allows logging of only one of the readers and the total count of the readers;

FIG. 8 illustrates an example of developing system inequalities for the CCP-based documentation of FIG. 7;

FIG. 9 illustrates an example of a graph that may be developed from the CCP-based documentation of FIG. 7;

FIG. 10 illustrates an example decoupled trace data flow corresponding to the CCP-based documentation of FIG. 7;

11A and 11B illustrate an update to the graph of FIG. 7 when a loop is removed; and

FIG. 12 illustrates a flow chart of an example method for decoupling trace data flows using CCP data.

Embodiments include a method, system, and computer program product for decoupling trace data flows using CCP data. These embodiments include: one or more trace data streams are identified within a trace of previous execution of the application, the one or more trace data streams documenting execution of the plurality of threads during the previous execution of the application. The one or more trace data streams include cache activity trace data and CCP trace data. The cache activity trace data includes one or more inter-thread data dependencies including one or more dependent cache activity trace entries, each dependent cache activity trace entry recording a corresponding memory access by a corresponding thread in dependence upon a CCP dependency between the traced threads. One or more inter-thread data dependencies are removed to create independent cache activity tracking data for each of the plurality of threads that enables each thread to be independently replayed. Tracking entries for each dependent cache activity, the removing comprising: (i) based on CCP dependencies between the tracked threads, identify corresponding values for corresponding memory accesses by the corresponding threads, and (ii) record the corresponding values for the corresponding memory accesses on behalf of the corresponding threads.

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

41页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种断言验证代码绑定方法及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!