Encoding of inputs to branch prediction circuits

文档序号：1327708 发布日期：2020-07-14 浏览：2次中文

阅读说明：本技术 分支预测电路的输入的编码 (Encoding of inputs to branch prediction circuits ) 是由托马斯·克里斯多夫·格鲁卡特石井康夫于 2018-10-19 设计创作，主要内容包括：一种数据处理装置包括：分支预测电路,适于存储与指令流相关的至少一个分支预测状态条目；输入电路,用于接收至少一个输入以生成新的分支预测状态条目,其中,所述至少一个输入包括多个位；以及编码电路,适于基于与正在执行所述指令流的当前执行环境相关联的值来对所述多个位中的至少一些位进行编码。这防止潜在的攻击,这些攻击利用由一个执行环境训练的分支预测条目被另一个执行环境用作分支预测基础的能力。(A data processing apparatus includes: branch prediction circuitry adapted to store at least one branch prediction state entry associated with an instruction stream; input circuitry to receive at least one input to generate a new branch prediction state entry, wherein the at least one input comprises a plurality of bits; and encoding circuitry adapted to encode at least some of the plurality of bits based on a value associated with a current execution environment in which the stream of instructions is being executed. This prevents potential attacks that take advantage of the ability of branch prediction entries trained by one execution environment to be used as the basis for branch prediction by another execution environment.)

1. A data processing apparatus comprising:

branch prediction circuitry adapted to store at least one branch prediction state entry associated with an instruction stream;

input circuitry to receive at least one input to generate a new branch prediction state entry, wherein the at least one input comprises a plurality of bits; and

encoding circuitry adapted to perform an encoding operation to encode at least some of the plurality of bits based on a value associated with a current execution environment in which the stream of instructions is being executed.

2. The data processing apparatus of claim 1, wherein the encoding operation comprises encoding at least some of the plurality of bits based on a value indicating a current execution permission for the instruction stream being executed.

3. The data processing apparatus according to any one of claims 1 and 2,

the encoding operation includes encoding the at least some bits by using a key, wherein the key is based on the current execution environment in which the instruction stream is being executed.

4. The data processing apparatus according to claim 3,

the encoding operation includes rearranging or switching the at least some of the plurality of bits using the key.

5. The data processing apparatus according to any one of claims 3 and 4,

the at least one input comprises an indication of an instruction address of a branch instruction;

the branch prediction circuit is adapted to receive a query value and perform a search using the query value, the query value comprising an indication of an instruction address of an instruction; and is

The encoding circuit is adapted to perform the encoding operation on at least some of the plurality of bits of the query value using the key prior to the search.

6. The data processing apparatus according to claim 5, wherein the encoding circuitry is adapted to recalculate the value of the key associated with the current execution environment and to perform the encoding operation on the at least some of the plurality of bits of the query value using the recalculated value of the key.

7. The data processing apparatus according to any one of claims 3 and 4,

the at least one input comprises an indication of a target address of a branch instruction;

The apparatus includes a reverse encoding circuit to perform a reverse encoding operation on an output of the branch prediction circuit, the output being output in response to receiving the query value.

8. The data processing apparatus of claim 7,

the reverse encoding circuitry is adapted to recalculate a value of the key associated with the current execution environment and perform the reverse encoding operation using the recalculated value of the key.

9. The data processing apparatus of any of claims 3 to 8,

the key is further based on any combination of one or more key input values indicative of at least one of:

exception level, privilege level, ASID, VMID, NS, physical processor core number and logical core number that is executing the instruction stream, one or more software-writable registers, and a previously generated random number.

10. The data processing apparatus of claim 9, wherein

The previously generated random numbers include at least one of:

an element of each logical processor;

an element of each physical processor; and

a system wide element.

11. The data processing apparatus according to any of claims 9 and 10, wherein the key is based on a one-way transformation applied to the one or more key input values.

12. The data processing apparatus of any of the preceding claims,

the instruction stream being executable in one of a plurality of execution environments adapted to execute at a minimum execution permission;

the encoding circuitry is adapted to perform the encoding operation further based on an identifier of one of the plurality of execution environments that is executing the stream of instructions.

13. The data processing apparatus according to any preceding claim, comprising:

monitoring circuitry adapted to detect a ratio of any combination of instruction fetch faults and instruction decode faults while the instruction stream is being executed in a speculative state, and to raise an exception or generate an erroneous response in response to the ratio satisfying a predetermined condition.

14. The data processing apparatus according to any preceding claim, wherein the branch prediction circuitry comprises a branch target prediction structure comprising a plurality of branch target entries, each branch target entry specifying at least one branch target address; and is

The encoding circuitry includes encryption circuitry to encrypt at least a portion of a new branch target entry to be written to the branch target prediction structure using an encryption key associated with the current execution environment.

15. The data processing apparatus according to claim 14, wherein each branch target entry specifies tag information and branch data specifying at least the branch target address; and is

The apparatus includes branch target prediction circuitry to perform a branch target prediction lookup on an instruction fetch address associated with the current execution environment, the branch target prediction lookup including determining whether any of a subset of branch target entries of the branch target prediction structure specifies tag information corresponding to a target tag determined for the instruction fetch address.

16. The data processing apparatus according to claim 15, wherein the value of the target mark is reusable in more than one execution environment.

17. The data processing apparatus according to any of claims 15 and 16, wherein the encryption circuitry is configured to encrypt at least a portion of the marking information of the new branch target entry using the encryption key.

18. The data processing apparatus according to claim 17, wherein the encryption circuit is configured to encrypt at least a portion of branch data of the new branch target entry using the encryption key; and is

The apparatus includes a decryption circuit to decrypt at least a portion of the branch data of one of the subset of branch target entries identified in the branch target prediction lookup as specifying tag information corresponding to the target tag.

19. The data processing apparatus according to any of claims 14 to 18, wherein the encryption key comprises a static key fixed for the current execution environment.

20. The data processing apparatus according to claim 19, wherein the static key of the current execution environment depends on a public key shared between at least two execution environments of the plurality of execution environments and at least one identifier specific to the current execution environment.

21. The apparatus of any of claims 14 to 18, wherein the encryption key comprises a dynamic key that is variable for the current execution environment.

22. The apparatus of claim 21, comprising key generation circuitry to generate an updated encryption key for the current execution environment.

23. The apparatus of any of claims 15 to 18, comprising a region table comprising a plurality of region entries, each region entry mapping branch context information to a region identifier, the region identifier comprising fewer bits than the branch context information, the branch context information comprising at least one identifier associated with a corresponding execution environment.

24. The apparatus of claim 23, wherein a target tag for the instruction fetch address comprises a target region identifier mapped by the region table to the branch context information associated with the instruction fetch address.

25. The apparatus according to any of claims 23 and 24, wherein each zone entry specifies an encryption key associated with the corresponding execution environment.

26. The apparatus according to any of claims 23 to 25, wherein, when a mapping provided by a given region entry of the region table is updated, the branch target prediction circuitry is configured to trigger updating of an encryption key associated with the execution environment associated with the given region entry after the mapping update.

27. The apparatus according to any of claims 14 to 26, wherein the branch target prediction circuitry is configured to determine the target tag from the instruction fetch address and a history of branch results for previous branch instructions preceding the instruction at the instruction fetch address.

28. A data processing apparatus comprising:

a storage unit to store at least one branch prediction state entry associated with an instruction stream;

a receiving unit to receive at least one input to generate a new branch prediction state entry, wherein the at least one input comprises a plurality of bits; and

an encoding unit to encode at least some of the plurality of bits of the at least one input based on a value associated with a current execution environment in which the instruction stream is being executed.

29. A method, comprising:

storing at least one branch prediction state entry associated with the instruction stream;

receiving at least one input to generate a new branch prediction state entry, wherein the at least one input comprises a plurality of bits; and

encoding at least some of the plurality of bits based on a value associated with a current execution environment in which the stream of instructions is being executed.

Technical Field

The present technology relates to the field of data processing. More particularly, it relates to branch prediction.

Background

The data processing apparatus may have branch prediction circuitry for predicting the outcome of a branch instruction before it is actually executed. By predicting the branch outcome before the branch instruction is actually executed, subsequent instructions following the branch may begin to be fetched and speculatively executed before the branch instruction execution completes, thereby preserving performance if the prediction is correct, as subsequent instructions may be executed earlier than if they were fetched only after the outcome of the branch is actually known.

Disclosure of Invention

At least some examples provide a data processing apparatus comprising:

branch prediction circuitry adapted to store at least one branch prediction state entry associated with an instruction stream;

input circuitry to receive at least one input to generate a new branch prediction state entry, wherein the at least one input comprises a plurality of bits; and

At least some examples provide a data processing apparatus comprising:

a storage unit to store at least one branch prediction state entry associated with an instruction stream;

a receiving unit to receive at least one input to generate a new branch prediction state entry, wherein the at least one input comprises a plurality of bits; and

At least some examples provide a method comprising:

storing at least one branch prediction state entry associated with the instruction stream;

receiving at least one input to generate a new branch prediction state entry, wherein the at least one input comprises a plurality of bits; and

at least some of the plurality of bits are encoded based on a value associated with a current execution environment in which the instruction stream is being executed.

Drawings

Further aspects, features and advantages of the present technology will become apparent from the following description of examples, which is to be read in connection with the accompanying drawings.

FIG. 1 schematically shows an example of a data processing apparatus with a branch predictor;

FIG. 2 shows an example of an encoding circuit for encoding a portion of the input to the branch prediction circuit;

3A-3C illustrate examples of encoding a portion of an input based on a key associated with a current execution context;

FIG. 4A illustrates an example of applying a reverse encoding operation to an encoded target address output by branch prediction circuitry based on a recalculated key associated with a current execution context;

FIG. 4B illustrates an example of applying an encoding operation to an instruction address used as part of a query to search branch prediction circuitry;

FIG. 5 illustrates an example of generating a key based on a plurality of identifiers associated with a current execution context;

FIG. 6 shows an example of a monitoring circuit for detecting instruction fetch failure rates;

FIG. 7 shows an example where an increase in instruction fetch failure rate of 20% or more triggers an error response;

FIG. 8 illustrates another example of a branch predictor including a branch target buffer and a branch direction predictor;

FIG. 9 illustrates a BTB table format for comparison, in which context information identifying a given execution context is specified in the tag information of each BTB entry;

FIG. 10 illustrates an alternative embodiment in which a region table is used to compress context information into shorter region identifiers that are used as tag information in the BTB table;

FIG. 11 illustrates a potential security problem that may arise in a system using such a region table, where an attacker may reuse the same region identifier from the region table with different execution scenarios;

FIG. 12 illustrates an example of encrypting branch information prior to storing the branch information in a branch target prediction structure and decrypting the branch information when reading the branch information from the branch target prediction structure based on an encryption key associated with a corresponding execution context;

FIG. 13 shows an example of entries of a BTB table and a zone table according to the example of FIG. 12;

FIG. 14 shows an example of changing an encryption key when updating a zone table entry;

FIGS. 15 and 16 illustrate corresponding examples of using encryption to protect multi-target indirect branch predictors from such attacks;

FIG. 17 is a flow diagram illustrating a method of performing a branch target prediction lookup.

FIG. 18 is a flow chart illustrating a method of generating a looked-up target marker value based on a region table.

Detailed Description

The processing circuitry may perform data processing in one of a plurality of execution environments. For example, each execution environment may correspond to a different software process executed by the processing circuitry, software of a different privilege level (e.g., applications and operating systems), a different portion of a given software process, a different virtual machine executing on the processing circuitry, and so forth. Branch prediction circuitry may be provided for storing at least one branch prediction state entry associated with the instruction stream. The branch prediction state entry may specify the predicted nature of the branch, such as the predicted taken or not-taken outcome, or the predicted branch target address. Branch prediction circuitry may be used to predict the outcome of a branch instruction before it is actually executed, so that subsequent instructions are fetched and speculatively executed earlier based on the predicted branch nature. Input circuitry may be provided for receiving at least one input to generate a new branch prediction state entry, wherein the at least one input comprises a plurality of bits. For example, the input may specify at least a portion of an instruction address of a branch instruction to which the new entry is to be allocated to the branch prediction circuitry, and/or an actual branch target address of the branch instruction which may be used as the predicted branch target address at a future occasion. To reduce the amount of memory required, some embodiments may store the branch target address indirectly, such as by specifying it as an offset from the address of the branch instruction.

The branch prediction mechanism is generally considered a performance enhancement mechanism, whose misprediction (misprediction) is not important to the security of the data processed by the system, but only affects the performance level achieved. Therefore, no security measures are generally required to protect the contents of the branch predictor.

The present technology provides encoding circuitry for performing an encoding operation to encode at least some of a plurality of bits of an input received by input circuitry based on a value associated with a current execution environment in which an instruction stream is being executed. The encoded version of the input may then be used to form a new branch prediction state entry to be generated for the branch prediction circuit.

Counterintuitively, it has been recognized that branch prediction circuitry may provide a path by which an attacker may bypass security protections provided on processing circuitry that restricts one execution environment from accessing data associated with another execution environment. This is because the branch prediction circuitry may allow access to the branch prediction state entry from a different execution environment to the execution environment to which the entry is allocated, so that the branch prediction state allocated to the branch prediction circuitry by the first execution environment may be used to control the behaviour of branches executed in the second execution environment. Previously, this was only considered a performance issue, just as the second execution environment encountered a wrong entry allocated by a different scenario, during execution of the branch by the second execution environment, a misprediction may be later identified once the actual branch outcome of the branch is identified as not matching the prediction. However, it has been recognised that instructions that are executed speculatively in error due to mispredicted branches may still affect data in the cache or other non-architected storage structure used by the data processing apparatus, which an attacker may use in an attempt to obtain some information about potentially sensitive data accessible to the second execution environment.

By providing encoding circuitry to apply encoding of at least some bits of input to branch prediction circuitry when generating a new branch prediction state entry based on values associated with the current execution environment, the branch predictor output will be different even if two different execution environments provide the same input, because the encoding of at least some bits of input is based on the values of the particular execution environments, which makes it difficult for one execution context to train the branch prediction circuitry with malicious branch information in an attempt to trick the other execution context into reusing the same branch prediction state entry. This reduces the risk of attacks of the type described above. The benefit of such an encoding circuit is unexpected because branch predictors are not generally considered to constitute a security risk, but rather are considered to be purely performance enhancing measures.

In some examples, the encoding operation may include encoding at least some of the plurality of bits of the input value based on a value indicating current execution permission for the stream of instructions being executed. This enables the execution environments associated with different execution permissions to encode the input in different ways, making it more difficult for an attacker to successfully control the branching of the victim execution environment to a desired target address with different execution permissions for the branch in an attempt to expose data that is inaccessible to the attacker but accessible to the victim execution environment, because when encoded with values associated with the attacker's execution environment, it may be difficult for the attacker to guess what values of the input match the target input encoded with different values associated with another execution environment to attack.

The encoding operations applied by the encoding circuitry may include any operation that changes the bit values of at least some bits of the input based on the value associated with the current execution environment. It is noted that a mere association of the value associated with the current execution environment with the input received by the input circuit will not be considered as an encoding of at least part of the input bits, since in this case all bits of the input will still retain their original values. Thus, in general, the encoding alters the value of at least one bit using a transformation defined by a value associated with the current execution environment.

For example, the encoding operation may include encoding at least some of the bits using a key, where the key is based on a current execution environment in which the instruction stream is being executed. The encoding operation may include rearranging at least some of the bits (e.g., shifting or otherwise reordering the bits) using the key. Further, the encoding operation may include switching at least some bits of the input using the key. For example, switching may be accomplished by applying an XOR (exclusive OR) operation on at least some of the bits of the input and a key derived from a value associated with the current execution environment. The XOR operation is efficient in terms of performance and hardware. Alternatively, the encoding operation may comprise a hash function applied to at least some bits of the key-based input. The encoding operation may be reversible (e.g., applying a second XOR operation on the result of a previous XOR operation using the same key may restore the original input) or may be a one-way hash function.

The input to the branch prediction circuit may include a plurality of pieces of information. The encoding may be applied to different parts of the input. The entire input need not be encoded.

For example, the at least one input may comprise an indication of an instruction address of a branch instruction, which may be used to form a new branch prediction state entry, thereby providing a prediction of the branch instruction. When querying the branch prediction circuit, the branch prediction circuit may receive a query value including an indication of an instruction address of an instruction for which branch prediction is to be performed and perform a search using the query value. For example, the search may identify whether the branch prediction circuit stores any branch prediction state entries associated with one or more instructions corresponding to the instruction address specified by the query value. For example, each entry may include some tag information or other state data, enabling the search to identify whether the query provided to the branch prediction circuit matches the entry. If the query misses, and it is subsequently determined that the instruction address indicated by the query corresponds to a branch instruction, a new branch prediction state entry may be allocated to the branch prediction circuitry, specifying the tag information or other state corresponding to the query that caused the miss, and specifying the actual branch information determined for the branch instruction.

In one example, prior to performing the search, the encoding circuitry may perform an encoding operation on at least one of the plurality of bits of the query value using a key derived from one or more values associated with the current execution environment. Thus, different execution environments encode the same query in different ways based on their environment-specific keys to alter the mapping between the input query and the entry of the branch prediction circuit returned as a match, thereby making it more complex for an attacker to predict what input values should be provided as a query to train the branch predictor for spoofing branches associated with some other query value in different execution environments to use the prediction state assigned by the attacker's execution environment. One advantage of applying encoding to the query value (rather than to the predicted branch state, e.g. the target address for the branch prediction) is that applying an encoding operation to the query input of the branch prediction circuit may be sufficient to frustrate an attacker, and therefore no decoding has to be applied when reading branch state information from the branch predictor, since the predicted branch information may still be stored in the branch prediction circuit in the clear. This may improve performance by avoiding additional timing paths for the decoding circuitry at the output of the branch predictor.

In other examples, the at least one input (whose bits are encoded by the encoding circuitry to form the new branch prediction state entry) may include an indication of a target address of the branch instruction (also referred to as a "branch target address" or a "branch target offset"). In this case, the apparatus may further include an inverse encoding circuit (or decoding circuit) to perform an inverse encoding operation on an output of the branch prediction circuit, the output being responsive to a search of the branch prediction circuit triggered based on the query value indicative of the instruction address. The inverse encoding operation may be any operation that reverses the effect of the encoding operation applied by the encoding circuit to recover the bits (whose values were converted by the encoding operation) original values. In this case, the predicted branch state does not change the mapping between the query value and which entry of the accessed branch prediction circuit, but rather is encoded in a context-specific manner based on the key associated with the current execution context, so even if the second execution context encounters a branch prediction state entry trained by the first execution context, the resulting branch prediction may be different from the prediction made when the same entry was accessed from the first execution context. This makes it more difficult for an attacker to successfully control the locations to which different execution environments branch by maliciously training the branch predictor. This may improve performance by reducing the chance of encountering a wrong branch prediction state entry.

The reverse encoding circuit may recalculate a value of a key associated with the current execution environment and perform a reverse encoding operation using the recalculated key value. Thus, if the current execution environment changes between the time the branch prediction state entry is allocated and the time the branch prediction state entry is accessed for prediction, the reverse encoding operation may yield different information to the information provided as the predicted branch state at the time the entry is allocated due to recalculation of the key derived from the value associated with the current execution environment.

It is also possible to apply an encoding operation to the instruction address and the target address of the branch in order to combine the two approaches.

The key associated with the current execution environment may be based on a combination of one or more identifiers associated with the current execution environment. For example, the key may be based on any combination of one or more of the following:

exception level (to distinguish between different modes of operation, e.g., user mode, kernel mode, hypervisor mode);

permission levels (to distinguish between different execution permissions);

ASID (address space ID-to distinguish different application level execution contexts);

VMID (virtual machine ID-distinguishes between different operating system or virtual machine level execution contexts or applications, the same ASID running under the control of different operating systems or virtual machines);

NS (non-secure/secure state, indicating the current secure state of the device);

physical processor core number (distinguishing processes executing on different processor cores provided in hardware);

number of logical cores (to distinguish execution environments executing using different logical partitions of a shared processor core provided in hardware); and

one or more software-writable registers (so that the software can provide further input for deriving a key to provide further changes to the key, which may make it more difficult for a process that knows the context identifier (such as an ASID or VMID) of the process executing under it to predict the value of the key used by that process, for example).

In addition, the key may also be generated based on a previously generated random number. This may provide further variation in the key generated for a given combination of identifiers associated with the current execution environment, making it more difficult for an attacker who is able to identify the key used by one device (or the function used to derive the key) to apply this knowledge to other devices that may use different random numbers. The random number may include: at least one of an element of each logical processor, an element of each physical processor, and a system-wide element. At least a portion of the previously generated random number may be generated at boot-up such that the random number changes each time the data processing apparatus boots up to provide further tamper resistance of the key associated with the given execution environment. For example, a hardware random number generator or pseudo-random number generator may be triggered to generate a new random number each time the device is booted. At least a portion of the previously generated random numbers may be pseudo-random — true randomness may not be required.

In some embodiments, the instruction stream may be executed in one of a plurality of execution environments adapted to execute with the lowest execution permission (execution permission level with lowest privilege). For example, the execution environment at the lowest execution permission may include the application or a sub-portion of the application. In some examples, the encoding circuitry may further perform the encoding operation based on an identifier of one of the plurality of execution environments that is executing the instruction stream at a lowest execution permission. This may allow different portions of application level software (which may share the same address translation mechanism and therefore may desire to share the mapping of branch predictor inputs to branch predictor entries) to use different encodings of branch predictor inputs to reduce the risk of one application or portion of an application causing the above-described form of attack on another application or portion of an application sharing the same address translation mechanism.

Monitoring circuitry may be provided for detecting a failure rate of instruction fetching or decoding when the instruction stream is executed in a speculative state, and for causing an exception or generating an error response in response to the detected failure rate of instruction fetching and/or decoding meeting a predetermined criterion (e.g. increasing beyond a predetermined threshold). For example, the threshold may be at least 20% higher than the previous ratio. Alternatively, the threshold may be 20% higher than the average ratio of other applications. This may provide a technique for detecting attacks of the form described above. If the number of instruction fetch faults increases, one possible explanation may be that an attacker is trying to train the branch predictor to entice other code to execute instructions from an inappropriate branch address, which is then detected as a misprediction. When the failure rate of speculative instruction fetching or decoding increases, a warning of potential attacks may be given by triggering an exception or error response. How software chooses to respond to such alerts may depend on the particular software being executed, but when an abnormally high instruction fetch fault is detected, the monitoring circuitry provided to trigger the interrupt/error response may provide a hardware framework that enables the software to respond in a manner appropriate to the software.

In one example, a branch prediction circuit may include a branch target prediction structure including a plurality of branch target entries, each specifying at least one branch target address. The encoding circuitry may include encryption circuitry to encrypt at least a portion of the new branch target entry to be written to the branch target prediction structure using an encryption key associated with the current execution environment.

Each branch target entry may specify tag information. The apparatus may have branch target prediction circuitry to perform a branch target prediction lookup on an instruction fetch address associated with a current execution environment. The branch target prediction lookup may include determining whether any subset of branch target entries specifies tag information corresponding to a target tag determined for an instruction fetch address. The subset of branch target entries found in a branch target prediction lookup may contain all branch target entries of the branch target prediction structure in the case of a fully associative cache implementation, or may contain only some branch target entries (a subset of branch target entries) in a set associative implementation. In a set associative implementation, for example, a subset of branch target entries to be looked up may be selected based on a portion of the address of a given branch instruction.

In general, the target tag may be determined in some manner based on some property of the instruction fetch address, the current execution environment in which the fetch address appears, or some history of recent operation of the processing circuitry (indicating some property of the current execution point represented by the instruction fetch address). The particular label used may vary depending on the type of branch target prediction structure implemented. For example, the target tag may be derived from an instruction fetch address or one or more identifiers associated with the current execution environment, and may also be based on a past history of branch results that resulted in the instruction(s) identified by the instruction fetch address.

The value of the target mark may be reused in more than one execution environment. Thus, the target mark may not be unique to a particular execution environment. This may be, for example, because a particular branch target prediction structure uses a tag that is completely independent of an identifier associated with the current execution environment, or because, although the tag includes at least a portion derived from a value associated with the current execution environment, to preserve circuit area in the tag information store of the branch target prediction structure, the tag information and target tag may be based on a compressed version of one or more execution environment identifiers that identify the current execution environment, such that the value of the compressed identifier may be reused from one execution environment to another.

Thus, there is no guarantee that a target tag used in one execution environment will not match tag information assigned to entries of the branch target prediction structure after execution of branch instructions associated with different execution contexts. This may result in false positive hits in the branch target prediction structure, which may sometimes return the wrong branch target address, so a branch prediction miss may result in the wrong instruction being executed after the branch. While such false positive hits may result in reduced processing performance, branch prediction miss resolution circuitry may have been provided to handle prediction misses by: the triggering processor flushes the pipeline instructions after predicting the mispredicted branch and resumes instruction fetching from the correct processing path after resolving the branch outcome. Therefore, false positive hits due to reuse of tag values across multiple execution environments are not generally considered a major problem, as they may be resolved in a manner similar to other causes of branch prediction misses, which, while impacting performance, are not generally considered a risk to data security.

However, it will be appreciated that in fact such false positive hits in the branch target prediction circuit may constitute a security hole to the security of the data being processed by the data processing apparatus. The apparatus may restrict access to certain data to a particular execution environment, for example using a privilege-based data access protection scheme. It has been recognized that false positive hits in branch target prediction circuitry may allow circumventing such security mechanisms so that a first execution environment controlled by an attacker can obtain information about sensitive information that is accessible by a second execution environment and not accessible by the first execution environment. This is surprising because branch prediction mechanisms are generally considered as a performance enhancement mechanism, whose prediction errors are not critical to the security of the data processed by the system, but merely affect the performance level achieved.

Although, ultimately, branch prediction may be determined to be incorrect because the actual target address of a branch executed by the second execution environment may not match the target address of the miss hit, and any architectural state associated with the miss executed instruction may be rewound to the previous correct state value to reverse the architectural effect of the miss executed instruction, while the miss executed instruction may also alter the non-architectural processor state, e.g., cache or translation look-up buffer (T L B), which may exist after resolving the prediction.

Thus, at least a portion of a new branch target entry to be written to the branch target prediction structure may be encrypted using an encryption key associated with the corresponding execution environment to which the branch information is assigned. The tag information of the new branch target entry or the branch data of the new branch target entry (specifying at least the predicted branch target address) or both may be encrypted. If the marker information is at least partially encrypted using an encryption key when assigning a new branch target entry, during a branch target prediction lookup: the encryption circuitry may encrypt the target tag determined for the instruction fetch address using a (re-computed) encryption key associated with the current execution environment, and the branch target prediction circuitry may compare the encrypted target tag with tag information of the subset of branch target entries to identify whether any of the subset of branch target entries specifies tag information corresponding to the target tag; or whether the encrypted tag information stored in the lookup entry can be decrypted and compared to the (unencrypted) target tag. If the branch data is encrypted when a new entry is assigned to the branch prediction structure, the apparatus may also have a decryption circuit to decrypt an encrypted portion of branch data of one branch target entry of the subset of branch target entries identified in the branch target prediction lookup as specifying tag information corresponding to the target tag.

For example, in one particular example, in a branch target prediction lookup, when none of the lookup subset of branch target entries specifies tag information corresponding to a target tag, and the instruction fetch address specifies a block containing one or more instructions of the branch instruction, the encryption circuitry may encrypt actual branch information determined for the branch instruction using an encryption key associated with the current execution environment, and the branch target prediction circuitry may assign a branch target entry to the branch target prediction structure, the branch target entry specifying the encrypted branch information and tag information corresponding to the target tag. On the other hand, on a lookup hit, when one of a subset of branch target entries does specify tag information corresponding to a target tag, the branch information stored in that entry may be decrypted using an encryption key associated with the current execution environment and then output as predicted branch information for a given branch instruction.

Thus, in this example, since the branch information in the branch target prediction structure is protected by the encryption key associated with the corresponding execution environment associated with the branch information, if one execution environment assigns branch information, it will be encrypted using the key associated with that environment, and then if a false positive happens to occur when another execution environment reuses the same tag information of the entry, the branch information will be decrypted using the key associated with the other execution environment, and therefore will not indicate the same branch target address as that originally provided by the execution environment that assigned the entry. It is generally believed that branch predictors are a pure performance enhancement that does not affect the security or integrity of the data, and surprisingly encryption is useful in branch predictors, but by encrypting branch information using execution environment specific keys, this makes attacks of the type described above more difficult because it is more difficult for attackers to control where another execution environment branches to when they do not know the key associated with each execution environment.

In some examples, the encryption circuit and the decryption circuit may comprise separate circuits. For example, in some cases, an operation applied to decrypt encrypted branch information may be different from an operation applied to encrypt branch information, and thus separate encryption and decryption methods may be applied. Alternatively, the encryption and decryption operations may actually be the same operation. For example, the encryption operation may include applying a reversible operation (e.g., XOR) to the branch information and the encryption key, and the decryption operation may include applying the same reversible operation (e.g., XOR) to the encrypted branch information and the encryption key. Thus, in some examples, the encryption circuit and the decryption circuit may both correspond to the same circuit provided in hardware (it is not necessary to provide two separate circuit units). Alternatively, separate encryption and decryption circuits may be provided to allow decryption of branch information for one entry of the branch target prediction structure to be performed in parallel with encryption of branch information for another entry, even if the encryption and decryption operations are the same.

In some examples, the encryption key may comprise a static key that is fixed for the current execution environment. Thus, each execution environment may be associated with some fixed key that never changes. This may be sufficient to provide sufficient security, as it is still difficult for an attacker to predict the outcome of decrypting a value encrypted using an encryption key that is different from the keys used in the encryption, one or more of which are unknown. Although many environment-specific keys may be stored in the storage structure of each execution environment, this may require a large amount of storage since the number of execution environments may be large. A simpler method for determining the static key of the current execution environment may be to derive the static key from a public key shared between the multiple execution environments and at least one environment identifier specific to the current execution environment. For example, as described above, the public key may be a previously generated random number. For example, the public key may be hashed or modified based on the identifier(s) of the current execution environment (e.g., the ASID, VMID, etc., described above).

In another approach, the encryption key may comprise a dynamic key that is variable for the current execution environment. Thus, in addition to changing in different environments, encryption keys may be changed from time to time for a particular environment. This may provide greater security because it reduces the chances of an attacker gaining access to the keys of a given environment by observing the behavior of the data processing system over a period of time. Thus, the apparatus may comprise key generation circuitry for generating an updated encryption key for the current execution environment. For example, the key generation circuit may include a random or pseudo-random number generator, such as a linear feedback shift register, to generate new random or pseudo-random values for the encryption key of a given execution environment, if needed. The timing of the key update may be arbitrary, or may be in response to some predetermined event, or may be at the expiration of a certain time period or number of branch prediction events. In some examples, the tag information in each branch target entry may be stored unencrypted, such that no encryption or decryption is applied to the tag information. This may simplify tag comparison. Encrypting the branch information may be sufficient to reduce the probability of the attack described above.

In addition, the tag information may be encrypted in addition to (or instead of) encrypting branch information indicating the branch target address. Thus, when an instruction fetch address is identified as referencing an instruction block that includes a branch, if there is a miss in a branch target prediction lookup, the target tag may be encrypted using an encryption key associated with the current execution environment, and the encrypted target tag may be designated as tag information for the assigned branch target entry. In a branch target prediction lookup, the decryption circuitry may decrypt the tag information for each subset of branch target entries, and the branch target prediction circuitry may compare the decrypted tag information to the target tag. Alternatively, the encrypted target tag from the branch target lookup may be directly compared to the encrypted tag information stored in the branch target prediction structure to identify whether the tags match, thereby avoiding the need for decryption.

In some cases, the tag and branch information may be encrypted or decrypted separately, one encryption applied to the tag and another encryption applied to the branch, in which case any method of handling the encryption of the tag (comparing in encrypted or decrypted form as required) may be used. However, in other examples, the tag information and the branch information may be encrypted together in a single encryption scheme. This may provide additional security because it may be more difficult to break the encryption applied to the branch information if the encryption value depends not only on the branch information and the encryption key of the particular environment, but also on the marker information provided with the branch (which may provide additional entropy for the encryption of the branch information). However, this may be slow, as in this case the decryption circuitry may need to decrypt the entire tag block and branch information for each entry looked up before comparing the decrypted tag with the target tag to see if there is a matching entry and if so, may output the decrypted branch information.

Thus, in summary, the method used for encryption/decryption and the degree of encryption of the branching information and the tokens may depend on the required compromise between performance and security.

In general, any information indicating the branch target address may be used as the branch information. In some examples, the branch information (when seen in the unencrypted item before encryption or after decryption) may explicitly indicate the branch target address. The branch target address may be indicated as an absolute address or may be indicated as a relative address using an offset from the current instruction fetch address at which the branch target prediction lookup was performed. In some examples, the branch target address may not be directly identifiable by the branch information, but the branch information may provide a pointer to other structures that identify the branch target address, or may be used to calculate the branch target address. Thus, in general, branch information may include any information that allows a predicted branch target address to be determined.

In some examples, the branch information may not specify any other information than the branch target address. However, in some examples, the branch information may also indicate at least one other piece of branch information representing certain prediction properties of branch instructions in one or more instruction blocks identified by the instruction fetch address. For example, the additional information may specify whether the branch is a conditional branch instruction, whether the branch target address should be predicted using some other branch target predictor separate from the branch target prediction structure (e.g., a branch target predictor for predicting polymorphic branches whose target address varies according to past processing results prior to the branch), or whether the branch represents a function call or a function return. Such additional branch information may be encrypted/decrypted along with information identifying the branch target address.

Although the target tag may depend on a series of attributes of the instruction fetch address, in one example, the branch target prediction circuitry may determine the target tag from at least one context identifier associated with the current execution context. This helps to avoid false positives between predictions made for the same address in different circumstances. For example, the environment identifier may include a virtual machine identifier that identifies a virtual machine associated with the address and/or a process identifier that identifies a process associated with the address.

One may expect that the target mark value should not be reused between execution environments if the target mark depends on at least one execution environment identifier associated with the current execution environment. However, in practice, the data processing apparatus may execute a large number of different contexts, but at any time the number of different contexts with information stored in the branch target prediction structure may be much lower. Thus, representing the complete context identifier in the tag information of each entry may require a large number of bits, and in practice, much of this information may be redundant, since for branch target prediction lookup purposes, the current execution context need only be distinguished from other execution contexts that currently cache branch information in the branch target prediction structure, and need not be distinguished from other contexts that do not represent branch information in the branch target prediction structure. Therefore, storing a complete context identifier may unnecessarily increase the size of each region entry and require multiple comparators to compare individual bits of the tag information to the target tag, thereby increasing circuit area.

Thus, to reduce circuit area and power consumption, some implementations may provide a region table having a plurality of region entries, each region entry mapping branch context information to a region identifier, the region identifier having fewer bits than the branch context information. The branch context information may include at least one identifier associated with the corresponding execution environment. In performing a branch target prediction lookup, a target tag may be determined based on a target region identifier mapped by a region table to branch context information (including at least one identifier) associated with a current instruction fetch address. The tag information for each branch target entry may specify a region identifier in place of the identifier(s) of the execution environment. Thus, the region table effectively allows a larger set of execution environment identifiers and any other information used to identify branch context information to be compressed into a shorter identifier that is used as a marker in the branch target prediction structure to save area.

In methods using such a region table, the number of region entries may be limited, so that when a new execution environment is encountered, in which no corresponding region entry has been allocated in the region table, the region entries previously allocated to a different execution environment may need to be reused for the current execution context. When a region entry is reused, the corresponding region identifier may still be used as a marker for certain entries in the branch target prediction structure. While these stale BTB entries may be invalidated from the BTB prediction structure to prevent false positive hits, performing such invalidation may be expensive in terms of performance and complexity, as it may require special circuitry to traverse the BTB prediction structure to evict reused information for the selected region identifier. In practice, this performance cost may not be justified because in any case, if there is a branch prediction miss based on the entry, it is expected that the stale entry will be evicted. Thus, implementations using region tables do not invalidate branch target entries at region table updates, which is prone to false positives in branch target prediction structures. Therefore, encryption/decryption of branch information as described above is particularly useful for a branch target predictor using a region table to improve security.

Although the encryption key associated with each execution environment may be stored in a separate storage structure, each zone entry may effectively specify the encryption key associated with the corresponding execution environment when a zone table is provided. Thus, the encryption key can be read from the area table at the same time as the target area identifier is looked up, saving the need to look up a separate storage structure separately. This approach also means that the encryption key does not have to be stored for all execution environments. Instead, keys can only be maintained for the particular execution environment that is currently mapped to the region identifier by the region table. For other execution environments, no encryption keys need to be maintained, as they do not currently involve branch target prediction structures. This approach may therefore also reduce the number of keys that need to be stored.

The encryption key associated with a given execution environment may be updated when the mapping provided by a given zone entry of the zone table is also updated. That is, when a zone entry of the zone table must be assigned to a different execution environment, the encryption key specified for the zone entry may also be updated to generate a new key for the execution environment associated with the zone entry after the mapping update. This may prevent an old key associated with one environment from remaining unchanged in the new environment, thereby ensuring that each environment uses a different key.

In some implementations, the branch context information mapped to the region identifier by the region table may include one or more execution environment identifiers that identify the current execution environment, but may not include any other information.

However, in other approaches, the branch context information may also depend on other information. For example, branch context information may also include a portion of the instruction fetch address for which a previous branch target prediction lookup resulted in the assignment of a given region entry to the region table. Thus, a portion of the instruction fetch address is used to look up the region table and identify the corresponding region identifier, which avoids the need to store the portion of the instruction fetch address as tag information in each branch target entry of the branch target prediction structure. In general, for a large number of fetch addresses used by a given execution environment, the most significant portion of the instruction fetch address may be the same, and the number of different values for this most significant portion may be relatively low for all instruction fetch addresses used within a given time frame in a particular execution environment. Thus, by representing this portion of the instruction fetch address in the region table and compressing it along with the execution environment identifier into a shorter region identifier, it is possible to reduce the amount of tag storage required per branch target entry and the amount of comparison logic to compare the target tag with the stored tag information.

Other forms of branch target prediction structures may use tags that are independent of the execution environment identifier associated with a given execution environment. For example, one type of branch target prediction structure may be used to predict the target addresses of certain branch instructions that vary in behavior according to previously executed branches. For example, the branch target address of a given branch instruction may be calculated as one of a range of different possible branch target addresses, e.g. from the results of previous conditional instructions, and the conditional instructions themselves may depend on the past history of branch results. For such a branch target prediction structure, the marking may depend on the instruction fetch address and the history of branch results for previous branch instructions at the instruction fetch address, but may be independent of the context identifier(s) of the current execution context. Thus, with this type of tag, the value of the tag may be reusable in multiple execution environments, as a given instruction address and history of branch results may be generated in different environments. Also, this may provide an attacker with a way to fill in the branch target prediction structure the branch information of a given branch history that is expected to be used in a certain victim context to be attacked, trying to force the victim context to branch to a certain instruction address controlled by the attacker context due to false positive hits between different contexts. Such an attack may be made more difficult by using encryption (if decryption is required) of branch information and/or tag information in the branch target prediction structure as described above, because decrypting encrypted branch information using the encryption key of the wrong context is less likely to result in a known address being controllable by an attacker.

Fig. 1 schematically shows an example of a data processing apparatus 2 having a processing pipeline comprising a plurality of pipeline stages. The pipeline includes a branch predictor 4 for predicting the outcome of a branch instruction and generating a series of fetch addresses for the instruction to be fetched. The fetch stage 6 fetches the instruction identified by the fetch address from the instruction cache 8. The decode stage 10 decodes the fetched instruction to generate control information for controlling subsequent stages of the pipeline. The renaming stage 12 performs register renaming to map the architectural register specifiers identified by the instruction to physical register specifiers identifying registers 14 provided in the hardware. Register renaming is very useful to support out-of-order execution because it may eliminate hazards between instructions by mapping instructions specifying the same architectural register to different physical registers in the hardware register file to increase the likelihood that instructions may be executed out of order from the program fetching the instructions from cache 8, which may improve performance by allowing later instructions to execute while previous instructions wait for operands to be available. The ability to map architectural registers to different physical registers also helps to rerun the architectural state in the event of a branch prediction miss. The issue stage 16 queues instructions waiting to be executed until operands required to process the instructions are available in the registers 14. The execution stage 18 executes instructions to perform corresponding processing operations. Write back stage 20 writes the results of the executed instructions back to registers 14.

The execution stage 18 may include a number of execution units, such as a branch unit 21 to evaluate whether a branch instruction was correctly predicted, an A L U (arithmetic logic unit) 22 to perform arithmetic or logical operations, a floating point unit 24 to perform operations using floating point operands, and a load/store unit 26 to perform load operations to load data from the memory system into registers 14 or store operations to store data from registers 14 into the memory system, hi this example, the memory system includes a level one instruction cache 8, a level one data cache 30, a level two cache 32 shared between data and instructions, and a main memory 34, but it should be understood that this is just one example of a possible memory hierarchy and that other implementations may have a higher level cache or a different arrangement.

FIG. 2 shows an example of providing input to the branch predictor 4 to generate a new branch prediction state entry to be allocated. The input comprises a number of bits. For example, the input in this example specifies an instruction address of an instruction identified as a branch, and a branch target address (target address) indicating an address to branch to when the branch is executed. Encoding circuitry 52 is provided for performing an encoding operation on at least some bits of the input based on a value 54 associated with the current execution environment in which the instruction stream is being executed. The encoded input value resulting from the encoding operation is then supplied to the branch prediction circuit 4. For example, the encoding may be applied to all or part of one or both of the input instruction address or target address.

Figures 3A to 3C show different examples of performing an encoding operation (in examples 3A and 3B, applied to a portion of the target address of a branch, although it may also be applied to the instruction address as in example 3C). As shown in fig. 3A through 3C, some bits of the input address may be removed before applying the encoding operation to the remaining bits. In FIG. 3A, the rearrangement of the relative order of the remaining bits is performed based on a key derived from a value associated with the current execution environment (or current execution permission). For example, a right shift of several bits specified by the key may be performed, or some other reordering of the bits may be performed. In the example of FIG. 3B, the encoding operation is performed as an XOR of the keys, with the selected input bits sent to the branch predictor. XOR is an example of a reversible encoding operation for which the inverse encoding operation may be performed by performing further XOR based on the corresponding key. In the example of fig. 3C, a hash function may be applied to selected bits of the input based on the key. Note that the key need not contain the same number of bits as the selected bit being encoded. In general, regardless of the particular form of encoding applied, the bit values of certain values of the input have changed before the encoded input is provided to the branch prediction circuitry. In the example of fig. 3A and 3B, which illustrate encoding of the target address, the removed bits may be bits whose values may be derived or known when reverse encoding is applied. For example, if instructions are already aligned with their size, one or more low order bits of the address may be removed because they are always zero. Similarly, if the target address is represented as an offset from the instruction address, the most significant bits may always be zero and thus may be removed before applying the encoding, since they do not need to be restored when applying the reverse encoding. In contrast, for encoding of instruction addresses, there is no need to ensure that the removed bits can be recovered by reverse encoding, since this approach does not require reverse encoding, and thus the translation may be a one-way translation.

Fig. 4A and 4B illustrate two examples of a query branch prediction circuit that retrieves predicted branch information in response to a query specifying a given instruction address. FIG. 4A may be used in an example where the encoding circuitry 52 of FIG. 2 applies an encoding operation to at least some bits of the target address of a branch. In this example, a query is provided to the branch prediction circuitry, the query specifying an instruction address for which branch prediction is to be performed. Branch prediction circuit 4 performs a lookup of its storage structure to identify whether any entries match the query, and if so, retrieves and outputs the encoded target address (generated by encoding circuit 52 based on a value 54 associated with the current execution environment). The reverse encoding circuitry 56 (which may be the same as the encoding circuitry 52 in some embodiments or may be different depending on the encoding/decoding algorithm implemented) applies a reverse encoding operation to the encoded target device based on a recalculated key 58, the recalculated key 58 being formed in the same manner as the value associated with the current execution environment 54 used in FIG. 2, except that it is recalculated based on parameters associated with the current execution environment 54 at the time the query for the branch prediction circuitry is executed rather than at the time the matching branch prediction entry is written to the branch prediction circuitry (as shown in FIG. 2). Thus, if a matching entry is accessed after a query triggered by the same execution environment as that which allocated the entry, the resulting target address output by the inverse encode circuitry 56 will be the same as the target address originally provided as input in FIG. 2. However, if the matching entry is accessed from an execution environment different from the execution environment that allocated the matching entry, the decoded target address 60 is different from the originally allocated target address 60. The encoding/reverse encoding algorithm and/or key (or method for generating the key) may be selected to make it difficult for an attacker to predict which alternate value of the target address should be used as input in fig. 2, so that the reverse encoded address 60 (when reverse encoded using a different key that encodes the address) matches the desired branch target address to which the attacker wishes the other execution environment to branch.

FIG. 4B illustrates an alternative method for querying branch prediction circuit 4 in an example where encoding circuitry 52 applies an encoding operation to an instruction address (or other tag information used to locate a matching entry in branch prediction circuit 4). In this case, when querying the branch prediction circuit 4 to perform a search for branch information, the corresponding encoding circuit 52 (which may be the same physical circuit used in FIG. 2, or a different circuit) applies the same encoding operation, e.g., applies a one-way hash to encode the instruction address according to the key, to the queried instruction address based on the recalculated key 58 (at the querying branch predictor) recalculated according to the identifier associated with the current execution environment. With this approach, it is not necessary to use reversible operations, since the data output from the branch prediction circuit 4 does not need to be encoded, but rather the ability to resist attacks is provided by scrambling the mapping between the query input by the branch predictor and the locations in the branch predictor that are deemed to match the query. Thus, branch prediction circuit 4 is searched based on the hashed query information, and the target address 62 output from the matching entry may be output in an unencrypted item and used as the predicted branch target address for the branch represented by the instruction address provided as the query input, without applying any reverse encoding.

FIG. 5 illustrates an example of forming a key from a plurality of identifiers associated with a current execution environment. These identifiers may be retrieved from one or more control registers associated with the processing pipeline 2, the processing pipeline 2 specifying the properties of the current execution environment. For example, the key may be based on any of:

an exception level 79;

execution permission level 80;

address Space ID (ASID) 81;

virtual Machine ID (VMID) 82;

safe state 83;

the number of physical cores 84;

the number of logical cores 85;

the random value 86 may be a true random number or a pseudo random number. The (pseudo) random number may be derived from at least one of: a per logical processor (pseudo) random number 89 that is different for each logical processor; each physical processor core is a different per physical processor (pseudo) random number 90; and a system (pseudo) random number 91, shared among all logical or physical processor cores in the data processing system, but which may be system-specific, to reduce the chance that a key interrupt on one system may be reused by another system (or any one or more of these). Each of these elements 89, 90, 91 of the random number may be updated each time the data processing apparatus 2 is started up.

One or more software-writable register values 87 that can be written to the register 14 under software control to provide further entropy for encoding operations;

container ID 86 (an identifier that distinguishes different parts of the execution environment at the lowest privilege level with the most strict access privileges).

Of course, not all of these parameters need to be considered for a particular implementation. In general, by generating keys for encoding operations (and, if necessary, reverse encoding operations) based on one or more identifiers 80-85, 88 associated with the current execution environment and optionally based on further parameters such as random numbers of software-defined values, it is unlikely that two different execution environments with different privilege levels will have the same key, and thus it is difficult for an attacker to train a branch predictor in one execution environment to entice execution environments with higher data access privileges to branch to malicious code that may cause secure data exposure. Furthermore, by adding a (pseudo) random value in the key generation process, it is more difficult for an attacker to determine what the key is. In particular, the use of random values means that even if an attacker has full access to one system, any information obtained by reverse engineering cannot be used to predict the key used on the other system, since the (pseudo) random values will be different. Similarly, it may be desirable to generate a different key for each execution environment on each logical processor core. For complexity and performance reasons, it may not be desirable to have a separate (pseudo) random number generator for each logical processor. In this case, the same result may be achieved by using each processor or system level (pseudo) random number with logical and/or physical core numbers in the key generation process. In some embodiments, the key generation process may include hashing together various key inputs. The hashing algorithm used may be a (secure) one-way hash.

That is, in some examples, the key may be based on a one-way transformation applied to at least one key input parameter, where the at least one key input parameter includes at least one value associated with the current execution environment (e.g., ASID, VMID, or exception level discussed above), but may also include other inputs, such as a random number of software-writable values. Generating keys using one-way conversion means that even if an attacker can observe all but one input, and they know the algorithm, and can observe some of the generated keys, they cannot calculate what the missing input is (e.g. a random number), which in turn means that they cannot predict the keys of different execution environments.

Fig. 6 shows an example in which a monitoring circuit 95 is provided, the monitoring circuit 95 being for monitoring the rate of instruction fetch faults encountered by the fetch stage 6 and/or instruction decode faults encountered by the decode stage 10. The extraction failure rate and the decoding failure rate may be monitored individually or as a combined rate. Furthermore, in some examples, only one of these types of failures may be monitored (e.g., a single zone failure or a single decode failure). Although fetch and decode failures may occur for many reasons, one of them may be a branch prediction miss of the branch predictor 4. If an attacker attempts to launch an attack using the branch predictor (as described above), there may be more frequent branch prediction misses. Thus, the rate of instruction fetch and decode failures can be used as an indicator that can provide a cue that an attack is being installed. The monitor circuit 95 may trigger an error response (e.g., raise an interrupt or exception) if the rate of instruction fetch or decode failures within a given time period is detected to increase by a threshold. FIG. 7 shows a graph of instruction fetch failure rate over successive time periods of the tracking duration T. As shown in fig. 7, a fault handling response may be triggered if the increase in the instruction fetch fault rate is greater than a threshold (e.g., 20%) from one time period to the next. Those skilled in the art will appreciate that there are a variety of ways to detect an attack using a failure rate (e.g., compared to a predetermined threshold, or compared to a failure rate of a previously executed program) within the scope of the present invention. The choice of software how to respond to such anomalies may vary, but this provides a means of signaling to the software that an attack is in progress.

Fig. 8 schematically illustrates another example of a branch predictor 4, the branch predictor 4 including a Branch Direction Predictor (BDP)140 for predicting whether a branch instruction is taken, a Branch Target Buffer (BTB)142 for predicting which branch instruction will be redirected to execute to a target address in the taken event, and a fetch queue 144 for queuing a fetch address identifying a block of program instructions to be fetched from the cache 8 (note that in some cases, the fetch queue may be considered part of the fetch stage 6 rather than part of the branch predictor 4, but functionally identical). The addresses placed in fetch queue 144 represent addresses of blocks of instructions to be fetched from instruction cache 8, which are derived from previous predictions by the branch predictor. The unit of instructions fetched from cache 8 in a block may be referred to as a "fetch block" and may have a particular default size, e.g. 16, 32 or 64 bytes, although in some cases the start address at which the fetch is performed is not aligned with a natural fetch block boundary, the fetch of a partial fetch block may be performed at less than the default size.

The BDP140 (which may also be referred to as a branch history buffer or branch result predictor) records branch history information used to predict the taken/not-taken results of branch instructions. Any known adopted/not-adopted prediction scheme may be used for the BDP 140. For example, gshare or TAGE is an example of a known branch direction prediction algorithm.

BTB142 maintains prediction information (and possibly other information such as context identifiers or other identifiers of the current execution environment) for a plurality of branch instructions identified by tags corresponding to a portion of the instruction address of the branch instruction. The prediction information may indicate the target address of the branch, as well as other information such as the instruction address of the corresponding branch instruction (program counter or PC), some attribute related to the branch (e.g., whether it is indirect, unconditional, function call, function return, etc.), or other information used to predict the outcome of the branch as described below.

The branch predictor 4 also includes a multi-target branch target predictor 146, which multi-target branch target predictor 146 is a special branch target buffer for predicting target addresses of branch instructions having polymorphisms, i.e., their branch target addresses vary over time, so that different instances of executing a branch instruction at the same instruction fetch address may result in different target addresses, depending on the outcome of executing the first few instructions of the branch. Both BTB142 and multi-target indirect branch target predictor 146 are examples of branch target prediction structures. In addition, the branch predictor 4 includes a region table 148, and the region table 148 is used to compress the context identifier into a short region identifier to be used as flag information of the BTB 142.

When fetch stage 6 begins a new fetch for a given fetch address to fetch a block of instructions from instruction cache 8, branch predictor 4 also looks up the fetch address in BDP140, BTB142, and multi-target branch target predictor 146. When the BTB142 holds predicted branch information for a branch target address that includes the program counter address represented by the current fetch address, this information is read from the BTB142 and defined by the BDP 140. The lookup may be controlled by branch prediction control logic 150. When the BDP140 does not employ branch prediction, the branch prediction control logic 150 selects an incremented version of the current fetch address, incremented by a step amount by adder 152, as the next fetch address so that the next fetch address used in the next cycle will proceed sequentially from the current fetch address. On the other hand, if BDP140 predicts the branch taken, branch prediction control logic 150 selects the predicted branch target output by BTB142 as the next fetch address. For some instruction fetch addresses, BTB142 may output an attribute indicating that the address refers to a block of instructions containing a branch that was previously detected as polymorphic, in which case the polymorphic branch attribute controls branch prediction control logic 150 to select the branch target address output by multi-target branch target predictor 146 as the next fetch address instead of the output of BTB 142. Rather than being based on predictions made by BTB142 independent of branch history, the multi-target branch target predictor bases its predicted target address on the branch history of earlier branches prior to the current execution point identified by the current instruction fetch address.

The next fetch address output by branch prediction control logic 150 is allocated to fetch queue 144 so that when the address reaches the front of the queue, the corresponding instruction block is fetched from instruction cache 8 by fetch stage 6. In the next processing cycle, the next fetch address is entered as the current fetch address to trigger another lookup of the branch predictor 4 for that fetch address. This process continues loop by loop to step through the executing program code. If a branch prediction miss is detected at the branch unit 21, when the actual outcome of the branch instruction differs from the predicted outcome generated by the branch predictor 4, a signal is sent back to the fetch stage 6 to reset the fetch queue and resume fetching from the actual branch target address, and the contents of the various prediction structures 140, 142, 146 are updated based on the actual outcome of the branch to increase the likelihood that future predictions will be correct.

In summary, BTB142 (also known as a branch target address cache or BTAC) is a component of branch predictor 4 that is used to identify the predicted target address of a branch. Thus, a BTB may effectively be a small cache of entries, each providing a branch location (a program counter for a branch instruction or a branch instruction address), a predicted target address for the branch, and possibly other trends, such as specifying whether the branch is conditional and whether attributes such as a function call or function return are represented. Since different execution contexts (e.g., different processes executed by the processor 2 or different virtual machines) may use the same virtual address to reference different branches, each BTB entry may be tagged with context information (e.g., a process identifier and/or a virtual machine identifier) in order to avoid unnecessary address conflicts.

Fig. 9 illustrates an implementation of BTB in an embodiment that does not use region table 148. In this example, BTB142 includes a plurality of entries 156, each entry 156 including a tag portion 158 and a data portion 160, tag portion 158 providing tag information used to identify whether the entry is associated with a current fetch address in a BTB lookup, and data portion 160 providing predicted branch information including a branch target address 166 and any other information 168 associated with the current corresponding branch. In this example, the tag portion 158 specifies one or more execution environment identifiers 162 as tag information, the execution environment identifiers 162 identifying the execution environment (context) in which the corresponding branch is executed and the instruction fetch address 164 corresponding to the branch (the program counter for the instruction block contains the branch). The data portion includes a branch target address 166 and other information 168, such as attributes specifying whether the branch is conditional, a function core, a function return, and the like.

In some implementations, data portion 160 may also include the least significant bits of branch address 164 (although not shown in FIG. 9). This may be useful in a superscalar processor, where multiple instructions are executed per cycle, in which case the branch predictor 4 may need to predict multiple instruction blocks in parallel, so that each entry may map to one instruction block. The least significant bits of the instruction fetch address may be excluded from the tag portion 158 to allow any instruction within the block to match an entry. However, even if the instruction fetch address of the block represented by a given entry is input, if the instruction fetch address is after the address of the last branch that occurs in the block associated with that entry, then no branch will occur after that instruction fetch address, and therefore, the branch prediction associated with the branch should not be performed. Thus, by including the least significant bits of the branch address in the data portion 160 of branch target address entry 156, this may enable a determination to be made as to whether an action should be taken on the prediction represented by the matching entry given the current instruction fetch address looked up in BTB 142.

The cache provided for the BTB may be implemented in different ways. In some examples, the caches may be fully associative such that the branch information for a given branch may be placed anywhere in the BTB. However, in practice, set associative cache implementations may be more efficient, in which case the location allowing storing of branch information for a given branch may be limited to a particular set (identified based on the instruction fetch address of the instruction block containing the branch) to reduce the number of entries that must be looked up for a given fetch address during a branch target buffer lookup.

To be able to store enough branch information to provide sufficiently high performance, a BTB may typically have a relatively large number of entries, e.g., on the order of thousands. However, in practice, the number of contexts that have information stored in the BTB at a given time may be much smaller, e.g., up to 10, because one execution context may cache branch information for many instructions of that context. Furthermore, many branch instructions associated with a given context may all share the same value for the more significant portion of the branch instruction address 164. This means that tag information 158, if implemented in the manner shown in FIG. 3, may include a large amount of redundant information, as a complete context that explicitly indicates an identifier and a complete branch instruction address may require a relatively large number of bits of tag 158, which increases the circuit area required by BTB142, as well as increasing the number of comparators required to make tag comparisons during lookups by BTB 142.

As shown in FIG. 10, to improve the area efficiency of BTBs, the branch predictor 4 may compress this redundant information into shorter tag values using the region table 148. In the example of fig. 10, each entry 156 of the BTB again has a tag portion 158 and a data portion 160, and the data portion 160 is the same as fig. 9. However, rather than representing the execution context identifier 162 and the full branch instruction address tag 164 in tag 158 within BTB 152, the tag portion merely specifies the lower portion 169 and the region identifier 171 of the tag portion of the branch instruction address, with region identifier 171 pointing to a corresponding region entry 170 in the region table that specifies the context identifier 162 and the upper portion 167 of the branch instruction address. Note that the region identifier is not explicitly indicated in each region entry 170 in fig. 10, but is implicitly indicated from the index of the corresponding region entry 170 (e.g., the first entry in the region table may be associated with region identifier #0, the next entry associated with region identifier #1, and so on). The region table 148 may be much smaller than the BTB142, e.g., having 10 to 100 entries, e.g., 32 in the example of fig. 10. In looking up BTB142, one or more context identifiers identifying the current execution context and the high order bits (e.g., bits 48 to 21) of the program counter are looked up in a region table to identify the corresponding region identifier, and then the region identifier and the low order bits (e.g., bits 20 to 16) of the marker bits of the program counter are used as marker information for looking up BTB 142. Note that the least significant bits of the program counter (e.g., 15 to 0) are not used in the tag at all, as these are the bits used to index into the corresponding set of set associative BTBs 142. Thus, the zone table 148 allows the size of the tag portion 158 of the BTB entry 156 to be reduced. This takes advantage of the fact that, in order to find BTB142, the current execution context need not be distinguished from every other execution context, but rather from any other execution context that currently has branch information in BTB 142.

However, as shown in FIG. 11, when BTB142 is implemented using region table 148, this may result in some false hits where branch predictor 4 determines that a branch from one execution context matches branch information in BTB entries allocated by a different execution context. This can be seen by comparing fig. 10 and 11, which show the contents of the BTB in one example before and after updating the region table 148. In FIG. 10, BTB currently includes branch information (represented by entries 156-1, 156-2, 156-3, respectively) for three branches located at addr1, addr2, addr3, respectively. The branch represented by entries 156-1 and 156-3 is associated with process 1 represented by region table entry # a, while the branch represented by BTB entry 156-2 is associated with process 2 represented by region table entry # B.

As shown in FIG. 11, the third process then executes the branch instruction and needs to allocate information to BTB142, but process 3 does not currently have any entry allocated to it in region table 148. If all zone table entries are already occupied, this requires that zone entry 170 be evicted and reassigned to a new process, so for example, zone entry 170-1 previously assigned to process 1 may be updated so that it now provides the context identifier and upper address associated with process 3. Branch target buffer entry 156-1 may also be updated to replace previous branch 1 with the new branch associated with process 3. However, another BTB entry 156-3 specifying the region identifier A of the updated region table entry may not be invalid at this stage and may continue to provide information associated with Branch 3 previously associated with Process 1. When there is an update to the region table, failure logic to traverse the BTB142 to eliminate such stale branches is typically not provided because it would require additional circuitry and such incorrect branch information could be detected in any event if predicted based on that information, because at the execution stage 18, if branch prediction data from a false execution scenario were used, the actual branch outcome of the branch would be different from the prediction and invalidation of the BTB entry involved could be triggered. Although this may result in a loss of performance, in practice, a misprediction may occur for other reasons unrelated to the reuse of the region table identifier, and is therefore not a big problem, as the same misprediction resolution logic in the processing pipeline 2 may be reused. Thus, in typical BTB implementations using region table 148, entries having region identifiers that map to updated region table entries may be allowed to remain consistent with stale branch information allocated by different execution contexts, as illustrated by example 156-3 shown in FIG. 11.

Thus, when a region table entry is updated, the old entry of BTB142 may encounter a subsequent branch instruction of the new process assigned to the updated region table entry, and this may result in a false hit to the previously assigned branch information from a different execution context. Previously, this would only be considered a performance issue, not a security issue. However, it has been recognized that an attacker may use this property of the BTB to control the speculative execution of another execution scenario that is not under the control of the attacker to expose information about confidential data managed by the process. This is possible if the following two conditions are met:

condition 1: process a may use a target address provided in BTB142 by another process B.

Condition 2: process B may control the target of the BTB entry that process a accesses.

In the above example of fig. 11, since the area identifier used as the tag information in the BTB is reused between different contexts (condition 1), the process 3 can use the target address provided by the process 1. In addition, process 1 may control the target of the BTB entry accessed by process 3 because it may execute a branch using a desired target address that shares tag bits [20:16] with the branch in process 3 to allocate the desired target address to the entry of BTB142 that will be hit by process 3. More generally, these conditions may occur in a branch target prediction structure, where the value of the tag information 158 may be reused between multiple execution scenarios. The region table is one of the reasons that this may occur, but other reasons may be simply an identifier that the tag information does not depend on the current execution context (current execution environment).

The two conditions described above can be used as the basis for the following attack. First, an attacker controlling process 1 may execute a branch instruction, causing a BTB entry to be allocated, specifying a branch target address that maps to some malicious instruction sequence designed to be executed by victim process 3, in order to control victim process 3 to perform certain operations that may expose confidential data to the attacker. After updating the region table and reassigning the region identifier previously used by attacker process 1 to victim process 3, victim process 3 then executes instructions from an address that matches the tag data in the stale BTB entry assigned by attacker process 1, thus making branch predictions based on the information provided by the attacker. This results in the speculative execution of a series of instructions from the attacker-supplied branch target address, i.e. the attacker-supplied special instructions, which are used to entice the victim process 3 to expose confidential data by leaving a footprint in the non-architectural state (e.g. the data cache). For example, the instructions may include memory access instructions that use confidential information that an attacker wishes to access to compute their target memory address. Thus, the data loaded into the caches 30, 32 by the memory access instruction may depend on the secret information. Even if a branch prediction miss is eventually identified and thus the architectural state in the registers 14 of the processor pipeline 2 is reversed back to the point before the miss was predicted by the victim process 3 and the sequence of instructions was speculatively executed, data loaded from memory by the wrong speculatively executed instruction may still be present in the caches 30, 32. Thus, when performing a switch back to attacker process 1, the attacker may attempt to access every possible address that may result from computing the target address based on the confidential data. When performing such memory accesses, attacker process 1 may measure performance data, such as the execution time of load instructions or the number of cache misses computed, and with such side channel information about performance, the attacker may determine whether victim process 3 places data from a given address in the cache, which may be used to infer the nature of confidential data that victim process 3 may access but attacker process 1 may not access.

For such an attack to succeed, the two conditions described above need to be met. FIG. 12 illustrates a technique to break the second condition, namely to prevent such attacks by making it difficult for an attacker to control the branch target address, which the victim context will use when a false positive hit occurs. The branch target prediction structures 142, 146 are provided with: encryption circuitry 174 for encrypting branch information to be written to the branch target prediction structure based on an encryption key associated with the current execution context; and decryption circuitry 176 to decrypt branch information read from the branch target prediction structure based on the encryption key associated with the current execution context. Key generation circuitry 179 (e.g., a linear feedback shift register or other random number generator) may generate keys for each context from time to time. Branch target prediction circuitry 178 (which may correspond to branch prediction control logic 150 of fig. 8 and any cache access circuitry associated with branch target prediction structures 142, 146 for generating a target tag value and looking up branch target entries to identify branch information for a given instruction fetch address) may generate a target tag from the instruction fetch address (e.g., using region table 148) and control the branch target prediction structure to output encrypted branch information if there is a hit in the branch target prediction structure. If there is a miss and a branch is subsequently executed by the execution stage 18, the actual branch information for that branch is encrypted by the encryption circuit 174 and written to the branch target prediction structure in conjunction with tag information identifying the branch, under control of the branch target prediction circuit. The encryption and decryption circuits 174, 176 are shown as separate circuits in fig. 12, but may be the same circuit (e.g., XOR circuit).

As described above, an alternative to encrypting branch information is to encrypt a portion of the target mark, in which case the decryption circuit 176 may not be needed.

FIG. 13 shows an example of the contents of the region table 148 and the branch target buffer 142 when encryption/decryption is applied, in this example, each region table entry 170 stores an encryption key 180 associated with a corresponding execution context represented by that region table entry 170. each time the corresponding region table entry 170 is updated, the key 180 may be generated as a random number or pseudo-random number by a linear feedback shift register (L FSR) or other random number generator 179 so that the key is different for a different execution context reusing the same region table entry. when a lookup for BTB142 misses, the fetch will continue sequentially beyond the current fetch address, but if a branch from the fetched instruction block is executed at the execution stage 18, the actual branch information is determined and may be assigned to a new branch entry 156 of BTB 142. when a new entry of BTB is assigned, the branch information is encrypted using the corresponding encryption key 180 stored in the region table 148 for the current execution context. optionally, the tag information 158 may also be used, but the tagged encryption key 160 is not necessarily encrypted, and thus, only some of the branch information may be encrypted, if it is more secure, and thus, the branch information is not necessarily encrypted.

On a branch prediction lookup that generates a hit in BTB142 (such that tag information 158 matches the target tag generated for the current instruction fetch address), rather than simply outputting branch information 160 directly from BTB142, the branch information is first decrypted using a corresponding encryption key 180 associated with the current execution context defined by region table 148, and the decrypted branch information is then used by branch prediction logic 150 to derive the next fetch address. Encryption and decryption to protect the BTB context may be implemented using key encryption or using public key encryption. Since branch target address prediction may be on a critical timing path of the processing pipeline, a relatively lightweight encryption method (e.g., XOR of branch information using key 180) may be used instead of, for example, more complex rounds of encryption.

As shown in fig. 14, this technique may prevent the second case of the above-described attack because by encrypting branch information using a key associated with the execution context that allocated the information, then if the same BTB entry 156-3 is accessed from the new execution context and then decrypted using the key of the new execution context, this will generate spam instead of the target address allocated by the original context, and thus it is difficult for an attacker to push a malicious target onto the BTB and control a different execution context that executes malicious code from a known location. For example, in FIG. 14, an attacker controlling process 1 has assigned an entry 156-3, and then region entry 170-1 is updated to point to process 3, as shown in FIG. 11. Upon updating the region table, a new key 180 is generated for the region table entry 170-1, so if there is a subsequent hit in the BTB entry 156-3 during execution from the new execution scenario 3, decryption of the branch information using the new key will not result in the same branch target address as originally provided by process 1. The old information in the branch target information is encrypted by the outdated key (key 1 is no longer available in the area table) while area table entry # a has the new key. To create the above-described attack type to control the victim process 3 to jump to a malicious target address T, the attacker needs to provide a different target address T ' so that Dec (Enc (T ', key1), New key) is T, and if enough bits are provided for the encryption key, the attacker cannot possibly predict the value of T ' required to force the victim process 3 to branch to T without knowing the old and New keys.

Although fig. 13 and 14 show examples in which keys associated with the respective execution contexts are cached in the region table 148, this is not necessary and the keys associated with each execution context may also be stored by a separate storage structure.

In the example of fig. 13 and 14, the key is a one-time-use dynamic key, as it is updated each time a region table replacement occurs. This makes the attack more difficult because the attacker needs to identify the key and use it to read the confidential information before the key usage period ends, reducing the probability that the attacker will be able to obtain information about the key by observing the series of branch predictions of the BTB for the available time before the key is updated. However, in other implementations, a static key for each execution context may be used instead of the dynamic key assigned to the region table entry, so that the same key is used for a given execution context throughout the usage period of the context. In this case, an efficient way to generate a key for each context to reduce storage requirements may be to provide a common key that is shared between all execution contexts, but derive a context-specific key for each context from the common key by hashing the common key with a context identifier associated with the context.

Fig. 15 and 16 show examples of using the multi-target indirect branch predictor 146. For multi-state branches where the target address changes from time to time during the program, the BTB142 providing a fixed target address for each branch program counter address may not provide reliable prediction, and thus a separate prediction structure 146 may be provided. For such polymorphic branches, branch information in BTB142 may be updated to provide an indicator that the predicted target address of the BTB should not be used in the future, whereas multi-target indirect branch predictor 146 should be used to predict the branch target address of such branches. The multi-target indirect branch predictor includes a plurality of branch target entries 190, each entry including information indicating a target address predicted for a branch and flag information 194 based on a history of results of previous branches (e.g., a bit sequence indicating whether a previous branch was not taken or taken). Thus, multiple entries may be allocated in the multi-target indirect branch predictor 146 for the same branch instruction, with the entries corresponding to different branch histories leading to the branch. The tag information 194 also depends on the instruction fetch address of the instruction block containing the branch. Unlike BTB142, the tag information does not include the context identifier or the high order bits of the program counter address of the corresponding execution context. Since the tag associated with each entry in the multi-target indirect branch predictor 46 is independent of the current execution context, this again means that tag values may be reused across multiple execution contexts. If the attacker controlling process 2 knows the access pattern (branch history) for accessing a particular indirect branch in victim process 1, the attacker can exploit a security problem similar to the one described above by pushing a malicious target address marked by the known branch history pattern (e.g. marker a in fig. 14), which address will be accessed by process 1. The assumption here is that an attacker can deliberately predict marker conflicts in the multi-target indirect branch predictor 146.

To prevent this vulnerability from being exploited, the encryption key 180 of the region table may also be used to encrypt the contents of the multi-target indirect branch predictor 146, as shown in FIG. 16. Thus, when branch information is assigned to an entry 190 of the multi-target indirect branch predictor 146, the encryption key 180 for the current execution context is read from the region table, and the encryption key 180 is used to encrypt the branch information 192 and optionally the tag 194 of the corresponding entry 190. In looking up the multi-target indirect branch predictor 146, the encryption key 180 for the current execution context is used to decrypt the branch information 192, so if the current execution context encounters an entry allocated by a previous execution context, decryption using a different key than that used to encrypt the data will result in the garbage data being output, different from the address originally provided by the previous execution context.

Thus, even in implementations that do not use region table 148, if the branch predictor uses multi-target indirect branch predictor 146 or another prediction structure that uses markers independent of the current execution context, encrypting the contents of the branch predictor may help avoid an attacker using false positive hits in the branch prediction structure to control the victim process to execute malicious code (intended to provide visibility of confidential data).

FIG. 17 is a flow diagram illustrating a method of performing branch target prediction lookups and updating branch information in the branch target prediction structures 142, 146. It should be understood that not all features of branch prediction that may be performed are shown in fig. 17 (e.g., not all branch direction prediction of the BDP140 is shown, as are steps taken to stop executing speculatively executed instructions and to rewind processor state in the event of a misprediction, which steps may be performed as any known branch prediction technique).

At step 200, a target tag is obtained for an instruction fetch address for which a lookup of the branch predictor is to be performed. The particular nature of the target tag will depend on the manner in which the branch predictor structure is implemented. For example, for a BTB implemented using region table 148 as described above, the target tag may be determined by the region table based on the context identifier of the current execution context and a portion of the instruction fetch address. This method will be discussed in more detail below with reference to fig. 18. Alternatively, for the multi-target indirect branch predictor 146, the target marking is based on the instruction fetch address and history of previously taken/not taken results. Other methods of generating the target mark may also be used.

At step 202, the branch prediction control logic 150 controls the branch target prediction structures 142, 146 to look up a subset of the branch target entries 156, 190. For example, a subset of entries may be selected based on instruction address, or in a fully associative cache implementation, the subset of entries may include all entries of the branch target prediction structure. The branch prediction circuit determines whether any of the selected sets of entries specifies tag information corresponding to the target tag obtained at step 200 for the given branch instruction. If none of the subset of branch target entries specifies tag information corresponding to the target tag, a miss is looked up in the branch target prediction structure and the branch prediction control logic 150 outputs an incremented version of the current fetch address from the adder 152 as the next fetch address. Once the corresponding block of instructions is decoded, the decode stage 10 determines in step 203 whether any of the instructions in the block are branch instructions. If not, the method ends because no branch information needs to be updated. If the fetched/decoded instruction block includes a branch instruction, then at step 204, a victim entry is selected from the lookup subset of branch target entries. For example, if a subset of branch target entries are currently invalid, then the invalid entry may be selected as the victim entry. If all subsets of the branch target entry are currently valid, one of the valid entries is evicted to make way for the new branch information. Any eviction policy may be used to select the victim entry (e.g., polled or least recently used).

At step 206, once the actual branch information has been resolved by the execution stage 18 for a given branch instruction, the actual branch information is encrypted using the encryption key associated with the current execution context. The branch information may include information for deriving or specifying a branch target address, and may also include other information about the branch as described above. The encryption key may be read from the zone table 148 or from a separate storage device. In some cases, the target mark may also be encrypted. At step 208, the encrypted branch information and (optionally encrypted) tag information determined based on the target tag is written to the victim entry selected at step 204.

If there is a hit in the branch target prediction structures 142, 146 at step 202 and one of the lookup subsets of branch target entries specifies tag information corresponding to the target tag, then at step 210, the encryption key associated with the current execution context is used to decrypt the branch information stored in the matching entry. At step 212, the decrypted branch information is output as a prediction for the given branch instruction. The branch target address derived from the decrypted branch information is allocated to fetch queue 144 to control subsequent fetching of instructions, and other prediction properties of the branch instruction specified by the decrypted branch information may control other aspects of the processing pipeline. Once the branch reaches the execute stage, the actual branch information is determined for the branch instruction, and it may be determined whether the prediction was correct. If there is a misprediction, the instructions following the branch instruction may be flushed from the pipeline and a signal sent to the fetch stage 6 to resume fetching instructions from the correct target address of the branch (if the branch has been taken) or from a sequential address following the instruction address of the branch (if the branch should not be taken). In addition, branch predictor 4 may also be updated to correct branch information stored in branch target prediction structures 142, 146 based on actual branch outcome, so that subsequent predictions for the same instruction fetch address the next time are more likely to be correct.

FIG. 18 is a flowchart illustrating in more detail the method of obtaining a target tag at step 200 of FIG. 17, where the branch target prediction structure is BTB142 that compresses the tagged portion of each entry using region table 148. At step 220, the region table 148 is looked up based on branch context information associated with the current instruction fetch address. For example, branch context information may include one or more context identifiers that identify an execution context of an execution branch, and may also include upper branch instruction address bits of an instruction fetch address. At step 222, the branch prediction control circuitry determines whether the region table includes a matching region table entry 190, wherein the branch context information stored in the entry 190 matches the branch context information provided for the given branch instruction. If so, then there is a hit in the region table and at step 224, the target tag is determined to include the region ID associated with the matching region table entry and one or more lower bits of the tag portion of the instruction address (e.g., bits 20:16 in the example of FIG. 13). Also at step 226, an encryption key corresponding to the current execution context is returned from the region table by reading the key from the matching region table entry.

If at step 222, a miss area table is looked up such that there are no matching area table entries having branch context information matching the branch context information provided for the current instruction fetch address, then at step 230, a victim area table entry is selected, e.g., an invalid area table entry that has not yet been mapped to a particular context, or a valid area table entry to be evicted if there is no invalid entry.at step 230, a victim area table entry may be selected using an eviction policy such as L RU or polling. at step 232, encryption key 180 stored in the victim area table entry is updated.

In summary, branch information in the branch target prediction structure is encrypted based on an encryption key associated with the execution context, which results in the assignment of branch information to the branch target prediction structure. In looking up the branch target prediction structure, when hit, the branch information from the matching entry is decrypted using the encryption key associated with the current execution context. This is very useful for branch target prediction structures that use tag information (where values can be reused in multiple execution contexts), because encryption and decryption make it more difficult for an attacker to gain access to sensitive data from another execution context by exploiting a false positive hit in an entry in the branch target prediction structure from a different context to the context in which the entry is assigned.

It should be understood that the specific example shown in FIGS. 8-18 is only one way to implement a branch predictor. More generally, when generating a new branch prediction entry, a portion of the inputs to the branch predictor may be encoded based on values associated with the current execution context, and when querying the branch predictor for a given query input, the query input may be encoded using values associated with the execution environment that triggered the query, or the output of the branch predictor may be backward encoded or decoded using values associated with the most recent execution environment, or both, to make it more difficult for an attacker to guess the state in the branch predictor that must be trained to control the victim process, to control branches pointing to particular addresses using false hits in the branch predictor between different execution environments.

The following clauses list further example arrangements:

(1) a data processing apparatus comprising:

branch prediction circuitry adapted to store at least one branch prediction state entry associated with an instruction stream;

input circuitry to receive at least one input to generate a new branch prediction state entry, wherein the at least one input comprises a plurality of bits; and

encoding circuitry adapted to encode at least some of the plurality of bits based on a value associated with a current execution environment in which the instruction stream is being executed.

(2) The data processing apparatus according to item (1), wherein

The encoding circuit is adapted to encode at least some of the bits by using a key, wherein the key is based on a current execution permission for the instruction stream being executed.

(3) The data processing apparatus according to item (2), wherein

The encoding circuit is adapted to encode at least some of the plurality of bits by rearranging the at least some of the plurality of bits using the key.

(4) The data processing apparatus according to item (2), wherein

The encoding circuit is adapted to encode at least some of the plurality of bits by toggling at least some of the plurality of bits using a key.

(5) The data processing apparatus according to item (2), wherein

The encoding circuit is adapted to encode at least some of the plurality of bits by performing a hash function using the key.

(6) The data processing apparatus according to item (5), wherein

The hash function is invertible.

(7) The data processing apparatus according to item (2), wherein

The encoding circuit is adapted to encode at least some of the plurality of bits by performing an XOR operation using the key.

(8) The data processing apparatus according to item (5), wherein

The hash function is a one-way hash function.

(9) The data processing apparatus according to any one of items (2) to (8), wherein

The at least one input comprises an indication of an instruction address of a branch instruction;

the branch prediction circuit is adapted to receive a query value containing an indication of an instruction address of the branch instruction and to perform a search using the query value; and is

The encoding circuit is adapted to encode at least some bits of the query value using the key prior to the search.

(10) The data processing apparatus according to clause (9), wherein the encoding circuit is adapted to recalculate the value of the key associated with the current execution environment and to perform the encoding operation on at least some of the bits of the query value using the recalculated value of the key.

(11) The data processing apparatus according to any one of items (2) to (10), wherein

The at least one input comprises an indication of a target address of a branch instruction;

the branch prediction circuit is adapted to receive a query value containing an instruction address indication of the branch instruction and to perform a search using the query value; and

the apparatus includes a decoding circuit to perform decoding on an output of the output branch prediction circuit in response to receiving the query value.

(12) The data processing apparatus according to item (11), wherein

Decoding involves recalculating the value of the key and then performing the decoding function.

(13) The data processing apparatus according to any one of items (1) to (12), wherein

The key is further based on any combination of values indicative of: exception level, privilege level, ASID, VMID, NS, physical processor core number, and logical core number to execute an instruction stream, and one or more software-writable registers.

(14) The data processing apparatus according to any one of items (1) to (13), wherein

The key is further based on a previously generated random number.

(15) The data processing apparatus according to clause (14), wherein

The previously generated random numbers include at least one of:

an element of each logical processor;

an element of each physical processor; and

a system wide element.

(16) The data processing apparatus according to item (15), wherein

At least a portion of the previously generated random numbers are generated at startup.

(17) The data processing apparatus according to any one of items (14) to (16), wherein

At least a portion of the previously generated random numbers are pseudo-random.

(18) The data processing apparatus according to any of items (1) to (17), wherein the key comprises at least one value associated with a current execution environment or a current execution permission based on a one-way transformation applied to at least one key input parameter.

(19) The data processing apparatus according to any one of items (1) to (17), wherein

The instruction stream may be executed in one of a plurality of execution environments adapted to execute at a minimum execution permission;

the encoding circuit is adapted to encode at least some of the plurality of bits further based on an identifier of one of a plurality of execution environments executing the instruction stream.

(20) The data processing apparatus according to any one of items (1) to (19), comprising:

a monitor circuit adapted to detect a ratio of any combination of instruction fetch faults and instruction decode faults when the instruction stream is executing in a speculative state; and

an interrupt is caused or an error response is generated in response to the ratio increasing beyond a predetermined threshold.

(21) The data processing apparatus according to clause (20), wherein

The predetermined threshold is at least 20% higher than the previous ratio.

(22) A data processing apparatus comprising:

means for storing at least one branch prediction state entry associated with an instruction stream;

means for receiving at least one input to generate a new branch prediction state entry, wherein the at least one input comprises a plurality of bits; and

means for encoding at least some of the at least one input plurality of bits based on a value indicating current execution permission for the stream of instructions being executed.

(23) A method, comprising:

storing at least one branch prediction state entry associated with the instruction stream;

receiving at least one input to generate a new branch prediction state entry, wherein the at least one input comprises a plurality of bits; and

at least some of the plurality of bits are encoded based on a value indicating current execution permission for the stream of instructions being executed.

(24) An apparatus, comprising:

processing circuitry for performing data processing in one of a plurality of execution scenarios;

a branch target prediction structure comprising a plurality of branch target entries, each branch target entry specifying branch information indicative of at least one branch target address;

an encryption circuit to encrypt branch information to be written to the branch target prediction structure using an encryption key associated with a current execution context; and

decryption circuitry to decrypt branch information read from the branch target prediction structure using an encryption key associated with the current execution context.

(25) The apparatus according to (24), wherein each branch target entry specifies tag information; and is

The apparatus includes branch target prediction circuitry to perform a branch target prediction lookup on an instruction fetch address associated with a current execution context, the branch target prediction lookup including determining whether any of a subset of branch target entries of a branch target prediction structure specifies tag information corresponding to a target tag determined for the instruction fetch address.

(26) The apparatus of clause (25), wherein the value of the target marker is reusable in multiple of the multiple execution contexts.

(27) The apparatus of clauses (25) and (26), wherein, when none of the subset of branch target entries specifies tag information corresponding to a target tag, and the instruction fetch address specifies a block of at least one instruction containing the branch instruction, the encryption circuit is configured to encrypt actual branch information of the branch instruction using an encryption key associated with a current execution context, the branch target prediction circuit is configured to assign a branch target entry of a branch target prediction structure that specifies the encrypted branch information and specifies tag information corresponding to the target tag.

(28) The apparatus according to any one of (25) to (27), wherein when one of the subsets of branch target entries specifies tag information corresponding to a target tag, the decryption circuitry is configured to decrypt branch information stored in the subset of entries using an encryption key associated with a current execution context, the branch target prediction circuitry being configured to output the decrypted branch information as predicted branch information for an instruction fetch address.

(29) The apparatus according to any of (24) to (28), wherein the encryption key comprises a static key that is fixed for a current execution context.

(30) The apparatus of clause (29), wherein the static key of the current execution context depends on a common key shared between at least two contexts of the plurality of execution contexts and at least one context identifier specific to the current execution context.

(31) The apparatus according to any one of (24) to (30), wherein the encryption key comprises a dynamic key that is variable for a current execution context.

(32) The apparatus of clause (31), including key generation circuitry to generate an updated encryption key for the current execution context.

(33) The apparatus of clause (25), wherein, when none of the subset of branch target entries specifies tag information corresponding to a target tag, and the instruction fetch address specifies a block containing at least one instruction of the branch instruction, the encryption circuitry is configured to encrypt the target tag using an encryption key associated with the current execution context, the branch target prediction circuitry is configured to designate the encrypted target tag as the tag information of the assigned branch target entry; and is

In branch target prediction lookup, the decryption circuitry is configured to decrypt the tag information of each subset of branch target entries, and the branch target prediction circuitry is configured to compare the decrypted tag information with the target tag.

(34) The apparatus according to any one of (24) to (33), wherein the branch information further indicates at least one piece of branch information other than the branch target address.

(35) The apparatus of clause (25), wherein the branch target prediction circuit is configured to determine the target marker based on at least one context identifier associated with a current execution context.

(36) The apparatus of clause (25), including a region table including a plurality of region entries, each region entry mapping branch context information to a region identifier, the region identifier including fewer bits than the branch context information, the branch context information including at least one context identifier associated with a corresponding execution context.

(37) The apparatus of clause (36), wherein the target tag for the instruction fetch address comprises a target region identifier mapped by the region table to branch context information associated with the instruction fetch address.

(38) The apparatus of any of clauses (36) and (37), wherein each region entry specifies an encryption key associated with the corresponding execution context.

(39) The apparatus according to any of (36) to (38), wherein when the mapping provided by a given zone entry of the zone table is updated, the branch target prediction circuitry is configured to trigger an update of an encryption key associated with an execution context associated with the given zone entry after the mapping update.

(40) The apparatus of any of clauses (36) to (39), wherein the branch context information for the given region entry further comprises a portion of an instruction fetch address for which a previous branch target prediction lookup caused the given region entry to be assigned to the region table.

(41) The apparatus of clause (25), wherein the branch target prediction circuit is configured to determine the target tag based on the instruction fetch address and a history of branch results for previous branch instructions preceding the instruction at the instruction fetch address.

(42) An apparatus, comprising:

means for performing data processing in one of a plurality of execution scenarios;

means for storing branch target entries of a branch target prediction structure, each branch target entry specifying branch information indicating at least one branch target address;

means for encrypting branch information to be written to the means for storing using an encryption key associated with the current execution context; and

means for decrypting branch information read from the means for storing using an encryption key associated with the current execution context.

(43) A method, comprising:

performing data processing in one of a plurality of execution scenarios;

storing branch target entries of a branch target prediction structure, each branch target entry specifying branch information indicative of at least one branch target address;

encrypting branch information to be written to the branch target prediction structure using an encryption key associated with the current execution context; and

the branch information read from the branch target prediction structure is decrypted using an encryption key associated with the current execution context.

In this application, the word "configured to" is used to indicate that an element of an apparatus has a configuration capable of performing the defined operation. In this context, "configuration" refers to an interconnection arrangement or manner of hardware or software. For example, the apparatus may have dedicated hardware providing the defined operations, or a processor or other processing device may be programmed to perform the functions. "configured to" does not mean that the device elements need to be changed in any way to provide a defined operation.

Although illustrative embodiments of the present invention have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various changes and modifications can be effected therein by one skilled in the art without departing from the scope and spirit of the invention as defined by the appended claims.

49页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：界面显示方法及控制终端

Encoding of inputs to branch prediction circuits

相关技术

网友询问留言