Techniques for efficiently operating a processing system based on instruction-based energy characteristics and machine learning

文档序号:1464892 发布日期:2020-02-21 浏览:25次 中文

阅读说明:本技术 基于指令的能量特性和机器学习有效地操作处理系统的技术 (Techniques for efficiently operating a processing system based on instruction-based energy characteristics and machine learning ) 是由 S·伊德甘吉 M·西乌 顾俊 J·雷利 M·帕特尔 R·塞尔瓦桑 E·库巴勒萨卡 于 2019-08-09 设计创作,主要内容包括:本发明公开了一种基于指令的能量特性和机器学习有效地操作处理系统的技术。一种集成电路,例如图形处理单元(GPU),包括用于调整工作电压和/或工作频率的动态功率控制器。控制器可以接收集成电路使用的当前功率和基于多个处理器中待处理的指令所确定的预测功率。控制器确定需要对工作电压和/或工作频率进行的调整,以最小化当前功率和预测功率之间的差异。包括系统内强化学习机制以对控制器参数进行自调整。(Techniques for efficiently operating a processing system based on instruction-based energy characteristics and machine learning are disclosed. An integrated circuit, such as a Graphics Processing Unit (GPU), includes a dynamic power controller for adjusting an operating voltage and/or operating frequency. The controller may receive a current power used by the integrated circuit and a predicted power determined based on instructions to be processed in the plurality of processors. The controller determines that adjustments to the operating voltage and/or operating frequency are needed to minimize the difference between the current power and the predicted power. Including an in-system reinforcement learning mechanism to self-adjust the controller parameters.)

1. A GPU, comprising:

parallel processors, each parallel processor comprising an instruction decoder that decodes an instruction prior to execution and signals an expected power usage for execution of the instruction;

a power controller that adjusts power provided to the processor at least partially in response to the signaled expected power usage; and

a neural network that accommodates adjustments performed by the power controller,

wherein the neural network performs reinforcement learning to mitigate peak power scenarios.

2. The GPU of claim 1, wherein the instruction decoder comprises an instruction look ahead predecoder that predecodes instructions to be executed in the future and estimates how much power will be used to execute the instructions, and each of the parallel processors comprises an additional instruction decoder that decodes the instructions for execution.

3. The GPU of claim 1, wherein the power controller is configured to reduce power to avoid peak power overshoot.

4. The GPU of claim 1, wherein the neural network controls the parallel processors to modify their instruction execution to reduce peak power requirements.

5. A processing system, comprising:

a plurality of processors; and

a power management circuit configured to:

comparing the power used by the plurality of processors to the estimated predicted power to be used by the plurality of processors in the future;

generating one or more control signals to vary operating power of the plurality of processors in response to the comparison; and

performing reinforcement learning to accommodate the generation of the one or more control signals.

6. The processing system of claim 5, wherein the reinforcement learning uses player commentary-based machine learning or Q-learning machine learning.

7. The processing system of claim 5, wherein the power management circuit comprises a proportional-integral-derivative (PID) controller configured to receive a difference result of the comparison and generate the one or more signals to change an operating frequency and/or voltage of the processing system.

8. The processing system of claim 7, wherein the reinforcement learning modifies operational coefficients of the PID controller.

9. The processing system of claim 7, wherein the PID controller generates the one or more signals every clock cycle, and the reinforcement learning modifies operating coefficients of the PID controller at intervals exceeding clock cycles.

10. The processing system of claim 5, wherein:

each processor is configured to pre-decode instructions received by the respective processor and to generate a signal based on the pre-decoded instructions, the signal estimating a power consumption level of the instructions to be processed; and

the power management circuit is further configured to aggregate the signals from each processor indicative of the energy consumption level and predict power based on the aggregated signals.

11. The processing system of claim 5, wherein the power management circuitry is configured to adjust an operating frequency and an operating voltage of the processing system based on the one or more generated control signals.

12. The processing system of claim 5, wherein the power management circuitry is configured to control the one or more processors to delay execution of one or more pending instructions based on the generated one or more control signals.

13. The processing system of claim 5, wherein the power management circuit comprises an analog-to-digital converter (ADC) configured to receive an input signal representative of a voltage and/or current provided to the processing system and to generate a digital output signal representative of the power being used by the processing system based on the input signal.

14. The processing system of claim 5, wherein the plurality of processors and the power management circuit are disposed on a same substrate.

15. A method of dynamically controlling voltage and frequency settings of a plurality of processors configured to receive and execute instructions, the method comprising:

measuring a power currently used by the processor;

predicting a power of the processor by examining instructions to be executed by the processor;

determining an error between the current power and the predicted power;

generating one or more signals that change an operating frequency and/or an operating voltage to reduce the error; and

performing reinforcement learning to accommodate generation of the one or more signals.

16. The method of claim 15, wherein generating the one or more signals comprises: using a proportional-integral-derivative (PID) controller to generate the one or more signals that vary an operating frequency and/or an operating voltage provided to the one or more processors based on the error.

17. The method of claim 16, wherein the PID controller generates the one or more signals at intervals corresponding to clock cycles, and the reinforcement learning modifies coefficients of the PID controller at intervals exceeding the intervals corresponding to clock cycles.

18. The method of claim 16, wherein the reinforcement learning modifies coefficients of the PID controller.

19. The method of claim 15, further comprising:

generating for each processor a signal representative of a power consumption level of an instruction to be processed to be executed in a clock cycle; and

determining the predicted power by aggregating the signals representing power consumption levels of pending instructions in respective clock cycles.

20. A controller, comprising:

a power monitor configured to monitor voltage and/or current supplied to the plurality of processors;

circuitry configured to receive signals from the plurality of processors representative of estimated energy consumption for future execution of instructions and to predict power used by the processors to execute the instructions in the future based on an aggregation of the received signals; and

a self-learning and self-adjusting controller configured to control power provided to the processor in response to the monitored voltage and/or current and the predicted power.

21. The controller of claim 20, wherein the self-learning and self-adjusting controller comprises:

a proportional-integral-derivative (PID) controller configured to determine power to be provided to the processor based on a coefficient, an

At least one neural network configured to perform reinforcement machine learning to modify the coefficients.

22. The controller of claim 20, wherein the self-learning and self-adjusting controller is disposed on the same substrate as the processor.

23. The controller of claim 20, wherein the self-learning and self-adjusting controller is further configured to send a signal to the one or more processors to delay execution of at least one instruction based on the determined power to be provided to the processors.

24. The controller of claim 20, wherein the self-learning and self-adjusting controller comprises:

a proportional-integral-derivative (PID) controller configured to determine power to be provided to the processor based on a coefficient, an

At least one neural network configured to perform player-commentator augmented machine learning to modify the coefficients based on a difference between energy provided to the plurality of processors and estimated energy expenditure for future execution of instructions.

25. The controller of claim 20, wherein the self-learning and self-adjusting controller comprises:

a plurality of proportional-integral-derivative (PID) controllers, each PID controller generating a control signal based on a different metric affecting the operation of the plurality of processors; and

a plurality of neural networks, each neural network configured to perform reinforced machine learning to modify coefficients of a respective PID controller,

wherein power provided to the processor is controlled based on a sum of the outputs of the PID controllers.

Technical Field

This technology relates to integrated circuit power management, and more particularly to dynamic voltage and/or frequency adjustment in integrated circuits for power management. More specifically, the techniques relate to managing power based on sensing upcoming processor instructions and employing an enhanced learning mechanism within the system to predict/improve power control system parameters. The use case includes power management of a massively parallel processor of the GPU.

Background

Graphics Processors (GPUs) have become ubiquitous. Rather than just graphics, GPUs are now widely used in applications that benefit from intensive computing operations, including, for example, artificial intelligence, real-time pattern recognition, and unmanned vehicle control, among myriad other applications.

Many GPUs are massively parallel, meaning that they contain many computing elements (e.g., programmable streaming multiprocessors ("SMs"). this massively parallel architecture allows developers to break up complex computations into smaller parallel parts that will complete faster because they are executed simultaneously.

Digital circuits typically consume more power when running fast, as a dancer consumes more energy when dancing fast than when dancing slow. In most integrated circuits, the speed of operation is controlled by a clock circuit. The clock circuit sends "beats" (clock signals) to the various circuits on the chip. These circuits synchronize their operation with the "beat" of the metronome, just like a clock signal. The faster the beat, the faster the circuit runs and the more power is consumed. Also, the faster the beat, the more "moves" the circuit can perform in a given time.

The supply voltage also affects speed and power consumption. Transistors (e.g., MOSFETs) are used to build digital logic circuits, which are integral parts of modern high-speed processors. The transistor operates (switches) faster as the supply voltage rises. One way to consider this is: switching the MOSFET transistor includes charging and discharging a capacitor. Increasing the dc voltage that powers the MOSFET can cause the capacitance of the MOSFET to charge faster, thereby increasing the switching speed of the transistor. This is similar to using higher water pressure to fill the keg faster. The fastest speed at which the MOSFET can be clocked may therefore depend on the supply voltage.

However, operating a circuit at a higher voltage may cause the circuit to consume more power while increasing its potential operating speed. Generally, the more power a circuit consumes, the more heat is generated. While fans and heat sinks are typically used to remove excess heat, improper heating wastes power that could otherwise be used for calculations. For mobile applications such as unmanned vehicles, portable devices, etc., the wasted power may unnecessarily shorten battery life. Even for stationary applications, such as servers and desktop computers, wasting energy can be environmentally unfriendly and increase operating costs.

While the most straightforward solution is to reduce the operating speed of the circuit to save power, this will adversely affect the number of operations per second that the circuit can perform, thereby affecting the speed at which it can perform complex calculations. In many demanding applications, including but not limited to unmanned vehicles, user interactive servers, and other computing devices, high speed performance is important for safety critical calculations and other calculations that must be performed in real time or near real time.

Thus, there is a tradeoff between operating speed and power consumption. To increase the number of operations per second, the rate (frequency) of the clock signal can be increased (if needed to support higher clock rates, the supply voltage can also be increased), at the cost of additional power consumption. To save power, the clock signal rate (and the supply voltage, if needed) can be reduced at the expense of reduced processing speed.

To manage this tradeoff, some GPUs and other processors provide dynamic control of clock rate (and in some cases, power supply voltage to support higher clock rates) depending on computational load, allowing circuits to "slow jump" when processing demands are less, and allowing them to "fast jump" when more computational speed is needed. Such dynamic control may reduce overall power consumption and corresponding heat output while maximizing speed performance.

While much work has been done in the past, there is a need for further improved solutions to provide an adaptive but adjustable system that can handle sudden load steps and load releases without impacting overall performance.

31页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:AR智能点菜系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!