System-on-chip and integrated circuit with multi-core out-of-phase processing

文档序号:105100 发布日期:2021-10-15 浏览:18次 中文

阅读说明:本技术 多核异相处理的系统级芯片和集成电路 (System-on-chip and integrated circuit with multi-core out-of-phase processing ) 是由 不公告发明人 于 2021-07-30 设计创作,主要内容包括:本发明提供一种多核异相处理的系统级芯片和集成电路,其中系统级芯片包括:用于输出时钟信号的时钟模块;至少两个支持异步处理的内核;用于将所述时钟模块输出的时钟信号以不同相位传输至各内核的延迟连接模块。本发明通过延迟连接模块将时钟模块输出的时钟信号以不同相位传输至至少两个支持异步处理的内核,以使得不同内核的时钟信号相位不同,从而错开不同内核根据时钟信号进行任务处理的时机,将原本同时触发的电涌噪声分散至时钟信号周期的不同位置,进而有效减小了SOC,尤其是超大功率SOC的电涌峰值和噪声峰值。(The invention provides a system-on-chip and an integrated circuit for multi-core out-phase processing, wherein the system-on-chip comprises: a clock module for outputting a clock signal; at least two cores supporting asynchronous processing; and the delay connection module is used for transmitting the clock signals output by the clock module to the cores in different phases. According to the invention, the clock signals output by the clock module are transmitted to at least two kernels supporting asynchronous processing in different phases through the delay connection module, so that the phases of the clock signals of different kernels are different, the time for task processing of different kernels according to the clock signals is staggered, and the original surge noises triggered simultaneously are dispersed to different positions of the clock signal period, thereby effectively reducing the SOC, especially the surge peak value and the noise peak value of the super-power SOC.)

1. A system-on-chip for multi-core out-of-phase processing, comprising:

a clock module for outputting a clock signal;

at least two cores supporting asynchronous processing;

and the delay connection module is used for transmitting the clock signals output by the clock module to the cores in different phases.

2. The system-on-chip with multi-core out-of-phase processing according to claim 1, wherein the delay connection module comprises at least two connection lines; one end of the connecting wire is connected with the output end of the clock module, and the other end of the connecting wire is connected with the clock input end of the kernel; the transmission path lengths of the connection lines connecting different cores are different.

3. The system-on-chip for multi-core out-phasing processing according to claim 1, wherein the delay connection module comprises a delay-locked loop; the delay phase-locked loop is used for receiving the clock signal output by the clock module and outputting at least two split-phase clock signals to different kernels.

4. The system-on-chip with multi-core out-of-phase processing according to claim 1, wherein the delay connection module comprises at least two adjustable delay lines; one end of the adjustable delay line is connected with the output end of the clock module, and the other end of the adjustable delay line is connected with the clock input end of the kernel; the adjustable delay lines connecting different cores are set to different delay phases.

5. The system-on-chip for multi-core out-phasing processing according to claim 1, wherein the delay connection module comprises a first connection line and a second connection line; one end of the first connecting wire is connected with the output end of the clock module, and the other end of the first connecting wire is connected with the clock input end of the first core; and one end of the second connecting wire is connected with the output end of the clock module, and the other end of the second connecting wire is connected with the clock input end of the second core through the phase inverter.

6. The system-on-chip for multi-core out-of-phase processing according to claim 1, wherein the delay connection module is configured to transmit the clock signal output by the clock module to each core with different phases based on clock skew compensation and/or PVT compensation;

the clock skew compensation refers to the compensation of the difference of the time used by the clock signal to reach each core;

the PVT compensation refers to compensation of offset caused by power, voltage and temperature to clock signal transmission.

7. The system-on-chip for multi-core out-of-phase processing according to any one of claims 1 to 6, wherein the phase difference value of the clock signals obtained by the adjacent cores through the delay connection module is pi.

8. The system-on-chip with multi-core out-of-phase processing according to any one of claims 1 to 6, wherein the number of cores is N; the phase difference value of the clock signals obtained by the ith core and the (i + 1) th core through the delay connection module is

9. An integrated circuit comprising an on-board voltage regulator, a power delivery network, and a system-on-chip for multi-core out-phasing processing as recited in any of claims 1 to 8;

the on-board voltage regulator supplies power to the system level chip with multi-core out-of-phase processing through a power supply transmission network;

the power transfer network comprises at least two transfer layers, and each transfer layer is provided with a decoupling capacitor for filtering surge.

Technical Field

The invention relates to the technical field of integrated circuits, in particular to a system-on-chip and an integrated circuit for multi-core out-of-phase processing.

Background

In modern IC chip power supply designs. Typically, Power supply systems have on-board Voltage Regulators (VRs) that can supply Power to package Power supply pins, which are then delivered to the chip through a Power Delivery Network (PDN) of the package, and to the transistors on the die through a Power delivery network delivery layer on the die.

In the field of large chip design, such as AI accelerator SOC, general graphics processing unit GPGPU, etc., the ultra-large SOC power designed from 300W to 600W is adopted. This makes PDN design very challenging. Particularly when the power supply is running from idle, the surge can generate large ac noise on the power supply. Meanwhile, in the IC digital circuit, each element is operated based on a clock signal. Thus, around the clock sampling edge, instantaneous high peak surges occur due to simultaneous operation of the components, resulting in large current peaks of ac power supply noise on the PDN.

Therefore, the problem of high surge and noise under ultra-large SOC power is an important issue to be solved in the industry.

Disclosure of Invention

The invention provides a system-level chip and an integrated circuit for multi-core out-of-phase processing, which are used for overcoming the defects of high surge peak value and high noise in SOC (system on chip), especially super-high power SOC (system on chip), and realizing the technical effects of dispersing power in the whole clock period and reducing the surge peak value.

The invention provides a system-on-chip for multi-core out-phase processing, which comprises:

a clock module for outputting a clock signal;

at least two cores supporting asynchronous processing;

and the delay connection module is used for transmitting the clock signals output by the clock module to the cores in different phases.

According to the system-on-chip for multi-core out-phase processing provided by the invention, the delay connection module comprises at least two connection lines; one end of the connecting wire is connected with the output end of the clock module, and the other end of the connecting wire is connected with the clock input end of the kernel; the transmission path lengths of the connection lines connecting different cores are different.

According to the system-on-chip for multi-core out-phase processing provided by the invention, the delay connection module comprises a delay phase-locked loop; the delay phase-locked loop is used for receiving the clock signal output by the clock module and outputting at least two split-phase clock signals to different kernels.

According to the system-on-chip for multi-core out-phase processing provided by the invention, the delay connection module comprises at least two adjustable delay lines; one end of the adjustable delay line is connected with the output end of the clock module, and the other end of the adjustable delay line is connected with the clock input end of the kernel; the adjustable delay lines connecting different cores are set to different delay phases.

According to the system-on-chip for multi-core out-phase processing provided by the invention, the delay connection module comprises a first connection line and a second connection line; one end of the first connecting wire is connected with the output end of the clock module, and the other end of the first connecting wire is connected with the clock input end of the first core; and one end of the second connecting wire is connected with the output end of the clock module, and the other end of the second connecting wire is connected with the clock input end of the second core through the phase inverter.

According to the system-on-chip with multi-core out-phase processing provided by the invention, the delay connection module is used for transmitting the clock signal output by the clock module to each core in different phases on the basis of clock offset compensation and/or PVT compensation;

the clock skew compensation refers to the compensation of the difference of the time used by the clock signal to reach each core;

the PVT compensation refers to compensation of offset caused by power, voltage and temperature to clock signal transmission.

According to the system-on-chip for multi-core out-phase processing provided by the invention, the phase difference value of the clock signals acquired by the adjacent cores through the delay connection module is pi.

According to the system-on-chip with multi-core out-phase processing provided by the invention, the number of the cores is N; the phase difference value of the clock signals obtained by the ith core and the (i + 1) th core through the delay connection module is

The invention also provides an integrated circuit, which comprises a board-mounted voltage regulator, a power supply transmission network and the system level chip for multi-core out-phase processing;

the on-board voltage regulator supplies power to the system level chip with multi-core out-of-phase processing through a power supply transmission network;

the power transfer network comprises at least two transfer layers, and each transfer layer is provided with a decoupling capacitor for filtering surge.

According to the system-on-chip and the integrated circuit for multi-core out-of-phase processing, the clock signals output by the clock module are transmitted to at least two cores supporting asynchronous processing in different phases through the delay connection module, so that the phases of the clock signals of different cores are different, the time for task processing of different cores according to the clock signals is staggered, the original surge noise triggered simultaneously is dispersed to different positions of the clock signal period, and the surge peak value and the noise peak value of the SOC, especially the super-power SOC, are effectively reduced.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic diagram of a system-on-chip for multi-core out-phase processing according to the present invention;

FIG. 2 is a second schematic diagram of a system-on-chip for multi-core out-phase processing according to the present invention;

FIG. 3 is a third schematic diagram of a system-on-chip with multi-core out-phasing processing according to the present invention;

FIG. 4 is a fourth schematic diagram of the structure of a system-on-chip with multi-core out-phase processing provided by the present invention;

FIG. 5 is a fifth schematic diagram of the structure of a system-on-chip with multi-core out-phase processing according to the present invention;

FIG. 6 is a schematic diagram of an integrated circuit according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of an adjustable delay line provided by an embodiment of the present invention;

fig. 8 is a second schematic diagram of an adjustable delay line according to an embodiment of the invention.

Reference numerals:

100: a clock module; 200: a kernel; 300: a delay connection module;

301: a connecting wire; 302: a delay locked loop; 303: an adjustable delay line;

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The system-on-chip for multi-core out-phasing processing of the present invention is described below in conjunction with fig. 1-4.

As shown in fig. 1, an embodiment of the present invention provides a system-on-chip for multi-core out-phase processing, including:

a clock module 100 for outputting a clock signal;

at least two cores 200 supporting asynchronous processing;

a delay connection module 300 for transmitting the clock signals output by the clock module 100 to the cores 200 with different phases.

In fig. 1, different types of arrows indicate that the phases of the clock signals transmitted to different cores 200 are different.

In this embodiment, the clock module 100 may be formed by one or more phase-locked loops.

The at least two cores 200 supporting asynchronous processing in this embodiment refer to at least two cores 200 supporting asynchronous processing at different clock signal phases.

Further, in this embodiment:

the delay connection module 300 is configured to transmit the clock signal output by the clock module 100 to each core 200 in different phases based on clock skew compensation and/or PVT compensation;

the clock skew compensation refers to compensation of a difference in time taken for a clock signal to reach each core 200;

the PVT compensation refers to compensation of offset caused by power, voltage and temperature to clock signal transmission.

The clock skew compensation and/or the PVT compensation can be realized through the following ideas:

step 1, building a simulation platform of a system level chip SOC;

step 2, inputting the structural parameters and/or the functional parameters of the delay connection module 300 as input quantities to a simulation platform;

and 3, feeding back and adjusting the structural parameters and/or the functional parameters of the delay connection module 300 according to the phase difference of each core clock signal output by the simulation platform, and repeating the step 2 until the phase difference of each core clock signal output by the simulation platform meets the target phase difference.

The setting of the target phase difference can be performed according to the number of cores 200 and the relative position of the cores 200, and the present embodiment provides three possible ways of setting the target phase difference as follows.

The first target phase difference is set based on the relative positions of the cores 200, that is, the phase difference of the clock signals obtained by the adjacent cores 200 through the delay connection module 300 is pi, that is, the clock signals of the adjacent cores 200 are inverted. This arrangement can maximize the difference in the surge generation time of the adjacent cores 200, and further disperse the surges that are originally generated in a concentrated manner into the clock signal cycle, thereby reducing the peak values of the surges and noise.

The second target phase difference is set based on the number of cores 200, i.e., the number of cores 200 is N; the phase difference value of the clock signals obtained by the delay connection module 300 from the ith core 200 and the (i + 1) th core 200 isThis arrangement can evenly distribute the surge generation time nodes of a plurality of cores 200 (typically more than 2) throughout the period of the clock signal, reducing the peak values of surges and noise from a global perspective.

The third target phase difference is set based on the number of cores 200 and the relative position of the cores 200, and is a setting scheme considered when the surge peak value cannot be effectively reduced through a first idea (after adjacent phase inversion, the instantaneous surge peak value is still large) and a second idea (after average dispersion, the phase difference value is too small, and the surge reduction effect is not ideal due to the influences of PVT errors and clock offset) because the number of cores 200 is too large. In this scheme, it is necessary to divide the cores 200 into a set number of small groups and set the target phase difference in units of small groups.

For example, the phase difference value of the adjacent cores 200 in a group may be set to pi, and the phase difference value of the adjacent group may be set to pi

This arrangement enables surge peak optimization for SOCs with a greater number of cores 200.

The beneficial effect of this embodiment lies in:

in the present embodiment, the delay connection module 300 transmits the clock signal output by the clock module 100 to at least two cores 200 supporting asynchronous processing in different phases, so that the phases of the clock signals of different cores 200 are different, thereby staggering the time for different cores 200 to perform task processing according to the clock signal, and dispersing the surge noise originally triggered at the same time to different positions of the clock signal period, thereby effectively reducing the surge peak value and the noise peak value in the SOC, especially the super-high power SOC.

According to the above embodiments, four specific configurations of the delay connection module 300 are provided in this embodiment.

As shown in fig. 2, in a first configuration, the delay connection module 300 includes at least two connection lines 301; one end of the connecting wire 301 is connected with the output end of the clock module 100, and the other end is connected with the clock input end of the kernel 200; the transmission path lengths of the connection lines 301 connecting different cores 200 are different.

In a second configuration, as shown in fig. 3, the delay connection module 300 includes a delay locked loop 302; the dll 302 is configured to receive the clock signal output by the clock module 100 and output at least two split-phase clock signals to different cores 200.

The delay locked loop 302 may also incorporate a phase interpolator to better implement the delay.

In a third configuration, as shown in fig. 4, the delay connection module 300 includes at least two adjustable delay lines 303; one end of the adjustable delay line 303 is connected with the output end of the clock module 100, and the other end is connected with the clock input end of the kernel 200; the adjustable delay lines 303 connecting different cores 200 are set to different delay phases.

The adjustable delay line 303 needs to be calibrated to compensate for PVT errors and clock skew, so as to achieve better delay.

In this embodiment, the adjustable delay line 303 may be implemented by a combinational logic circuit, specifically:

the adjustable delay line 303 may be formed by a connection line with delay cells that may be selected from inverters or flip-flops depending on the required accuracy.

As shown in fig. 7 and 8, the adjustable delay line 303 may also be implemented by an LC circuit or an inverter combination circuit. The LC circuit includes a predetermined number of inductors and capacitors (the capacitors may be either polar capacitors or nonpolar capacitors). The inverter combination circuit includes a plurality of inverters connected in series and externally connected to a control voltage Vcc, and the inverter combination circuit has a delayed input signal (i.e., a clock signal output by the clock module 100) as an input and a delayed output signal (i.e., a clock signal set to different delay phases) as an output.

In a fourth configuration, as shown in fig. 5, the delay connection module 300 includes a first connection line and a second connection line; one end of the first connection line is connected with the output end of the clock module 100, and the other end of the first connection line is connected with the clock input end of the first core 200; one end of the second connection line is connected to the output end of the clock module 100, and the other end of the second connection line is connected to the clock input end of the second core 200 through the inverter.

The beneficial effect of this embodiment lies in:

by adopting the first configuration scheme, the processing flow and the cost can be simplified to the greatest extent on the premise of reducing the SOC surge peak value and the noise peak value through the position of the inner core 200 and the path design of the connecting wire 301.

By adopting the second or third configuration scheme, the SOC surge peak value and the noise peak value can be reduced on the premise of not increasing the complexity of chip design.

According to any of the embodiments described above, in this embodiment:

this embodiment presents a clock controller method that spreads power over the clock cycle to reduce surges and di/dt.

In the SOC, the main core area is a digital circuit that triggers data by clock synchronization. When the sampling clock edge, all flip-flops sample the input data and merge all logic. Waiting for the next sampling clock edge to execute again. Thus, there is a peak in current usage at a clock edge, and there is less current usage between two clock edges.

The GPGPU or AI computation IC has many computational cores. It may be asynchronous between cores. The present embodiment introduces a clock delay control mechanism to extend the clock sampling edge to all cores 200, especially neighboring cores 200, within one clock cycle. These average the large current peaks at the clock edges to several smaller current peaks within the clock cycle to reduce the current switching peaks (or di/dt). This can reduce power supply noise.

The SOC includes a number of computational cores. Core communications pass through a Network On Chip (NOC). The cores are all asynchronous. Therefore, if the sampling clock phase between the cores 200 can be cancelled, the current peak of each core 200 can be effectively cancelled. The SOC core 200 clock is generated by a phase locked loop (which may be one or more). And then distributed to cores 200 through a clock distribution network. Thus, we can use a clock distribution network to control clock skew between cores 200.

One solution is to use a clock distribution tree that is carefully designed to guarantee clock skew and propagation through power, voltage, temperature simulations. Placing adjacent cores 200 in opposite clock phases. Therefore, adjacent core power surges can be spread out to 180 degrees of the clock cycle, maximizing the current averaging effect. For all cores 200, the sampling clock phase is extended to a full clock cycle to minimize surge. This approach requires a large amount of simulation and clock tree adjustment. Sometimes, it is difficult to scale it to one clock cycle for large chips, and it is difficult to manage power, voltage, temperature variations on the clock distribution tree design.

The other scheme is an additional improvement on the basis of the scheme, and a clock adjusting mechanism is added to adjust the clock phase. A delay locked loop or adjustable delay line 303 with calibration may do this. The dll may output clocks having the same frequency but with phases of 0 degrees, 90 degrees, 180 degrees, and 270 degrees, or may output any number of set phase angles such as 10 degrees, 40 degrees, 70 degrees, 100 degrees, 130 degrees, 160 degrees, 190 degrees, 220 degrees, 250 degrees, 280 degrees, 310 degrees, and 340 degrees. Finer phases can be programmed when using a delay line with calibration.

When a delay locked loop is used, it may generate and transmit a PVT compensated phase offset clock to the SOC core 200.

With delay lines, the delay varies with PVT, requiring a calibration scheme to adjust the delay setting to compensate for PVT variations.

The beneficial effect of this embodiment lies in:

the purpose of this embodiment is to reduce the ac current peaks during the flip-flop clock sampling edges by extending the clock sampling edges to the entire clock cycle between all cores 200. Also, this embodiment shifts the neighboring cores 200 to the opposite clock phase (180 degree offset), reducing the AC noise peak by 6dB and reducing the surge (di/dt). Thereby effectively reducing PDN design burden.

The embodiment of the invention also provides an integrated circuit, which comprises a board-mounted voltage regulator, a power supply transmission network and the system level chip for multi-core out-phase processing;

the on-board voltage regulator supplies power to the system level chip with multi-core out-of-phase processing through a power supply transmission network;

the power transfer network comprises at least two transfer layers, and each transfer layer is provided with a decoupling capacitor for filtering surge.

FIG. 6 provides an integrated circuit example, the power delivery network shown in the figure includes 3 delivery layers, namely a printed wiring board layer, an organic substrate layer, and a SOC IC layer (i.e., a system-on-chip integrated circuit layer); in some preferred embodiments, a silicon interposer is also disposed between the organic substrate layer and the SOC IC layer. Further improvements of the present embodiments in an integrated circuit will now be described, using the integrated circuit with a silicon interposer as an example.

The printed circuit board layer is connected with the packaging layer through a ball grid array packaging interface; the packaging layer and the silicon intermediate layer are connected through a C4 welding method; the silicon interposer and the SOC IC layer are connected by a microbump (uBump) interface.

And the board-mounted voltage regulator VR is electrically connected to the printed circuit board layer and supplies power to the system level chip with multi-core out-of-phase processing.

Decoupling capacitors are arranged on the printed circuit board layer, the packaging layer and the silicon intermediate layer and used for filtering noise. The decoupling capacitor can be matched with the system-level chip for multi-core out-of-phase processing, and the noise filtering effect is further optimized.

The beneficial effect of this embodiment lies in:

with reference to the integrated circuits described above, power is primarily dependent on the PDN design of the board, package, and die. On the premise of only adopting decoupling capacitor filtering, due to the fact that the PDN is far away from a chip, the high-frequency efficiency is low, and the filtering effect cannot be expected. That is, the large-scale integrated circuit has large power consumption, the clock frequency is GHz, and the generated high-frequency noise cannot be filtered, because the design of the power transmission system and the capacitor has limitations, and thus the noise is generated on the chip.

In the present embodiment, the SOC and the decoupling capacitor with clock phase delay are combined to respectively achieve the peak reduction effect for high frequency and low frequency.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

12页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:星载激光通信终端及其功耗控制方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!