High-throughput and low-delay PHY (physical layer) interface circuit device of DDR5SDRAM (synchronous dynamic random access memory)

文档序号:153077 发布日期:2021-10-26 浏览:46次 中文

阅读说明:本技术 一种ddr5 sdram的高吞吐率、低延迟phy接口电路装置 (High-throughput and low-delay PHY (physical layer) interface circuit device of DDR5SDRAM (synchronous dynamic random access memory) ) 是由 李康 陆少强 史江义 潘伟涛 荣卓尔 陈嘉伟 于 2021-06-24 设计创作,主要内容包括:本发明属于芯片设计技术领域,公开了一种DDR5SDRAM的多PHY接口电路装置,由频率比转换、DFI地址命令与数据读写、初始化训练校准、地址命令发送与数据收发和配置等模块构成。本发明装置能够提供高数据率、低延迟的多存储颗粒访问能力以支持标准DDR5协议。不仅通过初始化训练校准模块来训练路径最佳传输状态,以实现低延迟,而且还能够通过地址发送与数据收发模块完成的高速并串转换和高速时钟PLL模块共同支持DDR5高数据率传输。配置模块使用可配置寄存器来设置数据读写和数据收发模块,实现灵活的并行多存储通道结构,以实现高吞吐率传输,同时通过配置模块可配置频率比转换模块,实现包括1:1、1:2和1:4三种频率比操作,实现对不同DFI接口频率的控制器的支持。(The invention belongs to the technical field of chip design, and discloses a multi-PHY interface circuit device of DDR5SDRAM, which comprises modules of frequency ratio conversion, DFI address command and data reading and writing, initialization training calibration, address command sending, data receiving and transmitting, configuration and the like. The device can provide high-data-rate and low-delay multi-memory-granule access capability to support the standard DDR5 protocol. The path optimal transmission state is trained by initializing the training calibration module to realize low delay, and DDR5 high data rate transmission can be supported by the address transmission and the high-speed parallel-serial conversion and high-speed clock PLL module which are completed by the data transceiver module. The configuration module uses a configurable register to set a data read-write module and a data receiving-transmitting module, realizes a flexible parallel multi-storage channel structure to realize high throughput rate transmission, and simultaneously can configure a frequency ratio conversion module through the configuration module to realize three frequency ratio operations including 1:1, 1:2 and 1:4, and realize the support of controllers with different DFI interface frequencies.)

1. A high-throughput, low-latency PHY interface circuit arrangement for DDR5SDRAM, the high-throughput, low-latency PHY interface circuit arrangement for DDR5SDRAM comprising:

the frequency ratio conversion module, the DFI address command module and the address command sending module are electrically connected in sequence; the frequency ratio conversion module, the DFI data reading and writing module and the data transceiving module are electrically connected in sequence; the SDRAM initialization training calibration module is respectively and electrically connected with the DFI address command module and the DFI data read-write module; the high-speed clock PLL module is respectively and electrically connected with the address command sending module and the data transceiving module; the configuration module is respectively and electrically connected with the frequency ratio conversion module, the DFI address command module, the initialization training calibration module, the DFI data reading and writing module, the address command sending module, the high-speed clock PLL module and the data transceiving module;

the frequency ratio conversion module is used for processing conversion operation of different frequency ratio modes of DFI addresses/commands and data at the controller end, and comprises the steps of converting the DFI addresses and commands in 1:1, 1:2 and 1:4 modes into internally fixed 1:2 mode DFI addresses and commands; converting the DFI write data of 1:1, 1:2 and 1:4 modes into internally fixed 1:2 mode DFI write data; meanwhile, DFI read data of an internal fixed 1:2 mode is converted into DFI read data of 1:1, 1:2 and 1:4 modes, and finally the transmission from the fixed 1:2 mode to SDRAM is processed inside the PHY;

the DFI address command module is used for organizing and arranging different phase numbers of DFI addresses and command signals into 4-bit data lines, and the organizing and arranging modes comprise DDR5 single-cycle commands and DDR5 double-cycle commands; distributing every 4 lines to output to 1 address command sending module according to the number of addresses and command pins needing to be processed by the SDRAM; simultaneously generating a delay unit control signal, a command/data sending clock, a command/data sending initial enabling control signal, a command FIFO reading initial enabling signal, a data transceiving module control clock and controlling the transmission process of an address command;

the device comprises an initialization training calibration module, an address command sending module, a data receiving and sending module and a data processing module, wherein the initialization training calibration module is used for generating an initialization sequence which accords with the DDR5SDRAM standard specification, and sending the initialization sequence to each address command sending module and each data receiving and sending module to initialize the SDRAM so as to ensure that the SDRAM affair can be correctly accessed in a normal affair mode; performing delay training on each address, command and data path during initialization to obtain the optimal sampling center delay, adjusting a delay calibration unit of each path, calibrating the time sequence deviation of a clock signal and data gating, and ensuring that data can be sampled mutually to realize low-delay transmission;

the DFI data read-write module is used for arranging and distributing DFI write data and write data masks into a plurality of groups according to the configuration of SDRAM data lines as X8 and X16 and whether a plurality of SDRAMs are connected or not during write operation, wherein one group comprises 8 write data lines and 1 data mask, and each line is 4 bits; arranging the data writing enable signals into different SDRAM burst modes in a DFI data module according to DFI, setting different driving strength values of each data signal, and loading the two signals to a 4-bit data line to expand the two signals into 12-bit data line for output; generating 1 write data strobe signal line of 12bit according to the write data enable signal; generating a write data latency information control signal, a write data transmission clock and initial enable signal, a write command FIFO clock and an initial enable signal; the system is used for transmitting data inversion of 8 data lines and 1 bus during a read operation, generating a read data effective signal, outputting the read data to a frequency ratio module after arranging, and finally sending DFI read data to a controller; meanwhile, in the period of reading the command, according to the DFI data reading enabling signal, generating a data reading sampling gating signal, a data reading clock, an initial enabling signal, a reading command FIFO clock and an initial enabling signal, so that the data transceiver module can correctly sample SDRAM data;

the address command sending module is used for processing 4 lines and 4-bit addresses and commands of each line transmitted by the DFI address command module, converting 4-bit parallel to serial and IO interface characteristics of each line and finally outputting the addresses and commands to the address and command pins of the SDRAM; meanwhile, the control signal of the delay information is processed through the command FIFO module and the delay unit module, and finally the correctness of the address/command time sequence output to the SDRAM is ensured;

the high-speed clock PLL module is used for generating a 4-frequency-doubled high-speed clock by taking the DFI clock as a reference clock, outputting the 4-frequency-doubled high-speed clock to the address command sending module and the data receiving and sending module to realize serial-parallel conversion operation and generate an SDRAM clock, and ensuring the requirement of DDR5SDRAM on high speed by embedding a high-speed PLL;

the data transceiver module is used for respectively performing 12-bit parallel-to-serial conversion and IO interface characteristics on 8 write data lines, 1 write data mask line and 1 write data gate line transmitted by the DFI data module during data writing; processing the write data delay information control signal through the write command FIFO module and the write delay unit module of each line, and controlling the delay transmission in the write data period, so that the data output to the SDRAM conforms to the burst standard time sequence of the SDRAM; processing the read data sampling gating signals through 2 write command FIFOs and delay cells during a read operation; correctly acquiring serial data of 8 data pins and 1 bus data inversion pin of the SDRAM according to the sampling gate control signal through an IO interface unit; at the same time, serial conversion is carried out on parallel 4-bit data through 2 data odd-even FIFOs, and finally 8 data lines and 1 bus data are output and are reversely transferred to a controller port;

and the configuration module is used for writing the values of the internal registers through the configuration interface of the external slow APB, and configuring a plurality of working modes of the whole PHY according to the values of the registers in other modules.

2. The high throughput, low latency PHY interface circuit apparatus for a DDR5SDRAM of claim 1, wherein the configuration module is further configured to configure the module to form a flexible multiple data channel to enable parallel operation of multiple memories, the method comprising:

a data transceiver module is used for processing 8 data lines, an internal multi-channel register is set through a configuration module, a plurality of data bit width memories are connected by multiples of 8, the minimum X8 is configured to be multiples of 8, and a configuration device is required to be 4 data transceiver modules and 4 DFI data read-write modules when two memories of X16 are connected; and meanwhile, the distributed data bits of the dfi _ wrdata/the dfi _ rddata are enabled to be connected with the module by configuring the register value of the module, so that flexible parallel operation of a plurality of SDRAM is realized.

3. The high-throughput, low-latency PHY interface circuit arrangement for DDR5SDRAM of claim 1, wherein the high-throughput, low-latency PHY interface circuit arrangement for DDR5SDRAM further comprises a control and command transfer unit consisting essentially of a DFI address command module and an address command issue module;

the address command sending module consists of an address command parallel-serial conversion module, an address command IO port module, a local calibration delay unit and an address command control FIFO module; the DFI address command module of the control unit includes two modes of operation: the double cycle command in the DDR5 command truth table is two adjacent SDRAM CK cycles, the first cycle is used for sending command operation, and the second cycle is used for sending address value; the arrangement mode of the DFI address command module adopts { P1 high order, P0 high order, P1 low order, P0 low order } to organize 4bit data, at this time, effective commands P0 low order, P1 low order occupy the first SDRAM period, P0 high order, P1 high order occupy the second SDRAM period, and the low order and the high order of P0 and P1 are different;

following the DFI5.0 protocol, the operation of the DFI address command sent by the controller end, for the DDR5 double-cycle command comprising activation operation, write WRP, mode register write, mode register read, write operation, read operation, the low 14bit of the DFI _ address _ P0/P1 is set as the address command of the first cycle, the high 14bit is stored with the value of the second cycle, and the value of the first cycle and the second cycle can be defined; for single-cycle commands of DDR5, the commands include CA reference voltage commands, refresh all commands, refresh same bank commands, precharge all commands, precharge same bank commands, precharge bank groups, self-refresh entry commands, power saving mode exit commands, no operation/no selection commands and MPC multifunctional commands, the lower 14bit of the dfi _ address _ P0/P1 is set as the numerical value of the commands, and the upper 14bit is set as an invalid value, so that the realization mode of the single-cycle commands, namely the first cycle is valid, the second cycle is invalid, can be realized.

4. The high-throughput, low-latency PHY interface circuit apparatus for a DDR5SDRAM of claim 2, wherein the structure of the control and command transfer unit comprises:

firstly, an address command DFI _ address _ P0/P1 following a DFI5.0 protocol passes through a DFI address command module, is divided into a single cycle and a double cycle according to the operation mode of a DDR5 command, different arrangement modes are carried out, 4 paths of output are distributed to an address command sending module according to the number of DDR5SDRAM pins, and the 4 paths of channels respectively correspond to 4 SDRAM pin positions; 4-bit data is subjected to parallel-serial conversion and serial conversion, clock reading is performed after calibration, and then an address command IO port module is used for delay chain optimization, logic conversion of PHY voltage and SDRAM voltage and correct driving strength; the address command control FIFO is mainly used for transmitting delay information and is used for controlling the delay unit so as to control the delay of the address command and ensure the correctness of the time sequence.

5. The high-throughput, low-latency PHY interface circuit arrangement for DDR5SDRAM of claim 1, wherein the high-throughput, low-latency PHY interface circuit arrangement for DDR5SDRAM further comprises a data transfer unit, the data transfer unit consisting of a DFI data read-write module, a data transceiver module;

the DFI data read-write module is used for controlling a write channel and a read channel; the data write operation includes: the data written in by the DFI firstly form a parallel data line according to every 4 bits, 8 parallel data lines are sent to a DFI data reading and writing module, each parallel data line generates a 4-bit data enable burst mode control signal in the module according to a DFI write data enable signal, a drive strength value of each 4-bit data is set, and finally the 3 groups of data are combined into a 12-bit data line; is sent to the data transceiver module to perform parallel-to-serial conversion into serial output by using an internal high-speed clock; the data enable burst mode control signal is used for obtaining burst data operation by controlling enable signals of different data at an IO port, and the driving strength value of the data is used for controlling the driving of pull-up and pull-down resistors of the data at the IO port so as to improve the integrity of the signal; meanwhile, the local calibration delay unit is used for carrying out optimal delay calibration output on the write data so as to ensure that the SDRAM time sequence requirement is met;

the data read operation includes: in the process from the SDRAM read operation to the DFI read data output, firstly, generating an SDRAM read data sampling gating signal by a DFI read data enable signal in a DFI data read-write module; DQS and DQS _ N data gating signals output by the SDRAM are used as a double-edge sampling clock in an IO port sampling circuit, and serial data read by the double-edge SDRAM is read when the data sampling gating signals are read and started; converting 1-bit serial data into 4-bit parallel data through an odd serial-to-parallel FIFO (first in first out) module and an even serial-to-parallel FIFO module, sending the 4-bit parallel data to a frequency ratio module, and finally sending the 4-bit parallel data to a controller end through a DFI (distributed feedback interface) protocol; and meanwhile, the read data is ensured to be correctly sampled through the data de-skew and local calibration delay unit.

6. The high-throughput, low-latency PHY interface circuit apparatus for a DDR5SDRAM of claim 4, wherein the hardware configuration of the data transfer unit comprises:

an address command following a DFI5.0 protocol firstly passes through a DFI data read-write module, a write channel module processes write data such as DFI _ wrdata _ P0/P1 and the like, the write data are distributed into 8 data lines, the bit width of each data line is 4 bits, and corresponding burst and clock control signals are generated according to the DFI _ wrdata _ en/cs; the read channel module is mainly used for generating a read data sampling gating signal, can ensure that read data is sampled when an SDRAM IO interface is read, simultaneously transmits the dfi _ rddata _ P0/P1 read data, and generates a corresponding handshake signal which comprises the dfi _ rddata _ valid and the like; and the data receiving and transmitting module is used for sending the written data to an SDRAM pin after parallel-serial conversion and delay calibration, correctly sampling the read data by a sampler, obtaining read data reading 4 bits through data synchronous alignment and odd-even serial conversion parallel FIFO, transmitting the read data back to the DFI data reading and writing module in front, and finally outputting the read data to the controller.

7. The high-throughput, low-latency PHY interface circuit apparatus for a DDR5SDRAM of claim 4, wherein the data transfer unit supports DDR5 operations with a frequency ratio and variable frequency for the PHY, comprising:

firstly, the conditions of a controller end 1:2 and an internal fixed end 1:2 are not required to be converted, and seamless transmission is directly carried out; for the controller terminal 1:1 and the internal fixed 1:2, where the DFI clock frequency and the SDRAM frequency are the same and 1 cycle transmits one data of P0 phase, if it is to be converted into the internal fixed 1:2 mode, the data of the controller terminal is maintained for two clock cycles, corresponding to one clock of the 1:2 mode, and the two data of the two cycles of the 1:1 mode are respectively placed to the P0 and P1 phases of the 1:2 mode; for controller terminal 1:4 and internal fixed 1:2, the DFI clock frequency and SDRAM frequency are a 1:4 ratio and 1 cycle sends data of one P0, P1, P2, P3 phase, if to convert to internal fixed 1:2 mode, 1 cycle of controller terminal data, corresponding to 2 clocks of 1:2 mode, and 4 phase data of this 1:4 mode are placed to the P0, P1 phase of the first cycle and the P0, P1 phase of the second cycle of 1:2 mode, respectively;

when the read-write conversion of the external 1:1 and the internal 1:2 is carried out, 4 registers and a control counter are needed, firstly, external dfi _ wrdata _ P0 is registered to an internal P0 channel at the first DfiCtrClk under the control of the counter, the second period is registered to an internal P1 channel, and the two registers at the back simultaneously register the temporarily stored data for one beat at the same time to be dfi _ wrdata _ internal _ P0/P1; the reading channel is relatively simple, and the internal P0 and the P1 are respectively set as the external P0 in two adjacent cycles under the counting controller through a port register;

when the external frequency ratio is 1:4 and the internal frequency ratio is 1:2, the write channel needs to select dfi _ wrdata _ P0/P2 to be the internal P0 and the dfi _ wrdata _ P1/P3 to be the internal P1; the conversion of the read data channels respectively needs 8 registers, the internal P0 is set to the external P0, the internal P1 is set to the P1 in the first internal DfiClk period, the internal P0 is set to the external P2 next beat, the internal P1 is set to the external P3, and finally the external P0, the external P1, the external P2 and the external P3 are simultaneously registered and output in one DfiCtrClk through the control of a counter;

for the PHY internal usage, 4 clocks are used: the controller inputs DFI clock DFICtrlClk, internal fixed 2:1 mode DFI clock DfiClk, PLL high speed clock pclk and SDRAM clock CK, and the relationship is as follows: the internal fixed 2:1 mode DFI clock is 1/4 times of PLL high-speed clock, the SDRAM clock is 1/2 times of PLL high-speed clock, the controller input DFI clock is defined according to frequency ratio; the internal high-speed clock is generated by taking a DFI clock in an internal fixed frequency ratio 1:2 mode as a reference, multiplying the frequency by 4 times through a high-speed clock PLL module, and is used for an address command sending module and a data receiving and sending module to realize parallel-serial conversion, serial-parallel conversion and SDRAM clock generation.

8. The high-throughput, low-latency PHY interface circuit apparatus for a DDR5SDRAM of claim 1, wherein the high-throughput, low-latency PHY interface circuit apparatus for a DDR5SDRAM further comprises an initialization and training unit, the initialization and training unit comprising:

the DDR5 initialization unit is used for generating an initialization sequence which is in accordance with the DDR5SDRAM standard protocol JESD79-5, and ensuring that the memory can carry out a normal transaction mode;

the training unit is used for training and calibrating the system and comprises a 1D training mode and a 2D training mode; the 1D training is for delay optimization at 1 voltage provided by DFI, by training firmware to adjust local calibration delay units of each clock, command, address, data path to compensate for delays including board level and DRAM delays, etc.; the 2D training is to calculate the area passing the operation to form a 2D eye diagram for all read-write tests of each pair of voltage and delay after the 1D training, and analyze the margin of the 2D eye diagram to optimize the optimal voltage and delay points;

the initialization of the whole PHY is to combine SDRAM initialization and training calibration initialization into a flow, the initialization process is executed by PHY initialization firmware, and the hardware architecture of the DDR5PHY training calibration comprises:

firstly, a special training state machine block is used for controlling the whole training process and comprises two parts of training core initialization, and data of the special training state machine block is interacted with a control register block to control other modules; training data is sent to a DFI address command module and a DFI data read-write module through a multiplexer to be switched into a normal access function of a training core; during the write training, random write data is generated by training write data generation and is sent to the read data comparison module during the read training, and the command module generates a combination of corresponding read-write command operations to reflect real operations to adjust the optimal delay.

9. A method of controlling a high throughput, low latency PHY interface circuit arrangement for a DDR5SDRAM operating the high throughput, low latency PHY interface circuit arrangement of DDR5SDRAM of any one of claims 1 to 7, wherein the method of controlling the high throughput, low latency PHY interface circuit arrangement of DDR5SDRAM comprises the steps of:

after the device is started, initializing DDR5 by an initialization training calibration module and setting the delay of each path through training so as to achieve an optimal low-delay working state;

step two, during normal work, the DFI5.0 signal of the on-chip controller firstly processes the interconversion of systems with different frequency ratios by the control, address and data of the frequency ratio module; the address and control signals are processed by the DFI address command module and the address command sending module to be a control address time sequence conforming to DDR 5;

and thirdly, data transmission is performed between the DFI data read-write module and the data transceiver module, the DFI data read-write module and the data transceiver module are divided into read-write channels which are processed to perform efficient DDR5 particle read-write operation, and high-speed and low-delay efficient transmission between the on-chip DDR controller and the SDRAM is achieved.

10. An information data processing terminal, characterized in that the information data processing terminal is adapted to implement a high throughput, low latency PHY interface circuit arrangement of a DDR5SDRAM as claimed in any one of claims 1 to 7.

Technical Field

The invention belongs to the technical field of chip design, and particularly relates to a high-throughput and low-delay PHY interface circuit device of a DDR5 SDRAM.

Background

With the rapid development of artificial intelligence and big data technology since the 21 st century, the intelligent application scenarios need to process and transmit massive data, and especially, servers of data centers increasingly demand storage devices with higher speed, large capacity and high throughput rate. A double data rate synchronous dynamic random access memory (DDR SDRAM) is continuously updated and iteratively developed as a main memory of a computer, and a fifth generation DDR5 has characteristics of high capacity, high data rate and low delay, so as to support application scenarios of a server, 5G communication, multiple cameras, an automobile and the like. Due to different division of labor of various manufacturers in the DDR memory system, the DDR needs to be integrated in the SOC, and 3 IP cores of a DDR controller, a DDRPHY interface and DDR memory particles need to be integrated in the system to work cooperatively. Currently, the DDR5 protocol has just been released, and has several features to distinguish from DDR4 memory to achieve higher performance and flexibility, and these new features include: address command time-sharing multiplexing, higher transmission rate, double-period command operation, larger capacity configuration and the like, and meets the application of high speed and high throughput rate for the server.

As memory is iterated and the rate is doubled, a PHY interface is required to achieve high-rate, high-bandwidth transmission, and the DFI specification protocol is followed between the PHY and the controller. In the aspect of PHY scientific research in recent years, a method for generating a BIST data channel-based test vector of a DDR2/3 PHY is proposed, and by utilizing the reaching test requirements of pseudo-random numbers, an output driving circuit structure of a DDR3 PHY SSTL15 is provided, and the output driving circuit structure comprises a front driving core and a rear driving core. In the physical implementation of the DDR PHY, the optimization method of the clock tree reduces the problem of delay skew between the repair of DDR data; xilinx introduced a high performance DDR4 memory solution for All Programmable UltraScale devices in 2014 with data rates up to 2400Mbps per second, but Xilinx's IP did not use the standard DFI4.0 protocol to separate the controller portion from the physical PHY, so that those controllers based on the DFI standard could not be connected on FPGA, and are not suitable for expansion and IP integration.

In summary, most of the current scientific research work focuses on the directions of access scheduling optimization of a controller, a device for sending and receiving memory particle data, a pseudo-random data test of a DDR, a low-power management control circuit, a slot core maintenance device, and the like, and few patents relate to the implementation mode of the whole PHY structure design. Although the DDR5 and DFI5.0 protocols define functions and timing, no specific implementation device is provided, so the invention fills the blank in supporting PHY devices connecting a standard DFI5.0 controller and a DDR5 memory, and has the advantages of high throughput and low delay.

The DDR5SDRAM provides high-capacity, high-rate data storage for server-oriented applications, while the DDR5PHY interface is a connection scheme that enables high-rate, high-throughput, low-latency between a controller integrated on a system-on-chip and DDR5 memory particles. High performance DDR5 PHYs and on-chip controllers follow the latest DFI5.0 standard transmission specification to facilitate efficient reuse of IP and interfacing with different supported memory granules. In the face of high bandwidth requirements of mass data, the configurability of the DDR5PHY can be connected with a plurality of DDR5 memory particles to form a flexible multi-memory-channel structure. Since data interoperability is multi-mode, a high performance PHY is able to support burst 8 to 32 data reads and writes of the latest DDR5 memory grains, as well as large capacity 8G to 64G addressing ranges, in addition to refresh, precharge, configure mode registers, etc. operations. The latest DDR5 of the solid state technology association specifies rates from 3200MT/s to 6400MT/s, so the PHY provides higher operating frequencies and variable frequencies while improving data transfer reliability and high speed signal integrity by training calibration of delay and voltage to achieve optimal operation of the PHY. To achieve a high performance DDR5PHY, the key technical challenges are as follows:

1. the connection interface between the high performance PHY and the on-chip controller needs to support the latest DFI5.0 standard specification.

2. The latest DDR5 standard specification free switching of single cycle command and double cycle command operations.

3. The read-write control of the data burst of the latest DDR5 from 8 to 32 is realized, so that multiple working modes of the DDR5 are realized.

4. Different frequency ratios and variable frequency support to support the need for high data rates to meet DDR 5.

5. High bandwidth data transfer requires that the PHY be configurable to connect multiple DDR5 pellets to form a flexible multi-memory channel.

6. Delay calibration training for each channel of command, address, and data during initialization to ensure that the memory system operates in an optimal state.

7. The signal integrity problem of high-speed transmission needs to be calibrated for impedance, so as to reduce the reflection effect of the high-speed transmission signal of the signal and support higher transmission rate.

Through the above analysis, the problems and defects of the prior art are as follows:

(1) the existing Xilinx IP does not use the standard DFI4.0 protocol to separate the controller part from the physical PHY, so that the controller based on the DFI standard cannot be connected on an FPGA, and is not suitable for expansion and IP integration.

(2) Few patents in most current scientific research work relate to the implementation mode of the whole PHY structure design, and no specific implementation device is provided.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a high-throughput and low-delay PHY interface circuit device of DDR5 SDRAM.

The invention is realized in this way, a high throughput and low delay PHY interface circuit device of DDR5SDRAM, the high throughput and low delay PHY interface circuit device of DDR5SDRAM comprises a frequency ratio conversion module, a DFI address command module, an initialization training calibration module, a DFI data read-write module, an address command transmitting module, a high-speed clock PLL module, a data transceiver module and a configuration module.

The frequency ratio conversion module, the DFI address command module and the address command sending module are electrically connected in sequence; the frequency ratio conversion module, the DFI data reading and writing module and the data transceiving module are electrically connected in sequence; the SDRAM initialization module is respectively electrically connected with the DFI address command module and the DFI data read-write module; the high-speed clock PLL module is respectively and electrically connected with the address command sending module and the data transceiving module; the configuration module is respectively and electrically connected with the frequency ratio conversion module, the DFI address command module, the initialization training calibration module, the DFI data reading and writing module, the address command sending module, the high-speed clock PLL module and the data transceiving module.

The frequency ratio conversion module is used for processing conversion operation of different frequency ratio modes of DFI addresses/commands and data at the controller end, and comprises the steps of converting the DFI addresses and commands in 1:1, 1:2 and 1:4 modes into internally fixed 1:2 mode DFI addresses and commands; converting the DFI write data of 1:1, 1:2 and 1:4 modes into internally fixed 1:2 mode DFI write data; and converting the DFI read data of the internal fixed 1:2 mode into DFI read data of 1:1, 1:2 and 1:4 modes, and internally processing the transmission of the fixed 1:2 mode to the SDRAM by the PHY.

The DFI address command module is used for organizing and arranging different phase numbers of DFI addresses and command signals into 4-bit data lines, and the organizing and arranging modes comprise DDR5 single-cycle commands and DDR5 double-cycle commands; distributing every 4 lines to output to 1 address command sending module according to the number of addresses and command pins needing to be processed by the SDRAM; and simultaneously, generating a delay unit control signal, a command/data sending clock, a command/data sending initial enabling control signal, a command FIFO reading initial enabling signal, a data transceiving module control clock and controlling the transmission process of the address command.

The device comprises an initialization training calibration module, an address command sending module, a data receiving and sending module and a data processing module, wherein the initialization training calibration module is used for generating an initialization sequence which accords with the DDR5SDRAM standard specification, and sending the initialization sequence to each address command sending module and each data receiving and sending module to initialize the SDRAM so as to ensure that the SDRAM affair can be correctly accessed in a normal affair mode; and performing delay training on each address, command and data path during initialization to obtain the optimal sampling center delay, adjusting a delay calibration unit of each path, calibrating the time sequence deviation of a clock signal and data gating, and ensuring that data can be sampled mutually.

The DFI data read-write module is used for arranging and distributing DFI write data and write data masks into a plurality of groups according to the configuration of SDRAM data lines as X8 and X16 and the condition of whether a plurality of SDRAMs are connected or not during write operation, wherein one group comprises 8 write data lines and 1 data mask, and each line is 4 bits; arranging the data writing enable signals into different SDRAM burst modes in a DFI data module according to DFI, setting different driving strength values of each data signal, and loading the two signals to a 4-bit data line to expand the two signals into 12-bit data line for output; generating 1 write data strobe signal line of 12bit according to the write data enable signal; generating a write data latency information control signal, a write data transmission clock and initial enable signal, a write command FIFO clock and an initial enable signal; the system is used for transmitting data inversion of 8 data lines and 1 bus during a read operation, generating a read data effective signal, outputting the read data to a frequency ratio module after arranging, and finally sending DFI read data to a controller; and meanwhile, in the period of reading the command, a reading data sampling gating signal, a reading data clock, an initial enabling signal, a reading command FIFO clock and an initial enabling signal are generated according to the DFI reading data enabling signal, so that the address command sending module can correctly sample the SDRAM reading data.

The address command sending module is used for processing 4 lines and 4-bit addresses and commands of each line transmitted by the DFI address command module, converting 4-bit parallel to serial and IO interface characteristics of each line and finally outputting the addresses and commands to the address and command pins of the SDRAM; meanwhile, the delay information control signal is processed through the command FIFO module and the delay unit module, and finally the correctness of the address/command time sequence output to the SDRAM is ensured.

And the high-speed clock PLL module is used for generating a 4-frequency-multiplied high-speed clock by using the DFI clock as a reference clock, outputting the 4-frequency-multiplied high-speed clock to the address command sending module and the data receiving and sending module to realize serial-parallel conversion operation and generate an SDRAM clock, and realizing the high-speed clock by embedding a high-speed PLL, thereby ensuring the requirement of DDR5SDRAM on high speed.

The data transceiver module is used for respectively performing 12-bit parallel-to-serial conversion and IO interface characteristics on 8 write data lines, 1 write data mask line and 1 write data gate line transmitted by the DFI data module during data writing; processing the write data delay information control signal through the write command FIFO module and the write delay unit module of each line, and controlling the delay transmission in the write data period, so that the data output to the SDRAM conforms to the burst standard time sequence of the SDRAM; processing the read data sampling gating signals through 2 write command FIFOs and delay cells during a read operation; correctly acquiring serial data of 8 data pins and 1 bus data inversion pin of the SDRAM according to the sampling gate control signal through an IO interface unit; and simultaneously, serial conversion is carried out on parallel 4-bit data through 2 data odd-even FIFOs, and finally 8 data lines and 1 bus data are output and are reversely transferred to a controller port.

And the configuration module is used for writing the values of the internal registers through the configuration interface of the external slow APB, and configuring a plurality of working modes of the whole PHY according to the values of the registers in other modules.

Further, the configuration module can form a flexible multi-data channel through the configuration module to realize the parallel operation of a plurality of memories, and the specific steps are as follows:

a data transceiver module is used for processing 8 data lines, an internal multi-channel register is set through a configuration module, a plurality of data bit width memories are connected by multiples of 8, the minimum X8 is configured to be multiples of 8, and a configuration device is required to be 4 data transceiver modules and 4 DFI data read-write modules when two memories of X16 are connected; and meanwhile, the distributed data bits of the dfi _ wrdata/the dfi _ rddata are enabled to be connected with the module by configuring the register value of the module, so that flexible parallel operation of a plurality of SDRAM is realized.

Further, the high-throughput and low-delay PHY interface circuit device of the DDR5SDRAM also comprises a control and command transmission unit, wherein the control and command transmission unit mainly comprises a DFI address command module and an address command sending module.

The address command sending module consists of an address command parallel-serial conversion module, an address command IO port module, a local calibration delay unit and an address command control FIFO module; the DFI address command module of the control unit includes two modes of operation: in the DDR5 command truth table, there are single cycle command and double cycle command, respectively; the double-cycle command is two adjacent SDRAM CK cycles, the first cycle is used for sending command operation, and the second cycle is used for sending address values; the arrangement mode of the DFI address command module adopts { P1 high order, P0 high order, P1 low order, P0 low order } to organize 4bit data, at this time, effective commands P0 low order, P1 low order occupy the first SDRAM period, P0 high order, P1 high order occupy the second SDRAM period, and the low order and high order data of P0 and P1 are different.

Following the DFI5.0 protocol, the operation of the DFI address command sent by the controller end, for the DDR5 double-cycle command comprising activation operation, write WRP, mode register write, mode register read, write operation, read operation, the low 14bit of the DFI _ address _ P0/P1 is set as the address command of the first cycle, the high 14bit is stored with the value of the second cycle, and the value of the first cycle and the second cycle can be defined; for single-cycle commands of DDR5, the commands include CA reference voltage commands, refresh all commands, refresh same bank commands, precharge all commands, precharge same bank commands, precharge bank groups, self-refresh entry commands, power saving mode exit commands, no operation/no selection commands and MPC multifunctional commands, the lower 14bit of the dfi _ address _ P0/P1 is set as the numerical value of the commands, and the upper 14bit is set as an invalid value, so that the realization mode of the single-cycle commands, namely the first cycle is valid, the second cycle is invalid, can be realized.

Further, the structure of the control and command transmission unit includes:

firstly, an address command DFI _ address _ P0/P1 following a DFI5.0 protocol passes through a DFI address command module, is divided into a single cycle and a double cycle according to the operation mode of a DDR5 command, different arrangement modes are carried out, 4 paths of output are distributed to an address command sending module according to the number of DDR5SDRAM pins, and the 4 paths of channels respectively correspond to 4 SDRAM pin positions; 4-bit data is subjected to parallel-serial conversion and serial conversion, clock reading is performed after calibration, and then an address command IO port module is used for delay chain optimization, logic conversion of PHY voltage and SDRAM voltage and correct driving strength; the address command control FIFO is mainly used for transmitting delay information and is used for controlling the delay unit so as to control the delay of the address command and ensure the correctness of the time sequence.

Further, the high-throughput and low-delay PHY interface circuit device of the DDR5SDRAM further comprises a data transmission unit, wherein the data transmission unit consists of a DFI data reading and writing module and a data sending module.

The DFI data read-write module is used for controlling a write channel and a read channel; the data write operation includes: the data written in by the DFI firstly form a parallel data line according to every 4 bits, 8 parallel data lines are sent to a DFI data reading and writing module, each parallel data line generates a 4-bit data enable burst mode control signal in the module according to a DFI write data enable signal, a drive strength value of each 4-bit data is set, and finally the 3 groups of data are combined into a 12-bit data line; is sent to a data sending module to execute parallel-serial conversion into serial output by using an internal high-speed clock; the data enable burst mode control signal is used for obtaining burst data operation by controlling enable signals of different data at an IO port, and the driving strength value of the data is used for controlling the driving of pull-up and pull-down resistors of the data at the IO port so as to improve the integrity of the signal; meanwhile, the local calibration delay unit is used for carrying out optimal delay calibration output on the write data so as to ensure that the SDRAM time sequence requirement is met.

The data read operation includes: in the process from the SDRAM read operation to the DFI read data output, firstly, generating an SDRAM read data sampling gating signal by a DFI read data enable signal in a DFI data read-write module; DQS and DQS _ N data gating signals output by the SDRAM are used as a double-edge sampling clock in an IO port sampling circuit, and serial data read by the double-edge SDRAM is read when the data sampling gating signals are read and started; converting 1-bit serial data into 4-bit parallel data through an odd serial-to-parallel FIFO (first in first out) module and an even serial-to-parallel FIFO module, sending the 4-bit parallel data to a frequency ratio module, and finally sending the 4-bit parallel data to a controller end through a DFI (distributed feedback interface) protocol; and meanwhile, the read data is ensured to be correctly sampled through the data de-skew and local calibration delay unit.

Further, the hardware structure of the data transmission unit includes:

an address command conforming to a DFI5.0 protocol firstly passes through a DFI data read-write module, the write channel module processes write data such as DFI _ wrdata _ P0/P1, the write data are distributed into 8 data lines, the bit width of each data line is 4 bits, and corresponding burst and clock control signals are generated according to the DFI _ wrdata _ en/cs. The read channel module is mainly used for generating a read data sampling gating signal, can ensure that data is read when an SDRAMIO interface is read, simultaneously transmits the dfi _ rddata _ P0/P1 read data, and generates a corresponding handshake signal which comprises the dfi _ rddata _ valid and the like. And the next data sending module sends the write data to an SDRAM pin after parallel-serial conversion and delay calibration, the read data can be correctly sampled by a sampler, 4-bit read data is obtained through data deskew and odd-even serial conversion and FIFO, and the read data is transmitted back to the front DFI data reading and writing module and finally output to a controller end.

Further, the data transmission unit supports various operations of DDR5, and has frequency ratio and variable frequency operation for PHY, including:

firstly, the conditions of a controller end 1:2 and an internal fixed end 1:2 are not required to be converted, and seamless transmission is directly carried out; for the controller terminal 1:1 and the internal fixed 1:2, where the DFI clock frequency and the SDRAM frequency are the same and 1 cycle transmits one data of P0 phase, if it is to be converted into the internal fixed 1:2 mode, the data of the controller terminal is maintained for two clock cycles, corresponding to one clock of the 1:2 mode, and the two data of the two cycles of the 1:1 mode are respectively placed to the P0 and P1 phases of the 1:2 mode; for the controller terminal 1:4 and the internal fixed 1:2, the DFI clock frequency and the SDRAM frequency are a ratio of 1:4 and 1 cycle sends data of one P0, P1, P2, P3 phase, if to be converted to the internal fixed 1:2 mode, 1 cycle of the controller terminal data corresponds to 2 clocks of the 1:2 mode, and 4 phase data of the 1:4 mode are placed to the P0, P1 phase of the first cycle and the P0, P1 phase of the second cycle of the 1:2 mode, respectively.

When the read-write conversion of the external 1:1 and the internal 1:2 is carried out, 4 registers and a control counter are needed, firstly, external dfi _ wrdata _ P0 is registered to an internal P0 channel at the first DfiCtrClk under the control of the counter, the second period is registered to an internal P1 channel, and the two registers at the back simultaneously register the temporarily stored data for one beat at the same time to be dfi _ wrdata _ internal _ P0/P1; the reading channel is relatively simple, and the internal P0 and the P1 are respectively set as the external P0 in two adjacent cycles under the counting controller through a port register;

when the external frequency ratio is 1:4 and the internal frequency ratio is 1:2, the write channel needs to select dfi _ wrdata _ P0/P2 to be the internal P0 and the dfi _ wrdata _ P1/P3 to be the internal P1; the conversion of the read data channels respectively needs 8 registers, the internal first DfiClk period sets the internal P0 to the external P0, the internal P1 to the P1, the internal P0 to the external P2 next beat, the internal P1 to the external P3, and finally the external P0, P1, P2 and P3 are simultaneously registered and output in one DfiCtrClk under the control of a counter.

For the PHY internal usage, 4 clocks are used: the controller inputs DFI clock DFICtrlClk, internal fixed 2:1 mode DFI clock DfiClk, PLL high speed clock pclk and SDRAM clock CK, and the relationship is as follows: the internal fixed 2:1 mode DFI clock is 1/4 times of PLL high-speed clock, the SDRAM clock is 1/2 times of PLL high-speed clock, the controller input DFI clock is defined according to frequency ratio; the internal high-speed clock is generated by taking a DFI clock in an internal fixed frequency ratio 1:2 mode as a reference, multiplying the frequency by 4 times through a high-speed clock PLL module, and is used for an address command sending module and a data sending module to realize parallel-serial and serial-parallel conversion and generate an SDRAM clock.

Further, the high throughput, low latency PHY interface circuit arrangement of the DDR5SDRAM further includes an initialization and training unit, the initialization and training unit including:

and the DDR5 initialization unit is used for generating an initialization sequence which is in accordance with the DDR5SDRAM standard protocol JESD79-5, and ensuring that the memory can carry out a normal transaction mode.

The training unit is used for training and calibrating the system and comprises a 1D training mode and a 2D training mode; the 1D training is for delay optimization at 1 voltage provided by DFI, by training firmware to adjust local calibration delay units of each clock, command, address, data path to compensate for delays including board level and DRAM delays, etc.; the 2D training is directed to the overall read and write test for each pair of voltage and delay after the 1D training, calculating the area through which the operation passes to form a 2D eye, and analyzing its margin to optimize the optimum voltage and delay points.

The initialization of the whole PHY is to combine SDRAM initialization and training calibration initialization into a flow, the initialization process is executed by PHY initialization firmware, and the hardware architecture of the DDR5PHY training calibration comprises:

firstly, a special training state machine block is used for controlling the whole training process and comprises two parts of training core initialization, and data of the special training state machine block is interacted with a control register block to control other modules; training data is sent to a DFI address command module and a DFI data read-write module through a multiplexer to be switched into a normal access function of a training core; during the write training, random write data is generated by training write data generation and is sent to the read data comparison module during the read training, and the command module generates a combination of corresponding read-write command operations to reflect real operations to adjust the optimal delay.

Another object of the present invention is to provide a method for controlling a DDR5SDRAM high throughput and low latency PHY interface circuit device using the DDR5SDRAM high throughput and low latency PHY interface circuit device, wherein the method for controlling a DDR5SDRAM high throughput and low latency PHY interface circuit device comprises the steps of:

after the device is started, initializing DDR5 by an initialization training calibration module and setting delay of each path through training so as to achieve an optimal working state;

step two, during normal work, the DFI5.0 signal of the on-chip controller firstly processes the interconversion of systems with different frequency ratios by the control, address and data of the frequency ratio module; the address and control signals are processed by the DFI address command module and the address command sending module to form a control address time sequence conforming to DDR 5;

and thirdly, data transmission is performed between the DFI data reading and writing module and the data sending module, the DFI data reading and writing module and the DDR5 particle reading and writing module are divided into a reading and writing channel to perform high-efficiency and DDR5 particle reading and writing operation after processing, and high-speed, high-throughput rate and low-delay high-efficiency transmission between the on-chip DDR controller and the SDRAM are achieved.

It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:

after the device is started, initializing DDR5 by an initialization training calibration module and setting the delay of each path through training so as to achieve an optimal working state; during normal operation, the DFI5.0 signal of the on-chip controller processes the interconversion of systems with different frequency ratios by a frequency ratio module for control, address and data; the address and control signals are processed by the DFI address command module and the address command sending module to form a control address time sequence conforming to DDR 5; the data transmission is performed in the DFI data read-write module and the data transceiver module, and is divided into read-write channels which are processed to perform high-efficiency DDR5 particle read-write operation, so that high-speed, high-throughput and low-delay high-efficiency transmission between the on-chip DDR controller and the SDRAM is realized.

Another object of the present invention is to provide an information data processing terminal for realizing a high-throughput, low-delay PHY interface circuit device of the DDR5 SDRAM.

By combining all the technical schemes, the invention has the advantages and positive effects that: in order to realize high-speed, high-throughput and low-delay efficient transmission between a DDR controller and an SDRAM (synchronous dynamic random access memory) on a chip in the field of servers, the invention provides a high-speed PHY (physical layer) interface circuit device capable of connecting DDR5 memory particles and a controller, and the high-speed PHY interface circuit device can efficiently support multiple working modes of a DDR5 SDRAM.

The invention provides a concrete implementation mode of a DDR5PHY device through a specific hardware circuit technology, and realizes the connection between a standard DFI5.0 controller and DDR5 particles with high throughput rate and low delay. In order to realize various command operations supporting DDR5 standard protocol, the invention designs a single-cycle command mode and a double-cycle command mode in a control unit, and can automatically switch transmission access according to DFI5.0 protocol; the address transmission unit carries out 4BIT parallel-to-serial conversion on the address commands, so that each internal high-speed clock pclk period can control a half period of an SDRAM clock, and the difference of the address command values of two adjacent periods is ensured.

The data transmission unit realizes multiple data read-write working modes, controls burst operation at an IO port by using a data enabling signal during transmission, and supports burst BC8, BL16, BL32 and continuous burst operation to access DDR5 particles; the data transmission unit generates the sampling of the read data according to the DFI read signal, the read data of DDR5 is correctly sampled by using a sampling gate signal, and the time sequence of the read data is optimized through data de-skew; the frequency ratio module realizes the support of various working frequencies, and for different frequency systems, the register, the multiplexer and the control counter are used for realizing the high-efficiency conversion of the frequency ratio 1:1, 1:4 and 1:2 modes, thereby ensuring the correctness of the time sequences of the different frequency modes.

In order to support the requirement of high data rate of SDRAM, the invention generates higher internal clock by internal 4-frequency-doubled high-speed PLL module to satisfy the frequency of upper edge operation and data double edge operation of address command and generate SDRAM clock. In order to meet the requirement of high bandwidth, the register module is conveniently configured through the APB interface, the number of the address command sending module and the data receiving and sending module is controlled according to the bit width requirement, and the function of connecting a plurality of SDRAMs to realize a plurality of PHYs is supported; the IO processing unit of data, the signal characteristic optimization at the IO interface has: the delay chain, the voltage-domain-crossing logic conversion of VDD and VDDQ, the adjustment of pull-up and pull-down resistors and the correct control of driving strength are optimized, and the signal integrity of the high-speed transmission PHY is improved. The training unit of the present invention includes 1D training and 2D training, and starts adjusting the parameters of the delay unit of each path during initialization, so that the PHY achieves optimal access.

The invention provides a high-performance DDR5PHY interface with low throughput rate and low delay for server scene application, which can support various command operations and high-frequency conversion of DDR5, and realizes the conversion from DFI5.0 address/command to address/command transmission conforming to DDR5SDRAM standard protocol through a DFI address command module and a plurality of address command sending modules; mutual transmission of the DFI data read-write operation and the DDR5SDRAM data read-write operation is realized through the DFI data read-write module and the data transceiving modules; the frequency ratio conversion module can realize the interconversion of operations with the DFI frequency ratio of 1:1, 1:4 and 1:2 at the controller end and an internal fixed 1:2 mode, a high-speed clock PLL module is used for generating a high-speed clock with the internal frequency 4 times that of the DFI clock, and the relationship of a plurality of internal clocks of the PHY is managed; the training calibration module is initialized to generate an initialization sequence conforming to DDR4/DDR 5SDRAM standard specifications and to optimize the optimal delay for each data, command, address through training.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a control method for a high-throughput, low-latency PHY interface circuit apparatus for DDR5SDRAM according to an embodiment of the present invention.

FIG. 2 is a block diagram of a high throughput, low latency PHY interface circuit arrangement for a DDR5SDRAM according to an embodiment of the present invention;

in the figure: 1. a frequency ratio conversion module; 2. a DFI address command module; 3. initializing a training calibration module; 4. a DFI data read-write module; 5. an address command transmitting module; 6. a high-speed clock PLL module; 7. a data transceiver module; 8. and configuring the module.

Fig. 3 is a block diagram of the top-level structure of a DDR5PHY according to an embodiment of the present invention.

Fig. 4 is a block diagram of a control and command transmission unit according to an embodiment of the present invention.

Fig. 5 is a block diagram of a data transmission unit according to an embodiment of the present invention.

Fig. 6 is a schematic diagram of the conversion of the frequency ratio 1:1 provided by the embodiment of the present invention.

Fig. 7 is a schematic diagram of the conversion of the frequency ratio 1:4 provided by the embodiment of the present invention.

Fig. 8 is a schematic diagram of an initialization and training unit architecture according to an embodiment of the present invention.

FIG. 9 is a block diagram of a PHY subsystem architecture for an 8G DDR5SDRAM connected to X8 according to an embodiment of the present invention.

FIG. 10 is a flow chart of an address command channel according to an embodiment of the invention.

Fig. 11 is a flowchart of an initialization training calibration according to an embodiment of the present invention.

Fig. 12 is a flow chart of data write channel transmission according to an embodiment of the present invention.

FIG. 13 is a flow chart of data read channel transmission according to an embodiment of the present invention.

Fig. 14 is an implementation diagram of a DDR5 write operation double cycle command provided by an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In view of the problems of the prior art, the present invention provides a high throughput, low latency PHY interface circuit arrangement for DDR5SDRAM, which is described in detail below with reference to the accompanying drawings.

As shown in fig. 1, the method for controlling the high-throughput and low-latency PHY interface circuit apparatus of DDR5SDRAM according to the embodiment of the present invention includes the following steps:

s101, after the device is started, initializing DDR5 by an initialization training calibration module and setting delay of each path through training so as to achieve an optimal working state;

s102, during normal work, the DFI5.0 signal of the on-chip controller firstly processes the interconversion of systems with different frequency ratios by the control, address and data of the frequency ratio module; the address and control signals are processed by the DFI address command module and the address command sending module to form a control address time sequence conforming to DDR 5;

and S103, data is transmitted in the DFI data reading and writing module and the data transceiving module, and is divided into a reading and writing channel for high-efficiency DDR5 particle reading and writing operation after being processed, so that high-speed, high-throughput and low-delay high-efficiency transmission between the on-chip DDR controller and the SDRAM is realized.

As shown in fig. 2, the high-throughput and low-latency PHY interface circuit apparatus for DDR5SDRAM according to an embodiment of the present invention includes a frequency ratio conversion module 1, a DFI address command module 2, an initialization training calibration module 3, a DFI data read/write module 4, an address command transmission module 5, a high-speed clock PLL module 6, a data transceiver module 7, and a configuration module 8.

The technical solution of the present invention will be further described with reference to the following examples.

1. In order to realize high-speed, high-throughput and low-delay efficient transmission between a DDR controller and an SDRAM (synchronous dynamic random access memory) on a chip in the field of servers, the invention provides a high-speed PHY (physical layer) interface circuit device capable of connecting DDR5 memory particles and a controller, and the high-speed PHY interface circuit device can efficiently support multiple working modes of a DDR5 SDRAM.

The PHY interface circuit device with high throughput rate and low delay of the DDR5SDRAM comprises a frequency ratio conversion module 1, a DFI address command module 2, an initialization training calibration module 3, a DFI data read-write module 4, an address command sending module 5, a high-speed clock PLL module 6, a data receiving and transmitting module 7 and a configuration module 8. Fig. 3 is a block diagram of a DDR5PHY interface circuit implemented in accordance with the present invention. After the device is started, initializing a DDR5 and setting the delay of each path through training by an initialization training calibration module so as to achieve an optimal working state; during normal operation, the DFI5.0 signal of the on-chip controller processes the interconversion of systems with different frequency ratios by a frequency ratio module for control, address and data; the address and control signals are processed by the DFI address command module and the address command sending module to form a control address time sequence conforming to DDR 5; and the data transmission is performed between the DFI data read-write module and the data transceiver module, and is divided into read-write channels which are processed to perform high-efficiency DDR5 particle read-write operation, so that high-speed, high-throughput and low-delay high-efficiency transmission between the on-chip DDR controller and the SDRAM is realized.

The frequency ratio conversion module 1, the DFI address command module 2 and the address command sending module 5 are electrically connected in sequence; the frequency ratio conversion module 1, the DFI data read-write module 4 and the data transceiver module 7 are electrically connected in sequence; the SDRAM initialization module 3 is respectively electrically connected with the DFI address command module 4 and the DFI data read-write module 2; the high-speed clock PLL module 6 is respectively and electrically connected with the address command sending module 5 and the data transceiving module 7; the configuration module 8 is electrically connected to the modules 1 to 7, respectively.

The frequency ratio conversion module 1 is used for processing conversion operations of different frequency ratio modes of DFI addresses/commands and data at a controller end, and comprises the steps of converting the DFI addresses and the commands of 1:1, 1:2 and 1:4 modes into internally fixed 1:2 mode DFI addresses and commands; converting the DFI write data of 1:1, 1:2 and 1:4 modes into internally fixed 1:2 mode DFI write data; the internal fixed 1:2 mode DFI read data is converted to 1:1, 1:2, 1:4 mode DFI read data so that the fixed 1:2 mode to SDRAM transfer process is handled internally by the PHY.

The DFI address command module 2 is used for organizing and arranging different phase numbers of DFI addresses and command signals into 4-bit data lines, and the organizing and arranging modes comprise DDR5 single-cycle commands and DDR5 double-cycle commands; distributing every 4 lines to output to 1 address command sending module according to the number of addresses and command pins needing to be processed by the SDRAM; and simultaneously generating a delay unit control signal, a command/data sending clock, a command/data sending initial enabling control signal, a command FIFO reading initial enabling signal and a data transceiving module control clock so as to control the transmission process of the address command.

The initialization training calibration module 3 is used for generating an initialization sequence which accords with DDR5SDRAM standard specifications and sending the initialization sequence to each address command sending module and each data receiving and sending module to initialize the SDRAM so as to ensure that a normal transaction mode can correctly access SDRAM transactions; and simultaneously, carrying out delay training on each address, command and data path during initialization to obtain the optimal sampling center delay, and adjusting a delay calibration unit of each path to calibrate the time sequence deviation of the clock signal and the data strobe so as to ensure that data can be mutually sampled.

The DFI data reading and writing module 4 is used for arranging and distributing DFI write data and write data masks into a plurality of groups according to the configuration of the SDRAM data lines as X8 and X16 and the condition of whether a plurality of SDRAMs are connected or not during the write operation, wherein one group comprises 8 write data lines and 1 data mask, and each line is 4 bits; meanwhile, arranging the data writing enable signals into different SDRAM burst modes in a DFI data module according to DFI, setting different driving strength values of each data signal, and finally loading the two signals to a 4-bit data line to expand the two signals into 12-bit data line for output; generating 1 write data strobe signal line of 12bit according to the write data enable signal; generating a write data latency information control signal, a write data transmission clock and initial enable signal, a write command FIFO clock and an initial enable signal; the system is used for transmitting 8 data lines and 1 bus data inversion during a read operation, generating effective read data signals, arranging and outputting the read data to a frequency ratio module, and finally sending DFI read data to a controller; and simultaneously, generating a read data sampling gating signal, a read data clock and initial enabling signal, a read command FIFO clock and an initial enabling signal according to the DFI read data enabling signal in the read command period so that the address command transmitting module can correctly sample SDRAM read data.

The address command sending module 5 is used for processing 4 lines and 4-bit addresses and commands of each line transmitted by the DFI address command module, converting 4-bit parallel to serial and IO interface characteristics of each line, and finally outputting the addresses and commands to address and command pins of the SDRAM; meanwhile, the delay information control signal is processed through the command FIFO module and the delay unit module, so that the correctness of the address/command time sequence output to the SDRAM is finally ensured.

The high-speed clock PLL module 6 is used for generating a 4-frequency-multiplied high-speed clock by using the DFI clock as a reference clock, outputting the 4-frequency-multiplied high-speed clock to the address command sending module and the data receiving and sending module to realize serial-parallel conversion operation and generate an SDRAM clock, and particularly, the high-speed clock is realized by embedding a high-speed PLL, so that the requirement of DDR5SDRAM on high speed is met.

The data transceiver module 7 is used for respectively converting 12-bit parallel to serial and IO interface characteristics to 8 write-in data lines, 1 write-in data mask line and 1 write-in data gate line transmitted by the DFI data module during data writing; meanwhile, through the write command FIFO module and the write delay unit module of each line, the write data delay information control signal is processed, and the delay transmission in the write data period is controlled, so that the data output to the SDRAM conforms to the burst standard time sequence of the SDRAM; processing the read data sampling gating signals through 2 write command FIFOs and delay cells during a read operation; through an IO interface unit, correctly collecting serial data of 8 data pins and 1 bus data inversion pin of the SDRAM according to the sampling gate control signal; and performing serial conversion on the parallel 4-bit data through 2 data odd-even FIFOs, and finally performing data inversion to a controller port for outputting 8 data lines and 1 bus data.

The configuration module 8 may write the values of the internal registers through the configuration interface of the external slow APB, and may configure various operation modes of the entire PHY according to the values of the registers in other modules.

1.1 control and Command transfer Unit

The control and command transmission unit mainly comprises a DFI address command module and an address command sending module, wherein the address command sending module consists of an address command parallel-serial conversion module, an address command IO port module, a local calibration delay unit and an address command control FIFO module. In order to realize various operation modes supporting DDR5, the DFI address command module of the control unit comprises two operation modes: in the DDR5 command truth table, there are single cycle commands and double cycle commands, respectively. The double-cycle command is specifically two adjacent SDRAMCK cycles, the first cycle is used for sending command operation, the second cycle is used for sending address values, in order to meet the DDR5 double-cycle command, 4-bit data are organized in an arrangement mode of a DFI address command module by adopting { P1 high bit, P0 high bit, P1 low bit and P0 low bit }, at the moment, effective commands P0 low bit, P1 low bit occupy the first SDRAM cycle, P0 high bit and P1 high bit occupy the second SDRAM cycle, and the low bit and the high bit of P0 and P1 are different. Following the DFI5.0 protocol, the operation of the DFI address command sent by the controller terminal includes an active operation, a write WRP, a mode register write, a mode register read, a write operation, a read operation, for the DDR5 two-cycle command, the address command of the lower 14bit of the DFI address P0/P1 needs to be set as the first cycle, the value of the upper 14bit is stored as the second cycle, thus ensuring that the values of the first cycle and the second cycle are definable, for the DDR5 single-cycle command, including the CA reference voltage command, refresh all, refresh the same bank, precharge all, precharge the same bank, precharge bank group, self-refresh entry, power saving mode exit, no operation/no selection, MPC multifunction command, only the lower 14bit of the DFI address P0/P1 needs to be set as the value of the command, and the upper 14bit is set as an invalid value, for example, if the signal low level indicates valid, the invalid state is high level, so that for the single-cycle command, the dfi _ address _ P0/P1 stores low level (valid) for low 14bit and high level (invalid) for high 14bit, thus realizing the realization of the valid first cycle and invalid second cycle, namely the realization mode of the single-cycle command.

As shown in fig. 4, for the control and command unit structure, first, address commands DFI _ address _ P0/P1 following the DFI5.0 protocol pass through the DFI address command module, and are divided into a single cycle and a double cycle according to the operating mode of the DDR5 command, and different arrangement modes are performed, and every 4 channels are allocated according to the number of DDR5SDRAM pins and output to an address command transmitting module, and the 4 channels correspond to 4 SDRAM pin bits respectively. And then, 4-bit data is subjected to parallel-serial conversion and serial-parallel conversion, a clock is read after calibration, and then an address command IO port module is subjected to delay chain optimization, logic conversion of PHY voltage and SDRAM voltage, and correct driving strength. The address command control FIFO is mainly used for transmitting delay information and is used for controlling the delay unit so as to control the delay of the address command and ensure the correctness of the time sequence.

1.2 data transfer Unit

The data transmission unit consists of a DFI data read-write module and a data transceiving module, wherein the DFI data read-write module is used for controlling a write channel and a read channel, and the data write operation is specifically that data written by DFI firstly form a parallel data line according to every 4 bits; the 8 parallel data lines are sent to a DFI data read-write module, each parallel data line generates a 4-bit data enable burst mode control signal according to a DFI write-in data enable signal in the module, a drive intensity value of each 4-bit data is set, and finally the 3 groups of data are combined into a 12-bit data line; the data is sent to a data transceiver module to execute parallel-serial conversion into serial output by using an internal high-speed clock, wherein a data enable burst mode control signal is used for obtaining burst data operation by controlling enable signals of different data at an IO port, and a driving strength value of the data is used for controlling the driving of pull-up and pull-down resistors of the data at the IO port so as to improve the integrity of the signals; meanwhile, the local calibration delay unit is used for carrying out optimal delay calibration output on the write data so as to ensure that the SDRAM time sequence requirement is met.

The data reading operation is specifically that in the process from the SDRAM reading operation to the DFI read data output, firstly, an SDRAM read data sampling gating signal is generated by a DFI read data read-write module through a DFI read data enabling signal; DQS and DQS _ N data gating signals output by the SDRAM are used as a double-edge sampling clock in an IO port sampling circuit, and serial data read by the double-edge SDRAM is read when the data sampling gating signals are read and started; converting 1-bit serial data into 4-bit parallel data through an odd serial-to-parallel FIFO (first in first out) module and an even serial-to-parallel FIFO module, sending the 4-bit parallel data to a frequency ratio module, and finally sending the 4-bit parallel data to a controller end through a DFI (distributed feedback interface) protocol; and meanwhile, the read data is ensured to be correctly sampled through the data de-skew and local calibration delay unit.

As shown in fig. 5, which is a hardware structure of the data transmission unit, an address command conforming to the DFI5.0 protocol first passes through the DFI data read/write module, the write channel module processes write data such as DFI _ wrdata _ P0/P1, and the write data is distributed into 8 data lines, each data line has a bit width of 4 bits, and generates corresponding burst and clock control signals according to the DFI _ wrdata _ en/cs. The read channel module is mainly used for generating a read data sampling gating signal, can ensure that data is read when an SDRAMIO interface is read, simultaneously transmits the dfi _ rddata _ P0/P1 read data, and generates a corresponding handshake signal which comprises the dfi _ rddata _ valid and the like. And the next data transceiver module, the write data is transmitted to the SDRAM pin after parallel-serial conversion and delay calibration, and the read data can be correctly sampled by the sampler, and the read data read by 4 bits is obtained through data deskew and odd-even serial conversion and FIFO, is transmitted back to the front DFI data read-write module, and is finally output to the controller terminal.

Various operations of DDR5 are supported, frequency ratio and variable frequency operation are required for PHY, firstly, conversion is not needed for the case of 1:2 of a controller end and 1:2 of an internal fixed end, and seamless transmission is directly carried out; for the controller terminal 1:1 and the internal fixed 1:2, where the DFI clock frequency and the SDRAM frequency are the same and 1 cycle transmits one data of P0 phase, if it is to be converted into the internal fixed 1:2 mode, it is required that the data of the controller terminal is maintained for two clock cycles, exactly corresponding to one clock of the 1:2 mode, and the two data of the two cycles of the 1:1 mode are respectively placed to the P0 and P1 phases of the 1:2 mode. For the controller side 1:4 and the internal fixed 1:2, when the DFI clock frequency and the SDRAM frequency are at a ratio of 1:4 and 1 cycle sends data of one P0, P1, P2, P3 phase, so if to convert to the internal fixed 1:2 mode, 1 cycle of the controller side data exactly corresponds to 2 clocks of the 1:2 mode, and 4 phase data of the 1:4 mode are placed to the P0, P1 phase of the first cycle and the P0, P1 phase of the second cycle of the 1:2 mode, respectively.

As shown in fig. 6, the external 1:1 and internal 1:2 read-write conversion needs 4 registers and a control counter, the external dfi _ wrdata _ P0 is registered in the internal P0 channel at the first dfi ctrl clk under the control of the counter, the second cycle is registered in the internal P1 channel, and the two registers are used to simultaneously register the buffered data as dfi _ wrdata _ internal _ P0/P1; the read channel is relatively simple, and the internal P0 and the P1 are respectively set as the external P0 in two adjacent cycles under the counting controller through a port register. Conversion of the external frequency ratio 1:4 to internal 1:2 As shown in FIG. 7, the write channel needs to select dfi _ wrdata _ P0/P2 to internal P0, and dfi _ wrdata _ P1/P3 to P1; the conversion of the read data channels respectively needs 8 registers, the internal first DfiClk period sets the internal P0 to the external P0, the internal P1 to the P1, the internal P0 to the external P2 next beat, the internal P1 to the external P3, and finally the external P0, P1, P2 and P3 are simultaneously registered and output in one DfiCtrClk under the control of a counter.

For the PHY internal usage, 4 clocks are used: the controller inputs a DFI clock DFICtrClk, an internal fixed 2:1 mode DFI clock DfiClk, a PLL high-speed clock pclk and an SDRAM clock CK. Their relationship is: the internal fixed 2:1 mode DFI clock is 1/4 times the PLL high speed clock, the SDRAM clock is 1/2 times the PLL high speed clock, and the controller input DFI clock is defined by a frequency ratio. The internal high-speed clock is generated by taking a DFI clock in an internal fixed frequency ratio 1:2 mode as a reference, multiplying the frequency by 4 times through a high-speed clock PLL module, and is used for an address command sending module and a data receiving and sending module to realize parallel-serial conversion, serial-parallel conversion and SDRAM clock generation.

1.3 initialization and training Unit

Due to the application scenario of PHY, there may be several aspects: the speed performance of the controller or DDR5 particles connected with different manufacturers is different; when the PHY connects multiple SDRAMs, a daisy chain topology is used, which reduces the number of pins compared to a star topology, but the address/control lines arrive at each granule at different times; the end-to-end time of signals is different due to differences of connection defects and the like of the actual board level, so that the whole DDR5PHY cannot work in the optimal state, even the read-write error occurs. Therefore, the high-performance DDR5PHY can continuously train at different frequencies and voltages after initialization is completed according to actual conditions to set each address/data pin line to an actual optimal state, so that a 2D eye pattern for data reading and writing can be sampled in the middle.

And the DDR5 initialization unit generates an initialization sequence which is in accordance with the DDR5SDRAM standard protocol JESD79-5 so as to ensure that the memory can carry out a normal transaction mode.

The training unit is used for training and calibrating the training device and comprises a 1D (one-dimensional) training mode and a 2D (two-dimensional) training mode; the 1D training is for delay optimization at 1 voltage provided by DFI, by training firmware to adjust the local calibration delay unit of each clock, command, address, data path to compensate for delays including board level and DRAM delays, etc. The 2D training is directed to the overall read and write test for each pair of voltage and delay after the 1D training, calculating the area through which the operation passes to form a 2D eye, and analyzing its margin to optimize the optimum voltage and delay points. The initialization of the whole PHY is to combine the SDRAM initialization and the training calibration initialization into one flow, and the initialization process is executed by the PHY initialization firmware. As shown in FIG. 8, the DDR5PHY training calibration hardware architecture, a dedicated training state machine block is first used to control the whole training process, and includes two major parts of training core initialization, the data of which interacts with a control register block to control other modules. The training data is sent to the DFI address command module and the DFI data read-write module through the multiplexer so as to be switched into the normal access function of the training core. During the write training, the training write data generation will generate random write data, which is sent to the read data comparison module during the read training, and the command module will generate a combination of corresponding read and write command operations, which can reflect the actual operations to adjust the optimal delay.

2. Detailed description of the preferred embodiments

For better understanding of the present invention, in conjunction with the following embodiments, as shown in FIG. 9, DDR5 is connected to 8G DDR5SDRAM configured to be X8 bits wide, because the address command channels have CK _ t, CK _ c, CS _ n, CA [13:0], RESET _ n, TDSQ _ t, TDQS _ c, ALERT _ n, TEN, MIR, CAI, CA _ ODT, LBDQ, LBDQS, ZQ,

the total number of the address command sending modules is 28 bits, so 8 address command sending modules need to be configured, and one extra address module is used for a voltage pin of the SDRAM, because the data is 8 bits wide, 1 data reading and sending module and 1 DFI data reading and writing module need to be configured, the DDR5 data channels are DQ [7:0], DM _ n, DQS _ t and DQS _ c, the corresponding DFI signals of the controller end, the read-write data DFI _ wrdata _ P0/P1, and the dif _ rddata _ P0/P1 are all 16 bits, and the invention is described in detail below.

The processing flow steps of the DDR5PHY control channel are as follows:

step 1: the address and command of the external DFI input is in 1:1, 1:2 or 1:4 mode and the frequency ratio indication signal, the clock of the input is the dfictrlck clock at the controller terminal.

Step 2: the manner of frequency ratio conversion is selected based on the frequency ratio indicator signal, where the 1:1 and 1:4 modes require conversion, and the internal fixed 1:2 ratio clock is DfiClk.

And step 3: in the frequency ratio conversion module, the input DFI signal is converted into an internal fixed 1:2 mode, and the internal fixed DFI clock ratio is 2 times faster than the DFI input control clock for the 1:1 mode, so that the input DFI signal is converted into 2 bits in serial-to-parallel mode, namely, the 1 phase P0 is converted into 2 phases P0 and P1; for the 1:2 mode, conversion is not needed, and seamless transmission is directly carried out; for the 1:4 mode, the DFI input control clock is 2 times slower than the internal fixed DFI clock ratio, so the input DFI signal performs parallel 4-bit to serial 2-bit conversion, i.e., 4-phase P0, P1, P2, P3 to 2-phase P0, P1;

and 4, step 4: a fixed internal 1:2 pattern is obtained after the frequency ratio conversion, when all the phases associated with the frequency ratio are converted to the DfiClk start domain and the signal is valid only for phase P0 and phase P1.

And 5: according to the configuration of PHY, it is determined which mode needs to be operated, in order to be compatible with DDR5 operation, there are 3 operation modes that can be selected in the present invention, and the weekly operation mode corresponds to different address arrangement modes.

Step 6: the DFI address command module 2 organizes different phase numbers of DFI addresses and command signals into 4-bit data lines in a 2N speed-down mode arrangement mode of { P1, P1, P0 and P0}, a DDR5 single-cycle command mode arrangement mode of { default value, P1 and P0}, and a DDR5 double-cycle command mode arrangement mode of { P1 high, P0 high, P1 low and P0 low }, so that multiple working modes of DDR5SDRAM can be supported.

And 7: and performing the arranged address commands to form 4-bit parallel data, wherein each sending module is provided with 4 paths of parallel data and IO port enabling signals corresponding to each address command.

And 8: the address command parallel-serial conversion module 8 uses the calibrated clock to perform parallel-serial output on the 4-bit data lines transmitted by the DIF address command module 2.

And step 9: the IO port block 9 is instructed at the address to handle the input/output port characteristics, and a delay path is facilitated by breaking the delay chain through a register.

Step 10: implementing the transfer of address commands from the VDD voltage domain to VDDQ across logic levels facilitates low power management.

Step 11: calibrating the output impedance using adjustment of the pull-up and pull-down resistors;

step 12: so that the address command is ultimately sent out to the SDRAM pins while the output for the bi-directional port is the send command and the input is for sampling internal data at the time of the loopback test.

The address command channel flow is shown in fig. 10, and the initial training calibration flow is shown in fig. 11.

After the operation of sending a write command to the SDRAM and after the CWL write delay, write data needs to be provided, the DFI write data is converted into the write data according with the SDRAM time sequence through the PHY, and the processing flow of the data write channel comprises the following steps:

step 1: the externally input DFI write data signal comprises data, a data mask, a write data selection and a write data enable, wherein the data at the moment can be in different frequency ratios, and the input clock is DfiCtrLk.

Step 2: the frequency ratio mode is judged to decide which switching operation is to be performed.

And step 3: a frequency ratio module, which uses serial-to-parallel operation if the frequency ratio is 1:1, does not convert if the frequency ratio is 1:2, and converts the serial-to-parallel operation into a fixed 1:2 mode if the frequency ratio is 1: 4;

and 4, step 4: a fixed 1:2 pattern after frequency ratio conversion is obtained when the write data is in the DfiClk clock domain and only phase P0 and phase P1 are valid.

And 5: one period of dfi _ wrdata _ P0 and dfi _ wrdata _ P1 is divided into 8 data line groups, and each data line has 4 bits;

step 6: these 8 data lines are sent to TxDataLn 0[3:0] through TxDataLn 7[3:0] respectively, plus a data mask bit TxDatLn7[3:0], the TxCmdReqWrDest data strobe signal is dfi _ wrdata _ cs, TxCmdReqWr data enable signal.

And 7: a 4-bit burst data enable signal is generated based on the TxCmdReqWr data enable signal to control the enable per data bit, and also to generate a 4-bit drive value.

And 8: such three types of 4-bit data are combined into 12-bit data TxDatEnWk _ ln0[11:0] to TxDatEnWk _ ln8[11:0 ]; the TxCmdReqWrDest data strobe signal is also used to generate the data strobe DQS signal TxDatEnWk _ ln9[11:0 ].

And step 9: the 10 data lines are sent to the data transceiver module 7 for further processing;

step 10: the data receiving and transmitting module processes 10 data lines, a clock after delay calibration is used as a clock for reading and converting FIFO, 12 bits are converted into 3 bits, and the data are respectively TxDqVal _ ln0 data, TxDqEn _ ln0 enable data and TxDqWk _ ln0 driving values.

Step 11: then the data is sent to an IO port processing module, firstly, through a register, an enabling signal is TxDeqEn _ ln0, and data is TxDeqVal _ ln0 to perform delay chain breaking so as to optimize time sequence and meet the establishment and retention time;

step 12: then, the burst type is controlled to be 4, 8 and continuous burst by the write data clock as an enabling signal;

step 13: because the voltage domain of the SDRAM is from VSSQ to VDDQ, before the SDRAM is output to a pin, the logic conversion from the VDD voltage domain to the VDDQ voltage domain is also needed, and the driving strength is ensured to be correct;

step 14: and finally, adjusting the pull-up resistor and the pull-down resistor, taking 120 ohms of the pull-up resistor and the pull-down resistor as a reference, and finally enabling the signal output impedance to be 60 ohms, thereby effectively reducing the signal transmission reflection effect and improving the signal integrity.

After sending read operation to SDRAM, after RL read delay time, data appears at SDRAM port, PHY can accurately sample data, and finally converts into data meeting DFI protocol, and can send to controller end, data read channel's processing flow step:

step 1: according to SDRAM read delay, external read DFI signals DFI _ rddata _ en _ P0/P1/P2/P3, DFI _ rddata _ cs _ P0/P1/P2/P3 are received.

Step 2: and judging the frequency ratio of the DFI reading control signals to determine the operation needing conversion.

And step 3: if the ratio is 1:1, serial-to-parallel conversion is needed, if the ratio is 1:2, conversion is not needed, and if the ratio is 1:4, parallel-to-serial conversion is needed; the steps are converted into a fixed 1:2 mode, namely, dfi _ rddata _ en _ internal _ P0/P1 and dfi _ rddata _ cs _ internal _ P0/P1;

and 4, step 4: in the DFI data read-write module 4, the two read data control signals are used to generate a read data gating signal rxpadstandard, so that data is only sampled when a read command is available;

and 5: read data DQ0 to DQ7, bus data inversion DBI _ N, data strobe DQs, and DQs _ N appear on a pin that receives an external SDRAM in the data transceiver module 7;

step 6: sampling data strobing DQS and DQS _ N through a DQSRX sampler, sampling data DQ0 to DQ7 and bus data inversion DBI _ N through a data RX sampler, and ensuring that the sampled data is read data by using a read data gating signal RxPasstndby;

and 7: the sampled data is serial data, and 4BIT parallel data are output by two FIFOs of odd serial to parallel 4BIT and even serial to parallel 4 BIT;

and 8: then, combining the 4bit data in the DFI data read-write module 4 to form a DFI _ rddata _ internal _ W0/W1 signal and a DFI _ rddata _ dbi _ internal _ W0/W1, and generating a corresponding read data valid signal DFI _ rddata _ changed _ internal _ W0/W1 for indicating read data output;

and step 9: judging the frequency ratio and determining whether the read data needs to be converted;

step 10: if 1:1, read data needs to be deserialized, if 1:2, no conversion is needed, and if 1:4, a deserialization operation is needed.

Step 11: the data finally output to the controller end are dfi _ rddata _ W0/W1/W2/W3, dfi _ rddata _ valid _ W0/W1/W2/W3, and dfi _ rddata _ dbi _ W0/W1/W2/W3, so that the transmission correctness of the read data from the SDRAM to the controller is realized.

The data write channel transfer flow is shown in fig. 12, and the data read channel transfer flow is shown in fig. 13.

The initialization training calibration module generates an initialization sequence that is consistent with DDR5, and trains data, command, address, and clock paths to achieve optimal transmission conditions. DDR5 initializes the modes, specifically (1) upper voltages VPP, VDDQ, VDD, VSS, VSSQ, the power up ramp VPP must be at the same time or earlier than VDD. After the power-up is completed, the DQ, DQS _ t and DQS _ c levels must be between VSSQ and VDDQ, and the CS _ n, CK _ t, CK _ c and CA input signal levels must be between VSS and VDD to avoid latch-up; (2) the RESET signal RESET _ n is at a low level for at least 200us, and then the NOP command is used as an interval; (3) only commands, MRR, MRW, MPC and VREFCA, are supported during configuration, configuring default values of mode registers MR0, MR6, MR32, MR33, MR34, MR35, MR10, MR11, MR23, respectively. (4) After any training or calibration of timing parameters, DDR5 initialization is complete waiting for tZQLAT.

The training calibration mode is delayed, the training calibration comprises a 1D (one-dimensional) training mode and a 2D (two-dimensional) training mode, and the 1D training step comprises the following steps: device initialization, command/address training, read gating training, write balance fine training, read data deskew, read good training, write balance coarse training, write training, read training, and maximum read delay training. The 2D training steps are as follows: device initialization, read voltage eye pattern training, write voltage eye pattern training. As shown in fig. 11, the delay training calibration and SDRAM initialization are combined, and the initialization process of the PHY is as follows:

(A) start VDD, VDDQ, and VAA: VDD is the operating voltage inside PHY, VDDQ and VAA are the voltages of data, address and command of SDRAM pin, respectively, the first step of initialization is power-on, all outputs are unknown state until power supply is stable, and input is not concerned;

(B) start clock and reset PHY: this process is a reset PHY, not an SDRAM reset;

(C) initializing PHY configuration: because the PHY can support multiple SDRAM standard specifications and multiple combined configurations, this step is to determine what kind of memory specification and link configuration the current system is;

(D) loading firmware program image into instruction memory SRAM: the training flow is well compiled through a high-level C language, and is converted into a machine code after being compiled, and a firmware program image needs to be loaded into an instruction memory before training;

(E) the PHY input clock is set to the desired frequency: setting a fixed clock inside the DFI and a high-speed clock PLL module clock as required operation clocks according to the frequency ratio mode of the DFI and the input clock frequency of the DFI;

(F) and (3) message fast transmission written into the training firmware to enable the firmware to run: writing the step of the firmware to be operated and configuring the register parameters so that the firmware can be started to operate;

(G) performing 1D and 2D training sessions: continuously and repeatedly running the 1D and 2D training items until the optimal delay and voltage are reached;

(H) reading message fast result: once the training firmware is complete, it will transfer the training results from the data store to the configuration register transfer; other clock frequency training can be replaced to obtain different delay and voltage training results, and the optimization analysis in the future is facilitated.

(I) Load PHY initialization engine image: loading the running initialization training sequence into a PHHY initialization engine register, and indicating the content of the current running sequence;

(J) initializing SDRAM through DFI: strictly operating an initialization sequence corresponding to JESD79-4 or JESD79-5 according to DDR4 or DDR5, and configuring an SDRAM;

(K) ready for normal task mode: the PHY can enter normal transaction mode by waiting for the necessary time, with the input and output of the PHY controlled by the DFI bus.

The invention has the advantages that the high-performance DDR5PHY interface with low delay and throughput rate oriented to server scene application is provided, various command operations and high-frequency conversion of DDR5 can be supported, and the conversion from DFI5.0 address/command to address/command transmission conforming to DDR5SDRAM standard protocol is realized through the DFI address command module and the plurality of address command sending modules; mutual transmission of the DFI data read-write operation and the DDR5SDRAM data read-write operation is realized through the DFI data read-write module and the data transceiving modules; the frequency ratio conversion module can realize the interconversion of operations with the DFI frequency ratio of 1:1, 1:4 and 1:2 at the controller end and an internal fixed 1:2 mode, a high-speed clock PLL module is used for generating a high-speed clock with the internal frequency 4 times that of the DFI clock, and the relationship of a plurality of internal clocks of the PHY is managed; the training calibration module is initialized to generate an initialization sequence conforming to DDR4/DDR 5SDRAM standard specifications and to optimize the optimal delay for each data, command, address through training.

The key points and points to be protected of the invention are as follows:

the invention provides a concrete implementation mode of a DDR5PHY device through a specific hardware circuit technology, and realizes the connection between a standard DFI5.0 controller and DDR5 particles with high throughput rate and low delay. In order to realize various command operations supporting DDR5 standard protocol, the invention designs a single-cycle command mode and a double-cycle command mode in a control unit, and can automatically switch transmission access according to the DFI5.0 protocol.

The address transmission unit of the invention carries out 4BIT parallel-to-serial conversion of the address command, so that each internal high-speed clock pclk period can control a half period of an SDRAM clock, and the difference of the address command values of two adjacent periods is ensured; the data transmission unit realizes multiple data read-write working modes, the transmission of the data transmission unit uses a data enable signal to control burst operation at an IO port, and the data transmission unit supports the BC8, BL16, BL32 and continuous burst operation to access DDR5 particles.

The data transmission unit generates the sampling of the read data according to the DFI read signal, correctly samples the read data of DDR5 by using a sampling gate signal, and optimizes the time sequence of the read data through data de-skewing; the frequency ratio module realizes the support of various working frequencies, and for different frequency systems, the register, the multiplexer and the control counter are used for realizing the high-efficiency conversion of the frequency ratio 1:1, 1:4 and 1:2 modes, thereby ensuring the correctness of the time sequences of the different frequency modes.

In order to support the requirement of high data rate of SDRAM, the invention generates higher internal clock by internal 4-frequency-doubled high-speed PLL module to satisfy the frequency of upper edge operation and data double edge operation of address command and generate SDRAM clock. In order to meet the requirement of high bandwidth, the invention conveniently configures the register module through the APB interface, controls the number of the address command sending module and the data receiving and sending module according to the bit width requirement, supports the connection of a plurality of SDRAMs and realizes the function of a plurality of PHYs.

The IO processing unit of the data of the invention optimizes the signal characteristics of the IO interface by the following steps: the delay chain, the voltage-domain-crossing logic conversion of VDD and VDDQ, the adjustment of pull-up and pull-down resistors and the correct control of driving strength are optimized, and the signal integrity of the high-speed transmission PHY is improved; the training unit includes 1D training and 2D training, and during initialization, the parameters of the delay unit of each path are adjusted to enable the PHY to achieve optimal access.

As shown in fig. 14, which is an implementation of a write operation double cycle command of DDR5, CS _ n pull-down transfers the bank addressing of the CA bus, followed by the next cycle CS _ n pull-up, when the CA bus is the column address. Therefore, the DFI data read-write module arranges the address sequence into a first cycle with lower two bits and a second cycle with higher two bits, thereby realizing the write operation from DFI _ address [27:0] to CA [13:0] of DFI 5.0. After a write delay period of 9 cycles, dfi _ address [15:0] gives 4 data, and write data to DDR5 burst 16 is completed through PHY conversion.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, can be implemented in a computer program product that includes one or more computer instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

34页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:用于posit运算的加速电路系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!