low-delay clock domain structure of switching chip

文档序号:1711808 发布日期:2019-12-13 浏览:22次 中文

阅读说明:本技术 一种低时延的交换芯片时钟域结构 (low-delay clock domain structure of switching chip ) 是由 李沛杰 吕平 刘勤让 沈剑良 刘冬培 陈艇 张文建 张丽 董春雷 张霞 于 2019-08-30 设计创作,主要内容包括:本发明提供一种低时延的交换芯片时钟域结构。该时钟域结构包括:依次连接的入口SerDes RX单元、第一CDC单元、入口端口RX、第二CDC单元、核心交换单元、第三CDC单元、出口端口TX、第四CDC单元和出口SerDes TX单元;第一CDC单元,用于将入口SerDes RX单元接收的数据的时钟域由SerDes RX时钟域跨到核心交换时钟域;SerDes RX时钟域指SerDes RX单元工作时的时钟域,核心交换时钟域指核心交换单元工作时的时钟域;入口端口RX、核心交换单元和出口端口TX工作时采用相同的时钟域;第四CDC单元,用于将出口端口TX输出的数据的时钟域由核心交换时钟域跨到SerDes TX时钟域;SerDes TX时钟域指出口SerDes TX单元工作时的时钟域。本发明旨在提供一种降低交换时延,便于芯片性能提升的通用交换芯片的时钟域架构。(The invention provides a low-delay clock domain structure of a switching chip. The clock domain structure includes: the system comprises an inlet SerDes RX unit, a first CDC unit, an inlet port RX, a second CDC unit, a core switching unit, a third CDC unit, an outlet port TX, a fourth CDC unit and an outlet SerDes TX unit which are connected in sequence; a first CDC unit to cross a clock domain of data received by the ingress SerDes RX unit from the SerDes RX clock domain to a core switch clock domain; the SerDes RX clock domain refers to a clock domain when the SerDes RX unit works, and the core switching clock domain refers to a clock domain when the core switching unit works; the same clock domain is adopted when the inlet port RX, the core switching unit and the outlet port TX work; the fourth CDC unit is used for crossing the clock domain of the data output by the output port TX from the core switching clock domain to the SerDes TX clock domain; the SerDes TX clock domain indicates the clock domain in which the port SerDes TX unit operates. The invention aims to provide a clock domain framework of a universal switching chip, which reduces the switching time delay and is convenient for improving the performance of the chip.)

1. a low-latency switch chip clock domain structure, comprising: the system comprises an inlet SerDesRX unit, a first CDC unit, an inlet port RX, a second CDC unit, a core switching unit, a third CDC unit, an outlet port TX, a fourth CDC unit and an outlet SerDes TX unit which are connected in sequence;

The first CDC unit to cross a clock domain of data received by the ingress SerDes RX unit from a SerDes RX clock domain to a core switch clock domain; the SerDes RX clock domain refers to a clock domain when a SerDes RX unit works, and the core switching clock domain refers to a clock domain when a core switching unit works;

the same clock domain is adopted when the inlet port RX, the core switching unit and the outlet port TX work;

The fourth CDC unit is used for crossing a clock domain of data output by the egress port TX from a core switching clock domain to a SerDes TX clock domain; the SerDes TX clock domain indicates the clock domain in which the port SerDes TX unit operates.

2. The clock domain structure of claim 1, wherein the second CDC unit and the third CDC unit are both synchronous storage units.

3. The switch chip clock domain structure of claim 1, wherein the first CDC-unit, the second CDC-unit, the third CDC-unit, and the fourth CDC-unit each employ a flexible cache structure, each CDC-unit comprising: the asynchronous FIFO unit and the elastic cache deletion monitoring and processing unit;

And the elastic cache deletion monitoring and processing unit is used for monitoring the empty and full state alarm signal generated by the asynchronous FIFO unit and executing corresponding read-write operation according to the monitored empty and full state alarm signal.

4. the switch chip clock domain structure of claim 3, wherein the empty-full state alarm signal comprises: prog _ empty indication signal, prog _ full indication signal, and full indication signal.

5. the switch chip clock domain structure of claim 4, wherein the elastic cache pruning monitoring and processing unit comprises: a prog _ empty alarm elimination module and a prog _ full alarm elimination module;

the prog _ empty alarm eliminating module is used for monitoring a prog _ empty indicating signal; when a prog _ empty indicating signal is monitored, adding an idle byte according to the characteristics of a communication protocol, and simultaneously triggering a pause of read _ en;

the prog _ full alarm eliminating module is used for monitoring a prog _ full indicating signal; and deleting a free byte and simultaneously triggering a pause of write _ en according to the communication protocol characteristics when the prog _ full indication signal is monitored.

Technical Field

The invention relates to the technical field of switching chips, in particular to a low-delay clock domain structure of a switching chip.

background

switching structures of switching chips implemented based on hardware circuits are generally classified into three categories: CrossBar-based matrix switch fabrics, time-division-multiplexed-based bus switch fabrics, and Central-Buffer-based shared memory switch fabrics (e.g., wang. high-speed shared memory-based switch fabric research [ D ]. university of science and technology [ 2008 ]), in which a switch chip of CrossBar-based matrix switch fabrics has advantages of high switching efficiency and easy implementation when processing forwarding data, but is poor in scalability and manageability; the exchange based on the bus structure and the Central Buffer structure has the advantage of strong expansibility, but the high exchange efficiency needs to be realized by increasing the forwarding processing bandwidth, which poses a great challenge to the chip process. Along with the development of application, the throughput of the exchange chip is gradually improved, the exchange chip based on a single structure cannot meet the increase of exchange scale, the advantages of various exchange structures are gradually combined, the most common structure is the combination of a matrix structure based on CrossBar and a shared storage structure based on Central Buffer, the structure can compensate expansion caused by the exchange scale to a certain extent, the process limitation is considered, and efficient exchange is realized. However, in any switching structure, the structure of the clock domain is mainly determined based on the performance of switching forwarding and the expandability of the system in the design process, and the low-delay requirement of the switching system is hardly considered in this processing mode.

Disclosure of Invention

In order to solve the defect that the clock domain structure of the existing exchange chip has more clock domain crossing processing logics to cause delay increase, the invention provides the clock domain structure of the exchange chip with low time delay.

The invention provides a clock domain structure of a low-delay exchange chip, which comprises: the system comprises an inlet SerDes RX unit, a first CDC unit, an inlet port RX, a second CDC unit, a core switching unit, a third CDC unit, an outlet port TX, a fourth CDC unit and an outlet SerDes TX unit which are connected in sequence;

The first CDC unit to cross a clock domain of data received by the ingress SerDes RX unit from a SerDes RX clock domain to a core switch clock domain; the SerDes RX clock domain refers to a clock domain when a SerDes RX unit works, and the core switching clock domain refers to a clock domain when a core switching unit works;

The same clock domain is adopted when the inlet port RX, the core switching unit and the outlet port TX work;

The fourth CDC unit is used for crossing a clock domain of data output by the egress port TX from a core switching clock domain to a SerDes TX clock domain; the SerDes TX clock domain indicates the clock domain in which the port SerDes TX unit operates.

further, the second CDC-unit and the third CDC-unit are both synchronous storage units.

further, first CDC unit, second CDC unit, third CDC unit and fourth CDC unit all adopt elasticity buffer memory structure, and every CDC unit all includes: the asynchronous FIFO unit and the elastic cache deletion monitoring and processing unit;

And the elastic cache deletion monitoring and processing unit is used for monitoring the empty and full state alarm signal generated by the asynchronous FIFO unit and executing corresponding read-write operation according to the monitored empty and full state alarm signal.

Further, the empty and full state alarm signal comprises: prog _ empty indication signal, prog _ full indication signal, and full indication signal.

further, the elastic cache pruning monitoring and processing unit comprises: a prog _ empty alarm elimination module and a prog _ full alarm elimination module;

The prog _ empty alarm eliminating module is used for monitoring a prog _ empty indicating signal; when a prog _ empty indicating signal is monitored, adding an idle byte according to the characteristics of a communication protocol, and simultaneously triggering a pause of read _ en;

the prog _ full alarm eliminating module is used for monitoring a prog _ full indicating signal; and deleting a free byte and simultaneously triggering a pause of write _ en according to the communication protocol characteristics when the prog _ full indication signal is monitored.

The invention has the beneficial effects that:

1. The invention modifies the clock domain structure of the existing switching chip, can enlarge the range of the core processing logic to the maximum extent, and can maximally improve the clock frequency of the core processing logic under the condition of process permission, thereby maximally reducing the switching time delay;

2. the invention reduces the exchange time delay and simplifies the clock domain structure of the whole exchange chip, so that only the clock structure related to the SerDes needs to be updated when the exchange chip structure is updated or the process is changed, and the port of the core exchange unit and the clock structure of the exchange logic can not be adjusted, therefore, the clock domain structure provided by the invention determines the robustness and the strong inheritance of the exchange chip structure to a certain extent;

3. The invention is a universal low-delay exchange chip clock domain structure, which is not limited to the structure of a core exchange unit in principle and is suitable for the exchange chips under various conventional exchange structures;

4. the invention designs the clock domain crossing processing to the edge of the exchange chip, which makes the structure of the exchange chip clearer, simplifies the module division of the exchange chip and weakens the dependency of externally integrating the third-party SerDes IP.

Drawings

fig. 1 is a schematic structural diagram of a clock domain structure of a low-latency switch chip according to an embodiment of the present invention;

Fig. 2 is a schematic structural diagram of a CDC unit employing an elastic cache structure according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a basic structure of a switch chip provided in the prior art;

FIG. 4 is a schematic diagram of a conventional switch chip structure provided in the prior art;

Fig. 5 is a schematic diagram of a clock domain structure of a common switch chip provided in the prior art.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the data processing process of the whole exchange chip, the processes of receiving data serial/parallel conversion, receiving protocol data frame processing, data frame forwarding, sending protocol frame processing, sending data parallel/serial conversion and the like need to be completed, the time delay of the whole exchange chip is determined by the processing processes, the structure of the exchange chip can be abstracted as shown in fig. 3, and it can be seen that the basic structure of the exchange chip comprises 3 parts: the SerDes part is mainly used for completing serial/parallel and/or serial conversion of data; the port part is mainly used for receiving and sending protocol data; and the core switching part is mainly used for completing the forwarding of the data frame. The clock processing of the traditional switching chip is mainly designed by depending on the structure, generally speaking, the processing logic of the core switching is a single clock domain, the SerDes processes the serialization and the deserialization of the receiving and transmitting data, the clocks are divided into a receiving direction clock and a transmitting direction clock, a port part processes data frames related to a protocol, and the structure of the clock domain directly influences the processing delay of the switching chip.

As can be seen from the structure of the conventional switch chip (shown in fig. 3), the delay of the switch chip is mainly determined by the clock domains of the core switch process, the SerDes process and the port process. In existing switch chip processing, the port and SerDes are typically combined into port/channel processing logic, and as a result of such processing, the clock of the port is provided by the SerDes, and the structure of the clock domain is shown in fig. 4.

Generally, the SerDes Clock source is the output of an external crystal or a Clock processing chip, the SerDes RX outputs the Clock of the receiving end through an internal CDR (Clock Data recovery), the SerDes TX provides the Clock of the transmitting end through an internal PLL (Phase Lock Loop), and the port is divided into two parts, i.e., a port RX and a port TX, wherein the clocks of the port TX and the port RX are local clocks directly provided by the SerDes TX as Data is transmitted between the port and the SerDes. At the receiving side, because the Clock output by the SerDes RX is the Clock recovered by the CDR, in the data processing process, a Clock Crossing Domain exists between the recovered Clock of the receiving end and the local Clock, so that a CDC unit (Clock Domain Crossing, Clock Crossing Domain) needs to be arranged between the SerDes RX and the port RX to solve the interaction problem of the asynchronous Clock; similarly, since the clock for core switching is generally provided by an independent PLL inside the chip, which makes the clock for core switching and the port operation clock asynchronous, a CDC unit is designed between the core switching and the port TX and between the core switching and the port RX respectively, so as to solve the problem of interaction of the asynchronous clocks.

the switching chip based on the above structure is divided into 4 parts in the clock domain on the end-to-end data path, as shown in fig. 3, according to the processing flow of data: the clock domain crosses from the SerDes RX clock domain of the input port to the SerDes TX clock domain of the input port through CDC 1 (because data flows on two sides of CDC 1 are uninterrupted and reverse voltage logic is not generally designed, CDC 1 needs to perform clock compensation processing on clocks of different sources to solve frequency deviation between asynchronous clocks), the working clock SerDes TX clock domain of the input port crosses to the core switching clock domain through CDC 2 (because the two sides of CDC 2 are used for processing data messages, flow control and reverse voltage control logic are generally designed, CDC 2 needs to solve frequency deviation between asynchronous clocks based on messages or reverse voltage), the core switching clock domain crosses to the working clock SerDes TX clock domain through CDC 3 (because the processing of data messages on two sides of CDC 3, flow control and reverse voltage control logic are generally designed, CDC 3 needs to solve frequency deviation between asynchronous clocks based on messages or reverse voltage), the egress port and SerDes TX side will have no clock domain translation.

The invention provides a low-delay exchange chip clock domain structure by analyzing the data processing process of the exchange chip. The embodiment of the invention provides a low-delay clock domain structure of a switching chip, which comprises the following steps: the system comprises an inlet SerDes RX unit, a first CDC unit, an inlet port RX, a second CDC unit, a core switching unit, a third CDC unit, an outlet port TX, a fourth CDC unit and an outlet SerDes TX unit which are connected in sequence; the first CDC unit to cross a clock domain of data received by the ingress SerDes RX unit from a SerDes RX clock domain to a core switch clock domain; the same clock domain is adopted when the inlet port RX, the core switching unit and the outlet port TX work; the fourth CDC unit is used for crossing a clock domain of data output by the output port TX from a core switching clock domain to a SerDes TX clock domain.

Specifically, as shown in fig. 1, the clock domain processing of the structure mainly depends on forwarding of data, and the forwarding flow can be described as follows: after serial data received by an inlet of a switching chip is processed (including data clock recovery, serial-to-parallel conversion and the like) by an inlet SerDes RX unit, the inlet SerDes RX unit delivers the data to an inlet port RX for processing, the inlet port RX delivers the processed data message to a core switching unit for forwarding, the core switching unit forwards the data message to an outlet port TX for processing, and the outlet port TX sends the sent data message and other data to an outlet SerDes TX unit for processing (including parallel-to-serial conversion) and then sends the processed data message and other data.

Corresponding to the data forwarding process, the low-latency switch chip clock domain structure designed in the embodiment of the present invention can be described as follows: after data received by an entrance of the switching chip passes through the entrance SerDes RX unit, a clock domain crosses from the SerDes RX clock domain to the core switching clock domain through a first CDC unit (CDC 1 in fig. 1) (because data flows at two sides of the CDC 1 are uninterrupted and back-pressure logic is not generally designed, the CDC 1 needs to perform clock compensation processing on clocks of different sources to solve frequency offset between asynchronous clocks); in the embodiment of the present invention, the port processing logic and the core switching processing logic operate in the same clock domain, so that the second CDC unit (CDC 2 in fig. 1) and the third CDC unit (CDC 3 in fig. 1) do not implement the function of cross-clock domain processing, but implement the function by using the synchronous storage unit; then, after the data received and processed by the ingress port RX is forwarded by the core switch unit, the data at the egress port TX passes through the fourth CDC unit (CDC 4 in fig. 1) to cross the clock domain from the core switch clock domain to the SerDes TX clock domain (because data flows at both sides of the CDC4 are uninterrupted and a back pressure logic is not generally designed, the CDC4 needs to perform clock compensation processing on clocks of different sources to solve the frequency offset between asynchronous clocks).

Wherein, the SerDes RX clock domain refers to a clock domain when a SerDes RX unit works, and the core switching clock domain refers to a clock domain when a core switching unit works; the SerDes TX clock domain indicates the clock domain in which the port SerDes TX unit operates.

in the clock domain structure provided by the embodiment of the present invention, the selection of the position of the CDC unit, i.e., the first CDC unit and the fourth CDC unit, is one of the keys of the clock domain structure, and since the SerDes mainly completes serial/parallel conversion of received data and parallel/serial conversion of transmitted data, and the parallel data is directly received by a receiving end or generated by a transmitting end of a PCS (Physical Coding Sublayer), in this structure, the first CDC unit is placed after serial/parallel conversion of the SerDes RX and before an ingress port RX, and the fourth CDC unit is placed before serial/parallel conversion of the SerDes TX and after an egress of the port TX.

on the basis of the above embodiments, the first CDC unit, the second CDC unit, the third CDC unit, and the fourth CDC unit all employ an elastic cache structure, and each CDC unit includes: the asynchronous FIFO unit and the elastic cache deletion monitoring and processing unit; the elastic cache deletion monitoring and processing unit is used for monitoring the empty and full state alarm signal generated by the asynchronous FIFO unit and executing corresponding read-write operation according to the monitored empty and full state alarm signal.

As one possible implementation, the flexible cache pruning monitoring and processing unit includes: a prog _ empty alarm elimination module and a prog _ full alarm elimination module; the prog _ empty alarm eliminating module is used for monitoring a prog _ empty indicating signal; when a prog _ empty indicating signal is monitored, adding an idle byte according to the characteristics of a communication protocol, and simultaneously triggering a pause of read _ en; the prog _ full alarm eliminating module is used for monitoring a prog _ full indicating signal; and deleting a free byte and simultaneously triggering a pause of write _ en according to the communication protocol characteristics when the prog _ full indication signal is monitored.

specifically, the CDC units in the embodiments of the present invention all adopt a flexible cache structure, as shown in fig. 5, the structure mainly includes an asynchronous FIFO unit and a flexible cache deletion monitoring and processing unit. The asynchronous FIFO unit completes the storage of uninterrupted data stream, and the elastic buffer deleting monitoring and processing unit completes the storage state monitoring and the elastic buffer read-write control logic.

The processing of each CDC unit may be described as follows:

The asynchronous FIFO unit receives data from the SerDes or the port TX, and due to frequency deviation of clocks at two sides of the asynchronous FIFO unit, an empty-full state alarm occurs in the asynchronous FIFO unit along with the time, and a corresponding empty-full state alarm signal is generated; the elastic cache deleting monitoring and processing unit monitors the empty and full state alarm signal and executes corresponding read-write operation. Specifically, the empty-full state alarm signals generated by the asynchronous FIFO elements include an almost empty prog _ empty indication signal and an empty indication signal, and an almost full prog _ full indication signal and a full indication signal, wherein empty and full states do not occur if the CDC of the switch chip is reasonably designed. When the write side clock of the asynchronous FIFO unit is faster than the read side clock, the asynchronous FIFO unit gradually checks the prog _ full state along with the time lapse, when the asynchronous FIFO unit triggers the prog _ full indication signal, a prog _ full alarm elimination module in the elastic cache deletion monitoring and processing unit monitors the state indication signal, deletes an idle byte according to the communication protocol characteristics, and simultaneously triggers one pause of write _ en, so that the read side can read the asynchronous FIFO unit more within a certain time to eliminate the prog _ full indication signal; when the write side clock of the asynchronous FIFO unit is slower than the read side clock, the asynchronous FIFO unit gradually checks the prog _ empty state along with the time lapse, when the asynchronous FIFO unit triggers the prog _ empty indication signal, a prog _ empty alarm eliminating module in the elastic cache deletion monitoring and processing unit monitors the state indication signal, increases an idle byte according to the communication protocol characteristic, and simultaneously triggers one pause of read _ en, so that the write side can write the asynchronous FIFO unit more within a certain time to eliminate the prog _ empty indication signal.

Compared with the clock domain structure of a general exchange chip, the clock domain structure of the exchange chip with low time delay provided by the embodiment of the invention reduces the number of clock domains, the clock domain crossing is placed at the edge of the exchange chip as much as possible to be processed, so that the core logic range processed by a single clock domain is enlarged, the clock frequency of the core processing logic can be improved to a certain extent, the clock domain crossing part of the newly-added fourth CDC unit can increase a certain delay to a certain extent, but because the part is positioned at the edge of the exchange chip, the processing time delay of a SerDes transceiving channel is mainly expressed in serial/parallel conversion and parallel/serial conversion, the time delay is relatively fixed, and the improvement of the clock frequency of the core processing logic still reduces the exchange time delay of the exchange chip to a certain extent on the whole.

the low-delay clock domain structure of the switching chip provided by the invention has the following advantages:

1. The invention modifies the clock domain structure of the existing switching chip, can enlarge the range of the core processing logic to the maximum extent, and can maximally improve the clock frequency of the core processing logic under the condition of process permission, thereby maximally reducing the switching time delay;

2. the invention reduces the exchange time delay and simplifies the clock domain structure of the whole exchange chip, so that only the clock structure related to the SerDes needs to be updated when the exchange chip structure is updated or the process is changed, and the port of the core exchange unit and the clock structure of the exchange logic can not be adjusted, therefore, the clock domain structure provided by the invention determines the robustness and the strong inheritance of the exchange chip structure to a certain extent;

3. The invention is a universal low-delay exchange chip clock domain structure, which is not limited to the structure of a core exchange unit in principle, is suitable for the exchange chips under various conventional exchange structures, such as the exchange chips based on the cross Bar and the exchange structure combined by the cross Bar and the Central Buffer, which are commonly used at present, and the exchange chips based on the exchange structure of the Central Buffer and the bus;

4. the invention designs the clock domain crossing processing to the edge of the exchange chip, which makes the structure of the exchange chip clearer, simplifies the module division of the exchange chip and weakens the dependency of externally integrating the third-party SerDes IP.

finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

9页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种NMEA数据的波特率识别方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!