Data synchronous redundancy system and control method thereof

文档序号:1963610 发布日期:2021-12-14 浏览:8次 中文

阅读说明:本技术 一种数据同步冗余系统及其控制方法 (Data synchronous redundancy system and control method thereof ) 是由 邵忠俊 李彬 王文伟 张博 艾小强 张有波 于 2021-09-03 设计创作,主要内容包括:本发明属于冗余系统及其控制方法,为解决目前各工业领域中采用的容错计算机,使用时灵活性较差,仍然无法保证数据安全可靠传输的技术问题,提供一种数据同步冗余系统及其控制方法,系统采用2乘2取2安全计算机结构,同时运行,一个拥有控制权,另一个拥有监视权,在一个计算机系统内设置两个通道,两个通道采用相同的硬件设计、采集相同的数据,且运行相同的软件,在数据传输中,只有两个通道一致时,相应计算机系统才会输出,否则,也将控制权交给另一个计算机系统,实现了硬件结构冗余。两个计算机系统的同步设计可采用信号灯同步和数据同步方式相互结合的方式。(The invention belongs to a redundancy system and a control method thereof, and aims to solve the technical problems that a fault-tolerant computer adopted in the prior industrial fields has poor flexibility during use and still cannot ensure safe and reliable data transmission. The synchronous design of the two computer systems can adopt a mode of combining signal lamp synchronization and data synchronization.)

1. A data synchronization redundancy system, characterized by: comprising two computer systems in communication with each other;

the computer system comprises a safety output board, a service board, a command channel and a monitoring channel which are communicated with each other, a first fault detection board communicated with the command channel and a second fault detection board communicated with the monitoring channel;

the command channel and the monitoring channel adopt the same hardware design, collect the same data and run the same software;

the first fault detection board and the second fault detection board are respectively used for recording fault conditions of the command channel and the monitoring channel and sending the fault conditions to the safety output board;

the safety output board is used for receiving fault conditions sent by the first fault detection board and the second fault detection board and transmitting data with external equipment according to the fault conditions;

the service board is used for switching the state control output of the two computer systems.

2. A data synchronization redundancy system according to claim 1, characterized by: the command channel adopts a Windows operating system, and the monitoring channel adopts a Vxworks operating system.

3. A data synchronization redundancy system according to claim 1 or 2, characterized by: the command channel and the supervisory channel communicate with each other through CCDL or ethernet.

4. A method of controlling a data synchronous redundant system according to any one of claims 1 to 3, comprising the steps of:

s1, self-checking

S1.1, respectively sending a group of same preset digital signals through a command channel and a monitoring channel of a computer system;

s1.2, initial synchronization of Command channel and monitoring channel

Enabling the command channel and the monitoring channel to carry out double-handshake synchronization, and executing the step S1.3 if the waiting time of the two-time handshake is less than or equal to a first preset time; otherwise, the first fault detection board and the second fault detection board record the initial synchronization fault, and the corresponding computer system gives up the control right;

s1.3, command channel and monitor channel cycle synchronization

When each period of the preset digital signal starts, enabling the command channel and the monitoring channel to synchronously handshake, and if the waiting time of two times of handshaking is less than or equal to a second preset time, continuing to transmit data; otherwise, adding 1 to the periodic synchronization failure counter of the first fault detection board and the second fault detection board, and restarting the preset digital data period;

step S1.3, in the execution process, if the number on the periodic synchronization failure counter reaches a preset value, the transmission is stopped, and the corresponding computer system gives up the control right;

s1.4, data synchronization

S1.4.1, adding synchronous frame information to the data, wherein the synchronous frame information is the period information of the data;

s1.4.2, when data exchange is performed between the processor board of the command channel and the processor board of the monitor channel, comparing the synchronous frame information of the data in the command channel and the monitor channel, if they are consistent, continuing to execute the periodic task;

otherwise, comparing the synchronous frame numbers corresponding to the synchronous frame information in the command channel and the monitoring channel, replacing the synchronous frame information with larger synchronous frame number with the synchronous frame information with smaller synchronous frame number, adding 1 to the data synchronous error counters of the first fault detection plate and the second fault detection plate, and continuing to transmit data;

or comparing synchronous frame numbers corresponding to synchronous frame information in the command channel and the monitoring channel, enabling the channel with a larger synchronous frame number to wait for the channel with a smaller synchronous frame number, if the waiting time exceeds three periods, adding 1 to the data synchronous error counters of the first fault detection plate and the second fault detection plate, and restarting the data period;

during the execution of step S1.4.2, if the number of the data synchronization error counter reaches the preset value, the data transmission is stopped and the corresponding computer system gives up control;

s2, computer system selection

If the two computer systems do not give up the control right after the self-checking in the step S1, selecting one computer system to perform data transmission according to a preset instruction;

if any computer system gives up the control right after the self-checking of the step S1, another computer system performs data transmission;

if the two computer systems give up control right after the self-checking in the step S1, the two computer systems stop executing data transmission, and perform troubleshooting on the two computer systems;

s3, real-time synchronization in the working process of the redundant system

The data is transferred to the external device through the computer system determined through step S2, and step S1.2 to step S1.4 are repeatedly performed at the time of the transfer.

5. The method for controlling a data synchronization redundancy system according to claim 4, wherein in step S3, the step of transmitting the data to the external device through the computer system determined in step S2 further comprises fault tolerance detection before transmitting to the external device:

the data transmitted from the computer system to the external device through step S3 is transmitted to the CCQI and CCQI logics in the programmable logic module through the CPCI bus of the command channel and the CPCI bus of the monitor channel, respectively, after the CCQI and CCQI receive the data of the corresponding channel, the flag information bits of the received data are changed once, when the flag information bits of the received data of the CCQI and CCQI are consistent, the synchronization frame information of the data in the CCQI and CCQI are compared, and if the flag information bits of the received data of the CCQI and CCQI are consistent, the data contents are compared; if the data transmission is not consistent, the two computer systems do not give up the control right through the self-test of the step S1, the data transmission of the computer system to the external device is stopped, and the data transmission is executed by the other computer system, otherwise, the data transmission is stopped, and the two computer systems are subjected to troubleshooting.

6. The method as claimed in claim 5, wherein the comparing the data content is implemented by setting a comparison threshold in a miscompare counter of the safety output board, and if the data content is consistent, subtracting 1 from the miscompare counter until the value of the miscompare counter is zero, then subtracting 1 from the miscompare counter, otherwise, adding 1 from the miscompare counter until the value of the miscompare counter is greater than the comparison threshold, and then stopping data transmission by the computer system.

Technical Field

The invention belongs to a redundancy system and a control method thereof, and particularly relates to a data synchronization redundancy system and a control method thereof.

Background

In the face of various industrial fields, particularly high-safety industries such as aerospace, rail transit, banks and power plants, higher safety requirements are provided for electronic systems used in the industries, fault-tolerant computers are provided, double-computer hot standby products are replaced, and the high-safety computer hot standby system is widely used in the high-safety fields. Although the existing fault-tolerant computer has certain reliability and fault-tolerant capability, the flexibility of the computer is poor when the computer is used, and the safe and reliable transmission of data cannot be guaranteed.

Disclosure of Invention

The invention provides a data synchronization redundancy system and a control method thereof, aiming at solving the technical problems that the fault-tolerant computers adopted in the current industrial fields have poor flexibility when in use and still cannot ensure the safe and reliable transmission of data, starting from three aspects of hardware architecture, operating system and control algorithm, improving the redundancy of the system to the maximum extent and ensuring the reliability of the output of the redundancy system.

In order to achieve the purpose, the invention provides the following technical scheme:

a data synchronization redundancy system is characterized by comprising two computer systems which are communicated with each other;

the computer system comprises a safety output board, a command channel and a monitoring channel which are communicated with each other, a first fault detection board communicated with the command channel, and a second fault detection board communicated with the monitoring channel;

the command channel and the monitoring channel adopt the same hardware design, collect the same data and run the same software;

the first fault detection board and the second fault detection board are respectively used for recording fault conditions of the command channel and the monitoring channel and sending the fault conditions to the safety output board;

the safety output board is used for receiving fault conditions sent by the first fault detection board and the second fault detection board and transmitting data with external equipment according to the fault conditions;

the service board is used for switching the state control output of the two computer systems.

Furthermore, the command channel adopts a Windows operating system, and the monitoring channel adopts a Vxworks operating system.

Further, the command channel and the supervisory channel communicate with each other through CCDL or ethernet.

The invention also provides a control method of the data synchronization redundancy system, which is characterized by comprising the following steps:

s1, self-checking

S1.1, respectively sending a group of same preset digital signals through a command channel and a monitoring channel of a computer system;

s1.2, initial synchronization of Command channel and monitoring channel

Enabling the command channel and the monitoring channel to carry out double-handshake synchronization, and executing the step S1.3 if the waiting time of the two-time handshake is less than or equal to a first preset time; otherwise, the first fault detection board and the second fault detection board record the initial synchronization fault, and the corresponding computer system gives up the control right;

s1.3, command channel and monitor channel cycle synchronization

When each period of the preset digital signal starts, enabling the command channel and the monitoring channel to synchronously handshake, and if the waiting time of two times of handshaking is less than or equal to a second preset time, continuing to transmit data; otherwise, adding 1 to the periodic synchronization failure counter of the first fault detection board and the second fault detection board, and restarting the preset digital data period;

step S1.3, in the execution process, if the number on the periodic synchronization failure counter reaches a preset value, the transmission is stopped, and the corresponding computer system gives up the control right;

s1.4, data synchronization

S1.4.1, adding synchronous frame information to the data, wherein the synchronous frame information is the period information of the data;

s1.4.2, when data exchange is performed between the processor board of the command channel and the processor board of the monitor channel, comparing the synchronous frame information of the data in the command channel and the monitor channel, if they are consistent, continuing to execute the periodic task;

otherwise, comparing the synchronous frame numbers corresponding to the synchronous frame information in the command channel and the monitoring channel, replacing the synchronous frame information with larger synchronous frame number with the synchronous frame information with smaller synchronous frame number, adding 1 to the data synchronous error counters of the first fault detection plate and the second fault detection plate, and continuing to transmit data;

or comparing synchronous frame numbers corresponding to synchronous frame information in the command channel and the monitoring channel, enabling the channel with a larger synchronous frame number to wait for the channel with a smaller synchronous frame number, if the waiting time exceeds three periods, adding 1 to the data synchronous error counters of the first fault detection plate and the second fault detection plate, and restarting the data period;

during the execution of step S1.4.2, if the number of the data synchronization error counter reaches the preset value, the data transmission is stopped and the corresponding computer system gives up control;

s2, computer system selection

If the two computer systems do not give up the control right after the self-checking in the step S1, selecting one computer system to perform data transmission according to a preset instruction;

if any computer system gives up the control right after the self-checking of the step S1, another computer system performs data transmission;

if the two computer systems give up control right after the self-checking in the step S1, the two computer systems stop executing data transmission, and perform troubleshooting on the two computer systems;

s3, real-time synchronization in the working process of the redundant system

The data is transferred to the external device through the computer system determined through step S2, and step S1.2 to step S1.4 are repeatedly performed at the time of the transfer.

Further, in step S3, the transmitting data to the external device through the computer system determined in step S2 further includes fault tolerance detection before transmitting to the external device:

the data transmitted from the computer system to the external device through step S3 is transmitted to the CCQI and CCQI logics in the programmable logic module through the CPCI bus of the command channel and the CPCI bus of the monitor channel, respectively, after the CCQI and CCQI receive the data of the corresponding channel, the flag information bits of the received data are changed once, when the flag information bits of the received data of the CCQI and CCQI are consistent, the synchronization frame information of the data in the CCQI and CCQI are compared, and if the flag information bits of the received data of the CCQI and CCQI are consistent, the data contents are compared; if the data transmission is not consistent, the two computer systems do not give up the control right through the self-test of the step S1, the data transmission of the computer system to the external device is stopped, and the data transmission is executed by the other computer system, otherwise, the data transmission is stopped, and the two computer systems are subjected to troubleshooting.

Further, the comparing the data content specifically includes setting a comparison threshold in a comparison error counter of the safety output board, if the data content is consistent, subtracting 1 from the comparison error counter until the value of the comparison error counter is zero, not subtracting 1 from the comparison error counter, otherwise, adding 1 from the comparison error counter until the value of the comparison error counter is greater than the comparison threshold, and stopping data transmission by the computer system.

Compared with the prior art, the invention has the beneficial effects that:

1. the data synchronization redundancy system adopts a 2-by-2-out-of-2 safe computer structure, and provides a necessary hardware basic platform for the high reliability requirement of the system. Two computer systems are running simultaneously, one with control and the other with monitoring, in case of failure of one computer system, the other takes control.

The invention sets two channels in a computer system, the two channels adopt the same hardware design, collect the same data, and run the same software, in the data transmission, only when the two channels are consistent, the corresponding computer system will output, otherwise, the control right will be handed to another computer system, realize the redundancy of hardware structure.

2. The command channel adopts a Windows operating system, the monitoring channel adopts a Vxworks operating system, two different operating systems process the same signal, and the two operating systems are mutually verified and verified, so that the reliability and the safety of the system are further improved, and the fault of a single computer system caused by unstable factors is avoided to the maximum extent.

3. According to the invention, before formal data transmission, self-checking is carried out on the conditions of two channels in the computer system through self-checking, and synchronization detection is still carried out in real time during data transmission, so that the two channels in the computer system realize task synchronization through signal lamp synchronization and data synchronization, and the synchronization between the channels is realized by adopting a double-handshake synchronization algorithm which mainly adopts software and combines software and hardware, thereby enhancing the effectiveness of comparison on sampled data.

4. When the data is transmitted to the external equipment through the safety output board, the invention also needs to compare the validity of the synchronous frame information of the data and the data content again, thereby further ensuring the validity of the data.

5. When the data contents are compared, the fault-tolerant logic processing is added, and the equipment is ensured to be within the allowable range, and unnecessary output switching or fault error report caused by self or external influence can not be generated.

Drawings

FIG. 1 is a schematic diagram of an embodiment of a data synchronization redundancy system according to the present invention;

FIG. 2 is a diagram of a software system architecture for a command channel and a monitor channel in accordance with the present invention;

FIG. 3 is a drawing of a signal lamp synchronization mechanism employed in the present invention;

fig. 4 is a diagram illustrating a configuration of a fault register of the first fault detection board and the second fault detection board of fig. 1 according to the present invention.

Detailed Description

The technical solution of the present invention will be clearly and completely described below with reference to the embodiments of the present invention and the accompanying drawings, and it is obvious that the described embodiments do not limit the present invention.

The design of the data synchronization redundancy system is mainly considered from two main points: structural redundancy and information redundancy. In addition, aiming at the small-probability random fault of the system hardware or software, the invention also adds a fault-tolerant additional technology, and the used recovery strategy adopts a forward recovery strategy and a backward recovery strategy to recover the system to a coherent correct state and make up the incoherent condition of the current state. The system can identify and judge the faults and ignore the faults, and the output correctness is ensured, so that the reliability of the system is further improved.

The data synchronous redundancy system adopts a 2-by-2-out-of-2 structure from a system architecture, two computer systems of an I-end computer system and a II-end computer system run simultaneously and are communicated with each other, one computer system has a control right, the other computer system only has a monitoring right, the computer system at the II end obtains the control right under the condition of the fault of the I-end computer system, the computer systems at the two ends respectively comprise two channels, namely a command channel and a monitoring channel, and the data synchronous redundancy system further comprises a safety output board, a first fault detection board communicated with the command channel and a second fault detection board communicated with the monitoring channel. The two channels have the same hardware design, collect the same data, run the same software, the processor module in the channel visits the interface module through the local CPCI bus, insert and install, can communicate through CCDL or Ethernet between two channels, the dual-redundancy design compares two routes of input signals and operation result synchronously separately, only when two operation results are identical, the computer system will output, otherwise, the control right is handed over by another computer system output result that the structure is the same with it, the local machine sends out the alarm signal and outputs the security state immediately. The first fault detection board and the second fault detection board are respectively used for recording fault conditions of the command channel and the monitoring channel and sending the fault conditions to the safety output board, and the safety output board is used for receiving the fault conditions sent by the first fault detection board and the second fault detection board and carrying out data transmission with external equipment according to the fault conditions.

In order to further improve the system reliability, the command channel adopts a Windows operating system, the monitoring channel adopts a Vxworks operating system, two different operating systems process the same signal, and the two different operating systems are mutually verified and verified, so that the system reliability and the safety are further improved.

Referring to fig. 1, the system of the present invention is specifically described by taking a data synchronization redundancy system for vehicle-mounted devices as an example. The main functions of the processor module are: (1) the processor module reads and writes the data of each board card through a CPCI bus interface; (2) exchanging and comparing data between boards, obtaining data of another processor module in the same computer system through Ethernet, and monitoring input and output data; (3) the processing process is synchronous; (4) other functions, such as process synchronization; detecting whether the power supply has overvoltage, overcurrent, sensitive chip temperature monitoring and the like, and executing corresponding operation according to monitoring data; the processor scans the current working state at fixed time; setting a reset button to reset all circuit boards in the channel; the state of the equipment (starting, fault and the like) can be conveniently and quickly known by arranging the state indicator lamp. The first fault detection board and the second fault detection board are hardware bases of two channels synchronization, and can generate a synchronous clock, monitor states, record data and manage power supplies. The service board is mainly responsible for detecting the health states of the two channels, storing service information, fault diagnosis error information and the like, is responsible for communication of the two computer systems, and controls output switching according to the states of the computer systems. The safety output board can output signal conditioning, output channel self-checking, channel state indication and output signal voting. The digital input board can perform channel self-check, reading and storing of input switching value signals, self-check and channel state indication. The MVB board is connected with an MVB bus to realize the conversion of physical layer signals, protocol conversion is completed through communication protocol software running on the MVB board to realize cross-bus data transmission, the MVB board has interfaces of two paths of MVB buses which can be redundant with each other, a bus controller is used as an interface unit of an equipment circuit and a physical layer to support the functions of process data, message data, monitoring data and bus management specified in IEC61375-1 standard and provides a maintenance and data transmission interface, and the MVB board has an Ethernet module, a real-time clock, a power management and hardware watchdog module and a JTAG simulator debugging interface which support remote access. And the communication board is responsible for receiving the serial port data from the peripheral equipment and the Ethernet data from the peripheral equipment and sending the serial port data and the Ethernet data to the peripheral equipment. The safety output board is connected with the vehicle through the vehicle input and output interface to transmit information.

The processor module sends out or reads back the fault information and the working state of each board through the CPCI bus. Initial synchronization and periodic synchronization fault information, fault information of data synchronization among channels, data monitoring fault information, channel self-checking fault information and the like are detected by a processor module and sent to a fault register corresponding to the module, a watchdog overflows the fault information, the channel safety output comparison fault and channel fault logic information output by the module are generated by a logic circuit and directly recorded into the fault register, and the processor module reads the working state of the channel from the fault state register of the module, wherein the working state comprises the fault information and the channel health information.

After the system is started, the computer system at the I end is set as the main control, and the computer system at the II end is set as the monitoring (I, II end is a logic concept and is set by a controller in advance). Two channels in the computer system work simultaneously, including synchronous processing, reading input signals, exchanging input data, comparing input data, calculating, exchanging output results, comparing output results, and finally outputting control commands by the main control computer system.

If the comparison results of the two channels of the I-end computer are consistent, the I-end computer is indicated to have no fault in the working period, and the signal is sent to a safety output board to be used as one input of channel fault logic to participate in fault logic operation; if other parts (power supply monitoring, self-test, watchdog and the like) of the I-end computer system also work normally, the I-end computer system is in a healthy state, and the I-end computer system outputs a control command to external equipment; meanwhile, the computer system at the II end does the same work as the computer system at the I end, but the result is not output.

If the comparison results of the two channels of the computer system at the I end are inconsistent or other faults exist, the computer system at the I end outputs a safety state, sends a local fault signal to the computer system at the II end through the service board and simultaneously sends an alarm signal; after the computer system at the end II receives the fault signal of the computer system at the end I, if the computer system at the end II is in a healthy state, the computer system at the end II takes over the master control right to realize the requirement of one-time fault work; otherwise, both computer systems output a safety state and send out an alarm signal to meet the requirement of secondary failure safety.

The core content architecture of the two computer systems is similar to the hardware structure of a product, and the complete consistency is kept. The software system structure of the command channel and the monitoring channel is shown in figure 2, the two channels are synchronously controlled and simultaneously carry out mutual communication confirmation in the process, the whole synchronous control method comprises the steps of firstly carrying out initial synchronization, judging whether the initial synchronization is correct, if so, carrying out periodic synchronization, judging whether the periodic synchronization is correct, if so, judging whether the periodic synchronization is correct because the two channels require input data to be consistent (if the two channels are not exchanged, one channel has problems accidentally and instantly, the input data of the two channels are inconsistent), comparing the two channels in the respective channels, sending the compared result to the channel of the other side, comparing the result of the other side in the respective channel by the respective channels, exchanging the input data, carrying out data synchronization, judging whether a synchronous frame is correct, judging whether the data is consistent, if so, calculating the output data, exchanging the output data, if any step in the process is judged incorrectly, the corresponding synchronous fault processing is carried out, the fault reason can be checked in an intervening mode, if the results are judged correctly, data are output, during data output, the correctness and the consistency of the synchronous frame and the data of the output data are judged again, if the synchronous frame is incorrect, the synchronous fault processing is carried out, if the data are inconsistent, the data monitoring fault processing is carried out, and if the results are correct, information is sent to a safety output board and sent to external equipment by the safety output board.

Based on the idea of information redundancy (a part of information bits added in data are utilized to detect or correct errors of information in operation or transmission to achieve fault tolerance), software is mainly adopted, a soft/hard combined double-handshake synchronization algorithm is used for realizing synchronization between channels, the processing process of two-time synchronization can ensure the synchronization of the working flows of the two channels in a computer system, and the consistency of the data of the two channels is realized through a data synchronization flow, namely, a processor module between the two channels is added with periodic frame information (the periodic frame is the periodic information of the data) during each data exchange, so that the resynchronization of the system with another channel in the same period after the system is out of step is ensured.

The synchronous design of the two computer systems in the application can adopt a mode of combining signal lamp synchronization and data synchronization. Wherein the signal lamp is synchronous: two computer systems synchronize, i.e., "set" and "reset," the particular signal storage unit by cross-lighting and extinguishing the other's signal lights. The method can achieve task cycle synchronization, which is the most widely applied synchronization technology, also called "signal handshake", and the synchronization mechanism is shown in fig. 3. The data synchronization is realized by adding a synchronization frame when a data frame is transmitted, and the method can solve the problem of resynchronization after the system is out of step.

Two channels in the computer system realize task synchronization by means of signal lamp synchronization (including initial synchronization and periodic synchronization) and data synchronization. The synchronization between the channels is realized by adopting a software-based soft/hard combined double-handshake synchronization algorithm.

The computer synchronization process is to complete the synchronization between channels by combining a hardware timer and a signal lamp under the management of a system synchronization program. Two computers realize synchronization by lighting and extinguishing the signal lamps of the other computers in a mutually crossed manner, namely, setting and resetting specific signal storage units. The method can realize the synchronization of the task period, is the most widely applied synchronization technology and is also called signal handshake. The signal lamp synchronization mode is realized by performing 'setting' and 'resetting' operations on corresponding registers in a first fault detection plate and a second fault detection plate of two channels in a computer system through software and comparing results.

The specific synchronous control method comprises the following steps:

1. self-test

Before synchronous detection is carried out on positively transmitted data, self-checking is carried out on the two channels, and a group of same preset digital signals are respectively sent through a command channel and a monitoring channel of a computer system:

(1) initial synchronization of command channel and supervisory channel

Enabling the command channel and the monitoring channel to carry out double-handshake synchronization, and if the waiting time of two-time handshake is less than or equal to 2s, executing next-step cycle synchronization; otherwise, the first fault detection board and the second fault detection board record the initial synchronization fault, and the corresponding computer system gives up the control right;

(2) command channel and monitor channel cycle synchronization

When each period of the preset digital signal starts, enabling the command channel and the monitoring channel to perform synchronous handshake, and if the waiting time of the two handshakes is less than or equal to 200 mu s, continuing to transmit data; otherwise, adding 1 to the periodic synchronization failure counter of the first fault detection board and the second fault detection board, and restarting the preset digital data period;

in the period synchronization executing process, if the number on the period synchronization failure counter reaches a preset value, the transmission is stopped, and the corresponding computer system gives up the control right;

(3) data synchronization

And adding synchronous frame information to the data, wherein the synchronous frame information is the period information of the data. When data exchange is carried out between the processor board of the command channel and the processor board of the monitoring channel each time, comparing synchronous frame information of data in the command channel and the monitoring channel, and if the synchronous frame information is consistent, continuously executing a periodic task;

otherwise, comparing the synchronous frame numbers corresponding to the synchronous frame information in the command channel and the monitoring channel, replacing the synchronous frame information with larger synchronous frame number with the synchronous frame information with smaller synchronous frame number, adding 1 to the data synchronous error counters of the first fault detection plate and the second fault detection plate, and continuing to transmit data;

or comparing synchronous frame numbers corresponding to synchronous frame information in the command channel and the monitoring channel, enabling the channel with a larger synchronous frame number to wait for the channel with a smaller synchronous frame number, if the waiting time exceeds three periods, adding 1 to the data synchronous error counters of the first fault detection plate and the second fault detection plate, and restarting the data period;

in the data synchronization executing process, if the number on the data synchronization error counter reaches a preset value, the data transmission is stopped, and the corresponding computer system gives up the control right.

2. Computer system selection

The switching of the output control power of two computer systems is determined by three signals: externally given control authority, I-side computer system and II-side computer system. If the computer system at the I end and the computer system at the II end have no fault, the control right is set by the outside to determine the output (main control) end of the computer system; if one of the computer system at the I end and the computer system at the II end has a fault and the other computer system at the II end has a normal fault, the system is output by the computer system without the fault no matter which computer system the external control right is at, and when the master computer system has a fault, the system immediately gives an alarm signal.

3. Real-time synchronization in redundant system operation

Transmitting data to the external device through the computer system determined in the step 2, and repeatedly executing the steps 1 and 2 during transmission to synchronize formal transmission data:

initial synchronization: after the system finishes initialization and power-on self-test and before entering a periodic task, two channels of the computer system need to be initially synchronized, the maximum waiting time of two handshakes of the initial synchronization is 2 seconds, a watchdog timer needs to be cleared in the process, after the initial synchronization fails, software does not execute the periodic task any more, initial synchronization faults are recorded in fault registers of a first fault detection board and a second fault detection board, and the control right of the system is handed over.

And (3) periodic synchronization: when each working period starts, two channels need to carry out synchronous handshake, the maximum allowable waiting time is 200 microseconds, after input data and a calculation result of the opposite side are read, the consistency of synchronous frames is compared, a watchdog timer is not cleared in the process of periodic synchronization, if the periodic synchronization of the two channels fails, a periodic synchronization failure counter is added with 1, a task period is restarted, if the synchronization fails for 10 times, a periodic task is not executed, a periodic synchronization failure is recorded in a failure register, and the control right of a system is handed out.

The difference between the two synchronization modes is that the initial synchronization solves the problem of synchronous starting of the two machines, and the maximum waiting time limit of the two handshakes is 2 seconds. The periodic synchronization solves the synchronization problem of the application task, the maximum allowable waiting time is 200 microseconds, and the fault register records 10 times of faults and then reports errors. The repetition times and the waiting time are design experience reference values and can be adjusted according to the requirements of users.

The data synchronization means that the processor board between two channels adds synchronous frame information (synchronous frame is the period information of this data) during each data exchange, so as to ensure that the processor board can be resynchronized with another channel in the same period after the system is out of synchronization. When a data synchronization error occurs (i.e. a synchronization frame comparison error), the following two methods can be adopted to deal with the out-of-synchronization problem after data synchronization: (1) and forcibly changing the larger synchronous frame number in the synchronous frame numbers into the smaller synchronous frame number in the synchronous frame numbers, adding 1 to the data synchronization error counter, and continuing the downward progress of the task. (2) And waiting for a smaller period frame channel instruction by a larger period frame channel in the data frame, adding 1 to the data synchronization error counter when waiting for 3 periods to still fail, and restarting a task period. If the synchronization fails for 10 times, the periodic task is not executed any more, the data synchronization fault is recorded in the fault register, the control right of the system is handed over, and the two processing modes after the synchronization frame is out of step can be modified according to the requirements of users.

As shown in fig. 4, the fault registers in the first fault detection board and the second fault detection board of the present invention can record an initial synchronization fault, a periodic synchronization fault, a data monitoring fault, a self-checking fault (BIT fault of each board card in the computer system), a power monitoring fault (power signal overvoltage or overcurrent), a watchdog overflow and a safety comparison fault, respectively, and are used to record a corresponding fault, temporarily store the fault through the latch, and then output a corresponding channel fault logic. When any fault input exceeds the preset value, the control right of the computer system can be given out, and the system is disconnected from the outside. In all fault inputs, the power supply monitoring fault has the highest priority, namely when the power supply is in fault, early warning is needed, and time for storing the current state information is reserved for the processor module.

The system hardware redundancy structure and the software redundancy structure are matched with each other, so that the system can work with high reliability. However, this high reliability is extremely "sensitive" to the input signal or the product itself, which is very likely to cause transient errors in the output of the product. In order to eliminate the 'sensitive' characteristic caused by high reliability, a software 'fault-tolerant technology' -fault-tolerant counter is added to a safety output board of a product in each computer system, so that the reliability of the product is further increased.

The data of two channels synchronized by the computer system respectively enter the logic of CCQI and CCQII in the programmable logic module through respective CPCI buses, and after the CCQI and the CCQII receive the periodic frame and the valid data of the corresponding channel, the information bit of the received data mark is changed from '0' to '1' (the judgment is realized by detecting the write signal), or the information bit of the received data mark is changed from '1' to '0', which indicates that the logic receives the data; when the information bits of the received data flags are consistent, namely both are '1' or '0' (comparison is performed in the data flag register), periodic frame comparison is performed (comparison is performed in the periodic frame information comparison register), if the two data flags are the same, it is indicated that the two groups of data are data at the same moment, and if the two data flags are different, an error is considered to occur, and 1 is added to a corresponding counter; when it is judged that the data is not lost, the two-channel data are compared in a CPU output data comparison register. If the error times are lower than the allowable value and the output allowable signal is valid, the control channel keeps outputting outwards; otherwise, the channel is changed to output the safety state, and the controller is automatically switched and controlled by another computer system. When the data are consistent, the counter is decreased by 1 until the data are decreased to 0; when the data are inconsistent, the counter is increased by 1 until the data exceed the allowable value, the channel stops outputting, a fault signal is fed back to the first fault detection board and the second fault detection board, and the processor module informs the other computer system after acquiring the information and switches the control right, so that a fault-tolerant strategy is realized.

The input of the channel fault logic is discrete quantity, and in order to improve the reliability, two-bit coding is adopted, wherein bit1.0 is equal to 10 and represents logic 0, and bit1.0 is equal to 01 and represents logic 1.

The following table is a truth table for control switching logic.

TABLE 1 truth table of control power switching logic

Description of the drawings: SysErr is 1, which indicates the fault of the computer system, and when the channel fault logic of any channel of the computer system reports the fault; AnSysErr ═ 1 indicates another computer system failure; MasSel ═ 1 indicates that the input designation of the control right is the control end; OutEN is 1, which means that the output of the system is valid; both 0 and 1 in the table are logical 0 and 1.

When the computer system needs to output the security state, the communication path between the computer system and the external signal needs to be disconnected. When the computer system is in failure (namely the output permission signal is invalid), the relay is disconnected, and the path of internal data and external signals is cut off; otherwise, the relay is closed and data is output.

The above description is only an embodiment of the present invention, and is not intended to limit the scope of the present invention, and all equivalent structural changes made by using the contents of the present specification and the drawings, or applied directly or indirectly to other related technical fields, are included in the scope of the present invention.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于FPGA芯片的高层次综合仿真验证方法及系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!