Mobile QR Code QR CODE

  1. (Department of Electronics Engineering)
  2. (Department of Information and Communications Engineering, Sun Moon University, 70, Sunmoon-ro 221 beon-gil, Tangjeong-myeon, Asan, Chungnam 31460, Korea)



Clock synchronization, white rabbit, mixed-mode clock manager, synchronous ethernet, FPGA, 1000BASE-T

I. INTRODUCTION

Clock synchronization has been extensively studied to meet the timing requirements of various networked or distributed systems in the applications such as telecommunications, measurement and control, and high energy physics. To achieve higher accuracy, IEEE1588-2008 (1) has been standardized in the past decade and has presented Precision Time Protocol (PTP) to replace Network Time Protocol (NTP) and to provide sub-microsecond accuracy. In 2009, the European Organization for Nuclear Research (CERN) proposed White Rabbit (WR) (2) to provide sub-nanosecond accuracy for particle accelerator equipment synchroni-zation. Since then, various applications (3-5) including accelerators, synchrotrons, neutrino detectors, cosmic ray detectors, and national time laboratories are trying to utilize the great performance of WR.

A number of studies on WR implementation have been reported in the recent years. Some notable works among them are summarized as follows. National Institute of Standards and Technology (NIST) evaluated the use of WR-based time and frequency transfer within their own campus and verified the calibration procedure (6). NIST tries to improve the accuracy and precision of a real-time realization of Coordinated Universal Time (UTC). Universidad de Granada evaluated the influence of system frequencies to the overall clock synchronization accuracy (7). The authors modified WR PTP core (WRPC) and Ethernet physical layer to implement programmable frequency solution on a WR-LEN board. The paper shows that the higher system frequency of 250 MHz provides slightly better synchronization accuracy of 97.619 ps (peak-to-peak of the clock skew) and double link bandwidth. National Instruments presents the improvement of IEEE 1588 synchronization accuracy in 1000BASE-T systems (8). Some of clock-domain crossing errors could be compensated by using digital dual-mixer time difference (DDMTD) based on better understanding the clock relationships in physical layer transceivers. Note that this is the only attempt reported so far to implement sub-nanosecond accuracy over copper media. Although their test results show high accuracy of 460ps p-p, they did not implement frequency synchronization, which is one of the main ideas of WR. Sun Moon University presents a nanosecond-accuracy clock synchronization circuit for IEEE 1588 using tapped delay lines (9). Experimental results show that the two nodes share synchronized timing within the error between -0.74 ns and 0.89 ns.

This paper presents the White Rabbit implementation over copper media (1000BASE-T) using mixed-mode clock managers (MCMMs) and s frequency transfer technique based on Synchronous Ethernet (SyncE) (10). Frequency synchronization has been achieved by using clock signals generated from the phase-locked loop (PLL) in the physical layer transceiver. The clock synthesis circuit implemented in Xilinx 7 series FPGA replaces the voltage controlled crystal oscillator (VCXO) and digital-to-analog converter (DAC) components outside the FPGA to make the system implementation simple, easy, and low-cost. Measurement results show the clock synchronization accuracy less than 100 ps has been achieved.

The rest of this paper is organized as follows. Section II presents the architecture and functions of the proposed clock synchronization circuit for 1000BASE-T White Rabbit. In Section III, a clock synthesis circuit using adders and a MMCM implemented in a Xilinx FPGA is described in detail. Section IV explains clock relationships among master and slave devices and 1000BASE-T transceivers to transfer the reference frequency. Section V shows the test setup and remarkable measurement results. Section VI concludes the paper.

II. ARCHITECTURE OF THE PROPOSED 1000BASE-T WHITE RABBIT CIRCUIT

White Rabbit is one of the most advanced clock synchronization technologies to provide large distributed systems with sub-nanosecond accuracy. As mentioned earlier, CERN has developed WR in 2009, in order to synchronize thousands of devices in sub-atomic particle acceleration facilities distributed over 20 km within very small time error. WR uses SyncE to lock the clock frequencies at distant nodes and PTP of IEEE 1588 to share timing information. A two-way exchange of the PTP messages allows precise adjustment of clock phase and offset. The link delay is known precisely via accurate hardware timestamps and the calculation of delay asymmetry. A DDMTD using two digital mixers with the same offset clock measures the frequency and phase of the Ethernet media-dependent interface (MDI) clock in relation to the PTP clock domain.

Fig. 1. Overall architecture of the proposed circuit using MMCMs and SyncE for White Rabbit over 1000BASE-T.

../../Resources/ieie/JSTS.2020.20.4.320/fig1.png

Fig. 1 shows the overall architecture of the proposed circuit for clock synchronization using modified WRPC. Clock synthesis circuits and a Synchronous Ethernet frequency transfer unit (FTU) are integrated in the FPGA to run WR over copper media. The former synthesize a main reference 125 MHz clock (frequency-synchronized to the receive clock) and an offset or helper loop frequency for DDMTD phase detectors. A clock synthesis circuit implemented inside the FPGA replaces an external VCXO oscillator tuned by DAC. In many WR design cases such as SPEC, SVEC, and SPEXI reported so far, soft PLL requires two external VCXOs controlled by DACs. Section III describes the clock synthesis process and circuit structure in detail. SyncE FTU converts WR message manager interface into RGMII using a fast clock of 250 MHz for double data rate (DDR) transfer. Section IV describes the proposed frequency transfer strategy and the detailed operation of SyncE FTU.

WR message manager forms Ethernet frames from packets, which WRPC sends out, for a low-level communication. It also decodes a data stream received from PHY into understandable high-level packets. DMA engine performs the direct memory access (DMA) mechanism by pushing PTP packets directly to WR message manager and fetching received messages from the arbiter. It gets the transmission requests from the soft-core processor and signalizes when a new packet has been received from Ethernet media access control (MAC). Ethernet frame arbiter allows both DMA engine and a generic MAC to access the data interface of the WR message manager. Since the interface is actually the Wishbone bus operating in a pipelined mode, the module is a very simple Wishbone interconnect. LatticeMicro32 (LM32) is a 32-bit, big-endian, Harvard architecture soft-core processor optimized for FPGA chips. The original LM32 written in Verilog has been configured to control the synchronization and operation of all modules inside the WR PTP Core. The processor can pass bytes for sending, get received characters and configure a baud rate using a Wishbone interface. The system information from the WR PTP daemon can be outputted to the user console. Soft phase-locked loop synchronizes the frequency of the local reference clock (125 MHz) to the receive clock recovered from a data stream. It consists of two PLLs for helper and main loops, but actually only of the measurement circuits providing through Wishbone parameters necessary for software algorithm (DDMTD phase detectors). Dual-port RAM (DPRAM) services as data and instruction memory for LM32 and as packet data memory for DMA engine at the same time. LM32 has two Wishbone master interfaces, one for the instruction memory and one for the data memory. Each module communicates one another via pipelined Wishbone interface.

The proposed circuit uses Ethernet over copper media (1000BASE-T) in order to provide home and office networks with sub-nanosecond accuracy. Note that WR uses Ethernet over optical media (1000BASE-X) since it has been developed to synchronize thousands of accelerator devices distributed over long distance of tens of km. Greenstreet et al. is the only attempt to achieve sub-nanosecond accuracy using copper media reported so far. The attempt is not, however, WR implementation but an IEEE 1588 improvement by using DDMTD and by reducing the number of clock domain crossings.

III. CLOCK SYNTHESIZER BASED ON MIXED-MODE CLOCK MANAGER

Most of WR implementations use 2 external VCXOs controlled by DACs for the soft PLL. One generates a reference clock (frequency-synchronized to the physical layer clock), while the other outputs an offset frequency ($f_{PLL}=\frac{N}{N+1}f_{clkA}$) for DDMTD phase detectors. They make, however, the overall system expensive and complex since all the components except the VCXOs and DACs are integrated inside a single FPGA. Note that in particular, a very stable external VCXO to provide frequency control of several ppm costs quite a lot.

A clock synthesis circuit implemented in the FPGA to replace the external VCXO and DAC reduces a lot of the complexity and cost of the overall system. Although digital clock synthesizer circuits have been studied for the recent decades (11,12), most papers describe clock synthesizers as a small part of PLL or time-to-digital converter (TDC). Moreover, synthesizable clock synthesizers are very rare since very accurate frequency control requires full or semi-custom design. A fully synthesizable clock generator with proper performance must be designed by hardware description languages and implemented into the FPGA.

Fig. 2. Digital dual-mixer time difference (DDMTD) with a phase-locked loop (PLL), two DFFs, a deglitcher, and counters.

../../Resources/ieie/JSTS.2020.20.4.320/fig2.png

Fig. 2 shows the structure of digital dual-mixer time difference circuit. It consists of a phase-locked loop (PLL), two DFFs, a deglitcher & pulse shaping circuit, and counters for phase difference averaging. Note that the DDMTD presented is fully synthesizable since the entire circuit is written in Verilog. It computes the phase difference between clkA and clkB by sampling them with the offset frequency of f$_{\mathrm{PLL}}$. Deglitcher eliminates glitches around the signal transitions in the DFF outputs caused by jittery clock inputs. Counters then determines if the signal stay constantly high or low for a configured amount of clock cycles.

Fig. 3. Block diagram of mixed-mode clock manager (MMCM) in a Xilinx Kintex 7 FPGA for clock synthesis.

../../Resources/ieie/JSTS.2020.20.4.320/fig3.png

Fig. 3 shows a detailed view of MMCM as a clock synthesizing resource of Xilinx 7 series FPGAs. Input multiplexers select the reference and feedback clocks from either the dedicated clock buffer outputs or interconnect. Each clock input has a programmable counter divider (D). The phase-frequency detector (PFD) compares both phase and frequency of the rising edges of the both clocks. PFD’s output drives the charge pump (CP) and loop filter (LF) to generate a reference voltage. The PFD produces an up or down signal to CP and LF to determine whether the VCO should operate at a higher or lower frequency. The VCO produces eight output phases and one variable phase for fine-phase shifting. A special counter (M) is also provided for fractional divide operation. M controls the feedback clock of the MMCM, allowing a wide range of frequency synthesis.

IV. FREQUENCY TRANSFER USING SYNCHRONOUS ETHERNET TECHNOLOGY

Sub-nanosecond accuracy of WR is based on frequency synchronization using Synchronous Ethernet, which is an ITU-T standard for computer networking that facilitates the transference of clock signals over the Ethernet physical layer. Note that most of WR applications are concentrated in science field for very long distance, which is why they use optical media. We could not find WR implementation using copper media reported.

Fig. 4. Frequency transfer through 1000BASE-T physical layer based on Synchronous Ethernet.

../../Resources/ieie/JSTS.2020.20.4.320/fig4.png

Fig. 4 shows the proposed frequency transfer scheme between SyncE master and slave nodes. The difference from the existing method of PTP is that the clocks are syntonized by SyncE FTUs to improve the synchronization accuracy. First, two 1000BASE-T transceivers are forced to be link master and slave by writing proper values into PHY internal registers instead of decided in the auto-negotiation procedure. SyncE master generates a reference clock from a local oscillator clock signal using an internal PLL. SyncE slave then recovers the master’s reference clock from the incoming SYNC signals. The transceiver sends RGMII clock to SyncE FTU for the clock synthesis circuit. The recovered clock is syntonized but not synchronized to the clock in the master node, which means the frequencies are the same, the phases are different.

NetFPGA-1G-CML is one of the most popular prototyping boards for computer network devices. It provides a Xilinx Kintex-7 FPGA and 4 1Gb/s Ethernet interfaces, which enables the development of a single port Gigabit Ethernet device or a 4-port Ethernet switch using one of the cutting edge FPGAs. It is worth noting, that NetFPGA-1G-CML is much more cost-effective than SPEC board developed by CERN. The FPGA-based system allows users to develop designs that are able to process packets at line-rate, a capability generally not afforded by software based approaches.

Four Realtek RTL8211E Ethernet transceivers are provided to interface network connections via on-board RJ-45 connectors. Note that RTL8211E provides a PLL clock output, which is not a general support by commercial Ethernet transceivers. The PLL clock output is used in SyncE FTU for clock synthesis. IEEE 802.3 is going to define the generation of the RX and TX timestamp triggers in the generic reconciliation sublayer to support synchronization accuracy improvement. The standard committee is also defining transceiver delay measurements to report the latencies between the MDI and GMII for the both directions.

Fig. 5. Measurement environment for the clock synchronization accuracy and precision.

../../Resources/ieie/JSTS.2020.20.4.320/fig5.png

SyncE FTU connects Ethernet interface of WRPC and RTL8211E transceiver. In packet transmission, the massage data in 8-bit Ten Bit Interface (TBI) is decoded to detect special characters, then converted into double data rate (DDR) 4-bit RGMII of the half pin-count. In packet reception, the message data in 4-bit RGMII is converted into single data rate (SDR), then encoded using 8B/10B scheme to make 8-bit TBI. To meet the specific timing constraint, the skew among data and control signals has been minimized and controlled to have the phase difference of 90 degrees from RXC the reference clock.

V. MEASUREMENT RESULTS

The synchronization accuracy and precision of the 1000BASE-T WR circuit has been evaluated in the measurement environment as shown in Fig. 5. Two NetFPGA-1G-CML boards are exchanging WR messages via a direct Gigabit Ethernet connection. Unshielded twisted pair category-5 cable between them has the maximum length of 100 m. The master has the most accurate and stable reference clock from a function generator to broadcast its own timing information on SYNC messages. Receiving the messages, the slave then estimates the phase and frequency offsets to synchronize its local clock to the master’s timing. A digital phosphor oscilloscope with a sampling rate of 20 GS/s records the clock skew between pulse per second (PPS) output signals from the both nodes for 12 days.

Fig. 6. Oscilloscope screenshot that records the clock skew distribution of the slave from the master’s reference clock.

../../Resources/ieie/JSTS.2020.20.4.320/fig6.png

As depicted in Fig. 6, an oscilloscope screenshot indicates the rising edges of PPS$_{\mathrm{SLAVE}}$ are distributed over the range from -50.27 ps to 47.83 ps from the exact rising edge of PPS$_{\mathrm{MASTER}}$. As a final result, the synchronization accuracy or peak-to-peak clock skew is 98.10 ps and the precision or standard deviation is 10.78 ps. It is remarkable that the histogram are made from more than a million PPS samples, which is much more than most of the previous works. A mathematical model for the precision would be obtained by modifying the offset equation of WR over optical fiber (13). Analysis on copper transceiver latencies and cable delays, and then comparison to the measurement results should be presented in further research.

VI. CONCLUSIONS

A circuit implementation of White Rabbit over 1000BASE-T network environment has been presented with a clock synthesis circuit implemented in a commercial FPGA device and frequency transfer strategy based on Synchronous Ethernet. Realized and downloaded onto a state-of-the-art FPGA, the proposed circuit was verified to achieve the peak-to-peak clock skew less than 100 ps. Moreover, this technique uses gigabit Ethernet connections over existing copper media without any additional hardware resource or modification. That proves it is one of the very prominent solutions for cost-effective yet high-accuracy clock synchronization in the future applications.

ACKNOWLEDGMENTS

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2017R1C1B5018418).

REFERENCES

1 
2008, IEEE Standard for a Precision Clock Synchroniza-tion Protocol for Networked Measurement and Control Systems, IEEE Std., pp. 1588-2008Google Search
2 
Moreira P., Oct 2009, White rabbit: Sub-nanosecond timing distribution over Ethernet, Precision Clock Synchronization for Measurement, Control, and Communication, 2009, ISPCS 2009, 3rd IEEE International Sym. on, 12-16, pp. 1-5DOI
3 
Jiménez-López M., Feb 2019, A Fully Programmable White-Rabbit Node for the SKA Telescope PPS Distribution System, Instrumentation and Measurement, IEEE Trans. on, Vol. 68, No. 2, pp. 632-641DOI
4 
Ramos F., Gutiérrez-Rivas J., López-Jiménez J., Caracuel B., Díaz J., May 2018, Accurate Timing Networks for Dependable Smart Grid Applications, Industrial Informatics, IEEE Transactions on, Vol. 14, No. 5, pp. 2076-2084DOI
5 
de la Morena C., Jan 2018, Fully Digital and White Rabbit-Synchronized Low-Level RF System for LIPAc, Nuclear Science, IEEE Trans. on, Vol. 65, No. 1, pp. 514-522DOI
6 
Savory J., Sherman J., Romisch S., May 2018, White Rabbit-Based Time Distribution at NIST, Frequency Control Symposium, 2018. IFCS 2018, IEEE International, 21-24, pp. 1-5DOI
7 
Girela-López F., Torres-González F., Díaz J., Ultra-accurate Ethernet time-transfer with programmable carrier-frequency based on White Rabbit solution, Precision Clock Synchronization for Measurement, Control, and Communication, 2017, ISPCS 2017, 11th IEEE International Symposium on, pp. 36-41DOI
8 
Greenstreet R., Zepeda A., Improving IEEE 1588 synchronization accuracy in 1000BASE-T systems, Precision Clock Synchronization for Measurement, Control, and Communication, 2015, ISPCS 2015, 9th IEEE International Sym., pp. 1-6DOI
9 
Han J., Shin C., Dec 2016, A nanosecond-accuracy clock synchronization circuit for IEEE 1588-2008 using tapped delay, Electronics Express, IEICE, Vol. 13, No. 23, pp. 1-6DOI
10 
G., 2019, 8261: Timing and Synchronization Aspects in Packet Networks, ITU-T RecommendationGoogle Search
11 
Yuan C., Shekhar S., Sep 2019, A Supply-Noise-Insensitive Digitally-Controlled Oscillator, Circuits and Systems I: Regular Papers, IEEE Transactions on, Vol. 66, No. 9, pp. 3414-3422DOI
12 
Cadeddu S., Aug 2017, A Time-to-Digital Converter Based on a Digitally Controlled Oscillator, Nuclear Science, IEEE Transactions on, Vol. 64, No. 8, pp. 2441-2448DOI
13 
Lipinski M., Włostowski T., Serrano J., Alvarez P., Sep 2011, White Rabbit: a PTP Application for Robust Sub-nanosecond Synchronization, Precision Clock Synchronization for Measurement, Control, and Communication, 2011, ISPCS 2011, 5th IEEE Int. Sym. on, pp. 25-30DOI

Author

Jiho Han
../../Resources/ieie/JSTS.2020.20.4.320/au1.png

received the B.S, M.S, and Ph.D degrees in Electrical Engi-neering and Computer Science from Seoul National Univ. in 2002, 2004, and 2009, respectively.

He has been an assistant professor in Sun Moon Univ. since 2014. His research interests include clock synchronization and carrier-grade Ethernet.

Changyong Shin
../../Resources/ieie/JSTS.2020.20.4.320/au2.png

received the B.S and M.S degrees from Yonsei Univ. in 1993 and 1995, respectively, and the Ph.D degree from the Univ. of Texas at Austin, in 2006, all in Electrical Engineering.

From 1995 to 2001 and from 2007 to 2014, he was with LG Electronics and with SAIT, respectively. Since 2014, he has been with the Department of Information and Communications Engineering at Sun Moon Univ.

His research interests include wireless communications and signal processing for communications.