Nanosecond Latency Monitoring
Nanosecond Latency Monitoring
By Fergal Toomey - Chief Scientist and Co-Founder at Corvil
Trading speed continues its relentless growth and where once the millisecond was the basic unit of measurement, many participants now speak exclusively in terms of microseconds. A recent press release from Corvil, describing how the firm is helping Nomura to measure latency with nanosecond granularity, raised eyebrows as it appears to presage an imminent breach of the microsecond barrier.
The truth is that components of the trading loop are now using hardware-based technologies to achieve latencies in the low microsecond range. This necessitates the use of nanosecond-level measurement to gauge their performance. Nomura's NXT Direct DMA service, which provides market access to Nomura's clients with less than 3 microseconds of latency, is an example of this trend.
Nanosecond-level measurement is proving particularly important for co-located trading systems, where component latencies of a few microseconds can have a material impact on end-to-end performance. Co-location reduces the impact of distance-related latency, making the relative impact of system latency all the more important. Designers working on performance improvements for these components do literally think in terms of fractions of a microsecond today. As well as the example of co-located DMA, a second area that I believe will benefit soon from nanosecond-level measurement is tick-to-order latency for co-located algorithmic trading services. Tick-to-order latency measures how quickly a trading system can respond with orders to opportunities in the market, when they become visible via market data feed updates or 'ticks'. It is a key performance metric for high frequency trading, and designers using FPGA-based technologies are already targeting values in the low microseconds.
Nanosecond measurement will also offer traders a new level of insight into latency within the exchange itself. Many traders already monitor market response times, such as the time taken to acknowledge an order, or for a placed order to become visible in the market data feed. These measurements can help reveal why a particular order-request was not successful, for example. Response times are now well into the sub-millisecond range in many cases, even if they haven't quite reached the single-digit microsecond level yet. But traders are often interested in monitoring small changes in response time and how they might affect the behaviour of their strategies – after all, it is relative speed (rather than absolute response time) that determines trading success. With nanosecond-level measurements, truly tiny latency differences start to become visible. For example, it takes about 5 nanoseconds for a signal to travel one metre through cable – so even changing a cable length by a few metres within the exchange will produce effects that traders can observe.
A nanosecond is such a short period of time that we are often asked how it is possible to measure at this level of granularity. It is worth explaining some aspects of the answer here, especially because there are obvious approaches that do not actually work. The starting point for any latency measurement is the creation of timestamps for events of interest, such as the arrival of a market data update into a trading system or the transmission of a resulting order-request back towards the exchange. To achieve nanosecond precision these timestamps must be generated in hardware using specialized time-stamping equipment. Software timestamps, while easy to generate, are unfortunately not precise enough to consistently support even microsecond-level accuracy. This is partly because it is surprisingly difficult to access high resolution time information from software without careful, platform-dependent design. And it's partly because events detected in software can be ambiguous due to process-scheduling and buffering effects. Hardware time-stamping avoids these problems by triggering off well-defined low-level events (such as arrival/transmission of a network packet) and by ensuring fast access to a high resolution timer when the event occurs.
To calculate latency values at least two timestamps for different events must be compared. Best results are achieved when these timestamps come from the same clock, since otherwise we will have to worry about keeping clocks in sync. Even when using a single clock, the clock-speed relative to 'real-time' might be fast or slow, and can change due to temperature fluctuations. Clearly a clock that is running fast will produce latency measurements that are too short. In practice, most crystal-based clocks used in computer systems have speeds that are accurate to within a few tens of parts per million when used in a temperature-controlled data center environment. This is sufficient to allow nanosecond-level measurement for short latency values; for example, a measured latency of 1 millisecond will be accurate to within a few tens of nanoseconds. One might try to eliminate clock-speed inaccuracy by synchronizing the clock to a more stable external time source. But it turns out that most techniques for synchronizing clocks stuggle to achieve accuracy in the tens-of-nanoseconds range. For measuring short latency values, you will normally do best by letting the clock run free.
Once precise comparable timestamps are available for different events, the final step is to stitch them together to calculate useful latency metrics such as order response times, network transit times or tick-to-order latency values. This requires understanding the business transactions of interest, and using tools that can rapidly search streams of events to find the meaningful transactions within them. The process can be challenging in environments where data rates are high, multiple different protocols are in use and data is transformed many times as it passes along the trading loop. But it is a crucial step to extract real value from precion time-stamped event data.