Dialing market data latency monitoring up to 11 - reducing monitoring interference with diagnostics
Dialing market data latency monitoring up to 11 - reducing monitoring interference with diagnostics

By Kevin Gilchrist - Managing Director - US Business Development, NYSE Euronext
Latency measurement is a tricky beast. For this first blog entry I will step away from the already well covered network angle and take a closer look at the server side. I’ll also throw in some references to unlucky Austrian cats. Putting aside that latency terminology is ill-defined (one of the raisons d’être for latencystats.com), empirical latency measurement is getting tougher as we get closer to physical limits. So much so that many high-frequency trading (HFT) firms recently highlighted their inability to directly monitor the performance of what occurs inside a trading server.
This is because, in ultra-low latency situations, most if not all of the processing (parse market data + normalize + signal generation + dispatch order), occurs on one server (see diagram).

Doing anything across more than one server introduces network hops and therefore additional latency. This makes it hard to do the usual out-of-band monitoring. The Financial Information Exchange (FIX) order messaging to and from the exchange can still be monitored, but what is hard to do is measure how the market data and trade signal generation is processed. There is not an output for each market message received and an order signal is only generated if the strategy’s market conditions are met. This in turn means performance is not simply a measure of seeing a volume of data going in and a corresponding amount of transformed data (trade signals) coming out.
Configuring the software to log performance information introduces latency which offers the dilemma of how do you manage what you can’t measure? Now we come to the kitty. Students of physics see the situation as analogous to Schrodinger’s cat, i.e. the act of observation interferes with the status of what you are trying to observe (in the case of Schrodinger, the life of said moggy was on the line).
To avoid this observational interference (and not get a call from the ASPCA, really flogging this cat analogy huh?) one can instead capture a sizable amount of production data then and replay it at real speed or multiples of real-time under different software configurations. Logging is turned on during these tests and the differences are observed. It’s not reflective of the real performance speed but may be indicative of relative improvements.
During these replay tests, I’ve also heard that the variability is so sensitive that the temperature of the server makes a difference at these levels. Basically the cooler the server the better.
It would be interesting to hear readers’ observations on the matter, particularly views on effective monitoring of server-side latency and how to effectively do so without invalidating or impacting the results.

Post new comment