Millionths of a Second Can Cost Millions of Dollars

A New Way to Track Network Delays




Computer scientists have developed an inexpensive solution for diagnosing delays in data center networks as short as tens of millionths of seconds—delays that can lead to multi-million dollar losses for investment banks running automatic stock trading systems. Similar delays can delay parallel processing in high performance cluster computing applications run by Fortune 500 companies and universities.

University of California, San Diego and Purdue University computer scientists presented this work on August 20, 2009 at SIGCOMM, the premier networking conference.

tracking network delays The new approach offers the possibility of diagnosing fine-grained delays—down to tens to microseconds—and packet loss as infrequent as one in a million at every router within a data center network. (One microsecond is one millionth of a second.) The solution could be implemented in today’s router designs with almost zero cost in terms of router hardware and with no performance penalty. The UC San Diego and Purdue University computer scientists call their invention the Lossy Difference Aggregator.

“This is stuff the big traders will be interested in,” said George Varghese, a computer science professor at the UC San Diego Jacobs School of Engineering and an author on the SIGCOMM paper, “but more importantly, the router vendors for whom such trading markets are an important vertical.”

If an investment bank’s algorithmic stock trading program reacts to information on cheap stocks from an incoming market data feed just 100 microseconds earlier than the competition, it can buy millions of shares and bid up the price of the stock before its competitors’ programs can react, the computer scientists say.

While the network links between Wall Street and investment banks’ data centers are short, optimized and well monitored, the performance of the routers within the data centers that run automated stock trading systems are difficult and expensive to monitor. Delays in these routers, also known as latencies, can add 100s of microseconds, potentially leading to millions of dollars in lost opportunities.

“Every investment banking firm knows the importance of microsecond network delays. Because routers today aren’t capable of tracking delays through them at microsecond time scales, exchanges such as the London Stock Exchange use specially crafted external boxes to track delays at various key points in the data center network,” said Alex Snoeren, a computer science professor at the UC San Diego Jacobs School of Engineering and an author on the SIGCOMM paper.

Millionths of a Second Can Cost Millions of Dollars Millionths of a Second Can Cost Millions of Dollars: A New Way to Track Network Delays But these external systems are generally too large and expensive to be added to every router in a data center network running an automated stock trading system. This makes it difficult for the network managers to identify and locate problematic routers before they cost the company large amounts of money, the computer scientists say.

“Our hope is that this approach will allow router vendors to add fine scale delay and loss tracking, at almost zero cost to router performance, perhaps obviating the desire for expensive external network monitoring boxes at every router,” said Ramana Kompella, the first author on the SIGCOMM paper and a computer science professor at Purdue University. Kompella earned his Ph.D. in computer science at UC San Diego in 2007.

The SIGCOMM 2009 paper presents simulations and proof-of-concept code for measuring latencies down to tens of microseconds and losses that occur once every million packets.

“The next step would be to build the hardware implementation, we are looking into that,” said Kompella, who plans to continue pioneering research in fault diagnosis at Purdue.

This work highlights a fundamental shift happening across the Internet. As computer programs—rather than humans—increasingly respond to streams of information moving across computer networks in real time, millionths of seconds matter. Algorithmic stock trading systems are just one example. Extra microseconds of delay can also mean slower response times across clustered-computing platforms, which can slow down computation-intensive research, such as drug discovery projects.

Read more on University of California - San Diego.