From the July 01, 2012 issue of Futures Magazine • Subscribe!

Big Data: Manage it, don’t drown in it

Collection and distribution

Data access and distribution are the most straightforward of the challenges, but intelligent strategies can be rewarded with cost savings and optimized performance that translates to competitive advantages.

Data access can be via direct connection or through an aggregator maintaining its own direct connections (see “Buy vs. build,” below).  For algorithmic traders, latency introduced by anything less than a direct connection may be a deal-breaker. On the other hand, direct connectivity is not easily scalable and its speed comes at a price. Each new connection requires parsing a new data protocol, so development and maintenance costs quickly can become prohibitive for companies requiring multiple connections.

Conversely, aggregated connections come with the advantage of a single parser obtaining data from multiple sources, packaged and ready to plug into a firm’s workflow. There may be some reduction in speed, though often in microsecond time frames. Additionally, for full value, real-time data requires historical context, and this need adds up-front purchase costs and data storage burdens. Aggregated connectivity may offer a way to skirt this mechanical archiving challenge and a significant data purchase cost. For a market data provider, customer requirements may compel a middle ground where direct connectivity to key markets is essential and ancillary data sourced from third-party aggregators is perfectly acceptable.

Regardless of direct or aggregated connectivity, if an individual or organization must redistribute data, this poses additional hurdles. Quick and reliable data redistribution to a user base demands server farm performance. Hardware costs and scalability issues can be significant. Part of the solution is straightforward: Farms simply must have the capacity to serve clients effectively. The other part — hardware optimization — is where dealing with Big Data intelligently can yield a competitive edge. With flat file storage, SSD, multithreading and other methods, it is possible to make redistribution more sophisticated and less a brute-force issue of server farm expansion.

<< Page 2 of 4 >>
Comments
comments powered by Disqus