Our world is generating data at a speed faster than ever before. In year 2010, 5 Exabytes (1018 bytes, or 1 billion Gigabytes) of data were created every two days, which exceed the total amount of information that was created by human beings from the dawn of civilization to year 2003.1 Till 2020, over 40 Zettabytes (1021 bytes) of data would be created, replicated, and consumed.2 With the overwhelming amount of data pouring into our lives, from anywhere, anytime, and any device, we are undoubtedly entering the era of Big Data.
Big data, with their promise to discover valuable insights for better decision making, have recently attracted significant interests from both academia and industry. The voluminous data are generated from a variety of users and devices, and are to be stored and processed in powerful datacenters. As such, there is a strong demand toward building an unimpeded network infrastructure to gather the geo-distributed and rapidly-generated data, and move them to datacenters for effective knowledge discovery. The express network should also be seamlessly extended to inter-connect multiple datacenters, and to inter-connect the server nodes within a datacenter. In this article, we take a close look at the unique challenges in building such a network infrastructure for big data. The study in this segment related to a network highway: the access networks that connect data sources, the Internet backbone that bridges them to remote datacenters, as well as the dedicated network among datacenters and within a datacenter. We also present two case studies of real-world big data applications that are empowered by networking, highlighting interesting and promising future research directions.
Big data brings big value. With advanced big data analyzing technologies, insights can be acquired to enable better decision making for critical development areas such as health care, economic productivity, energy, and natural disaster prediction, to name but a few. For example, by collecting and analyzing flu related keyword searches, Google has developed the Flu Trends service to detect regional flu outbreaks in near real-time. Specifically, Google Flu Trends collects historical search frequency data of 50 million common keywords in each week from 2003 to 2008. Then a linear model is used to compute the correlation coefficient between each keyword search history data and the actual influenza-like illness history data obtained from the Centers for Disease Control and Prevention (CDC) in the US. After that, the keywords with the highest correlation coefficients are picked out and their instant search frequencies are aggregated to predict future flu outbreaks in the US. With big data in keyword searches, Google Flu Trends is able to detect flu outbreaks over a week earlier than CDC, which can significantly reduce the loss caused by the flu and even save lives. Another example comes from the United Parcel Service (UPS), who equips its vehicles with sensors to track their speed and location. With the sensed data, UPS has optimized its delivery routes and cut its fuel consumption by 8.4 million gallons in 2011.3 It has been reported that big data analytics is among the top 5 catalysts that help with increasing US productivity and raising the GDP in the coming years.
Data explosion, which has been a continuous trend since the 1970s, is not news. The Internet has long grown together with the explosion, and has indeed greatly contributed to it. The three V’s (volume,variety and velocity) from today’s big data however are unprecedented. It remains largely unknown whether the Internet and related networks can keep up with the rapid growth of the big data.
bigdata-ieee-network: Building a Network Highway for Big Data: Architecture and Challenges.pdf