Snowbird and the Big Data Avalanche

Snowbird: Navigating the Research Slopes

As I write this column, a late spring snow has settled over Seattle, covering my freshly mown lawn. This prompted me to think about the upcoming CRA Conference at Snowbird, Utah. Every two years, the chairs of the Ph.D.-granting departments of computer science and engineering, as well as the leaders of government and industrial laboratories, gather at Snowbird to discuss all aspects of the state of computing— research, education, recruiting, diversity and inclusion, government and industrial policies, and collaboration. The Snowbird meeting a great opportunity for networking—the social kind—meeting new and old friends, exchanging ideas and experiences and sharing best practices.

This year’s Snowbird conference is being organized by J. Strother Moore (University of Texas at Austin) and Marek Rusinkiewicz (Telcordia Technologies), together with a very capable organizing committee. It will be held July 13-15; watch http://www.cra.org/Activities/snowbird/2008/index.html for details. I look forward to seeing many of you there, even without the snow!

The Big Data Avalanche

The old joke whose punch line says, “If you have to ask how much it costs, you can’t afford it,” has some relevance to big data. If you have to ask how big your data really is, you aren’t paying attention to how fast it is growing. One need only reflect on the fact that today’s inexpensive digital music players have more storage capacity than yesteryear’s supercomputers.

Big data has long been a personal interest of mine—from multiple perspectives. I first watched my scientific collaborators struggle to process scientific data from expensive instruments. Now, high-resolution sensors, inexpensive, large-scale storage and their diverse applications (from digital cameras to environmental monitors to scientific instruments) are changing how we record social interactions and cultural history and how we explore our world. In turn, the burgeoning volumes of data pose both opportunities and challenges for data provenance and curation, for analysis and processing, and for storage and retrieval.

These issues exemplify the breadth and depth of computing, its broad societal impact and the opportunities for multidisciplinary collaboration. As a personal example, I am a member of the electronic records advisory committee for the National Archives, which preserves the records of the federal government, including all Presidential records. The growth of Presidential email and other electronic records, together with rapidly changing email formats, storage technologies and all of the associated privacy, confidentiality and national security issues, make records’ preservation and organization fare more daunting that I could ever have imagined.

Hence, I am excited to have just returned from a “big data” meeting in Silicon Valley, which was organized as a Computing Community Consortium (CCC) event under CRA auspices; seehttp://www.cra.org/ccc/home.article.bigdata.html) for details. This visioning workshop brought together academia, government and industry to discuss research opportunities in an exciting area of great change. The talks spanned the gamut of topics, extracting insights from data via social network analysis to electronic laboratory notebooks and research provenance to large-scale infrastructure for Internet search and scientific data storage.

This meeting was but the first in a series that the CCC and CRA will sponsor over the coming months. Watch the CRA website for details.

Dan Reed, CRA’s Board Chair, is Microsoft’s Scalable and Multicore Computing Strategist. Contact him at Daniel.Reed [@] microsoft.com or his blog at www.hpcdan.org

Computing Research News