This Article is published in the May 2012 issue

NSF Leads Federal Big Data Initiative

By Farnam Jahanian and Suzi Iacono

On March 29, the White House Office of Science and Technology Policy (OSTP) launched a federal Big Data Research and Development Initiative (BDRDI). By improving our ability to extract knowledge and insights from large and complex collections of digital data, this initiative promises to solve some of the Nation’s most pressing challenges—in science, education, government, medicine, commerce and national security—laying the foundations for U.S. competitiveness for many decades to come.

Across the U.S. government today, agencies recognize that research and education communities are undergoing a profound transformation with the use of large-scale, diverse, and high-resolution data sets that allow for data-intensive decision-making at a level never before imagined. This initiative will both help to accelerate discovery and innovation, as well as support their transition into practice to benefit society. As the recent President’s Council of Advisors on Science and Technology (PCAST) 2010 review of the Networking Information Technology Research and Development (NITRD) program notes, the pipeline of data to knowledge to action has tremendous potential in transforming all areas of national priority.1

The cornerstone of this initiative is a joint NSF-NIH solicitation, Core Technologies and Techniques for Advancing Big Data Science & Engineering, or Big Data. It aims to advance the core scientific and technological means of managing, analyzing, visualizing, and extracting useful information from large, diverse, distributed and heterogeneous data sets. Specifically, the program will focus on foundational research in three areas:

  1. Data collection and management: Novel approaches and new tools are required to deal with massive amounts of often heterogeneous and complex data coming from multiple sources—such as those generated by observational systems in various sciences, simulations, and models across many scientific fields, as well as those created in transactional and longitudinal data systems in social and commercial domains.
  2. Data Analytics: Significant impacts will result from advances in analysis, simulation, modeling, visualization, and interpretation to facilitate discovery of phenomena, to realize causality of events, to enable prediction and to recommend action.
  3. E-science collaboration environments: To allow for broad communities of scientists, engineers, and analysts to have access to diverse data and the best inferential and visualization tools, a comprehensive “big data” cyberinfrastructure is necessary.

The Big Data program creates enormous opportunities for creating new knowledge from large-scale data across all disciplines. It is one component in NSF’s long-term strategy to address national big data challenges, which include advances in foundational techniques and technologies to derive knowledge from data;  cyberinfrastructure to manage, curate and serve data to science and engineering research and education communities; new approaches to education and workforce development; and a comprehensive program to support multi-disciplinary teams and communities to make advances in the complex grand challenge science and engineering problems of a computation- and data-intensive world.

The formulation of this initiative is the result of a thriving ecosystem that includes the research community, the private sector, and the science agencies in the federal government. The computing community significantly contributed to this initiative through a number of influential white papers, many of which are included in the seriesData Analytics: From Data to Knowledge to Action, posted on the Computing Community Consortium (CCC) website.2 This series highlights the importance of advances in big data to areas of national priority, including healthcare, new biology, science and engineering, cyber and national security, new transportation, education, and the smart grid. Several overview papers point out the challenges and opportunities as well as the path from inchoate data to discovery made possible through new methods and approaches.

Over a year ago, under the auspices of the National Science and Technology Council, OSTP chartered an interagency Big Data Senior Steering Group to develop a research, education, and infrastructure agenda as well as a plan for how the agencies can cooperate to achieve our Nation’s long-term goals. The Big Data committee is co-chaired by NSF and NIH, with members from DARPA, DOD OSD, DHS, DOE-Science, HHS, NARA, NASA, NIST, NOAA, NSA, and USGS. The interagency initiative announced last week is the culmination of the first year of a multi-year effort.

To summarize, the Big Data initiative aims to accelerate the progress of scientific discovery and innovation through advances in deriving knowledge from data; develop the next generation of big data scientists, engineers, and educators; facilitate scalable data infrastructure; and promote economic growth and improved health and quality of life.

We invite you to participate in this exciting new opportunity for the CISE community!

Farnam Jahanian is Assistant Director for Computer and Information Science and Engineering (CISE) at NSF. Suzi Iacono is Senior Science Advisor for CISE.


[1] See Designing a Digital Future: Federally Funded Research and Development in Networking and Information Technology, Executive Office of the President, December 2010:].

2 See the Computing Community Consortium White Papers website:



In addition to the $25 million joint NSF-NIH solicitation and CISE’s $10 million Expeditions in Computing award, participating agencies announced several new investments as part of the Big Data R&D Initiative.

The Department of Defense said that it is “placing a big bet on big data,” unveiling a $60 million “Data to Decisions” effort in support of new research projects across the full spectrum of data to decisions, autonomy, and human systems.

The goal is to harness and utilize big data in new and unconventional ways, together with sensing, perception, and decision support, to make truly autonomous systems that “go well beyond tethered joysticks.” In addition to a funding opportunity announcement, the DoD plans to run several prize competitions in the coming months.

DARPA announced the XDATA program, providing $25 million for projects that develop computational techniques and tools for analyzing large volumes of structured as well as unstructured data. Central challenges to be addressed through XDATA projects include “scalable algorithms for processing imperfect data in distributed data stores and effective human-computer interaction tools that are rapidly customizable to facilitate visual reasoning for diverse missions.” The program envisions open source software toolkits for flexible software development and, ultimately, processing of large volumes of data for use in targeted defense applications.


• NIH made available through Amazon Web Services (AWS) 200 terabytes of data from the 1000 Genomes Project, constituting “the world’s largest set of data on human genetic variation.”

• The Department of Energy Office of Science launched a $25 million Scalable Data Management, Analysis, and Visualization Institute, spanning six national laboratories and seven universities, as part of its Scientific Discovery Through Advanced Computing (SciDAC) Program. The institute’s objective is to develop new and improved tools to help scientists manage and visualize data.

• And the U.S. Geological Survey unveiled the latest awardees of its John Wesley Powell Center for Analysis and Synthesis. A total of eight projects are being funded with a focus on improving our understanding of earth system science through big data, including “species response to climate change, earthquake recurrence rates, and the next generation of ecological indicators.”

To learn more, visit

—Erwin P. Gianchandani