Big Data Regional Hubs Industry-Academic Collaboration


As a result of the CCC / CRA Industry Academic Survey, conducted in spring of 2015 and the CCC Industry Roundtable Discussion held on July 24, 2015, the CCC sponsored a program on Industry-Academic Collaboration. The goal of this program was to catalyze and foster partnerships between industry and academic research by creating mechanisms for early career researchers in academia and industry representatives to interact and to explore ways to work together.  This program enabled shared learning and perspectives, access to problems and data, career training, and opportunities for long term partnerships.

To foster these partnerships, the CCC implemented this program through the Big Data Regional Hubs. In 2016, the BD Hubs were charged with ensuring participation from diverse institutions and strengthening partnerships based on their specific regional needs. The CCC asked each Hub to create a plan that is aligned with their region’s needs and opportunities to foster partnerships, which would enhance the research of early career academic researchers, and which would benefit the Innovation Hub.  The plans focused on early career researchers, included metrics for success, and articulated how it will create long-term partnerships that persist beyond the initial activities. Featured activities included workshops, faculty internships, student internships, site visits, hackathons, and lecture series.

Below you can find more information about each of the individual hubs and their programming to foster industry-academic collaboration. You can also learn more about the CCC’s efforts to promote industry-academic collaborations in The Future of Computing Research: Industry-Academic Collaborations white paper.

Northeast BD Hub

The Northeast BD Hub is coordinated by Columbia University. The hub’s projects included:

  • The Young Innovator Internship – gave eight “Young Innovators” (MS students who have one year to graduation or PhD students who have two years or less until graduation) an opportunity to work on big data analytics with small companies, local government agencies, and NGOs.The graduate students who participated in this program are:
    • Justin Cole of Carnegie Mellon University, who served as a research associate on Smart Cities at the MetroLab Network
    • Tom Effland of Columbia University, who built and tested data parsing prototypes at TextIQ
    • Debopriya Ghosh of Rutgers University – Newark, who applied big data management and analytics to identify gaps in cancer care and enhance patient outcomes with HealthEC
    • Kenneth Graves of Teachers College, who produced pilot visualizations of big data collected from the NYC Foundation for Computer Science Education (CSNYC)’s CS4All initiative
    • Shupeng Gui of University of Rochester, who used analytics to improve fuel efficiency at Vnomics
    • Oliver Hao of University of Pittsburgh who investigated innovative methods in financial event extraction at Agolo 3
    • Andrew Satz of Columbia University who served as a data science consultant at Synergic Partners
    • Logan Wells of SUNY Oswego, who helped develop and improve functional assessment products with Motion Intelligence

    Learn about one participants experience in the program on the Northeast Hub blog.

  • The Knowledge Exchange – Yakov Bart of Northeastern University delivered a lecture on prospective meta-analysis in marketing (PMM) at Integral Ad Science on September 14, 2016.
  • The Enabling Seamless Data Sharing in Industry and Academia Workshop – held at Drexel University September 29-30th, this cross-sector workshop addressed best practices for data sharing and related concerns. Participants from across industry, academia, government, and non-profits delivered “TED talk” – style presentations and participated in breakout sessions to share the challenges they have experienced in sharing data and came up with potential solutions to these problems. Read about the workshop experience on the hub’s blog.

Contact Information:

  • René Baston, Executive Director of the Northeast BD Hub, rb70 [at]
  • Katie Naum, Program Coordinator of the Northeast BD Hub, ken2115 [at]

South BD Hub

The South BD Hub is coordinated by Georgia Tech and the University of North Carolina. The hub’s projects include:

  • Data Start – Through the Southern Startup Internship Program in Data Science (DataStart) internship program that partnered students from the South Region with data related startup companies. The program was able to provide six graduate student fellowships at six different startup companies in the southern United States from approximately June 1st to August 31st, 2016. The 2016 DataStart Fellows and companies were:
    • Student: Samia Ansari, University of Georgia

      Host company: Sartography, Staunton, VA

      Ansari, a student in the professional science master’s program in biomanufacturing and bioprocessing, characterized the representation of women and racial minorities in cancer research conducted between 2002 and 2012. She also worked to characterize and spot trends about women and minorities’ participation in cancer trials during that time.

    • Student: Lucy D’Agostino McGowan, Vanderbilt University

      Host company:, Nashville, TN

      D’Agostino McGowan, a PhD student in biostatistics, incorporated raw data streams from Google Analytics, Slack and other sources to create a foundation for predictive modeling for, a company that assembles remotely-managed freelance software development teams for companies worldwide.

    • Student: Aziz Eram, University of Arkansas at Little Rock

      Host company: Black Oak Analytics, Little Rock, AR

      Eram, a student in the master’s program in information quality, developed and tested a general approach to the problem of cleansing and standardizing information obtained from free text fields that reference the same product or service—for example, information about store inventory that is entered into a system manually by an employee.

    • Student: Zhengqian Jiang, Florida State University

      Host company: NPGroup Inc., Tallahassee, FL

      Jiang, a student in the department of industrial and manufacturing engineering, assisted NPGroup Inc. in developing and commercializing a sensor system for wind turbines that can accurately detect loads that go undetected in the models typically used by inflow sensors.

    • Student: Jonathan Ortiz, University of Texas at Austin

      Host company:, inc., Austin, TX

      Ortiz, a student in the professional data analytics program, worked with, an Austin-based stealth technology company headed by Brett Hurt, a serial entrepreneur who has led several successful big data startups, including Bazaarvoice and Coremetrics (now IBM Customer Analytics). For more on Ortiz’s work, see this article in GCN or his blog post on the South Hub website.

    • Student: Ashok Vardhan, George Mason University

      Host company: MetiStream, McLean, VA

      Vardhan, who is pursuing a master’s in data analytics engineering, helped develop a healthcare data conversion solution called Ember, which bridges the gap between the existing Health Level Seven International (HL7) version 2.x (HL7 V2) healthcare standards and the emerging next generation international specification called Fast Healthcare Interoperability Resources (FHIR).

  • PEPI – Through the Program to Empower Partnerships with Industry (PEPI), the South Big Data Hub provided funding to support early career faculty, research scientists, and postdocs in data-intensive fellowships with industry. The nine 2016 PEPI Fellows were:
    • Dr. Gerard Dumancas, Assistant Professor of Chemistry at Louisiana State University–Alexandria. Project Title: Use of data analytics for moisture prediction during tablet granulation. Host company: GlaxoSmithKline
    • Dr. Ragib Hasan, Assistant Professor in the Department of Computer and Information Sciences at the University of Alabama at Birmingham.  Project Title: Big Data Analytics for CyberCrime: A Scalable Authorship Attribution Approach Towards Real-time Detection and Classification of Malicious Emails. Host company: PhishMe, Inc.
    • Dr. Xia (Ben) Hu, Assistant Professor in the Department of Computer Science & Engineering, College of Engineering, Texas A&M University. Project Title: Anomaly Detection in Large Graphs. Host company: UnitedHealthcare
    • Dr. Meng Li, Assistant Professor Department of Statistical Science Duke University Project. Title: Statistical Analysis on Amyotrophic Lateral Sclerosis. Host company: Biogen
    • Dr. Yongchao Liu, Research Scientist II School of Computational Science & Engineering Georgia Institute of Technology. Project Title: Compressive Computing for Big Data Applications Based on Accelerators.  Host company: Accelogic
    • Dr. David Gotz, Assistant Professor, University of North Carolina.  Project: Applying Scalable Temporal Visual Analytics Methods to Complex Health Analytics Challenges. Host company Allscripts
    • Dr. Soomin Lee, Postdoctoral Associate Duke University. Project Title: Optimal Parallel Methods to Speed Up Big Data Processing, Host company McKesson
    • Dr. Turgay Ayer, Assistant Professor of Industrial and Systems Engineering at Georgia Tech. Project Title: Sequential blood pressure monitoring based on data from wearables.  Host company: UnitedHealthcare;
    • Dr. Hui Wang, Assistant Professor Florida State University Project: Big Data Analytics for Cost-Effective Load Control in Wind Farm Based On Distributed ITOFPress Sensors Host company: Nanotechnology Patronas Group
  • Data Infrastructure for Materials and Advanced Manufacturing Workshop – Industry and academic researchers connected to discuss the current state of the data infrastructure supporting the accelerated insertion of new and advanced materials into commercial products.
  • Applications of Analytics and Machine Learning in Energy Workshop – Industry and academic researchers convened to address the application of analytics and machine learning in the domains of Energy: Power, Smart Grid, etc., as well as Big Data and Data Science.
  • High Impact Applications of Data Science in Precision Medicine, Health Analytics, and Health Disparities Workshop – Industry and academic researchers met to assess high impact applications of Data Science in Precision Medicine, Health Analytics, and Health Disparities.

Contact Information:

Midwest BD Hub

The Midwest BD Hub (also called SEEDCorn) is coordinated by the University of Illinois at Urbana Champaign. The hub’s projects included:

  • The Early Career Big Data Summit – hosted by the University of North Dakota April 6-8, 2016. The summit provided a venue for early career big data researchers (graduate students, post docs, and pre-tenure faculty) to connect with Industry, third-sector volunteer groups, and established researchers. Events will include multiple industry panel discussions, researcher lightning talks, and a hands-on application hack-a-thon.
  • “Data Quality and Informal Data-An Oxymoron” Workshop – an inter-disciplinary workshop to engage industry and early career researchers was held September 28-29, at Indiana University. The workshop was designed to provide early career researchers and others introduction to the latest developments and emergent issues in data quality. Nine (9) early career researchers received “travel fellowships” to attend.
  • Travel grants to the SEEDCorn All-Hands Meeting held in May, 2016 at Iowa State University, the Midwest Hub’s All Hands meeting invited attendees from academia, industry and the government to discuss and mobilize around data-driven R&D.
  • Midwest Big Data Summer School – a week-long ‘short course’ in data science ran June 20-24 at Iowa State University as the “Inaugural Midwest Big Data Summer School.” There were several research organizations represented, as well as industry and federal government.
  • “Midwest Big Data Opportunities and Challenges” Workshop – this workshop held at the University of Chicago was “designed to bring together junior researchers from top universities and industry leaders to discuss active areas of research and development in Big Data.” The meeting was intended to allow local industry leaders to discuss current big data infrastructures and challenges; give junior researchers an opportunity to discuss current research; and, for regional practitioners and researchers to connect and identify areas of potential collaboration. View the workshop webpage here.
  • “Food and Data Workshop: Interoperability through the 
Food Pipeline” Workshop – was held at the University of Illinois at Urbana-Champaign (UIUC) in September, 2016. Concerned with addressing the pressing national/global challenges in 
food security and public health challenges, this meeting considered data optimization 
across the food pipeline, and understanding the relationship between data and food writ 
large. There was a particular focus on questions of interoperable data ontologies, privacy, 
and analytic insights, with talks by researchers from academia and industry that are based 
at regional, national, and international organizations.
  • The “Interdisciplinary Workshops of Big Data in Healthcare Outcome and Workflow for Early Career Researchers” Workshop – held at the University of Wisconsin, Milwaukee, in September 2016. This award supported one of a series of four summer workshops, which were developed to consider big data approaches for improved healthcare outcomes and enhanced efficiency in healthcare workflows. View the workshop webpage here.
  • Big Data for Health and Medicine Workshop – held at the University of Nebraska – Omaha, in August 2016, this workshop brought together industry and non-profit organizations from Omaha and the surrounding region for knowledge sharing, and to develop new collaborations and mentorship opportunities. The goal of the workshop was to encourage discussion on challenges facing health-related industries with regards to data collection, gathering, storage, and analysis.
  • “Midwest Workshop on Big Neuroscience Data, Tools, 
Protocols & Services” – Researchers from Midwest academic institutions, 
industry and governmental partners attended the September workshop held by The University of Michigan, which was designed to build 
an active and collaborating Midwest Neuroscience Community network. Students, trainees, 
fellows, junior investigators were encouraged to participate, and 43 “trainee scholarships” 
were awarded to support their attendance. In addition to the new ACNN website 
(, the team has developed a web form to collect sharable 
resources, including highly scalable APIs; Algorithms, methods, techniques; and education 
and training opportunities. Based on the array of participants, the PI and awardee team 
anticipate the formation of new collaborations on development of software tools, services, 
learning materials, end-to-end pipeline workflows. View the workshop webpage here.
  • “Data Science for Food, Energy and Water” workshop – part of the 2016 SIGKDD meeting, 
and held on August 14 in San Francisco ( MBDH 
provided travel support for 22 Early Career participants with CCC Seed funds. The workshop 
was premised on the need for coordinated efforts to address data science challenges in the 
security of F-E-W, and so to introduce this emerging area to the “Knowledge Discovery in 
Data and data mining” community, and sow interest in the application of technology from 
the multi-disciplinary KDD field. The talks from the workshop are available on YouTube.
  • Research Awards – small awards were provided for two research projects developed by Early Career researchers at UIUC who are working on topics directly pertinent to the MBDH and the Midwest Big Data community.

Contact Information:

  • Melissa Cragin, Executive Director of the Midwest BD Hub, cragin [at]
  • Klara Nahrstedt, University of Illinois at Urbana Champaign, klara [at]

West BD Hub

The West BD Hub is coordinated by the University of Washington, UC Berkeley and the San Diego Supercomputer Center, UC San Diego. The hub’s projects included:

  • The Collaboratory Faire and Cross-Sector Collaboration Working Group – As part of the West Hub’s first All Hands Meeting, participants to showcased how they have built upon another group’s tool/data or how others have built upon their open source tools/data. The highlights of the Faire were captured on video and/or print to leverage partnerships with the USC Annenberg School and for the workshop on data science journalism.
  • The Workshop on Data Hackathon Best Practices – As part of the first International Data Week, the WBDIH partnered with Research Data Alliance (RDA) and Federation of Earth Science Information Partners (ESIP) to host a Data Hackathons: Lessons Learned & Best Practices Workshop in the West Region state of Colorado. The hands-on, interactive workshop and the resources produced were designed to empower community leaders with lessons learned and best practices from hackathon organizers and participants.
  • The Science of Data-Driven Storytelling Workshop – The West Hub partnered with to provide a workshop to educate data scientists on data journalism and best practices for facilitating and pitching public-facing data narratives. View the workshop website here.

Contact Information: