This article is published in the November 2016 issue.

South Big Data Hub DataStart Highlights

south big data hub logAs a result of the CCC / CRA Industry Academic Survey, conducted in spring of 2015 and the CCC Industry Roundtable Discussion held on July 24, 2015, the CCC partnered with the four NSF-sponsored Big Data Regional Innovation Hubs (BD Hubs) for a program on industry-academic collaboration. Each Hub is charged with addressing regional specific big data challenges. Areas of emphasis for the South BD hub include coastal hazards, industrial big data, and health analytics, among others.

As one of its CCC-sponsored activities, the South BD Hub ran the DataStart internship program, which paired graduate students from the South Regional Innovation Hub with data-related startup companies for three months.

The program had three primary goals:

  • Provide talented students from the southern United States with opportunities to apply their classroom knowledge to data science problems in real-world settings.
  • Build capacity for data science and big data analytics within the entrepreneurial business community in the Southern United States. Many new ideas and innovations in data science will come from early-stage, startup companies. The South Big Data Hub must engage and support this entrepreneurial sector if it is to help grow the regional and national data-driven economy.
  • Expand the South Big Data Hub and larger data science community by fostering a network of entrepreneurs and smaller companies that utilize data science and analytics.

After a comprehensive review process, the South BD Hub chose six students from different universities to work onsite with their host companies from June 1st to August 31st, 2016.

The 2016 DataStart Fellows and companies were:

Student: Samia Ansari, University of Georgia
Host company: Sartography, Staunton, VA
Ansari, a student in the professional science master’s program in biomanufacturing and bioprocessing, characterized the representation of women and racial minorities in cancer research conducted between 2002 and 2012. She also worked to characterize and spot trends about women and minorities’ participation in cancer trials during that time.

Student: Lucy D’Agostino McGowan, Vanderbilt University
Host company:, Nashville, TN
D’Agostino McGowan, a PhD student in biostatistics, incorporated raw data streams from Google Analytics, Slack and other sources to create a foundation for predictive modeling for, a company that assembles remotely-managed freelance software development teams for companies worldwide.

Student: Aziz Eram, University of Arkansas at Little Rock
Host company: Black Oak Analytics, Little Rock, AR
Eram, a student in the master’s program in information quality, developed and tested a general approach to the problem of cleansing and standardizing information obtained from free text fields that reference the same product or service—for example, information about store inventory that is entered into a system manually by an employee.

Student: Zhengqian Jiang, Florida State University
Host company: NPGroup Inc., Tallahassee, FL
Jiang, a student in the department of industrial and manufacturing engineering, assisted NPGroup Inc. in developing and commercializing a sensor system for wind turbines that can accurately detect loads that go undetected in the models typically used by inflow sensors.

Student: Jonathan Ortiz, University of Texas at Austin
Host company:, inc., Austin, TX
Ortiz, a student in the professional data analytics program, worked with, an Austin-based stealth technology company headed by Brett Hurt, a serial entrepreneur who has led several successful big data startups, including Bazaarvoice and Coremetrics (now IBM Customer Analytics). For more on Ortiz’s work, see this article in GCN or his blog post on the South Hub website.

Student: Ashok Vardhan, George Mason University
Host company: MetiStream, McLean, VA
Vardhan, who is pursuing a master’s in data analytics engineering, helped develop a healthcare data conversion solution called Ember, which bridges the gap between the existing Health Level Seven International (HL7) version 2.x (HL7 V2) healthcare standards and the emerging next generation international specification called Fast Healthcare Interoperability Resources (FHIR).

The South BD Hub recently held a workshop for the DataStart participants to share their experiences and research. Learn more about DataStart and the South BD Hub’s programs on their website. Stay tuned to the CCCBlog for updates from the other programs sponsored through the industry-academic collaborations.