A Conversation on Data Science

By Steve Heller, Two Sigma

CRA brings together people from academia, government labs, and industrial labs. For me, coming from industry, CRA’s Conference at Snowbird is an unbeatable opportunity to take the pulse of academia. One of the hot topics in 2016 was the Data Science juggernaut. I was glad to join Barbara Ryder (Chair) and Lise Getoor (Co-Chair) in organizing the panel session: Data Science in the 21st Century, which was well attended and full of energy and ideas. After CRA’s Committee on Data Science (Lise Getoor, Chair, David Culler, Eric de Sturler, David Ebert, Mike Franklin, and H.V. Jagadish) published the bulletin article Computing Research and the Emerging Field of Data Science, David Culler and I sat down for a follow-up chat: A Conversation on Data Science.

Below is an excerpt:

Steve Heller: All the examples in statistics textbooks are couched in an application domain. Could we view data science as computer science’s coming of age, in the sense that it’s intrinsically tied to serving other disciplines (which we call CS+X)?

David Culler:  Lovely question; let’s take it apart a little bit.

The highest goal of computer science is universality, and it’s our greatest strength and also our greatest weakness. When we really have tackled a problem, the solution is applicable very broadly. Statistics has a different character. Statistics departments have always been deeply embedded in domains.

So there are very different kinds of cultural structures, and it’s not that one is better—the two are really quite different. Michael Jordan explains that statisticians kind of embed themselves, whereas computer scientists are often in the role of creating the platform.

Embedding works so well for statistics as, very often, the challenges of a particular application domain get brought back inside the field and push the envelope of statistics, understanding that in one domain to create models you need to idealize data in terms of different distributions than you’ve been using in other domains. So I think that it reveals part of the difference in nature of the two fields.

 To view the entire conversation click here.

A Conversation on Data Science