How Much is a Triple?

The following Great Innovative Idea is from Prof. Dr. Heiko Paulheim, Chair for Data Science and Program Director of Mannheim Master in Data Science at the University of Mannheim. Paulheim was one of the CCC Blue Sky winners at the International Semantic Web Conference (ISWC) in October 2018 for his paper called, How Much is a Triple?

The Idea

Knowledge graphs, i.e., structured collections of knowledge in the form of a graph, are currently widely created and used, both commercially by major Internet companies such as Google and Facebook, as well as open source by researchers, such as DBpedia or Wikidata. While there is a larger body of work on how to create and refine those knowledge graphs, as well as various metrics that analyze the coverage and data quality of those knowledge graphs, there is one dimension which has not been analyzed so far: the cost of knowledge graph creation.

Estimating the cost of knowledge graph creation (and thereby the average cost of a single triple in a knowledge graph) is possible, but not easy in all cases. For some knowledge graphs, such as Cyc, the total development cost is known, so that the computation is straightforward. For others, such as Freebase or DBpedia, approximations are needed. To estimate the total cost, we draw upon research works that estimate the creation cost of Wikipedia, as well as analyses of software development efforts.

Impact

While software development cost is well understood, attaching a price tag to a knowledge graph is difficult. The impact of this work is two-fold: first, it is the first work that shows how to potentially quantify the cost of axioms in knowledge graphs, and, at the same time, all the assumptions made in the computation give rise to discussions: at each step, different assumptions could be made, with potentially different outcomes. Hence, I hope that many people will question those assumptions – and thereby be inspired to come up with interesting alternatives.

Second, understanding the cost of knowledge graphs allows for new considerations, e.g., analyzing the relation between cost and data quality, quantifying the gain of (semi-)automating certain steps in the knowledge graph creation process, and so on.

Other Research

Heiko’s main research focus is on knowledge graph creation and refinement. His group has conducted several studies on improving the quality of knowledge graphs such as DBpedia (http://dbpedia.org/), most prominently the SDType algorithm for filling in missing instance types, which has meanwhile been integrated in DBpedia. Furthermore, the group creates new large-scale knowledge graphs, such as WebIsALOD (http://webisa.webdatacommons.org/) and DBkWik (http://dbkwik.webdatacommons.org/), which complement existing knowledge graphs by covering different domains and entities.

Researcher’s Background

Heiko has obtained his Ph.D. in computer science at the Technical University of Darmstadt in 2011. His Ph.D. was on ontology-based application integration, an area in which he had been doing research since 2006. Since 2017, Heiko holds the chair for Data Science at Mannheim University.