Datasets First! A Bottom-up Data Linking Paradigm

The following great innovative idea is from Konstantin Todorov from the University of Montpellier and a researcher at the LIRMM laboratory within the FADO group. Todorov was one of the Blue Sky Awards at ISWC 2019 for his paper called Datasets First! A Bottom-up Data Linking Paradigm.

The Idea

Data linking is understood as the task of establishing typed links between entities across different knowledge graphs via the help of automatic link discovery systems. We argue that the current generic approach to develop data linking solutions has reached its limits and suggest that a paradigm shift in the way we look onto this task needs to take place. We propose to enable the development of data-centric approaches for bottom-up linking: instead of trying to fit a generic top-down solution to any linking problem and datasets type, we suggest to enable a better understanding of the underlying data before applying a targeted solution best suited to the particular knowledge graphs at hand. After years of research in the field, time has come to step back and look at what we can learn from the large amount of existing cross-dataset links and linking systems. We redefine the linking task as the automatic identification via machine learning techniques of the linking problem type(s) that two knwoledge graphs manifest and the application of an automatically generated (combination of) atomic linking solutions that are best fit for the graphs at hand.

Impact

The hypothesis is that a data-centric approach will improve both the efficiency and effectiveness of the linking process. In addition, channelizing the so far decentralised and top-down endeavours into the data linking problem will foster and facilitate the application of linked data technologies within and across an even larger variety of domains and will ultimately free the domain expert of the technological burden.

Other Research

My research lies in the field of Artificial Intelligence and particularly knowledge representation, Web data linking, knowledge graphs and Web search with applications in the cultural heritage, social science and agro-bio domains (e.g. http://data.doremus.org/). Recently, I’ve taken interest in the problem of modelling and analysing online discourse data. This includes the extraction, verification and consolidation of information about claims on controversial topics, the associated viewpoints, sources and web-documents, their structuring and encapsulation into knowledge graphs in support of the understanding and analysis of societal debates (https://data.gesis.org/claimskg/site/#about).

Researcher’s Background

I’m an Associate Professor for Computer Science at the University of Montpellier (France) since 2012. I used to be a postdoctoral fellow at the MAS Laboratory at the Ecole Centrale Paris (France) between 2009 and 2012, where I worked on multimedia semantics. I received a PhD in Cognitive Science from the University of Osnabrueck (Germany) in 2010. Several years before that, I obtained a MSc in Applied Mathematics (Statistical Learning) from the University of Provence in Marseille.