A family of metrics connecting Jaccard distance to normalized information distance

Bjørn Kjos-Hanssen (University of Hawaii at Manoa)

17-Nov-2020, 23:00-00:00 (3 years ago)

Abstract: Distance metrics are used in a wide variety of scientific contexts. In a 2001 paper in the journal Bioinformatics, M. Li, Badger, Chen, Kwong, and Kearney introduced an information-based sequence distance. It is analogous to the famous Jaccard distance on finite sets. Soon thereafter, M. Li, Chen, X. Li, Ma and Vitányi (2004) rejected that distance in favor of what they call the normalized information distance (NID). Raff and Nicholas (2017) proposed a return to the Bioinformatics distance, based on further application-informed criteria.

We attempt to shed some light on this "dispute" by showing that the Jaccard distance and the NID analogue form the extreme points of the set of metrics within a family of semimetrics studied by Jiménez, Becerra, and Gelbukh (2013).

The NID is based on Kolmogorov complexity, and Terwijn, Torenvliet and Vitányi (2011) showed that it is neither upper semicomputable nor lower semicomputable. Our result gives a 2-dimensional family including the NID as an extreme point. It would be interesting to know if any of these functions are semicomputable.

logic

Audience: researchers in the topic


Computability theory and applications

Series comments: Description: Computability theory, logic

The goal of this endeavor is to run a seminar on the platform Zoom on a weekly basis, perhaps with alternating time slots each of which covers at least three out of four of Europe, North America, Asia, and New Zealand/Australia. While the meetings are always scheduled for Tuesdays, the timezone varies, so please refer to the calendar on the website for details about individual seminars.

Organizers: Damir Dzhafarov*, Vasco Brattka*, Ekaterina Fokina*, Ludovic Patey*, Takayuki Kihara, Noam Greenberg, Arno Pauly, Linda Brown Westrick
*contact for this listing

Export talk to