Two-mode clustering in a fuzzy setting: methods and cluster validity indices

Dr Maria Brigida Ferraro (Sapienza University of Rome)

Thu May 30, 13:00-14:00 (2 weeks from now)
Lecture held in STEM 3.1.

Abstract: The aim of clustering is to find a partition of the rows (e.g. objects) of a data matrix based on the values assumed on a set of variables (columns). Two objects belong to the same cluster if the corresponding rows are close to each other according to a certain metric based on all the variables. However, it can be reasonable to seek clusters such that objects assigned to the same cluster are close to each other with respect to a subset of variables. The research interest can also be reversed, i.e., the goal is to find clusters of variables close to each other in terms of a subset of objects. Standard clustering algorithms are not adequate to accomplish these tasks. For this purpose, two-mode clustering methods have been introduced. Two-mode clustering consists in simultaneously partitioning modes (e.g., objects and variables) of an observed two-mode data matrix.

In the literature, two-mode clustering methods have been extensively studied and extended along various directions. Most of them are based on the classical approach to clustering, i.e., the objects (or the variables) are either assigned or not to the clusters. A more powerful and flexible exploratory approach is represented by introducing fuzziness in the clustering process. In this case, the objects (or the variables) are no longer either assigned or not to the clusters, but belong to the clusters with the so-called (fuzzy) membership degrees taking values in the interval [0,1]. A high membership degree, close to 1, recognizes an object (or variable) strongly assigned to a cluster, i.e., an object (or variable) very close to the corresponding cluster prototype.

Starting from the Double k-Means, we propose a class of two-mode clustering algorithms in a fuzzy framework, including some robust proposals, taking into account that, in this case, different kinds of outliers exist and should be considered. In addition, in order to evaluate the two fuzzy partitions and to choose the optimal numbers of clusters, new cluster validity indices are introduced. The proposed measures are defined in terms of the compactness within each cluster and separation between clusters. Starting from some well-known indices in standard fuzzy clustering, some generalizations to the two-mode case are addressed. The adequacy of the proposals is checked by means of simulation and real-case studies.

machine learningmicroeconomic theoryprobabilitystatistics theory

Audience: researchers in the discipline

(ED-3S) Essex Data Science Seminar Series

Series comments: You can dowload the calendar clicking in the following link: researchseminars.org/seminar/Essex-DataScience/ics

The ED-3S has a sibling seminar, MESS: Mathematics Essex Seminar Series. Find more info here: researchseminars.org/seminar/EssexMaths

Organizer:	Jianya Lu*
	*contact for this listing

Export talk to