FINCH
A modification of the FINCH to be applied on building the h-NNE hierarchy.
- hnne.finch_clustering.FINCH(data: ndarray, initial_rank: ndarray | None = None, distance: str = 'cosine', ensure_early_exit: bool = True, verbose: bool = True, ann_threshold: int = 40000, random_state: int | None = None)
FINCH clustering algorithm.
- Parameters:
data (array, shape (n_samples, n_features)) – Input matrix with features in rows.
initial_rank (array, shape (n_samples, 1) (optional)) – First integer neighbor indices.
distance (str (default 'cosine')) – One of [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’] Recommended ‘cosine’.
ensure_early_exit (bool (default True)) – May help in large, high dim datasets, ensure purity of merges and helps early exit.
verbose (bool (default True)) – Print verbose output.
ann_threshold (int (default 40000)) – Data size threshold below which nearest neighbors are approximated with ANNs.
random_state (Optional[int] (default None)) – An optional random state for reproducibility purposes. It fixes the state of ANN.
- Returns:
c (array of shape (n_samples, n_partitions)) – Matrix with labels indicating cluster participation. There is one column per partition.
num_clust (array of shape (n_partitions)) – Number of clusters per partition.
partition_clustering (list of arrays of shapes equal to the values of num_clust) – List of arrays with labels indicating the centroids cluster participation per level.
lowest_level_centroids (array of shape (num_clust[0], n_features)) – The feature coordinates of the lowest level centroids.
References
The code implements the FINCH algorithm described in our CVPR 2019 paper [1] Sarfraz et al. “Efficient Parameter-free Clustering Using First Neighbor Relations”, CVPR2019 https://openaccess.thecvf.com/content_CVPR_2019/papers/Sarfraz_Efficient_Parameter-Free_Clustering_Using_First_Neighbor_Relations_CVPR_2019_paper.pdf Original code author:
M. Saquib Sarfraz (saquib.sarfraz@kit.edu) Karlsruhe Institute of Technology (KIT)