HNNE

The main class of h-NNE.

class hnne.HNNE(n_components: int = 2, metric: str = 'cosine', radius: float = 0.4, ann_threshold: int = 40000, preliminary_embedding: str = 'pca', random_state: int | None = None)

Hierarchical 1-Nearest Neighbor graph based Embedding

A fast hierarchical dimensionality reduction algorithm.

Parameters:

n_components (int (default 2)) – The dimension of the target space of the projection.
metric (str (default 'cosine')) – The metric used to compute the distances when forming the h-nne hierarchy levels. Its value should be supported by both sklearn and pynndescent. Some possible values: ‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’.
radius (float (default 0.45)) – The radius used to place points around centroids as a portion of the distance between nearest neighbor anchors. Though the theoretical value which guarantees no overlaps between anchor points is 0.2, 0.45 is a value which provides in practice denser visualizations with minimal loss in performance.
ann_threshold (int (default 40000)) – A threshold above which approximate nearest neighbors will be computed instead of real nearest neighbors when building the levels of h-nne.
preliminary_embedding (str (default 'pca')) – The preliminary embedding used to initiate h-nne. In terms of performance pca > pca_centroids > random_linear and in terms of speed performance pca < pca_centroids < random_linear.
random_state (Optional[str] (default None)) – An optional random state for reproducibility purposes. It fixes the state of PCA and ANN.

min_size_top_level

The minimum number of centroids existing on the top level of the hierarchy. To achieve this minimum, the top levels which have fewer centroids are removed.

Type:: int (default 3)

hierarchy_parameters

An object holding the parameters which encode the h-nne hierarchy. They are saved during fitting and can be reused both during projecting new points or projecting again with different parameters, e.g. n_components.

Type:: Optional[HierarchyParameters]

References

The code implements the h-NNE algorithm described in our CVPR 2022 paper: [1] M. Saquib Sarfraz*, Marios Koulakis*, Constantin Seibold, Rainer Stiefelhagen. Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction. CVPR 2022. https://openaccess.thecvf.com/content/CVPR2022/papers/Sarfraz_Hierarchical_Nearest_Neighbor_Graph_Embedding_for_Efficient_Dimensionality_Reduction_CVPR_2022_paper.pdf

It is for academic purposes only. The code or its re-implementation should not be used for commercial use. Please contact the authors below for licensing information. Marios Koulakis (marios.koulakis@gmail.com) M. Saquib Sarfraz (saquibsarfraz@gmail.com) Karlsruhe Institute of Technology (KIT)

fit(X: ndarray, y: ndarray | None = None, override_n_components: int | None = None, verbose: bool = False, skip_hierarchy_building_if_done: bool = True)

Build an h-nne hierarchy based on X and use it to project X.

Parameters:

X (array, shape (n_samples, n_features)) – The data to project.
y (array, shape (n_samples, )) – Ignored.
override_n_components (Optional[int] (default None)) – Argument used to overwrite the original dimension of the target space of the projection.
verbose (bool (default False)) – If true, plot info and progress messages.
skip_hierarchy_building_if_done (bool (default True)) – If true, the h-nne hierarchy will be built only on the first run of fit. Warning: if you need to project a new dataset with the same HNNE object, then you have to set this to false.

fit_transform(X: ndarray, y: ndarray | None = None, override_n_components: int | None = None, verbose: bool = False, skip_hierarchy_building_if_done: bool = True)

Build an h-nne hierarchy based on X and use it to project X.

Parameters:

X (array, shape (n_samples, n_features)) – The data to project.
y (array, shape (n_samples, )) – Ignored.
override_n_components (Optional[int] (default None)) – Argument used to overwrite the original dimension of the target space of the projection.
verbose (bool (default False)) – If true, plot info and progress messages.
skip_hierarchy_building_if_done (bool (default True)) – If true, the h-nne hierarchy will be built only on the first run of fit. Warning: if you need to project a new dataset with the same HNNE object, then you have to set this to false.

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

override_n_components (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for override_n_components parameter in fit.
skip_hierarchy_building_if_done (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for skip_hierarchy_building_if_done parameter in fit.
verbose (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for verbose parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_transform_request(*, ann_point_combination_threshold: bool | None | str = '$UNCHANGED$', verbose: bool | None | str = '$UNCHANGED$') → HNNE

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

ann_point_combination_threshold (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for ann_point_combination_threshold parameter in transform.
verbose (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for verbose parameter in transform.

Returns:

self – The updated object.

Return type:

object