HNNE
The main class of h-NNE.
- class hnne.HNNE(n_components: int = 2, metric: str = 'cosine', radius: float = 0.4, ann_threshold: int = 40000, preliminary_embedding: str = 'pca', random_state: int | None = None)
Hierarchical 1-Nearest Neighbor graph based Embedding
A fast hierarchical dimensionality reduction algorithm.
- Parameters:
n_components (int (default 2)) – The dimension of the target space of the projection.
metric (str (default 'cosine')) – The metric used to compute the distances when forming the h-nne hierarchy levels. Its value should be supported by both sklearn and pynndescent. Some possible values: ‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’.
radius (float (default 0.45)) – The radius used to place points around centroids as a portion of the distance between nearest neighbor anchors. Though the theoretical value which guarantees no overlaps between anchor points is 0.2, 0.45 is a value which provides in practice denser visualizations with minimal loss in performance.
ann_threshold (int (default 40000)) – A threshold above which approximate nearest neighbors will be computed instead of real nearest neighbors when building the levels of h-nne.
preliminary_embedding (str (default 'pca')) – The preliminary embedding used to initiate h-nne. In terms of performance pca > pca_centroids > random_linear and in terms of speed performance pca < pca_centroids < random_linear.
random_state (Optional[str] (default None)) – An optional random state for reproducibility purposes. It fixes the state of PCA and ANN.
- min_size_top_level
The minimum number of centroids existing on the top level of the hierarchy. To achieve this minimum, the top levels which have fewer centroids are removed.
- Type:
int (default 3)
- hierarchy_parameters
An object holding the parameters which encode the h-nne hierarchy. They are saved during fitting and can be reused both during projecting new points or projecting again with different parameters, e.g. n_components.
- Type:
Optional[HierarchyParameters]
References
The code implements the h-NNE algorithm described in our CVPR 2022 paper: [1] M. Saquib Sarfraz*, Marios Koulakis*, Constantin Seibold, Rainer Stiefelhagen. Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction. CVPR 2022. https://arxiv.org/abs/2203.12997
It is for academic purposes only. The code or its re-implementation should not be used for commercial use. Please contact the authors below for licensing information. Marios Koulakis (marios.koulakis@gmail.com) M. Saquib Sarfraz (saquibsarfraz@gmail.com) Karlsruhe Institute of Technology (KIT)
- fit(X: ndarray, y: ndarray | None = None, override_dim: int | None = None, verbose: bool = False, skip_hierarchy_building_if_done: bool = True)
Build an h-nne hierarchy based on X and use it to project X.
- Parameters:
X (array, shape (n_samples, n_features)) – The data to project.
y (array, shape (n_samples, )) – Ignored.
override_dim (Optional[int] (default None)) – Argument used to overwrite the original dimension of the target space of the projection.
verbose (bool (default False)) – If true, plot info and progress messages.
skip_hierarchy_building_if_done (bool (default True)) – If true, the h-nne hierarchy will be built only on the first run of fit. Warning: if you need to project a new dataset with the same HNNE object, then you have to set this to false.
- fit_transform(X: ndarray, y: ndarray | None = None, override_dim: int | None = None, verbose: bool = False, skip_hierarchy_building_if_done: bool = True)
Build an h-nne hierarchy based on X and use it to project X.
- Parameters:
X (array, shape (n_samples, n_features)) – The data to project.
y (array, shape (n_samples, )) – Ignored.
override_dim (Optional[int] (default None)) – Argument used to overwrite the original dimension of the target space of the projection.
verbose (bool (default False)) – If true, plot info and progress messages.
skip_hierarchy_building_if_done (bool (default True)) – If true, the h-nne hierarchy will be built only on the first run of fit. Warning: if you need to project a new dataset with the same HNNE object, then you have to set this to false.
- set_fit_request(*, override_dim: bool | None | str = '$UNCHANGED$', skip_hierarchy_building_if_done: bool | None | str = '$UNCHANGED$', verbose: bool | None | str = '$UNCHANGED$') HNNE
Request metadata passed to the
fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
override_dim (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
override_dimparameter infit.skip_hierarchy_building_if_done (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
skip_hierarchy_building_if_doneparameter infit.verbose (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
verboseparameter infit.
- Returns:
self – The updated object.
- Return type:
object
- set_transform_request(*, ann_point_combination_threshold: bool | None | str = '$UNCHANGED$', verbose: bool | None | str = '$UNCHANGED$') HNNE
Request metadata passed to the
transformmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
ann_point_combination_threshold (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
ann_point_combination_thresholdparameter intransform.verbose (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
verboseparameter intransform.
- Returns:
self – The updated object.
- Return type:
object