API

SDR

class pySDR.SDR.SDR(path, data)

Class for applying either DR or SDR.

Parameters

pathstr: Storage path for the outputs of the LGC algorithm.
datanp.ndarray: Data to apply either DR or SDR on.

apply_DR(seed=None, **kwargs)

Function that applies dimensionality reduction to the currently loaded dataset.

Note

The currently available dimensionality reduction techniques are:

KLLE, NPE, Kernel LTSA, Linear LTSA, Hessian LLE, Laplacian Eigenmaps, LPP, Diffusion Map, Isomap, Landmark Isomap, MDS, LMDS, SPE, Kernel PCA, PCA, RP, Factor Analysis, tSNE & Manifold Sculpting from the Tapkee library. Please consult the Tapkee documentation for the appropriate keyword arguments for each DR method.
UMAP from umap-learn. NB: Only the keyword arguments seed, num_neighbors, target_dimension, metric, umap_init and min_dist are currently implemented. Please consult the umap-learn documentation for more information about their meaning.
LTSA from sklearn. One can also use the sklearn backend for applying LLE or Hessian LLE by setting backend='sklearn'.

The DR method can be set by providing, e.g. method='LMDS'. By default this function will apply RP (i.e. random projection).

Parameters

seedint, default = None: Random seed to use for the given DR method. By default no random seed will be set.
**kwargs :: Additional DR method specific keyword arguments.

Returns

datanp.ndarray, shape (n_samples, target_dimension): The reduced dataset.

apply_LGC(alpha, T=10, k=100)

Function that applies local gradient clustering (LGC) to the currently loaded dataset.

Parameters

alphafloat: Learning rate of the LGC algorithm.
Tint, default = 10: Number clustering iterations LGC takes.
kint, default = 100: Number of nearest neighbors to consider for computing the local gradient.

Returns

datanp.ndarray, shape (n_samples, n_features): The clustered dataset.

LGC

pySDR.LGC.sharpening_for_dr(data, alpha, T=10, k=100)

Function that applies sharpening by means of local gradient clustering (LGC) to the currently loaded dataset.

Note

This function serves as a Python interface between pySDR and SDR. Please use the pySDR.SDR.SDR class for SDR unless your objective is to just sharpen the high dimensional data.

Parameters

alphafloat: Learning rate of the LGC algorithm.
Tint, default = 10: Number clustering iterations LGC takes.
kint, default = 100: Number of nearest neighbors to consider for computing the local gradient.

Returns

datanp.ndarray, shape (n_samples, n_features): The clustered dataset.

DR

class pySDR.DR.DR(**kwargs)

Class for applying dimensionality reduction.

Note

This class serves as a common DR interface for Tapkee DR methods as well as sklearn and UMAP learn. It is not recommended to use this class for DR. Please use pySDR.SDR.SDR without an apply_LGC() call instead.

Parameters

**kwargs: A number of keyword arguments specifying the DR method to use and its configuration. Note a method should always be provided.

apply_DR(data, filepath)

Function that applies DR on the data provided and stores the results in a *.txt file.

Parameters

datanp.ndarray, shape (n_samples, n_features): Feature space data.
filepathstr: File to store the results to with 6 significant digits (i.e. roughly float32 precision).

get(arg)

General interface for all getter methods.

Parameters

argstr: A string specifying the setting that needs to be retrieved.

Returns

value :: The value of the setting.

get_backend()

Function that gets the backend of the DR method.

Returns

backendstr: Backend of the DR method (can be either tapkee, sklearn or umap-learn).

get_gaussian_kernel_width()

Function that gets the value of the gaussian_kernel_width parameter of the DR method. Used by the Laplacian Eigenmaps, LPP and Diffusion Map algorithms in the Tapkee library.

Returns

gaussian_kernel_widthfloat: Width of the Gaussian kernel used by the DR method.

get_landmark_ratio()

Function that gets the value of the landmark_ratio parameter of the landmark algorithms LMDS and Landmark Isomap.

Returns

landmark_ratioint, between [0,1]: Ratio of landmark points that is used by the DR method.

get_max_iteration()

Function that gets the value of the max_iteration parameter of the DR method. Used by:

SPE
Factor Analysis
Manifold Sculpting

Returns

max_iterationint: Maximum number of iterations that can be reached by the DR method.

get_method()

Function that gets the DR method ID.

Returns

methodint: ID of the DR method that is currently set. The correspondence between ID and DR method is as follows: KLLE = 0, NPE = 1, Kernel LTSA = 2, Linear LTSA = 3, Hessian LLE = 4, Laplacian Eigenmaps = 5, LPP = 6, Diffusion Map = 7, Isomap = 8, Landmark Isomap = 9, MDS = 10, LMDS = 11, SPE = 12, Kernel PCA = 13, PCA = 14, RP = 15, Factor Analysis = 16, tSNE = 17, Manifold Sculpting = 18, UMAP = 19 & LTSA = 20.

get_metric()

Function that gets the value of the metric parameter of UMAP.

Returns

metricstr: Metric parameter of the UMAP algorithm.

get_min_dist()

Function that gets the value of the min_dist parameter of UMAP.

Returns

min_distfloat, between [0,1]: Value of the min_dist parameter of UMAP controlling how tightly UMAP packs points together.

get_num_neighbors()

Function that gets the value of the num_neighbors parameter of the DR method. Used by:

KLLE
NPE
Kernel LTSA
Linear LTSA
Hessian LLE
Laplacian Eigenmaps
LPP
Isomap
Landmark Isomap
Manifold Sculpting
LTSA

Returns

num_neighborsint: Number of nearest neighbors used by the DR method.

get_sne_perplexity()

Function that gets the value of the sne_perplexity parameter of tSNE.

Returns

sne_perplexityfloat: Perplexity parameter of tSNE.

get_sne_theta()

Function that gets the value of the sne_theta parameter of tSNE.

Returns

sne_thetafloat: Theta parameter of the tSNE algorithm.

get_squishing_rate()

Function that gets the value of the squishing_rate parameter of the Manifold Sculpting algorithm.

Returns

squishing_ratefloat: Squishing rate parameter of the Manifold Sculpting algorithm.

get_target_dimension()

Function that gets the value of the target_dimension parameter of the DR method. NB: umap-learn and scikit-learn call this parameter n_components.

Returns

target_dimensionint: Number of dimensions the DR method will reduce to.

get_umap_init()

Function that gets the value of the init parameter of UMAP.

Returns

initstr: Initialization used by the UMAP algorithm.

set(**kwargs)

General interface for all setter methods.

Parameters

**kwargs: A number of keyword arguments specifying the configuration of the DR method that need to be set.

set_backend(backend)

Function that sets the backend of the DR method.

Parameters

backendstr: Backend of the DR method (can be either tapkee, sklearn or umap-learn).

set_gaussian_kernel_width(gaussian_kernel_width)

Function that sets the value of the gaussian_kernel_width parameter of the DR method. Used by the Laplacian Eigenmaps, LPP and Diffusion Map algorithms in the Tapkee library.

Parameters

gaussian_kernel_widthfloat: Width of the Gaussian kernel to be used by the DR method.

set_landmark_ratio(landmark_ratio)

Function that sets the value of the landmark_ratio parameter of the landmark algorithms LMDS and Landmark Isomap.

Parameters

landmark_ratioint, between [0,1]: Ratio of landmark points that needs to be used by the DR method.

set_max_iteration(max_iteration)

Function that sets the value of the max_iteration parameter of the DR method. Used by:

SPE
Factor Analysis
Manifold Sculpting

Parameters

max_iterationint: Maximum number of iterations that can be reached by the DR method.

set_method(method)

Function that sets the DR method ID.

Parameters

methodint: ID of the DR method. The correspondence between ID and DR method is as follows: KLLE = 0, NPE = 1, Kernel LTSA = 2, Linear LTSA = 3, Hessian LLE = 4, Laplacian Eigenmaps = 5, LPP = 6, Diffusion Map = 7, Isomap = 8, Landmark Isomap = 9, MDS = 10, LMDS = 11, SPE = 12, Kernel PCA = 13, PCA = 14, RP = 15, Factor Analysis = 16, tSNE = 17, Manifold Sculpting = 18, UMAP = 19 & LTSA = 20.

set_metric(metric)

Function that sets the value of the metric parameter of UMAP.

Parameters

metricstr: Metric parameter of the UMAP algorithm.

set_min_dist(min_dist)

Function that sets the value of the min_dist parameter of UMAP.

Parameters

min_distfloat, between [0,1]: Value of the min_dist parameter of UMAP controlling how tightly UMAP packs points together.

set_num_neighbors(num_neighbors)

Function that sets the value of the num_neighbors parameter of the DR method. Used by:

KLLE
NPE
Kernel LTSA
Linear LTSA
Hessian LLE
Laplacian Eigenmaps
LPP
Isomap
Landmark Isomap
Manifold Sculpting
LTSA

Parameters

num_neighborsint: Number of nearest neighbors to be used by the DR method.

set_seed(seed)

Function that sets the random seed of the DR method instance.

Parameters

seedint: random seed

set_sne_perplexity(sne_perplexity)

Function that sets the value of the sne_perplexity parameter of tSNE.

Parameters

sne_perplexityfloat: Perplexity parameter of tSNE.

set_sne_theta(sne_theta)

Function that sets the value of the sne_theta parameter of tSNE.

Parameters

sne_thetafloat: Theta parameter of the tSNE algorithm.

set_squishing_rate(squishing_rate)

Function that sets the value of the squishing_rate parameter of the Manifold Sculpting algorithm.

Parameters

squishing_ratefloat: Squishing rate parameter of the Manifold Sculpting algorithm.

set_target_dimension(target_dimension)

Function that sets the value of the target_dimension parameter of the DR method. NB: umap-learn and scikit-learn call this parameter n_components.

Parameters

target_dimensionint: Number of dimensions the DR method needs to reduce to.

set_umap_init(umap_init)

Function that sets the value of the init parameter of UMAP.

Parameters

initstr: Initialization used by the UMAP algorithm.