API

SDR

class pySDR.SDR.SDR(path, data)

Class for applying either DR or SDR.

Parameters

pathstr

Storage path for the outputs of the LGC algorithm.

datanp.ndarray

Data to apply either DR or SDR on.

apply_DR(seed=None, **kwargs)

Function that applies dimensionality reduction to the currently loaded dataset.

Note

The currently available dimensionality reduction techniques are:

  • KLLE, NPE, Kernel LTSA, Linear LTSA, Hessian LLE, Laplacian Eigenmaps, LPP, Diffusion Map, Isomap, Landmark Isomap, MDS, LMDS, SPE, Kernel PCA, PCA, RP, Factor Analysis, tSNE & Manifold Sculpting from the Tapkee library. Please consult the Tapkee documentation for the appropriate keyword arguments for each DR method.

  • UMAP from umap-learn. NB: Only the keyword arguments seed, num_neighbors, target_dimension, metric, umap_init and min_dist are currently implemented. Please consult the umap-learn documentation for more information about their meaning.

  • LTSA from sklearn. One can also use the sklearn backend for applying LLE or Hessian LLE by setting backend='sklearn'.

The DR method can be set by providing, e.g. method='LMDS'. By default this function will apply RP (i.e. random projection).

Parameters

seedint, default = None

Random seed to use for the given DR method. By default no random seed will be set.

**kwargs :

Additional DR method specific keyword arguments.

Returns

datanp.ndarray, shape (n_samples, target_dimension)

The reduced dataset.

apply_LGC(alpha, T=10, k=100)

Function that applies local gradient clustering (LGC) to the currently loaded dataset.

Parameters

alphafloat

Learning rate of the LGC algorithm.

Tint, default = 10

Number clustering iterations LGC takes.

kint, default = 100

Number of nearest neighbors to consider for computing the local gradient.

Returns

datanp.ndarray, shape (n_samples, n_features)

The clustered dataset.

LGC

pySDR.LGC.sharpening_for_dr(data, alpha, T=10, k=100)

Function that applies sharpening by means of local gradient clustering (LGC) to the currently loaded dataset.

Note

This function serves as a Python interface between pySDR and SDR. Please use the pySDR.SDR.SDR class for SDR unless your objective is to just sharpen the high dimensional data.

Parameters

alphafloat

Learning rate of the LGC algorithm.

Tint, default = 10

Number clustering iterations LGC takes.

kint, default = 100

Number of nearest neighbors to consider for computing the local gradient.

Returns

datanp.ndarray, shape (n_samples, n_features)

The clustered dataset.

DR

class pySDR.DR.DR(**kwargs)

Class for applying dimensionality reduction.

Note

This class serves as a common DR interface for Tapkee DR methods as well as sklearn and UMAP learn. It is not recommended to use this class for DR. Please use pySDR.SDR.SDR without an apply_LGC() call instead.

Parameters

**kwargs

A number of keyword arguments specifying the DR method to use and its configuration. Note a method should always be provided.

apply_DR(data, filepath)

Function that applies DR on the data provided and stores the results in a *.txt file.

Parameters

datanp.ndarray, shape (n_samples, n_features)

Feature space data.

filepathstr

File to store the results to with 6 significant digits (i.e. roughly float32 precision).

get(arg)

General interface for all getter methods.

Parameters

argstr

A string specifying the setting that needs to be retrieved.

Returns

value :

The value of the setting.

get_backend()

Function that gets the backend of the DR method.

Returns

backendstr

Backend of the DR method (can be either tapkee, sklearn or umap-learn).

get_gaussian_kernel_width()

Function that gets the value of the gaussian_kernel_width parameter of the DR method. Used by the Laplacian Eigenmaps, LPP and Diffusion Map algorithms in the Tapkee library.

Returns

gaussian_kernel_widthfloat

Width of the Gaussian kernel used by the DR method.

get_landmark_ratio()

Function that gets the value of the landmark_ratio parameter of the landmark algorithms LMDS and Landmark Isomap.

Returns

landmark_ratioint, between [0,1]

Ratio of landmark points that is used by the DR method.

get_max_iteration()

Function that gets the value of the max_iteration parameter of the DR method. Used by:

  • SPE

  • Factor Analysis

  • Manifold Sculpting

Returns

max_iterationint

Maximum number of iterations that can be reached by the DR method.

get_method()

Function that gets the DR method ID.

Returns

methodint

ID of the DR method that is currently set. The correspondence between ID and DR method is as follows: KLLE = 0, NPE = 1, Kernel LTSA = 2, Linear LTSA = 3, Hessian LLE = 4, Laplacian Eigenmaps = 5, LPP = 6, Diffusion Map = 7, Isomap = 8, Landmark Isomap = 9, MDS = 10, LMDS = 11, SPE = 12, Kernel PCA = 13, PCA = 14, RP = 15, Factor Analysis = 16, tSNE = 17, Manifold Sculpting = 18, UMAP = 19 & LTSA = 20.

get_metric()

Function that gets the value of the metric parameter of UMAP.

Returns

metricstr

Metric parameter of the UMAP algorithm.

get_min_dist()

Function that gets the value of the min_dist parameter of UMAP.

Returns

min_distfloat, between [0,1]

Value of the min_dist parameter of UMAP controlling how tightly UMAP packs points together.

get_num_neighbors()

Function that gets the value of the num_neighbors parameter of the DR method. Used by:

  • KLLE

  • NPE

  • Kernel LTSA

  • Linear LTSA

  • Hessian LLE

  • Laplacian Eigenmaps

  • LPP

  • Isomap

  • Landmark Isomap

  • Manifold Sculpting

  • LTSA

Returns

num_neighborsint

Number of nearest neighbors used by the DR method.

get_sne_perplexity()

Function that gets the value of the sne_perplexity parameter of tSNE.

Returns

sne_perplexityfloat

Perplexity parameter of tSNE.

get_sne_theta()

Function that gets the value of the sne_theta parameter of tSNE.

Returns

sne_thetafloat

Theta parameter of the tSNE algorithm.

get_squishing_rate()

Function that gets the value of the squishing_rate parameter of the Manifold Sculpting algorithm.

Returns

squishing_ratefloat

Squishing rate parameter of the Manifold Sculpting algorithm.

get_target_dimension()

Function that gets the value of the target_dimension parameter of the DR method. NB: umap-learn and scikit-learn call this parameter n_components.

Returns

target_dimensionint

Number of dimensions the DR method will reduce to.

get_umap_init()

Function that gets the value of the init parameter of UMAP.

Returns

initstr

Initialization used by the UMAP algorithm.

set(**kwargs)

General interface for all setter methods.

Parameters

**kwargs

A number of keyword arguments specifying the configuration of the DR method that need to be set.

set_backend(backend)

Function that sets the backend of the DR method.

Parameters

backendstr

Backend of the DR method (can be either tapkee, sklearn or umap-learn).

set_gaussian_kernel_width(gaussian_kernel_width)

Function that sets the value of the gaussian_kernel_width parameter of the DR method. Used by the Laplacian Eigenmaps, LPP and Diffusion Map algorithms in the Tapkee library.

Parameters

gaussian_kernel_widthfloat

Width of the Gaussian kernel to be used by the DR method.

set_landmark_ratio(landmark_ratio)

Function that sets the value of the landmark_ratio parameter of the landmark algorithms LMDS and Landmark Isomap.

Parameters

landmark_ratioint, between [0,1]

Ratio of landmark points that needs to be used by the DR method.

set_max_iteration(max_iteration)

Function that sets the value of the max_iteration parameter of the DR method. Used by:

  • SPE

  • Factor Analysis

  • Manifold Sculpting

Parameters

max_iterationint

Maximum number of iterations that can be reached by the DR method.

set_method(method)

Function that sets the DR method ID.

Parameters

methodint

ID of the DR method. The correspondence between ID and DR method is as follows: KLLE = 0, NPE = 1, Kernel LTSA = 2, Linear LTSA = 3, Hessian LLE = 4, Laplacian Eigenmaps = 5, LPP = 6, Diffusion Map = 7, Isomap = 8, Landmark Isomap = 9, MDS = 10, LMDS = 11, SPE = 12, Kernel PCA = 13, PCA = 14, RP = 15, Factor Analysis = 16, tSNE = 17, Manifold Sculpting = 18, UMAP = 19 & LTSA = 20.

set_metric(metric)

Function that sets the value of the metric parameter of UMAP.

Parameters

metricstr

Metric parameter of the UMAP algorithm.

set_min_dist(min_dist)

Function that sets the value of the min_dist parameter of UMAP.

Parameters

min_distfloat, between [0,1]

Value of the min_dist parameter of UMAP controlling how tightly UMAP packs points together.

set_num_neighbors(num_neighbors)

Function that sets the value of the num_neighbors parameter of the DR method. Used by:

  • KLLE

  • NPE

  • Kernel LTSA

  • Linear LTSA

  • Hessian LLE

  • Laplacian Eigenmaps

  • LPP

  • Isomap

  • Landmark Isomap

  • Manifold Sculpting

  • LTSA

Parameters

num_neighborsint

Number of nearest neighbors to be used by the DR method.

set_seed(seed)

Function that sets the random seed of the DR method instance.

Parameters

seedint

random seed

set_sne_perplexity(sne_perplexity)

Function that sets the value of the sne_perplexity parameter of tSNE.

Parameters

sne_perplexityfloat

Perplexity parameter of tSNE.

set_sne_theta(sne_theta)

Function that sets the value of the sne_theta parameter of tSNE.

Parameters

sne_thetafloat

Theta parameter of the tSNE algorithm.

set_squishing_rate(squishing_rate)

Function that sets the value of the squishing_rate parameter of the Manifold Sculpting algorithm.

Parameters

squishing_ratefloat

Squishing rate parameter of the Manifold Sculpting algorithm.

set_target_dimension(target_dimension)

Function that sets the value of the target_dimension parameter of the DR method. NB: umap-learn and scikit-learn call this parameter n_components.

Parameters

target_dimensionint

Number of dimensions the DR method needs to reduce to.

set_umap_init(umap_init)

Function that sets the value of the init parameter of UMAP.

Parameters

initstr

Initialization used by the UMAP algorithm.