API

SDR Optimization Routines

SHARC.optimization_routines.optimize_DR(X, labels=None, num_samples=None, methods=['LMDS'], metric=None, storage_path='./', param_grid='./settings_DR.json', verbose=True, seed=None)

Function that finds the optimal parameter set for each DR method given a parameter grid.

Parameters

Xarray-like, shape (n_samples, n_features): An array containing the data that needs to be projected.
labelsarray-like, shape (n_samples,), default = None: An array containing the labels (as numeric values) corresponding to each sample in X. Be sure to provide it when it is used by the optimization metric.
num_samplesint, default = None (optional): Size of the random subset of samples that will be used to find the optimal DR parameters. If None all samples will be used. Beware that for large datasets this may significantly slow down the optimization procedure! As a general recommendation one should not use significantly more than 10000 samples.
methodslist, default = [“LMDS”] (optional): A list with names of the DR methods to optimize as strings.
metricmetrics.Metrics instance, default=None (optional): A metrics.Metrics instance with a metric_total method which will be called to evaluate the DR performance for a given parameter set. If not provided metrics.DR_MetricsV1 will be initialized and used with its default parameters.
storage_pathstr, default = “./” (optional): Path to the folder in which temporary files and results will be stored.
param_gridstr, default = “./settings_DR.json” (optional): The path to a JSON file containing a compact parameter grid for each method provided in methods.
verbosebool, default = True (optional): Controls the verbosity.
seedint, default = None (optional): Random seed which is used by both the projection technique and for selecting a random subset of num_samples.

Returns

best_paramsdict: Dictionary containing the best parameter sets for each DR method specified in methods.
best_scoreslist: List containing the best total scores for each DR method specified in methods. The scores are computed by calling metric.metric_total.

SHARC.optimization_routines.optimize_LGC(X, labels=None, num_samples=None, methods=['LMDS'], metric=None, storage_path='./', param_grid='./settings_LGC.json', DR_params='./best_DR_params.json', verbose=True, seed=None)

Function that finds the optimal parameter set for each LGC method given a parameter grid.

Parameters

Xarray-like, shape (n_samples, n_features): An array containing the data that needs to be projected.
labelsarray-like, shape (n_samples,), default = None: An array containing the labels (as numeric values) corresponding to each sample in X. Be sure to provide it when it is used by the optimization metric.
num_samplesint, default = None (optional): Size of the random subset of samples that will be used to find the optimal LGC parameters. If None all samples will be used. Beware that for large datasets this may significantly slow down the optimization procedure! As a general recommendation one should not use significantly more than 10000 samples.
methodslist, default = [“LMDS”] (optional): A list with names of the DR methods to use in combination with LGC as strings.
metricmetrics.Metrics instance, default=None (optional): A metrics.Metrics instance with a metric_total method which will be called to evaluate the LGC performance for a given parameter set. If not provided metrics.LGC_Metrics will be initialized and used with its default parameters.
storage_pathstr, default = “./” (optional): Path to the folder in which temporary files and results will be stored.
param_gridstr, default = “./settings_LGC.json” (optional): The path to a JSON file containing a compact parameter grid for each method provided in methods.
DR_paramsstr, default = “./best_DR_params.json” (optional): The path to a JSON file containing the parameters to use for each DR method provided in methods.
verbosebool, default = True (optional): Controls the verbosity.
seedint, default = None (optional): Random seed which is used by both the projection technique and for selecting a random subset of num_samples.

Returns

best_paramsdict: Dictionary containing the best LGC parameter set for each DR method used which were specified in methods.
best_scoreslist: List containing the best total scores for each DR method used which were specified in methods. The scores are computed by calling metric.metric_total.

SHARC.optimization_routines.save_results(results, outfile)

Function to save optimization results to a JSON file.

Parameters

resultsdict: A dictionary containing the results to be saved to outfile.
outfilestr: The name of the file the results should be saved to.

SDR Optimization Metrics

class SHARC.metrics.DR_MetricsV1(metric=['euclidean', 'euclidean'], k=7)

Metric class for DR optimization using a metric composed of the trustworthiness, continuity, neighborhood hit and Shepard goodness metrics. Metric functions are inherited from the metrics.Metrics class.

Parameters

metricstr or list, default=[“euclidean”, “euclidean”]: Metrics to use when computing distances in the feature space and the projection space. When a string is provided that same metric will be used for both the feature space and the projection space. Values are passed to scipy.spatial.distance.pdist.
kint, default=7: Number of nearest neighbors to consider when computing the various metrics. Used by metric_trustworthiness, metric_continuity, metric_jaccard_similarity_coefficient, metric_neighborhood_hit and metric_distribution_consistency.

metric_total()

Function to compute the optimization metric.

Returns

totalfloat: The value between \([0, 1]\) of the composite metric, i.e.:

\[\frac{1}{4}\left(\text{trustworthiness} + \text{continuity} + \text{neighborhood hit} + \text{Shepard goodness}\right)\]

class SHARC.metrics.DR_MetricsV2(metric=['euclidean', 'euclidean'], k=7)

Metric class for DR optimization using a metric composed of only the distribution consistency metric. The metric function for distribution consistency is inherited from the metrics.Metrics class.

Parameters

metricstr or list, default=[“euclidean”, “euclidean”]: Metrics to use when computing distances in the feature space and the projection space. When a string is provided that same metric will be used for both the feature space and the projection space. Values are passed to scipy.spatial.distance.pdist.
kint, default=7: Number of nearest neighbors to consider when computing the various metrics. Used by metric_trustworthiness, metric_continuity, metric_jaccard_similarity_coefficient, metric_neighborhood_hit and metric_distribution_consistency.

metric_total()

Function to compute the optimization metric.

Returns

totalfloat: The value between \([0, 1]\) of the optimization metric, i.e. distribution consistency.

class SHARC.metrics.LGC_Metrics(metric='euclidean', k=7)

Metric class for LGC optimization using a metric composed of only the distribution consistency metric. The metric function for distribution consistency is inherited from the metrics.Metrics class.

Parameters

metricstr or list, default=[“euclidean”, “euclidean”]: Metrics to use when computing distances in the feature space and the projection space. When a string is provided that same metric will be used for both the feature space and the projection space. Values are passed to scipy.spatial.distance.pdist.
kint, default=7: Number of nearest neighbors to consider when computing the various metrics. Used by metric_trustworthiness, metric_continuity, metric_jaccard_similarity_coefficient, metric_neighborhood_hit and metric_distribution_consistency.

metric_total(k=7)

Function to compute the optimization metric.

Returns

totalfloat: The value between \([0, 1]\) of the optimization metric, i.e. distribution consistency.

class SHARC.metrics.Metrics(metric=['euclidean', 'euclidean'], k=7)

A base class for the computation of some basic metrics that quantify the performance of DR algorithms.

Parameters

metricstr or list, default=[“euclidean”, “euclidean”]: Metrics to use when computing distances in the feature space and the projection space. When a string is provided that same metric will be used for both the feature space and the projection space. Values are passed to scipy.spatial.distance.pdist.
kint, default=7: Number of nearest neighbors to consider when computing the various metrics. Used by metric_trustworthiness, metric_continuity, metric_jaccard_similarity_coefficient, metric_neighborhood_hit and metric_distribution_consistency.

fit(X, Y, labels=None)

Fit the provided data to the metric instance. That is, for both X and Y compact distance matrices and nearest neighbor sets are computed.

Parameters

Xarray-like, shape (n_samples, n_features): Feature space dataset.
Yarray-like, shape (n_samples, n_embedding_dimensions): Projection space dataset.
labelsarray-like, shape (n_samples, ), default=None: An array of label values for each sample. Only required for purity/VSC metrics such as metric_neighborhood_hit, metric_distance_consistency and metric_distribution_consistency

Returns

selfobject: Returns self.

get_summary()

Function to get a summary of the computed metrics.

Returns

summarydict: A dictionary containing all computed metrics and their values.

metric_continuity()

Function to compute the continuity metric which quantifies the proportion of missing neighbors in the projection. The functional definition reads as follows:

(1)\[M_c(k) = 1 - \frac{2}{Nk(2N-3k-1)}\sum^{N}_{i=1}\sum_{j\in \mathcal{V}^k_i}(\hat{r}(i,j)-k)\]

In this definition, \(N\) is the number of samples in the dataset and \(k\) is the number of nearest neighbors to consider and should always be smaller than \(N / 2\) for the metric to be properly normalized. The set \(\mathcal{V}^{k}_i\) consists of the \(k\) nearest neighbors of sample \(i\) in original data space that are not among the \(k\) data vectors after the projection. The quantity \(\hat{r}(i,j)\) specifies the rank of the point \(j\) when feature vectors are based on their distance to point \(i\) after the projection.

Returns

continuityfloat: The value between \([0,1]\) yielded by the continuity metric.

metric_distance_consistency()

Function to compute the distance consistency metric which measures how well separated data clusters with different labels are in the projection. The functional definition reads as follows:

(2)\[M_{\text{DSC}} = 1 - \frac{\left|\left\{\vec{x}\in D : \text{CD}(\vec{x}, \text{centr}(\text{clabel}(\vec{x}))) \neq 1\right\}\right|}{N}\]

In this definition, \(N\) is the number of samples in the dataset \(D\) and \(\text{CD}(\vec{x}, \text{centr}(\text{clabel}(\vec{x})))\) is the so-called centroid distance which is defined as follows:

\[\begin{split}\text{CD}(\vec{x}, \text{centr}(\text{clabel}(\vec{x}))) = \begin{cases} 1\quad d(\vec{x},\text{centr}(\text{clabel}(\vec{x}))) < d(\vec{x},\text{centr}(c_i)) \forall i \in [0, m] \wedge c_i \neq \text{clabel}(\vec{x})\\ 0\quad\text{otherwise} \end{cases}\end{split}\]

where \(\text{centr}(c_i)\) is the position of the centroid corresponding to all datapoints with class label \(c_i\), \(\text{clabel}(\vec{x})\) gets the class label of datapoint \(\vec{x}\) and \(d(\vec{x},\vec{y})\) is the distance between points \(\vec{x}\) and \(\vec{y}\).

Returns

distance_consistencyfloat: The value between \([0, 1]\) yielded by the distance consistency metric.

metric_distribution_consistency()

Function to compute the distribution consistency metric which measures how well separated data with different class labels are in the projection. The functional definition reads as follows:

(3)\[M_{\text{DC}} = 1 + \frac{1}{N\log_2(m)}\sum_{\vec{x}\in D}\sum_{i=0}^{m}\frac{p_{c_i}}{\sum_{i=0}^m p_{c_i}}\log_2\left(\frac{p_{c_i}}{\sum_{i=0}^m p_{c_i}}\right)\]

In this definition, \(N\) is the number of samples in the dataset \(D\), \(m\) is the number of unique class labels and \(p_{c_i}\) is the number of datapoints of class \(c_i\) in the nearest neighbor set of a point \(\vec{x}\). The way this metric is defined, it measures the average purity with respect to the class labels in the neighborhood of all points in the dataset. To probe the purity it uses the Shannon entropy.

Returns

distribution_consistencyfloat: The value between \([0, 1]\) yielded by the distribution consistency metric.

metric_jaccard_similarity_coefficient()

Function to compute the Jaccard similarity coefficient metric which quantifies the proportion of overlap between the \(k\)-nearest neighbor sets in the feature space and the projection space. The functional definition reads as follows:

(4)\[M_J(k) = \frac{1}{N}\sum^{N}_{i=1}\frac{\left|\mathcal{N}^k_i \cap \mathcal{M}^k_i\right|}{\left|\mathcal{N}^k_i \cup \mathcal{M}^k_i\right|}\]

In this definition, \(N\) is the number of samples in the dataset and \(k\) is the number of nearest neighbors to consider. The set \(\mathcal{N}^{k}_i\) consists of the \(k\) nearest neighbors of sample \(i\) in original data space. The set \(\mathcal{M}^{k}_i\) consists of the \(k\) nearest neighbors of sample \(i\) in the projection.

Returns

jaccard_similarity_coefficientfloat: The value between \([0,1]\) yielded by the Jaccard similarity coefficient metric.

metric_neighborhood_hit()

Function to compute the neighborhood hit metric which measures how well separated datapoints with different labels are in the projection. The functional definition reads as follows:

(5)\[M_{NH}(k) = \frac{1}{kN}\sum^{N}_{i=1}\left|\left\{j\in\mathcal{N}^{k}_{i} | l_j = l_i\right\}\right|\]

In this definition, \(N\) is the number of samples in the dataset and \(k\) is the number of nearest neighbors to consider. The set \(\mathcal{N}^k_i\) is the set of nearest neighbors of point \(i\) in the projection space and \(l_i\) denotes the label of a point \(i\).

Returns

normalized_stressfloat: The value between \([0, \infty]\) yielded by the normalized stress metric.

metric_normalized_stress()

Function to compute the normalized stress metric which quantifies the respective mismatch between pointwise distances in the feature space and the projection space. The functional definition reads as follows:

(6)\[M_{\sigma}(k) = \frac{\sum^{N}_{i=1}\sum^{N}_{j=1}\left(\Delta^n(\mathbf{x}_i,\mathbf{x}_j)-\Delta^m(P\left(\mathbf{x}_i\right),P\left(\mathbf{x}_j)\right)\right)^2}{\sum^{N}_{i=1}\sum^{N}_{j=1}\Delta^n(\mathbf{x}_i,\mathbf{x}_j)^2}\]

In this definition, \(N\) is the number of samples in the dataset. The function \(\Delta^n(\mathbf{x}_i, \mathbf{x}_j)\) returns the distance between points \(i\) and \(j\) in \(n\)-dimensions.

Returns

normalized_stressfloat: The value between \([0, \infty]\) yielded by the normalized stress metric.

metric_shepard_goodness(return_shepard=False)

Function that computes the Shepard goodness metric, i.e. the spearman rank correlation of the Shepard diagram.

Parameters

return_shepardbool, default=False: Controls whether to return the Shepard diagram as well.

Returns

shepard_goodnessfloat: The value between \([0,1]\) of the Shepard goodness metric.

metric_trustworthiness()

Function to compute the trustworthiness metric which quantifies the proportion of false neighbors in the projection. The functional definition reads as follows:

(7)\[M_t(k) = 1 - \frac{2}{Nk(2N-3k-1)}\sum^{N}_{i=1}\sum_{j\in \mathcal{U}_i^k}(r(i,j) - k)\]

In this definition, \(N\) is the number of samples in the dataset and \(k\) is the number of nearest neighbors to consider and should always be smaller than \(N / 2\) for the metric to be properly normalized. The set \(\mathcal{U}_i^k\) consists of the \(k\) nearest neighbors of sample \(i\) in the projection that are not amongst the \(k\) nearest neighbors of \(i\) in the original space. The quantity \(r(i,j)\) specifies the rank of the point \(j\) when feature vectors are ordered based on their distance to point \(i\) in the original space.

Returns

trustworthinessfloat: The value between \([0,1]\) yielded by the trustworthiness metric.

print_summary(file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, end='\n'): Function to print a summary of the computed metrics.

Parameters

file : file-like object (stream), default=sys.stdout

end : string appended after the last value, default=’\n’

shepard_diagram()

Function that returns the Shepard diagram.

Returns

shepard_diagramarray-like (n_pairs, 2): An array of pairwise distances between points in the original data space and the projection.

NNP Models

class SHARC.nn_models.DenseBlock(*args, **kwargs)

Class constructor of a dense block.

Parameters

unitsint (required): Number of units in the Dense layer.
momentumfloat between [0,1], default=0.6 (optional): Momentum parameter of the batch normalization layer. Should be close to 1 for slow learning of batch normalization layer. Typically somewhere between 0.6 and 0.85 works fine for big batches.
alphafloat, default=0.3 (optional): Negative slope coefficient of leaky ReLU layer.
ratefloat between [0,1], default=0 (optional): Dropout rate.

call(x, training=True)

Calls the model on new inputs and returns the outputs as tensors.

In this case call() just reapplies all ops in the graph to the new inputs (e.g. build a new computational graph from the provided inputs).

Note: This method should not be called directly. It is only meant to be overridden when subclassing tf.keras.Model. To call a model on an input, always use the __call__() method, i.e. model(inputs), which relies on the underlying call() method.

Args:

inputs: Input tensor, or dict/list/tuple of input tensors. training: Boolean or boolean scalar tensor, indicating whether to

run the Network in training mode or inference mode.

mask: A mask or list of masks. A mask can be either a boolean tensor: or None (no mask). For more details, check the guide [here](https://www.tensorflow.org/guide/keras/masking_and_padding).

Returns:

A tensor if there is a single output, or a list of tensors if there are more than one outputs.

class SHARC.nn_models.NNPModelBackboneV1(*args, **kwargs)

NNP model backbone class version 1.

Parameters

D1_unitsint (required): Number of units in the first dense layer of the network. Should not be less than 4!
**kwargs(optional): Additional keyword arguments to be passed to each block in this backbone.

call(inputs, training=True)

Calls the model on new inputs and returns the outputs as tensors.

In this case call() just reapplies all ops in the graph to the new inputs (e.g. build a new computational graph from the provided inputs).

Note: This method should not be called directly. It is only meant to be overridden when subclassing tf.keras.Model. To call a model on an input, always use the __call__() method, i.e. model(inputs), which relies on the underlying call() method.

Args:

inputs: Input tensor, or dict/list/tuple of input tensors. training: Boolean or boolean scalar tensor, indicating whether to

run the Network in training mode or inference mode.

mask: A mask or list of masks. A mask can be either a boolean tensor: or None (no mask). For more details, check the guide [here](https://www.tensorflow.org/guide/keras/masking_and_padding).

Returns:

A tensor if there is a single output, or a list of tensors if there are more than one outputs.

class SHARC.nn_models.NNPModelBackboneV2(*args, **kwargs)

NNP model backbone class version 2.

Parameters

D1_unitsint (required): Number of units in the first dense layer of the network. Should not be less than 4!
**kwargs(optional): Additional keyword arguments to be passed to each block in this backbone.

call(inputs, training=True)

Calls the model on new inputs and returns the outputs as tensors.

In this case call() just reapplies all ops in the graph to the new inputs (e.g. build a new computational graph from the provided inputs).

Note: This method should not be called directly. It is only meant to be overridden when subclassing tf.keras.Model. To call a model on an input, always use the __call__() method, i.e. model(inputs), which relies on the underlying call() method.

Args:

inputs: Input tensor, or dict/list/tuple of input tensors. training: Boolean or boolean scalar tensor, indicating whether to

run the Network in training mode or inference mode.

mask: A mask or list of masks. A mask can be either a boolean tensor: or None (no mask). For more details, check the guide [here](https://www.tensorflow.org/guide/keras/masking_and_padding).

Returns:

A tensor if there is a single output, or a list of tensors if there are more than one outputs.

SHARC.nn_models.construct_NNPModel(num_input_features, output_dimensions=2, output_activation='sigmoid', version=2, **kwargs)

Function to construct a NNP (neural network projection) model.

Parameters

num_input_featuresint: The number of input features.
output_dimensionsint, default=2 (optional): The number of output dimensions of the projection.
output_activationstr or function, default=”sigmoid” (optional): Activation function to use.
versionint, default=2 (optional): Version of the NNP model backbone to use.
**kwargs(optional): Additional keyword arguments will be passed to the NNP model backbone.

Returns

modeltensorflow Model: A tf.keras.Model instance.

NNP Training Utilities

SHARC.nn_training_utils.train_nnp(X, Y_true, model, loss_function, optimizer, labels=None, epochs=10, validation_ratio=0.25, save_path='./NNP', verbose=False)

Function that handles the training of the NNP model.

Parameters

Xarray-like, shape (n_samples, n_features): Feature space training dataset.
Y_truearray-like, shape (n_samples, n_embedding_dimensions): Projection space training dataset.
modeltensorflow Model: The tf.keras.Model instance to train.
loss_function :: A Tensorflow compatible loss function, i.e. it supports auto differentiation, to use for optimization.
optimizertensorflow optimizer: The tf.keras.optimizer to use for optimization.
labelsarray-like, shape (n_samples,), default = None: An array containing the labels (as numeric values) corresponding to each sample in X and Y_true. When provided it is used to stratify the cross validation set.
epochsint, default = 10 (optional): Maximum number of epochs.
validation_ratiofloat, default = 0.25 (optional): Fraction of the dataset to use for cross validation at each training epoch.
save_pathstr, default = “./NNP” (optional): Path the save the checkpoints, training history and trained model to.
verbosebool, default = False (optional): Controls the verbosity.

Returns

train_lossnumpy.ndarray, shape (epochs,): Training loss at each epoch.
valid_lossnumpy.ndarray, shape (epochs,): Validatation loss at each epoch.
pred_train_lossnumpy.ndarray, shape (epochs,): Inferential training loss at each epoch.

y_true:: Ground truth values. shape = [batch_size, d0, .. dN], except sparse loss functions such as sparse categorical crossentropy where shape = [batch_size, d0, .. dN-1]
y_pred:: The predicted values. shape = [batch_size, d0, .. dN]

SDR-NNP Classifier

class SHARC.classifiers.SDRNNPClassifier(nnp_model_path=None, classifier=None)

A classifier which implements SDR-NNP based classification.

Parameters

nnp_model_pathstr, default=None: Path to the stored SDR-NNP model (required).
classifierobject, default=None: Classifier used for the final classification (required).

Attributes

X_ndarray, shape (n_samples, n_features): The input passed during fit().
y_ndarray, shape (n_samples,): The labels passed during fit().
classes_ndarray, shape (n_classes,): The classes seen at fit().

fit(X, y)

Fit the SDR-NNP based classifier from the training dataset.

Parameters

Xarray-like, shape (n_samples, n_features): The training input samples.
yarray-like, shape (n_samples,): The target values. An array of int.

Returns

selfobject: Returns self.

plot_classifier_decision_boundaries(ax=None, grid_resolution=200, eps=0.2, plot_method='contourf', cmap=<matplotlib.colors.LinearSegmentedColormap object>, alpha=0.3, **kwargs)

Plot decision boundaries for the trained classifier.

Parameters

axmatplotlib Axes, default=None: Axes object to plot on. If None, a new figure and axes is created.
**kwargs: Additional arguments are passed to sklearn.inspection.DecisionBoundaryDisplay.from_estimator().

Returns

displayDecisionBoundaryDisplay: Object storing the result.

plot_projection(X, y=None, ax=None)

Plot the SDR-NNP projection of the input data X.

Parameters

Xarray-like, shape (n_samples, n_features): The input samples.
yarray-like, shape (n_samples,), default=None: The target values. An array of int.
axmatplotlib Axes, default=None: Axes object to plot on. If None, a new figure and axes is created.

Returns

axmatplotlib Axes: Axes object that was plotted on.

predict(X)

Predict the class labels for the provided data.

Parameters

Xarray-like, shape (n_samples, n_features): The input samples.

Returns

yndarray, shape (n_samples,): Class labels for each data sample.

predict_proba(X)

Return probability estimates for the test data X.

Parameters

Xarray-like, shape (n_samples, n_features): The input samples.

Returns

yndarray, shape (n_samples,): Class labels for each data sample.

Consolidation Methods

SHARC.consolidation.alternative_consolidation(predictions)

When the predictions by the different classifiers are in disagreement the sample is assigned to the post-consolidation outlier class.

Parameters

predictionsarray-like, shape (n_classifiers, n_samples): The predictions given by each classifier.

Returns

labelsarray-like, shape (n_samples,): The consolidated labels.

SHARC.consolidation.average_probability_consolidation(probabilities, threshold=None, label_lut=None)

Consolidation is done by averaging the probabilities for each class returned by each classifier. Samples are labelled by the class with the highest average probability.

Parameters

probabilitiesarray-like, shape (n_classifiers, n_samples, n_classes): The probabilities predicted for each class by each classifier.
thresholdfloat, default=None (optional): Optional probability threshold. Whenever, the highest average probability falls below the given threshold value the sample is classified as an outlier.
label_lutarray-like, shape (n_classes,): Lookup table for the labels.

Returns

labelsarray-like, shape (n_samples,): The consolidated labels.

SHARC.consolidation.lowest_entropy_consolidation(probabilities, threshold=None, label_lut=None, return_entropies=False, return_selected_classifiers=False)

For each sample use the classification of the classifier with lowest entropy in the distribution of class labels.

Parameters

probabilitiesarray-like, shape (n_classifiers, n_samples, n_classes): The probabilities predicted for each class by each classifier.
thresholdfloat, default=None: The entropy threshold. Samples with a post-consolidation entropy above this threshold with be classified as an outlier. If None no thresholding will be applied.
label_lutarray-like, shape (n_classes,): Lookup table for the labels.

Returns

labelsarray-like, shape (n_samples,): The consolidated labels.
entropies: array-like, shape (n_classifiers, n_samples): The computed entropy in the probabilities predicted for each class by each classifier. Only returned if return_entropies=True.
selected_classification: array-like, shape (n_samples): An array consisting of indices corresponding to the classifiers that were used in the final classification of each sample. Only returned if return_selected_classifiers=True

SHARC.consolidation.majority_vote_consolidation(predictions)

Consolidation is done through a majority vote. When the vote is indecisive the sample is classified as an outlier.

Parameters

predictionsarray-like, shape (n_classifiers, n_samples): The predictions given by each classifier.

Returns

labelsarray-like, shape (n_samples,): The consolidated labels.

SHARC.consolidation.multiplied_probability_consolidation(probabilities, threshold=None, label_lut=None)

Consolidation is done by multiplying the probabilities for each class returned by each classifier. Samples are labelled by the class with the highest multiplied probability.

Parameters

probabilitiesarray-like, shape (n_classifiers, n_samples, n_classes): The probabilities predicted for each class by each classifier.
thresholdfloat, default=None (optional): Optional probability threshold. Whenever, the highest multiplied and normalized probability falls below the given threshold value the sample is classified as an outlier.
label_lutarray-like, shape (n_classes,): Lookup table for the labels.

Returns

labelsarray-like, shape (n_samples,): The consolidated labels.

Additional Utilities

SHARC.utils.insertColors(table, colors)

Combines magnitudes in astropy Table into colours and adds them to the table. The astropy Table is modified in-place.

Parameters

tableastropy Table: Table containing the magnitudes to be combined into colours.
colorsiterable: An array or list containing colours as strings with magnitudes corresponding to column names in the astropy Table.

Returns

tableastropy Table: Modified astropy Table with colours added as columns to the end of the original Table.

SHARC.utils.writeDataset(table, filename, verbose=True, overwrite=False)

Writes an astropy Table to a fits file.

Parameters

tableastropy Table: Table to be written to file.
filenamestr: Filename of the file the astropy Table needs to be written to.
verbosebool, default = True: Variable controlling the verbosity of this function.
overwritebool, default = False: Variable controlling whether to overwrite any existing file. When the file already exists and overwrite=False the dataset won’t be written and the function will exit.

Plot Functions

class SHARC.plot_funcs.CustomConfusionMatrixDisplay(confusion_matrix, *, display_labels=None)