Statistics

Basic statistic calculations

pyPCG.stats.max(data: ndarray[tuple[int, ...], dtype[float64]], k: int = 1) → ndarray[tuple[int, ...], dtype[float64]] | float64[source]

Get maximum values from input

Parameters:

data (np.ndarray) – input data
k (int, optional) – number of largest values to return. Defaults to 1.

Returns:

maximum value(s)

Return type:

np.ndarray | float

pyPCG.stats.min(data: ndarray[tuple[int, ...], dtype[float64]], k: int = 1) → ndarray[tuple[int, ...], dtype[float64]] | float64[source]

Get minimum values from input

Parameters:

data (np.ndarray) – input data
k (int, optional) – number of smallest values to return. Defaults to 1.

Returns:

minimum value(s)

Return type:

np.ndarray | float

pyPCG.stats.mean(data: ndarray[tuple[int, ...], dtype[float64]]) → ndarray[tuple[int, ...], dtype[float64]] | float64[source]

Calculate mean of inputs

Parameters:: data (np.ndarray) – input data (can be 2 dimensional array)
Returns:: mean value of data, if input is 2D then return the value along of 1st axis
Return type:: np.ndarray | float

pyPCG.stats.std(data: ndarray[tuple[int, ...], dtype[float64]]) → ndarray[tuple[int, ...], dtype[float64]] | float64[source]

Calculate standard deviation of inputs

Parameters:: data (np.ndarray) – input data (can be 2 dimensional array)
Returns:: standard deviation of data, if input is 2D then return the value along of 1st axis
Return type:: np.ndarray | float

pyPCG.stats.med(data: ndarray[tuple[int, ...], dtype[float64]]) → ndarray[tuple[int, ...], dtype[float64]] | float64[source]

Calculate median of inputs

Parameters:: data (np.ndarray) – input data (can be 2 dimensional array)
Returns:: median of data, if input is 2D then return the value along of 1st axis
Return type:: np.ndarray | float

pyPCG.stats.percentile(data: ndarray[tuple[int, ...], dtype[float64]], perc: float = 25) → ndarray[tuple[int, ...], dtype[float64]] | float64[source]

Calculate given percentile of inputs

Parameters:

data (np.ndarray) – input data (can be 2 dimensional array)
perc (float) – selected percentile to calculate. Defaults to 25.

Returns:

given percentile of data, if input is 2D then return the value along of 1st axis

Return type:

np.ndarray | float

pyPCG.stats.rms(data: ndarray[tuple[int, ...], dtype[float64]]) → ndarray[tuple[int, ...], dtype[float64]] | float64[source]

Calculate root mean square of inputs

Parameters:: data (np.ndarray) – input data (can be 2 dimensional array)
Returns:: root mean square of data, if input is 2D then return the value along of 1st axis
Return type:: np.ndarray | float

pyPCG.stats.skew(data: ndarray[tuple[int, ...], dtype[float64]]) → ndarray[tuple[int, ...], dtype[float64]] | float64[source]

Calculate skewness of inputs

Parameters:: data (np.ndarray) – input data (can be 2 dimensional array)
Returns:: skewness of data, if input is 2D then return the value along of 1st axis
Return type:: np.ndarray | float

pyPCG.stats.kurt(data: ndarray[tuple[int, ...], dtype[float64]]) → ndarray[tuple[int, ...], dtype[float64]] | float64[source]

Calculate kurtosis of inputs

Parameters:: data (np.ndarray) – input data (can be 2 dimensional array)
Returns:: kurtosis of data, if input is 2D then return the value along of 1st axis
Return type:: np.ndarray | float

pyPCG.stats.iqr(data: ndarray[tuple[int, ...], dtype[float64]]) → ndarray[tuple[int, ...], dtype[float64]] | float64[source]

Calculate interquartile range

Parameters:: data (np.ndarray) – input data
Returns:: interquartile range of data
Return type:: np.ndarray | float

Operators

These functions take a given statistic calculation function and extend it

pyPCG.stats.window_operator(data: ndarray[tuple[int, ...], dtype[float64]], win_size: int, fun: Callable, overlap_percent: float = 0.5) → tuple[ndarray[tuple[int, ...], dtype[int64]], ndarray[tuple[int, ...], dtype[float64]]][source]

Apply given statistical function over a sliding window on the input

Parameters:

data (np.ndarray) – input data
win_size (int) – window size
fun (Callable) – statistical function to apply
overlap_percent (float, optional) – window overlap as a ratio to the window size. Defaults to 0.5.

Returns:

window sample locations (usually used as time dimension), and calculated values in the windows

Return type:

tuple[np.ndarray,np.ndarray]

Transformations

These functions take the input data and give a new dataset

pyPCG.stats.trim_transform(data: ndarray[tuple[int, ...], dtype[float64]], trim_precent: float) → ndarray[tuple[int, ...], dtype[float64]][source]

Trim the upper and lower percentage of values

Parameters:

data (np.ndarray) – input data to trim
trim_precent (float) – percentage to trim away

Returns:

trimmed values

Return type:

np.ndarray

pyPCG.stats.outlier_remove_transform(data: ndarray[tuple[int, ...], dtype[float64]], dist: float = 3.0) → ndarray[tuple[int, ...], dtype[float64]][source]

Remove outliers based on the MAD (median of absolute differences)

Parameters:

data (np.ndarray) – input data
dist (float, optional) – MAD score threshold. Defaults to 3.0.

Returns:

data without outliers

Return type:

np.ndarray

Statistic calculation grouping

typeddict pyPCG.stats.stats_config[source]

Type to hold statistic calculation configs

Required Keys:

calc_fun (Callable) – function for calculation
name (str) – name of the calculated statistic

class pyPCG.stats.stats_group(*stats: stats_config)[source]

Group statistic calculations together for reuse

configs

List of statistic calculations with names

Type:: list[stats_config]

signal_stats

Signal statistics by segment, #TODO: this will become its own type in the future

Type:: dict[str, dict[str, list[float]]]

dataframe

Pandas dataframe container of statistics for utility

Type:: pd.DataFrame

Example

Create a statistic group calculation:

For an easier experience use the stats_config type

>>> import pyPCG.stats as sts
>>> stat_1 = {"calc_fun":sts.mean,"name":"Mean"}
>>> stat_2 = {"calc_fun":sts.std,"name":"Std"}
>>> mean_std = sts.stats_group(stat_1,stat_2)

Run the created group with some dummy features:

>>> import numpy as np
>>> dummy = {"length":np.arange(10),"max freq":np.arange(10)}
>>> basic_stats = mean_std.run(dummy)
>>> print(basic_stats)
{'Feature': ['length', 'max freq'], 'Mean': [4.5, 4.5], 'Std': [2.8722813232690143, 2.8722813232690143]}

Add statistics for a given segment, and export it as xlsx:

>>> mean_std.add_stat("Test",basic_stats)
>>> mean_std.export("test.xlsx")

add_stat(segment: str, stats: dict[str, list[float]])[source]

Add calculated statistics to signal_stats with the given segment name

Parameters:

segment (str) – segment name to save to
stats (dict[str,list[float]]) – calculated statistics

calc_group_stats(total_ftr_dict: dict[str, dict[str, ndarray[tuple[int, ...], dtype[float64]]]])[source]

Calculate all stats on all given features and segments

Parameters:: total_ftr_dict (dict[str,dict[str,np.ndarray]]) – Feature dictionaries named by segment

export(filename: str)[source]

Export statistics to excel file

Parameters:: filename (str) – filename to save to

run(ftr_dict: dict[str, ndarray[tuple[int, ...], dtype[float64]]]) → dict[str, list[float]][source]

Run the statistic calculation based on configuration

Parameters:: ftr_dict (dict[str,np.ndarray]) – feature dictionary, same format as the output of feature_group.run
Returns:: Calculated statistics, named
Return type:: dict[str,list[float]]