Statistics
Basic statistic calculations
- pyPCG.stats.max(data: ndarray[tuple[int, ...], dtype[float64]], k: int = 1) ndarray[tuple[int, ...], dtype[float64]] | float64[source]
Get maximum values from input
- Parameters:
data (np.ndarray) – input data
k (int, optional) – number of largest values to return. Defaults to 1.
- Returns:
maximum value(s)
- Return type:
np.ndarray | float
- pyPCG.stats.min(data: ndarray[tuple[int, ...], dtype[float64]], k: int = 1) ndarray[tuple[int, ...], dtype[float64]] | float64[source]
Get minimum values from input
- Parameters:
data (np.ndarray) – input data
k (int, optional) – number of smallest values to return. Defaults to 1.
- Returns:
minimum value(s)
- Return type:
np.ndarray | float
- pyPCG.stats.mean(data: ndarray[tuple[int, ...], dtype[float64]]) ndarray[tuple[int, ...], dtype[float64]] | float64[source]
Calculate mean of inputs
- Parameters:
data (np.ndarray) – input data (can be 2 dimensional array)
- Returns:
mean value of data, if input is 2D then return the value along of 1st axis
- Return type:
np.ndarray | float
- pyPCG.stats.std(data: ndarray[tuple[int, ...], dtype[float64]]) ndarray[tuple[int, ...], dtype[float64]] | float64[source]
Calculate standard deviation of inputs
- Parameters:
data (np.ndarray) – input data (can be 2 dimensional array)
- Returns:
standard deviation of data, if input is 2D then return the value along of 1st axis
- Return type:
np.ndarray | float
- pyPCG.stats.med(data: ndarray[tuple[int, ...], dtype[float64]]) ndarray[tuple[int, ...], dtype[float64]] | float64[source]
Calculate median of inputs
- Parameters:
data (np.ndarray) – input data (can be 2 dimensional array)
- Returns:
median of data, if input is 2D then return the value along of 1st axis
- Return type:
np.ndarray | float
- pyPCG.stats.percentile(data: ndarray[tuple[int, ...], dtype[float64]], perc: float = 25) ndarray[tuple[int, ...], dtype[float64]] | float64[source]
Calculate given percentile of inputs
- Parameters:
data (np.ndarray) – input data (can be 2 dimensional array)
perc (float) – selected percentile to calculate. Defaults to 25.
- Returns:
given percentile of data, if input is 2D then return the value along of 1st axis
- Return type:
np.ndarray | float
- pyPCG.stats.rms(data: ndarray[tuple[int, ...], dtype[float64]]) ndarray[tuple[int, ...], dtype[float64]] | float64[source]
Calculate root mean square of inputs
- Parameters:
data (np.ndarray) – input data (can be 2 dimensional array)
- Returns:
root mean square of data, if input is 2D then return the value along of 1st axis
- Return type:
np.ndarray | float
- pyPCG.stats.skew(data: ndarray[tuple[int, ...], dtype[float64]]) ndarray[tuple[int, ...], dtype[float64]] | float64[source]
Calculate skewness of inputs
- Parameters:
data (np.ndarray) – input data (can be 2 dimensional array)
- Returns:
skewness of data, if input is 2D then return the value along of 1st axis
- Return type:
np.ndarray | float
- pyPCG.stats.kurt(data: ndarray[tuple[int, ...], dtype[float64]]) ndarray[tuple[int, ...], dtype[float64]] | float64[source]
Calculate kurtosis of inputs
- Parameters:
data (np.ndarray) – input data (can be 2 dimensional array)
- Returns:
kurtosis of data, if input is 2D then return the value along of 1st axis
- Return type:
np.ndarray | float
Operators
These functions take a given statistic calculation function and extend it
- pyPCG.stats.window_operator(data: ndarray[tuple[int, ...], dtype[float64]], win_size: int, fun: Callable, overlap_percent: float = 0.5) tuple[ndarray[tuple[int, ...], dtype[int64]], ndarray[tuple[int, ...], dtype[float64]]][source]
Apply given statistical function over a sliding window on the input
- Parameters:
data (np.ndarray) – input data
win_size (int) – window size
fun (Callable) – statistical function to apply
overlap_percent (float, optional) – window overlap as a ratio to the window size. Defaults to 0.5.
- Returns:
window sample locations (usually used as time dimension), and calculated values in the windows
- Return type:
tuple[np.ndarray,np.ndarray]
Transformations
These functions take the input data and give a new dataset
- pyPCG.stats.trim_transform(data: ndarray[tuple[int, ...], dtype[float64]], trim_precent: float) ndarray[tuple[int, ...], dtype[float64]][source]
Trim the upper and lower percentage of values
- Parameters:
data (np.ndarray) – input data to trim
trim_precent (float) – percentage to trim away
- Returns:
trimmed values
- Return type:
np.ndarray
- pyPCG.stats.outlier_remove_transform(data: ndarray[tuple[int, ...], dtype[float64]], dist: float = 3.0) ndarray[tuple[int, ...], dtype[float64]][source]
Remove outliers based on the MAD (median of absolute differences)
- Parameters:
data (np.ndarray) – input data
dist (float, optional) – MAD score threshold. Defaults to 3.0.
- Returns:
data without outliers
- Return type:
np.ndarray
Statistic calculation grouping
- typeddict pyPCG.stats.stats_config[source]
Type to hold statistic calculation configs
- Required Keys:
calc_fun (
Callable) – function for calculationname (
str) – name of the calculated statistic
- class pyPCG.stats.stats_group(*stats: stats_config)[source]
Group statistic calculations together for reuse
- configs
List of statistic calculations with names
- Type:
list[stats_config]
- signal_stats
Signal statistics by segment, #TODO: this will become its own type in the future
- Type:
dict[str, dict[str, list[float]]]
- dataframe
Pandas dataframe container of statistics for utility
- Type:
pd.DataFrame
Example
Create a statistic group calculation:
For an easier experience use the stats_config type
>>> import pyPCG.stats as sts >>> stat_1 = {"calc_fun":sts.mean,"name":"Mean"} >>> stat_2 = {"calc_fun":sts.std,"name":"Std"} >>> mean_std = sts.stats_group(stat_1,stat_2)
Run the created group with some dummy features:
>>> import numpy as np >>> dummy = {"length":np.arange(10),"max freq":np.arange(10)} >>> basic_stats = mean_std.run(dummy) >>> print(basic_stats) {'Feature': ['length', 'max freq'], 'Mean': [4.5, 4.5], 'Std': [2.8722813232690143, 2.8722813232690143]}
Add statistics for a given segment, and export it as xlsx:
>>> mean_std.add_stat("Test",basic_stats) >>> mean_std.export("test.xlsx")
- add_stat(segment: str, stats: dict[str, list[float]])[source]
Add calculated statistics to signal_stats with the given segment name
- Parameters:
segment (str) – segment name to save to
stats (dict[str,list[float]]) – calculated statistics
- calc_group_stats(total_ftr_dict: dict[str, dict[str, ndarray[tuple[int, ...], dtype[float64]]]])[source]
Calculate all stats on all given features and segments
- Parameters:
total_ftr_dict (dict[str,dict[str,np.ndarray]]) – Feature dictionaries named by segment
- export(filename: str)[source]
Export statistics to excel file
- Parameters:
filename (str) – filename to save to
- run(ftr_dict: dict[str, ndarray[tuple[int, ...], dtype[float64]]]) dict[str, list[float]][source]
Run the statistic calculation based on configuration
- Parameters:
ftr_dict (dict[str,np.ndarray]) – feature dictionary, same format as the output of feature_group.run
- Returns:
Calculated statistics, named
- Return type:
dict[str,list[float]]