Stats group tutorial

Setup steps

import pyPCG as pcg
import pyPCG.io as pcg_io
import pyPCG.preprocessing as preproc
import pyPCG.segment as sgm
import pyPCG.features as ftr
import pyPCG.stats as sts
from importlib.resources import files
data, fs = pcg_io.read_signal_file(str(files('pyPCG').joinpath("data").joinpath("example.wav")),"wav")
signal = pcg.pcg_signal(data,fs)

signal = pcg.normalize(signal)
bp_signal = preproc.filter(preproc.filter(signal,6,100,"LP"),6,20,"HP")
denoise_signal = preproc.wt_denoise(bp_signal)
env_signal = preproc.homomorphic(denoise_signal)
hsmm = sgm.load_hsmm(str(files('pyPCG').joinpath("data").joinpath("pre_trained_fpcg.json")))
states = sgm.segment_hsmm(hsmm,signal)
s1_start, s1_end = sgm.convert_hsmm_states(states,sgm.heart_state.S1)

Calculate some example features

s1_len = ftr.time_delta(s1_start,s1_end,env_signal)
s1_maxfreq, s1_maxfreq_val = ftr.max_freq(s1_start,s1_end,signal,nfft=1024)

Basic statistics

To create statistics from the calculated features, call the appropriate statistic function.

Let’s calculate the mean and standard deviation of the example features above

mean_len = sts.mean(s1_len)
std_len = sts.std(s1_len)
print(f"{mean_len=:.3f} {std_len=:.3f}")

mean_maxfreq = sts.mean(s1_maxfreq)
std_maxfreq = sts.std(s1_maxfreq)
print(f"{mean_maxfreq=:.3f} {std_maxfreq=:.3f}")
mean_len=0.095 std_len=0.009
mean_maxfreq=29.278 std_maxfreq=3.078

Statistics group object

If a large amount of different statistics is required for multiple features and multiple segment types, then a statistics group can be created to reduce the repeated code and lessen the possibility human error.

The stats_group object takes an arbitrary amount of stats configs, which is a dictionary of the statistic measure and its name.

For example, let’s create a common statistic measure group with the mean and standard deviation, as seen above.

mean_std = sts.stats_group({"calc_fun":sts.mean,"name":"Mean"},
                           {"calc_fun":sts.std,"name":"Std"})

To run the calculations call the run method on the statistics group.

The input is a dictionary containing the features with their names.

The output will be a dictionary with a Feature field containing a list of the names of the features, and the calculated statistics with the names described in the configs. The values are in the same order as in the Feature list.

basic_stats = mean_std.run({"length":s1_len,"max freq":s1_maxfreq})
print(basic_stats)
{'Feature': ['length', 'max freq'], 'Mean': [0.09477387835596791, 29.278003329730993], 'Std': [0.009471572524317224, 3.0778448376432928]}

The required input format for running a statistics group is the same as the output of a feature group object.

Let’s create a feature group for demonstration. (For additional details, see the feature group tutorial)

example_group = ftr.feature_group({"calc_fun":ftr.time_delta, "name":"length", "input":"raw"},
                                  {"calc_fun":ftr.ramp_time, "name":"onset", "input":"env"},
                                  {"calc_fun":ftr.max_freq, "name":"max frequency", "input":"raw","params":{"nfft":1024}})

example_features = example_group.run(signal,env_signal,s1_start,s1_end)

Now the statistic calculation will look like the following

example_stats = mean_std.run(example_features)
print(example_stats)
{'Feature': ['length', 'onset', 'max frequency'], 'Mean': [0.09477387835596791, 0.061852897673793185, 29.278003329730993], 'Std': [0.009471572524317224, 0.011814973307823076, 3.0778448376432928]}

Exporting statistics

Each statistics group can store statistics from different segments. To do this, call the add_stat method with the name of the segment and the calculated statistics.

As an example, let’s store the previous statistics as S1

mean_std.add_stat("S1",example_stats)

For further analysis, the statistics group contains a pandas dataframe, which contains the added statistics

mean_std.dataframe
Segment Feature Mean Std
0 S1 length 0.094774 0.009472
1 S1 onset 0.061853 0.011815
2 S1 max frequency 29.278003 3.077845

The stored statistics can also be exported to an Excel spreadsheet

mean_std.export("example.xlsx")