Feature group tutorial
Setup steps
import pyPCG as pcg
import pyPCG.io as pcg_io
import pyPCG.preprocessing as preproc
import pyPCG.segment as sgm
import pyPCG.features as ftr
Read in signal and calculate its envelope
from importlib.resources import files
data, fs = pcg_io.read_signal_file(str(files('pyPCG').joinpath("data").joinpath("example.wav")),"wav")
signal = pcg.pcg_signal(data,fs)
signal = pcg.normalize(signal)
bp_signal = preproc.filter(preproc.filter(signal,6,100,"LP"),6,20,"HP")
denoise_signal = preproc.wt_denoise(bp_signal)
env_signal = preproc.homomorphic(denoise_signal)
Segment S1 sounds from the signal
hsmm = sgm.load_hsmm("pre_trained.json")
states = sgm.segment_hsmm(hsmm,signal)
s1_start, s1_end = sgm.convert_hsmm_states(states,1)
Extracting features from S1 segments
All feature calculations require two numpy arrays with the boundary timings of the desired segments. These timings are in samples. The outputs of the feature calculations are numpy arrays containing the calculated feature for every segment. Sometimes the output is two arrays, usually corresponding to a location and the value at that location.
Note: some features expect the input signal to be the envelope of the PCG for proper functionality.
s1_len = ftr.time_delta(s1_start,s1_end,env_signal)
s1_maxfreq, s1_maxfreq_val = ftr.max_freq(s1_start,s1_end,signal,nfft=1024)
print(len(s1_len),f"{s1_len[0]:.3f}")
print(len(s1_maxfreq),f"{s1_maxfreq[0]:.3f}")
134 0.087
134 28.912
However, if the same feature calculations need to be run on multiple types of segments (e.g.: timelength and frequency for S1, S2, systole, diastole each), calling each function one-by-one can get quite tedious, more so when changes need to be made to the calculated features. This is prone to errors due to human error.
To circumvent the previously mentioned problems, we can create a feature group object
Feature group object
The feature group object takes an arbitrary number of so-called feature configs. Each feature config must contain a feature calculation function, the name of the calculated feature, the expected input (raw signal or envelope). Optionally additional parameters can be provided as key-value pairs in a dictionary.
timing_group = ftr.feature_group({"calc_fun":ftr.time_delta,"name":"length","input":"raw"},
{"calc_fun":ftr.ramp_time,"name":"onset","input":"env"},
{"calc_fun":ftr.ramp_time,"name":"exit","input":"env","params":{"type":"exit"}})
This feature group will calculate the time length of the segments, the onset times (time from start of segment to the maximum location), and exit times (time from maximum location to end of segment).
To run a feature group, we use its run method. Which takes both
types of expected input, and the segment boundaries.
The output will be a dictionary containing the calculated feature arrays for each segment with the names of the features provided in the feature configs.
timings = timing_group.run(signal,env_signal,s1_start,s1_end)
for key,vals in timings.items():
print(key,len(vals),f"{vals[0]:.3f}")
length 134 0.087
onset 134 0.042
exit 134 0.045
If a feature returns multiple values (similar to max_freq), only the
first output is considered in the output of the feature group.
Note: this is likely to change in a future version
freq_group = ftr.feature_group({"calc_fun":ftr.max_freq, "name":"max frequency","input":"raw","params":{"nfft":1024}},
{"calc_fun":ftr.spectral_centroid, "name":"center frequency", "input":"raw"})
frequencies = freq_group.run(signal,env_signal,s1_start,s1_end)
for key,vals in frequencies.items():
print(key,len(vals),f"{vals[0]:.3f}")
max frequency 134 28.912
center frequency 134 41.663
If we want to combine the results, we can do so with the Python dictionary union operation
total_features = timings | frequencies
print(total_features.keys())
dict_keys(['length', 'onset', 'exit', 'max frequency', 'center frequency'])
Additional notes
The previous total_features could also be calculated with a unified
feature group. However, it may be advantageous to separate certain
features to different groups to reduce unnecessary calculations. For
example, calculating the onset time of the systole does not make much
sense, since there is no expected peak in the segment.
Using feature groups may be not necessary if only one type of segment is considered, or if the difference between the sets of desired features for segment types is large.