Reading and Analyzing some experimental hdf5 data
+Table of Contents
+ ++I will read and get the statistics of some data I usually acquire in the lab. I +will use this log as a future reference to when I need to read hdf5 files +contained my data. +
+ +1 Data reading using h5py package
++To read the data we make use of the h5py package which contains most of what +might be necessary to read hdf5 files. +
+ +import h5py as h5 +f = h5.File('testdata.h5','r') +print(f.keys()) ++
+<KeysViewHDF5 ['FileType', 'Frame', 'Waveforms']> ++ + +
+Now that we know what is inside, let’s try to see which waveforms we have. +
+ +print(f['Waveforms'].keys()) ++
+<KeysViewHDF5 ['Channel 1', 'Channel 2']> ++ + +
+We have two channels, thus another subgroup. Gotta investigate deeper: +
+print(f['Waveforms/Channel 1'].keys()) +print('---') +print(f['Waveforms/Channel 2'].keys()) ++
+<KeysViewHDF5 ['Channel 1Data']> +--- +<KeysViewHDF5 ['Channel 2Data']> ++ + +
+Now we can get the attributes of the recorded channels. They are going to be
+necessary to properly map the data to a numpy array.
+
print(f['Waveforms/Channel 1'].attrs.keys()) ++
+<KeysViewHDF5 ['Count', 'DispInterpFactor', 'InterpSetting', 'MaxBandwidth', 'MinBandwidth', 'NumPoints', 'NumSegments', 'SavedInterpFactor', 'Start', 'WavAttr', 'WaveformType', 'XDispOrigin', 'XDispRange', 'XInc', 'XOrg', 'XUnits', 'YDispOrigin', 'YDispRange', 'YInc', 'YOrg', 'YUnits']> ++ + +
+Besides the channels we’ve read, a time series should be created. Each of +its elements should correspond to a data sample. To do that we will use the +information stored in one of the channels (supposing they were acquired using +the same oscilloscope config). +
+import numpy as np +import matplotlib.pyplot as plt + +idata1 = f['Waveforms/Channel 1'].attrs.get('XDispRange') +# XDispRange and YDispRange are the total range covered in the oscilloscope display. +idata2 = f['Waveforms/Channel 1'].attrs.get('XOrg') +# XOrg is the x-origin of the data and will be the same as XDispOrigin if the saved +# data is constrained to the displayed area. +idata3 = f['Waveforms/Channel 1'].attrs.get('XInc') +# Xinc is the incremental value of X, or the sampling period (i.e. XDispRange/NumPoints). +idata4 = f['Waveforms/Channel 1'].attrs.get('NumPoints') +temp1 = idata3*(np.arange(1,idata4+1))+idata2 + +plt.figure() +plt.plot(temp1,'b') +plt.grid(linestyle='dotted') +plt.axis('tight') +plt.title('Times series') + +plt.savefig(filename) +print(filename) ++
+
+Once the time array has been created we can move to reading the data. +
+yinc = f['Waveforms/Channel 1'].attrs.get('YInc') # Vertical sampling period +yorg = f['Waveforms/Channel 1'].attrs.get('YOrg') # Closest value to origin (y=0) +data1 = yorg + yinc*f['Waveforms/Channel 1/Channel 1Data'] +yinc = f['Waveforms/Channel 2'].attrs.get('YInc') +yorg = f['Waveforms/Channel 2'].attrs.get('YOrg') +data2 = yorg + yinc*f['Waveforms/Channel 2/Channel 2Data'] +print('Data type data:', data1.dtype) ++
+Data type data: float64 ++ + +
+Now we can plot the data to see it. +
+ +plt.close('all') +plt.figure() +plt.plot(temp1*1e6,data1,'b') +plt.plot(temp1*1e6,data2,'r') +plt.grid(linestyle='dotted') +plt.axis('tight') +plt.xlabel('Time ($\mu$s)') +plt.ylabel('Amplitude (Volts)') +plt.title('Trigger (blue) and Data (red)') + +plt.savefig(filename) +print(filename) ++
+
+Data looks nice and amplitudes agree with what was observed in the oscilloscope +display. Therefore out reading method works. +
+ ++Some statistics from the data can now be computed, once we can trust the data +was correctly read. +
+avg_data = np.mean(data2) +med_data = np.median(data2) +std_data = np.std(data2,ddof=1) +min_data = np.amin(data2) +max_data = np.amax(data2) +#------------------------ +print('Data average (Volts): ',avg_data) +print('Data median (Volts): ',med_data) +print('Data standard deviation (Volts): ',std_data) +print('Data minimum (Volts): ',min_data) +print('Data maximum (Volts): ',max_data) ++
+Data average (Volts): 0.0019089614972975556 +Data median (Volts): 0.004743201433087485 +Data standard deviation (Volts): 0.039513536115368106 +Data minimum (Volts): -0.1612688487249741 +Data maximum (Volts): 0.15652564729188667 ++ + +
+The data was recorded when the input channel was turned off, thus from that we +can estimated the signature “noise” of the system when the input will be on. +For that we will look at the histogram and see if looks like a Normal/Gaussian +continuous probability distribution. +
+from scipy.stats import norm + +plt.figure() +plt.hist(data2,color='b',histtype='bar',ec='black',density=True) + +x = np.linspace(min_data,max_data,idata4) +p = norm.pdf(x,avg_data,std_data) + +plt.plot(x,p,'k',linewidth=2) + +plt.grid(linestyle='dotted') +plt.axis('tight') + +title = "Fitting Gaussian to histogram: $\mu$ = %.5e, $\sigma$ = %.5e" % (avg_data,std_data) +plt.title(title) + +plt.savefig(filename) +print(filename) ++
+
+At last we can use the D’Agostino’s K\(^2\) test to infer goodness-of-fit or if +the assumption of a Normal distribution really holds. +
+from scipy.stats import normaltest +alpha = 1e-3 +k2, p = normaltest(data2) +print("p = {:.10e}".format(p)) ++
+Since the p-value is (way) smaller than the alpha value we can conclude the null +hypothesis can be rejected, that is, a Gaussian distribution does not represent +the data. +
+