#+TITLE: Reading and Analyzing some experimental hdf5 data #+AUTHOR: Victor Martins Gomes #+DATE: 2020-04-28 Tuesday #+LANGUAGE: en #+PROPERTY: header-args :session :exports both :eval never-export I will read and get the statistics of some data I usually acquire in the lab. I will use this log as a future reference to when I need to read hdf5 files contained my data. * Data reading using h5py package To read the data we make use of the h5py package which contains most of what might be necessary to read hdf5 files. #+begin_src python :results output :session *python* :exports both import h5py as h5 f = h5.File('testdata.h5','r') print(f.keys()) #+end_src #+RESULTS: : Now that we know what is inside, let's try to see which waveforms we have. #+begin_src python :results output :session *python* :exports both print(f['Waveforms'].keys()) #+end_src #+RESULTS: : We have two channels, thus another subgroup. Gotta investigate deeper: #+begin_src python :results output :session *python* :exports both print(f['Waveforms/Channel 1'].keys()) print('---') print(f['Waveforms/Channel 2'].keys()) #+end_src #+RESULTS: : : --- : Now we can get the attributes of the recorded channels. They are going to be necessary to properly map the data to a =numpy= array. #+begin_src python :results output :session *python* :exports both print(f['Waveforms/Channel 1'].attrs.keys()) #+end_src #+RESULTS: : Besides the channels we've read, a time series should be created. Each of its elements should correspond to a data sample. To do that we will use the information stored in one of the channels (supposing they were acquired using the same oscilloscope config). #+begin_src python :results output file :var filename="time_plot.png" :exports both :session *python* import numpy as np import matplotlib.pyplot as plt idata1 = f['Waveforms/Channel 1'].attrs.get('XDispRange') # XDispRange and YDispRange are the total range covered in the oscilloscope display. idata2 = f['Waveforms/Channel 1'].attrs.get('XOrg') # XOrg is the x-origin of the data and will be the same as XDispOrigin if the saved # data is constrained to the displayed area. idata3 = f['Waveforms/Channel 1'].attrs.get('XInc') # Xinc is the incremental value of X, or the sampling period (i.e. XDispRange/NumPoints). idata4 = f['Waveforms/Channel 1'].attrs.get('NumPoints') temp1 = idata3*(np.arange(1,idata4+1))+idata2 plt.figure() plt.plot(temp1,'b') plt.grid(linestyle='dotted') plt.axis('tight') plt.title('Times series') plt.savefig(filename) print(filename) #+end_src #+RESULTS: [[file:time_plot.png]] Once the time array has been created we can move to reading the data. #+begin_src python :results output :session *python* :exports both yinc = f['Waveforms/Channel 1'].attrs.get('YInc') # Vertical sampling period yorg = f['Waveforms/Channel 1'].attrs.get('YOrg') # Closest value to origin (y=0) data1 = yorg + yinc*f['Waveforms/Channel 1/Channel 1Data'] yinc = f['Waveforms/Channel 2'].attrs.get('YInc') yorg = f['Waveforms/Channel 2'].attrs.get('YOrg') data2 = yorg + yinc*f['Waveforms/Channel 2/Channel 2Data'] print('Data type data:', data1.dtype) #+end_src #+RESULTS: : Data type data: float64 Now we can plot the data to see it. #+begin_src python :results output file :var filename="data_plot.png" :exports both :session *python* plt.close('all') plt.figure() plt.plot(temp1*1e6,data1,'b') plt.plot(temp1*1e6,data2,'r') plt.grid(linestyle='dotted') plt.axis('tight') plt.xlabel('Time ($\mu$s)') plt.ylabel('Amplitude (Volts)') plt.title('Trigger (blue) and Data (red)') plt.savefig(filename) print(filename) #+end_src #+RESULTS: [[file:data_plot.png]] Data looks nice and amplitudes agree with what was observed in the oscilloscope display. Therefore out reading method works. Some statistics from the data can now be computed, once we can trust the data was correctly read. #+begin_src python :results output :session *python* :exports both avg_data = np.mean(data2) med_data = np.median(data2) std_data = np.std(data2,ddof=1) min_data = np.amin(data2) max_data = np.amax(data2) #------------------------ print('Data average (Volts): ',avg_data) print('Data median (Volts): ',med_data) print('Data standard deviation (Volts): ',std_data) print('Data minimum (Volts): ',min_data) print('Data maximum (Volts): ',max_data) #+end_src #+RESULTS: : Data average (Volts): 0.0019089614972975556 : Data median (Volts): 0.004743201433087485 : Data standard deviation (Volts): 0.039513536115368106 : Data minimum (Volts): -0.1612688487249741 : Data maximum (Volts): 0.15652564729188667 The data was recorded when the input channel was turned off, thus from that we can estimated the signature "noise" of the system when the input will be on. For that we will look at the histogram and see if looks like a Normal/Gaussian continuous probability distribution. #+begin_src python :results output file :var filename="data_hist.png" :exports both :session *python* from scipy.stats import norm plt.figure() plt.hist(data2,color='b',histtype='bar',ec='black',density=True) x = np.linspace(min_data,max_data,idata4) p = norm.pdf(x,avg_data,std_data) plt.plot(x,p,'k',linewidth=2) plt.grid(linestyle='dotted') plt.axis('tight') title = "Fitting Gaussian to histogram: $\mu$ = %.5e, $\sigma$ = %.5e" % (avg_data,std_data) plt.title(title) plt.savefig(filename) print(filename) #+end_src #+RESULTS: [[file:data_hist.png]] At last we can use the D'Agostino's K$^2$ test to infer goodness-of-fit or if the assumption of a Normal distribution really holds. #+begin_src python :results output file :var filename="data_plot.png" :exports both :session *python* from scipy.stats import normaltest alpha = 1e-3 k2, p = normaltest(data2) print("p = {:.10e}".format(p)) #+end_src #+RESULTS: [[file:p = 0.0000000000e+00]] Since the p-value is (way) smaller than the alpha value we can conclude the null hypothesis can be rejected, that is, a Gaussian distribution does not represent the data.