Ex 4-1: meilleure manière de traiter les poids dans la régression

ee3c40ac · François Févotte · b1342638 · ee3c40ac · ee3c40ac · ee3c40ac
Commit ee3c40ac authored Apr 16, 2020 by François Févotte
Expand all Hide whitespace changes
Inline Side-by-side

Showing with 274 additions and 287 deletions

challenger.html module4/challenger.html +258 -265

challenger.jmd module4/challenger.jmd +16 -22

challenger.pdf module4/challenger.pdf +0 -0

No files found.
--- a/module4/challenger.html
+++ b/module4/challenger.html
--- a/module4/challenger.jmd
+++ b/module4/challenger.jmd
@@ -99,32 +99,23 @@ This corresponds to the values from the article of Dalal et al. The standard
 errors are
 $s_{\hat{\alpha}} = `j @printf "%.3f" σα`$ and
 $s_{\hat{\beta}} = `j @printf "%.3f" σβ`$,
-which is different from the $3.052$ and $0.047$ reported by Dallal et al. The
-deviance is
-$G^2 = `j @printf "%.3f" G²`$ with `j nDOF` degrees of freedom.
+which is different from the $3.052$ and $0.047$ reported by Dallal et al.

-I cannot find any value similar to the Goodness of fit ($G^2=18.086$) reported
-by Dalal et al. However, the number of degrees of freedom is similar to theirs
-(21).
+The deviance is $G^2 = `j @printf "%.3f" G²`$ with `j nDOF` degrees of freedom.
+I cannot find any value similar to the Goodness of fitreported by Dalal *et al.*
+($G^2=18.086$). However, the number of degrees of freedom is different but at
+least similar to theirs (21).

 There seems to be something wrong. Oh I know, I haven't indicated that my
 observations are actually the result of 6 observations for each rocket
-launch. The correct way to do this would be to weight the data using the `Count`
-column. Since I don't know how to do that with the
-[GLM](https://github.com/JuliaStats/GLM.jl) package I'm using, I will simply
-duplicate the data:
+launch. Let's indicate these weights (since the weights are always the same
+throughout all experiments, it does not change the estimates of the fit but it
+does influence de variance estimate).

 ```julia; wrap=false; hold=true
-weighted_data = DataFrame(Temperature=Int[], Frequency=Float64[])
-for row in eachrow(data)
-    for _ in 1:row.Count
-        push!(weighted_data, (Temperature=row.Temperature,
-                              Frequency=row.Frequency))
-    end
-end
-
-model = glm(@formula(Frequency ~ Temperature), weighted_data,
-            Binomial(), LogitLink())
+model = glm(@formula(Frequency ~ Temperature), data,
+            Binomial(), LogitLink();
+            wts=data.Count)

 α, β   = coef(model)
 σα, σβ = stderror(model)
@@ -142,8 +133,11 @@ $s_{\hat{\beta}} = `j @printf "%.3f" σβ`$,
 The Goodness of fit (Deviance) indicated for this model is
 $G^2 = `j @printf "%.3f" G²`$ with `j nDOF` degrees of freedom. Now $G^2$ is in
 good accordance to the results of the Dalal *et al.* article, but the number of
-degrees of freedom is 6 times larger than i should, due to my tampering of the
-data to duplicate them instead of weighting them.
+degrees of freedom is approximately 6 times larger than that of Dalal *et
+al*. Note that, even removing this factor (which is probably due to the way the
+number of residual degrees of freedom are defined in both libraries in the
+presence of weights), the values are similar but still differ by
+`j @printf "%2.0f" 100 * (nDOF/6/21 - 1)`%.

 # Predicting failure probability


--- a/module4/challenger.pdf
+++ b/module4/challenger.pdf