diff --git a/module3/exo3/exercice_en.Rmd b/module3/exo3/exercice_en.Rmd index 13b258ddd0da29bc3bf08c64b6a1db742f6d5409..4987c82777cdf812a028d5cf577961dea735b09a 100644 --- a/module3/exo3/exercice_en.Rmd +++ b/module3/exo3/exercice_en.Rmd @@ -1,33 +1,534 @@ ---- -title: "Your title" -author: "Your name" -date: "Today's date" -output: html_document ---- + + -```{r setup, include=FALSE} -knitr::opts_chunk$set(echo = TRUE) -``` + -## Some explanations + + + -This is an R Markdown document that you can easily export to HTML, PDF, and MS Word formats. For more information on R Markdown, see . -When you click on the button **Knit**, the document will be compiled in order to re-execute the R code and to include the results into the final document. As we have shown in the video, R code is inserted as follows: + -```{r cars} -summary(cars) -``` -It is also straightforward to include figures. For example: +Around Simpson’s Paradox -```{r pressure, echo=FALSE} -plot(pressure) -``` + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + +
+

Loading and inspecting data

+
module3_Practical_session_Subject6_smoking <- read.csv("~/R/module3_Practical_session_Subject6_smoking.csv")
+
+
+

convert the data to factor character

+
module3_Practical_session_Subject6_smoking$Smoker <- as.factor(module3_Practical_session_Subject6_smoking$Smoker)
+
module3_Practical_session_Subject6_smoking$Status <- as.factor(module3_Practical_session_Subject6_smoking$Status)
+
module3_Practical_session_Subject6_smoking$AgeG <- as.factor(module3_Practical_session_Subject6_smoking$AgeG)
+
+
+

the data checked for its value; either character, numeric or factor

+
sapply(module3_Practical_session_Subject6_smoking, class)
+
##    Smoker    Status       Age      AgeG 
+##  "factor"  "factor" "numeric"  "factor"
+
+
+

I tabulated the total number of women alive and dead over the period according to their smoking habits

+
tabulate(module3_Practical_session_Subject6_smoking$Smoker)
+
## [1] 732 582
+
module3_Practical_session_Subject6_smoking.tab <- table(module3_Practical_session_Subject6_smoking$Status, module3_Practical_session_Subject6_smoking$Smoker)
+
TAB=table(module3_Practical_session_Subject6_smoking$Status, module3_Practical_session_Subject6_smoking$Smoker)
+
+
+

Graphic representation of these data and calculate confidence intervals, P-value, calculate confidence intervals and the Odds ratio. The surprising result is mortality was significantly high (p=0.003) among non exposed group (non-smoker). The graph below shows that hign mortality rate among non-smoker.

+
barplot(TAB, beside=T, legend=T)
+

+
chisq.test(TAB, correct=T)
+
## 
+##  Pearson's Chi-squared test with Yates' continuity correction
+## 
+## data:  TAB
+## X-squared = 8.7515, df = 1, p-value = 0.003093
+
fisher.test(TAB)
+
## 
+##  Fisher's Exact Test for Count Data
+## 
+## data:  TAB
+## p-value = 0.002989
+## alternative hypothesis: true odds ratio is not equal to 1
+## 95 percent confidence interval:
+##  0.5307485 0.8822128
+## sample estimates:
+## odds ratio 
+##  0.6850392
+
+
+

Mortality rates based on a new age category. The following classes were considered: 18-34 years, 34-54 years, 55-64 years, over 65 years

+
TAB1=table(module3_Practical_session_Subject6_smoking$Status, module3_Practical_session_Subject6_smoking$Smoker, module3_Practical_session_Subject6_smoking$AgeG)
+
prop.table(TAB1)
+
## , ,  = 18-34
+## 
+##        
+##                  No         Yes
+##   Alive 0.168188737 0.138508371
+##   Dead  0.004566210 0.005327245
+## 
+## , ,  = 35-54
+## 
+##        
+##                  No         Yes
+##   Alive 0.130898021 0.143835616
+##   Dead  0.014459665 0.029680365
+## 
+## , ,  = 55-64
+## 
+##        
+##                  No         Yes
+##   Alive 0.061643836 0.049467275
+##   Dead  0.030441400 0.038812785
+## 
+## , ,  = Over 64
+## 
+##        
+##                  No         Yes
+##   Alive 0.021308980 0.005327245
+##   Dead  0.125570776 0.031963470
+
+
+

New Death variable created and 1 or 0 to indicate whether the individual died during the 20-year period, I studied the Death ~ Age model to study the probability of death as a function of age according to whether one considers the group of smokers or non-smokers.These regressions doesn’t allow me to conclude on the harmfulness of smoking. Since the age increase the fitted value decrease.

+
Death <- module3_Practical_session_Subject6_smoking$Status
+Deathcode<-ifelse(Death == "Alive", 1, 0) 
+
logistic <- glm(Deathcode ~ module3_Practical_session_Subject6_smoking$Age, binomial)
+summary(logistic)
+
## 
+## Call:
+## glm(formula = Deathcode ~ module3_Practical_session_Subject6_smoking$Age, 
+##     family = binomial)
+## 
+## Deviance Residuals: 
+##     Min       1Q   Median       3Q      Max  
+## -2.8803  -0.4551   0.2848   0.5897   2.3335  
+## 
+## Coefficients:
+##                                                 Estimate Std. Error z value
+## (Intercept)                                     6.104537   0.321414   18.99
+## module3_Practical_session_Subject6_smoking$Age -0.097651   0.005555  -17.58
+##                                                Pr(>|z|)    
+## (Intercept)                                      <2e-16 ***
+## module3_Practical_session_Subject6_smoking$Age   <2e-16 ***
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## (Dispersion parameter for binomial family taken to be 1)
+## 
+##     Null deviance: 1560.3  on 1313  degrees of freedom
+## Residual deviance: 1004.8  on 1312  degrees of freedom
+## AIC: 1008.8
+## 
+## Number of Fisher Scoring iterations: 5
+
plot(module3_Practical_session_Subject6_smoking$Age, jitter(Deathcode, 0.5), xlab = "module3_Practical_session_Subject6_smoking$Age", ylab = "DeathCode (0 - Dead, 1 - Alive")
+

+
plot(module3_Practical_session_Subject6_smoking$Age,fitted.values(logistic))
+

+
+ + + + +
+ + + + + + + + + + + + + + +