Loading and inspecting data
module3_Practical_session_Subject6_smoking <- read.csv("~/R/module3_Practical_session_Subject6_smoking.csv")
convert the data to factor character
module3_Practical_session_Subject6_smoking$Smoker <- as.factor(module3_Practical_session_Subject6_smoking$Smoker)
module3_Practical_session_Subject6_smoking$Status <- as.factor(module3_Practical_session_Subject6_smoking$Status)
module3_Practical_session_Subject6_smoking$AgeG <- as.factor(module3_Practical_session_Subject6_smoking$AgeG)
the data checked for its value; either character, numeric or factor
sapply(module3_Practical_session_Subject6_smoking, class)
## Smoker Status Age AgeG
## "factor" "factor" "numeric" "factor"
I tabulated the total number of women alive and dead over the period according to their smoking habits
tabulate(module3_Practical_session_Subject6_smoking$Smoker)
## [1] 732 582
module3_Practical_session_Subject6_smoking.tab <- table(module3_Practical_session_Subject6_smoking$Status, module3_Practical_session_Subject6_smoking$Smoker)
TAB=table(module3_Practical_session_Subject6_smoking$Status, module3_Practical_session_Subject6_smoking$Smoker)
Graphic representation of these data and calculate confidence intervals, P-value, calculate confidence intervals and the Odds ratio. The surprising result is mortality was significantly high (p=0.003) among non exposed group (non-smoker). The graph below shows that hign mortality rate among non-smoker.
barplot(TAB, beside=T, legend=T)

chisq.test(TAB, correct=T)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: TAB
## X-squared = 8.7515, df = 1, p-value = 0.003093
fisher.test(TAB)
##
## Fisher's Exact Test for Count Data
##
## data: TAB
## p-value = 0.002989
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.5307485 0.8822128
## sample estimates:
## odds ratio
## 0.6850392
Mortality rates based on a new age category. The following classes were considered: 18-34 years, 34-54 years, 55-64 years, over 65 years
TAB1=table(module3_Practical_session_Subject6_smoking$Status, module3_Practical_session_Subject6_smoking$Smoker, module3_Practical_session_Subject6_smoking$AgeG)
prop.table(TAB1)
## , , = 18-34
##
##
## No Yes
## Alive 0.168188737 0.138508371
## Dead 0.004566210 0.005327245
##
## , , = 35-54
##
##
## No Yes
## Alive 0.130898021 0.143835616
## Dead 0.014459665 0.029680365
##
## , , = 55-64
##
##
## No Yes
## Alive 0.061643836 0.049467275
## Dead 0.030441400 0.038812785
##
## , , = Over 64
##
##
## No Yes
## Alive 0.021308980 0.005327245
## Dead 0.125570776 0.031963470
New Death variable created and 1 or 0 to indicate whether the individual died during the 20-year period, I studied the Death ~ Age model to study the probability of death as a function of age according to whether one considers the group of smokers or non-smokers.These regressions doesn’t allow me to conclude on the harmfulness of smoking. Since the age increase the fitted value decrease.
Death <- module3_Practical_session_Subject6_smoking$Status
Deathcode<-ifelse(Death == "Alive", 1, 0)
logistic <- glm(Deathcode ~ module3_Practical_session_Subject6_smoking$Age, binomial)
summary(logistic)
##
## Call:
## glm(formula = Deathcode ~ module3_Practical_session_Subject6_smoking$Age,
## family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.8803 -0.4551 0.2848 0.5897 2.3335
##
## Coefficients:
## Estimate Std. Error z value
## (Intercept) 6.104537 0.321414 18.99
## module3_Practical_session_Subject6_smoking$Age -0.097651 0.005555 -17.58
## Pr(>|z|)
## (Intercept) <2e-16 ***
## module3_Practical_session_Subject6_smoking$Age <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1560.3 on 1313 degrees of freedom
## Residual deviance: 1004.8 on 1312 degrees of freedom
## AIC: 1008.8
##
## Number of Fisher Scoring iterations: 5
plot(module3_Practical_session_Subject6_smoking$Age, jitter(Deathcode, 0.5), xlab = "module3_Practical_session_Subject6_smoking$Age", ylab = "DeathCode (0 - Dead, 1 - Alive")

plot(module3_Practical_session_Subject6_smoking$Age,fitted.values(logistic))
