Upload New File

fec6df6d · NourElh · 4103dbff · fec6df6d
Commit fec6df6d authored Dec 20, 2022 by NourElh
Show whitespace changes
Inline Side-by-side

Showing with 123 additions and 0 deletions

anova_bats_brain.Rmd Linear Model/anova_bats_brain.Rmd +123 -0

No files found.
--- a/Linear Model/anova_bats_brain.Rmd
+++ b/Linear Model/anova_bats_brain.Rmd
+---
+title: "Bats Brain Size Analysis"
+author: "EL HASSANE Nour"
+output: html_document
+---
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = TRUE)
+```
+### 1.  Presentation
+```{r}
+myData <- read.table(file="bats.csv", sep=";", skip=3, header=T)
+myData
+```
+### 2.  Study of the relationship between brain weight and body mass
+```{r}
+phyto <- myData[(myData$Diet==1),]
+phyto
+```
+```{r}
+reg1 = lm(BRW ~ BOW, data=phyto)
+plot(reg1)
+summary(reg1)
+```
+#### Mathematical form of the model
+BRW_i= \alpha_i * BOW_i + \betha + \epsilon
+#### Estimate of the intercept 
+623.4469
+#### The value of the test statistics for the model test
+513.4
+#### H0 hypothesis of this test
+\alpha_i =0, for all i \in [1,63]
+#### Relationship between brain weight and body mass
+We can say that the brain weight has an effect on the body mass, which means that the two variable are in correlation.
+#### Coefficient of determination
+0.95, we can say that the model is useful.
+```{r}
+anova(reg1)
+```
+#### Sum of residual squares
+4253838
+#### Additional information presented in this table
+The residuals sum of squares is 4253838, which means that there exist an unexplained part that is poorly fitted by the model.
+```{r}
+plot(reg1$fitted.values, reg1$residuals, xlab="Predicted", ylab="Residuals")
+```
+The graph above shows that for large brain weights, the residuals are considerable, meaning that the largest brain weight values are far from the regression line. The graph shows an outlier as well.
+```{r}
+plot(reg1,4)
+which(phyto$BRW>8000)
+phytobis=phyto[which(phyto$BRW<8000),]
+reg2 = lm(BRW ~ BOW, data=phytobis)
+summary(reg2)
+```
+#### Diagnostic graphs for reg1 and reg2
+```{r}
+par(mfcol=c(2,2))
+plot(reg1)
+plot(reg2)
+```
+- Reg1: Graph 2 "Scale-location" shows that there are 3 outliers since their standardized residuals squares are bigger than 1. According (Graph 3 "Normal Q-Q"), these outliers are approximate from theorical quantiles -2 and 2.
+- Reg2: Graph 2  "Scale-location" shows that there are still some outliers (3), that are approximate from theorical quantiles -2 and 2 (Graph 3 "Normal Q-Q")
+### 3.  Study of the contribution to the total weight of each part of the brain
+```{r}
+library(corrplot)
+phytoNum=phyto[, c(4:8)]
+mat.cor=cor(phytoNum)
+corrplot(mat.cor, type="upper")
+```
+```{r}
+cor.test(phyto$BRW,phyto$HIP)
+cor.test(phyto$BRW,phyto$MOB)
+cor.test(phyto$BRW,phyto$AUD)
+```
+The p-value for the correlation test of brain weight with both HIP "hippocampus volume", and MOB "main olfactory bulb volume" are very small which means that BRW is strongly in correlation with HIP and MOB, and the p-value in the case of AUD "auditory nuclei volume" is quite small so we don't really know if they are in correlation or not.
+```{r}
+regm=lm(BRW~MOB+AUD+HIP,data=phytobis)
+summary(regm)
+anova(regm)
+```
+- Multiple regression model:
+BRW_i = \alpha_{MOB_i}+\betha_{HIP_i}+\gamma*X_{AUD_i}
+- The model isn't really valid because we can deduce that MOB is useless since the p-value is big (so it might be correlated with other variable which is HIP), so we can just drop it. 
+- We can say that total brain mass is positively and strongly correlated with HIP and AUD, and isn't correlated with MOB due to the presence of HIP.
+- Coefficients are: AUD: 47.989, MOB: -2.444, HIP: 15.981
+- The coefficient associated with the variable MOB is negative which means that the larger the MOB (main olfactory bulb volume), the smaller the BRW (brain weight).
+```{r}
+reg0 = lm(BRW ~ 1, data = phytobis)
+step(reg0, scope=BRW~AUD + MOB + HIP, direction="forward")
+```
+- Purpose: 
+- Conclusion: 
+### 4. Link between volume of the auditory part and diet
+```{r}
+myData$Diet_F = as.factor(myData$Diet)
+with(myData, plot(AUD~Diet))
+with(myData, plot(AUD~Diet_F))
+```
+The box plot (AUD ~ Diet_F) should preferably be looked because it only illustrates diet types, shows the mean value for each type and represent the confidence interval.
+```{r}
+lm = lm(AUD~Diet_F, data=myData)
+anova(lm)
+```
+From the anova result, since the p-value is big, this means that Diet_F and AUD are not correlated which is surprising and contradicting with the hypothesis mentioned in the statement.