The graph above shows that for large brain weights, the residuals are considerable, meaning that the largest brain weight values are far from the regression line. The graph shows an outlier as well.
```{r}
plot(reg1,4)
which(phyto$BRW>8000)
phytobis=phyto[which(phyto$BRW<8000),]
reg2 = lm(BRW ~ BOW, data=phytobis)
summary(reg2)
```
#### Diagnostic graphs for reg1 and reg2
```{r}
par(mfcol=c(2,2))
plot(reg1)
plot(reg2)
```
- Reg1: Graph 2 "Scale-location" shows that there are 3 outliers since their standardized residuals squares are bigger than 1. According (Graph 3 "Normal Q-Q"), these outliers are approximate from theorical quantiles -2 and 2.
- Reg2: Graph 2 "Scale-location" shows that there are still some outliers (3), that are approximate from theorical quantiles -2 and 2 (Graph 3 "Normal Q-Q")
### 3. Study of the contribution to the total weight of each part of the brain
```{r}
library(corrplot)
phytoNum=phyto[, c(4:8)]
mat.cor=cor(phytoNum)
corrplot(mat.cor, type="upper")
```
```{r}
cor.test(phyto$BRW,phyto$HIP)
cor.test(phyto$BRW,phyto$MOB)
cor.test(phyto$BRW,phyto$AUD)
```
The p-value for the correlation test of brain weight with both HIP "hippocampus volume", and MOB "main olfactory bulb volume" are very small which means that BRW is strongly in correlation with HIP and MOB, and the p-value in the case of AUD "auditory nuclei volume" is quite small so we don't really know if they are in correlation or not.
- The model isn't really valid because we can deduce that MOB is useless since the p-value is big (so it might be correlated with other variable which is HIP), so we can just drop it.
- We can say that total brain mass is positively and strongly correlated with HIP and AUD, and isn't correlated with MOB due to the presence of HIP.
- The coefficient associated with the variable MOB is negative which means that the larger the MOB (main olfactory bulb volume), the smaller the BRW (brain weight).
### 4. Link between volume of the auditory part and diet
```{r}
myData$Diet_F = as.factor(myData$Diet)
with(myData, plot(AUD~Diet))
with(myData, plot(AUD~Diet_F))
```
The box plot (AUD ~ Diet_F) should preferably be looked because it only illustrates diet types, shows the mean value for each type and represent the confidence interval.
```{r}
lm = lm(AUD~Diet_F, data=myData)
anova(lm)
```
From the anova result, since the p-value is big, this means that Diet_F and AUD are not correlated which is surprising and contradicting with the hypothesis mentioned in the statement.