---
title: "Design of Experiments"
author: "El-hassane Nour"
date: "January 8, 2023"
output:
  pdf_document: default
  html_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Playing with the DoE Shiny Application
The model studied in this experiment is a black-blox, where x1, x2, ..., x11 are controlable factores, z1,...,z11 are uncontrolable factors and y is the output. 
In order to approximate this unknown model, we need first to determine which variables are the most significant on the response y, using screening designs.

Then, define and fit an analytical model of the response y as a function of the primary factors x using regression and lhs & optimal designs.

## 1. First intuition
My first intuition was to run an lhs design using the 11 factors to have a general overview about the response behavior. 
```{r}
library(DoE.wrapper)
 set.seed(45);
design <- lhs.design(type= "maximin", nruns= 500 , nfactors= 11, digits=NULL, seed= 20523, factor.names=list(X1=c(0,1),X2=c(0,1),X3=c(0,1),X4=c(0,1),X5=c(0,1),X6=c(0,1),X7=c(0,1),X8=c(0,1),X9=c(0,1),X10=c(0,1),X11=c(0,1)))

design.Dopt <- Dopt.design(30, data=design, nRepeat= 5, randomize= TRUE, seed=19573); design.Dopt
```
Once data was generated, I tested it on the DoE shiny app and got a csv file containing the generated data with the corresponding response.
```{r}
df <- read.csv("exp.csv",header = TRUE, colClasses=c("NULL", NA, NA, NA, NA, NA, NA, NA,NA, NA, NA, NA, NA));
df
```
## 2. Designing experiments to run
### 2.1 Screening design using Plackett-Burman screening designs
Now, we're interested to see the factors effects in more in details. This will allow us to define the most efficient factor that influence the response. Since running a large number of such experiments is tedious, we gonna use the Plackett-Burman designs to see the different possible interactions.
```{r}
library(FrF2)
d<-pb(nruns= 12 ,n12.taguchi= FALSE ,nfactors= 12 -1, ncenter= 0 , replications= 1 ,repeat.only= FALSE ,randomize= TRUE ,seed= 26654 ,factor.names=list( X1=c(0,1),X2=c(0,1),X3=c(0,1),X4=c(0,1),
X5=c(0,1),X6=c(0,1),X7=c(0, 1),X8=c(0,1),X9=c(0,1),X10=c(0,1),
X11=c(0,1)));d
```

Here are the results:

   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
1   1  0  1  1  0  1  1  1  0   0   0
2   1  0  0  0  1  0  1  1  0   1   1
3   0  0  1  0  1  1  0  1  1   1   0
4   0  1  1  0  1  1  1  0  0   0   1
5   0  1  0  1  1  0  1  1  1   0   0
6   1  1  0  1  1  1  0  0  0   1   0
7   1  1  1  0  0  0  1  0  1   1   0
8   1  0  1  1  1  0  0  0  1   0   1
9   0  0  0  0  0  0  0  0  0   0   0
10  1  1  0  0  0  1  0  1  1   0   1
11  0  0  0  1  0  1  1  0  1   1   1
12  0  1  1  1  0  0  0  1  0   1   1

### 2.2. Regression and Analyse of Variance
In order to vizualise the correlation between factors and the response, we do a linear regression on the data generated by the application, and analyse the variance to see the effect.

#### Experiment 1: X1, X3, X4, X6, X7, X8 taken into account
```{r}
y<-df$y
summary(lm(y~df$x1+df$x3+df$x4+df$x6+df$x7+df$x8,data=df))
```
Nothing interesting :/, even R^2 is too small wich is too bad.

#### Experiment 2: X1, X5, X7, X8, X10, X11 taken into account
```{r}
y<-df$y
summary(lm(y~df$x1+df$x5+df$x7+df$x8+df$x10+df$x11,data=df))
```
Very small improvement in R^2, but still too bad.

#### Experiment 3: X3, X5, X6, X8, X9, X10 taken into account
```{r}
y<-df$y
summary(lm(y~df$x3+df$x5+df$x6+df$x8+df$x9+df$x10,data=df))
```
It seems that X9 is a significant factor that influences the model, and the determination coefficient R^2 is 0.73 now, which is pretty good.

#### Try the combination X3*X6:
```{r}
y<-df$y
summary(lm(y~df$x3+df$x9+df$x6+df$x3:df$x6,data=df))
```

Not too much interesting. Let's try another experiment where X9 and X3 are set to 1.

#### Experiment 8: X1, X3, X4, X5, X6, X9, X11 taken into account
```{r}
y<-df$y
summary(lm(y~df$x1+df$x3+df$x4+df$x5+df$x6+df$x9,data=df))
```

Removing X11 doesn't have any effect. We decide to keep X9, X6, X4, combine X5 with X7, combine X1 with X3.

```{r}
y<-df$y
summary(lm(y~df$x3:df$x1+df$x6+df$x5:df$x7+df$x4+df$x9,data=df))
```
We can say that our model is:
y= -2.72\*X9 + 0.72\*X4 + 0.42\*X6 + 0.99\*X1\*X3 -0.77\*X5\*X7 + 1.55