## Step 1: Explore your dataset to generate hypotheses

For example, we may have a dataset of 100 observations on 10 variables that are actually uncorrelated. To illustrate, let’s generate such a dataset:```
set.seed(1)
data <- matrix(rnorm(1000), nrow = 100, ncol = 10)
colnames(data) <- paste("x", 1:10, sep = "")
```

Let’s explore the data by inspecting the correlations:```
library(corrplot)
corrplot(cor(data), method = "number", type = "upper")
```

We got at least one meaningful correlation there: x4 and x9 show a correlation \(>.2\). Now let’s perform some statistical testing to see if some are correlations significant:

```
library(Hmisc)
pvalues <- rcorr(data)$P
diag(pvalues) <- 1
corrplot(1-pvalues, type = "upper", method = "number", main = "1 minus p-value")
```

Two out of 45 correlation coefficients are significant. Surely there must be a publication in these data! Also, we could decide to test one- instead of two-sided, then we have 3 significant associations. But what would be much better than 3 significant correlations? Yes, indeed: a mediation effect!

## Step 2: Perform confirmatory analyses on the same dataset

### Example 1: Mediation Analysis

Obviously, we hypothesized before gathering our data that x10 is a causal variable, with a partially indirect effect on x9, via x4.```
lm1 <- lm(x9 ~ x10, data = data.frame(data))
summary(lm1)
```

```
##
## Call:
## lm(formula = x9 ~ x10, data = data.frame(data))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.46497 -0.74698 0.03043 0.64968 2.86824
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.009241 0.107194 0.086 0.9315
## x10 0.205740 0.102135 2.014 0.0467 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.072 on 98 degrees of freedom
## Multiple R-squared: 0.03976, Adjusted R-squared: 0.02996
## F-statistic: 4.058 on 1 and 98 DF, p-value: 0.04671
```

Great! The causal and dependent variable are signficantly associated!```
lm2 <- lm(x9 ~ x4 + x10, data = data.frame(data))
summary(lm2)
```

```
##
## Call:
## lm(formula = x9 ~ x4 + x10, data = data.frame(data))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.36641 -0.62453 0.04468 0.62737 2.84705
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.001822 0.105719 -0.017 0.9863
## x4 0.216306 0.107765 2.007 0.0475 *
## x10 0.181501 0.101315 1.791 0.0763 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.056 on 97 degrees of freedom
## Multiple R-squared: 0.07805, Adjusted R-squared: 0.05904
## F-statistic: 4.106 on 2 and 97 DF, p-value: 0.01942
```

Awesome! We have partial mediation: the effect of x10 is smaller when x4 is included in the equation and even becomes insignificant.### Example 2: Factor Analysis

Confirmatory factor analysis (CFA) offers a great way to generate a lot of outcomes at once: There will always be a parameter estimate, significance test or fit index that you can use to back up any interpretation you prefer. As we have no clue what we are looking for in our data, let’s explore the data first using PCA, which will provide us with the hypotheses we need to subsequently fit a CFA on the same data:```
library(psych)
VSS.scree(data)
```