ICC is the proportion of random variation between subjects.
Let’s simulate multilevel data with high ICC.
ID <- 1:13
obs <- 1:4
d <- expand.grid(obs = obs, ID = ID)
head(d)
## obs ID
## 1 1 1
## 2 2 1
## 3 3 1
## 4 4 1
## 5 1 2
## 6 2 2
Now simulate intercepts. Notice the higher standard deviation for
z, the between ID variance, versus e, the
within ID variance.
set.seed(1)
z <- rnorm(13, sd = 2)
e <- rnorm(nrow(d), sd = 0.5)
d$y <- (3 + z[d$ID]) + e
Fit multilevel model and calculate ICC
library(lme4)
library(performance)
m <- lmer(y ~ 1 + (1 | ID), data = d)
icc(m)
## # Intraclass Correlation Coefficient
##
## Adjusted ICC: 0.930
## Unadjusted ICC: 0.930
Notice in the summary output (under random effects) that the Variance/Std Dev is much bigger for ID (between) than Residual (within).
summary(m)
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ 1 + (1 | ID)
## Data: d
##
## REML criterion at convergence: 117.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.1725 -0.6008 0.1031 0.5790 2.1489
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 2.8171 1.6784
## Residual 0.2109 0.4592
## Number of obs: 52, groups: ID, 13
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 3.4599 0.4698 7.364
The ICC is simply the proportion of total variance due to ID.
2.8171 /(2.8171 + 0.2109)
## [1] 0.9303501
Plot of data. Notice much of the variability is due to differences between IDs.
library(ggplot2)
ggplot(d) +
aes(y = y, x = ID, color = factor(ID)) +
geom_jitter(width = 0.05, height = 0)
Let’s simulate multilevel data with low ICC. Notice the lower
standard deviation for z, the between ID variance, versus
e, the within ID variance.
set.seed(3)
z <- rnorm(13, sd = 0.05)
e <- rnorm(nrow(d), sd = 2)
d$y <- (3 + z[d$ID]) + e
Fit multilevel model and calculate ICC
m <- lmer(y ~ 1 + (1 | ID), data = d)
icc(m)
## # Intraclass Correlation Coefficient
##
## Adjusted ICC: 0.144
## Unadjusted ICC: 0.144
Notice in the summary output (under random effects) that the Variance/Std Dev is much smaller for ID (between) than Residual (within).
summary(m)
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ 1 + (1 | ID)
## Data: d
##
## REML criterion at convergence: 209.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.45681 -0.72734 0.04071 0.81441 1.88816
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 0.4948 0.7034
## Residual 2.9454 1.7162
## Number of obs: 52, groups: ID, 13
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 3.0022 0.3077 9.756
Plot the data. Notice much of the variability is within IDs, not between.
ggplot(d) +
aes(y = y, x = ID, color = factor(ID)) +
geom_jitter(width = 0.05, height = 0)
By playing with the seed we can generate data where the estimated ICC is NA.
set.seed(998)
z <- rnorm(13, sd = 0.05)
e <- rnorm(nrow(d), sd = 2)
d$y <- (3 + z[d$ID]) + e
This results in a singular fit.
m <- lmer(y ~ 1 + (1 | ID), data = d)
## boundary (singular) fit: see help('isSingular')
icc(m)
## Warning: Can't compute random effect variances. Some variance components equal
## zero. Your model may suffer from singularity (see `?lme4::isSingular`
## and `?performance::check_singularity`).
## Decrease the `tolerance` level to force the calculation of random effect
## variances, or impose priors on your random effects parameters (using
## packages like `brms` or `glmmTMB`).
## [1] NA
The summary output tells us the estimated variance between groups is about 0.
summary(m)
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ 1 + (1 | ID)
## Data: d
##
## REML criterion at convergence: 232.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.6138 -0.5419 0.1072 0.7088 2.0647
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 0.000 0.000
## Residual 5.168 2.273
## Number of obs: 52, groups: ID, 13
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 2.8230 0.3153 8.955
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
We can see this when we look at the group-specific estimated intercepts. They’re all the same (no variance). This is what is causing the warning.
coef(m)
## $ID
## (Intercept)
## 1 2.822981
## 2 2.822981
## 3 2.822981
## 4 2.822981
## 5 2.822981
## 6 2.822981
## 7 2.822981
## 8 2.822981
## 9 2.822981
## 10 2.822981
## 11 2.822981
## 12 2.822981
## 13 2.822981
##
## attr(,"class")
## [1] "coef.mer"
The warning message suggests one way to alleviate the issue is to “impose priors on your random effects parameters”. This involves Bayesian modeling. The rstanarm package makes this easy to implement. It fits a Bayesian model with default priors on the random effect parameters.
library(rstanarm)
m2 <- stan_lmer(y ~ 1 + (1 | ID), data = d, refresh = 0)
icc(m2)
## # Intraclass Correlation Coefficient
##
## Adjusted ICC: 0.051
## Unadjusted ICC: 0.051
Now we’re getting an estimate for ICC but at the cost of a more complex analysis involving prior distributions. (Though I suppose you could quietly use this method just to get an estimate for the ICC.)
It’s worth asking yourself if you even need to report ICC. Accounting for multilevel data in your analysis doesn’t necessarily mean we need to quantify what proportion of variance is due to random effect groups. We can do it, but it’s usually tangential to the main analysis.