ICC is the proportion of random variation between subjects.

Simulate data with high ICC

Let’s simulate multilevel data with high ICC.

ID <- 1:13
obs <- 1:4
d <- expand.grid(obs = obs, ID = ID)
head(d)
##   obs ID
## 1   1  1
## 2   2  1
## 3   3  1
## 4   4  1
## 5   1  2
## 6   2  2

Now simulate intercepts. Notice the higher standard deviation for z, the between ID variance, versus e, the within ID variance.

set.seed(1)
z <- rnorm(13, sd = 2)
e <- rnorm(nrow(d), sd = 0.5)
d$y <- (3 + z[d$ID]) + e

Fit multilevel model and calculate ICC

library(lme4)
library(performance)
m <- lmer(y ~ 1 + (1 | ID), data = d)
icc(m)
## # Intraclass Correlation Coefficient
## 
##     Adjusted ICC: 0.930
##   Unadjusted ICC: 0.930

Notice in the summary output (under random effects) that the Variance/Std Dev is much bigger for ID (between) than Residual (within).

summary(m)
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ 1 + (1 | ID)
##    Data: d
## 
## REML criterion at convergence: 117.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.1725 -0.6008  0.1031  0.5790  2.1489 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  ID       (Intercept) 2.8171   1.6784  
##  Residual             0.2109   0.4592  
## Number of obs: 52, groups:  ID, 13
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   3.4599     0.4698   7.364

The ICC is simply the proportion of total variance due to ID.

2.8171 /(2.8171  + 0.2109)
## [1] 0.9303501

Plot of data. Notice much of the variability is due to differences between IDs.

library(ggplot2)
ggplot(d) +
  aes(y = y, x = ID, color = factor(ID)) +
  geom_jitter(width = 0.05, height = 0)

Simulate data with low ICC

Let’s simulate multilevel data with low ICC. Notice the lower standard deviation for z, the between ID variance, versus e, the within ID variance.

set.seed(3)
z <- rnorm(13, sd = 0.05)
e <- rnorm(nrow(d), sd = 2)
d$y <- (3 + z[d$ID]) + e

Fit multilevel model and calculate ICC

m <- lmer(y ~ 1 + (1 | ID), data = d)
icc(m)
## # Intraclass Correlation Coefficient
## 
##     Adjusted ICC: 0.144
##   Unadjusted ICC: 0.144

Notice in the summary output (under random effects) that the Variance/Std Dev is much smaller for ID (between) than Residual (within).

summary(m)
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ 1 + (1 | ID)
##    Data: d
## 
## REML criterion at convergence: 209.9
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.45681 -0.72734  0.04071  0.81441  1.88816 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  ID       (Intercept) 0.4948   0.7034  
##  Residual             2.9454   1.7162  
## Number of obs: 52, groups:  ID, 13
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   3.0022     0.3077   9.756

Plot the data. Notice much of the variability is within IDs, not between.

ggplot(d) +
  aes(y = y, x = ID, color = factor(ID)) +
  geom_jitter(width = 0.05, height = 0)

Simulate data with NA ICC

By playing with the seed we can generate data where the estimated ICC is NA.

set.seed(998)
z <- rnorm(13, sd = 0.05)
e <- rnorm(nrow(d), sd = 2)
d$y <- (3 + z[d$ID]) + e

This results in a singular fit.

m <- lmer(y ~ 1 + (1 | ID), data = d)
## boundary (singular) fit: see help('isSingular')
icc(m)
## Warning: Can't compute random effect variances. Some variance components equal
##   zero. Your model may suffer from singularity (see `?lme4::isSingular`
##   and `?performance::check_singularity`).
##   Decrease the `tolerance` level to force the calculation of random effect
##   variances, or impose priors on your random effects parameters (using
##   packages like `brms` or `glmmTMB`).
## [1] NA

The summary output tells us the estimated variance between groups is about 0.

summary(m)
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ 1 + (1 | ID)
##    Data: d
## 
## REML criterion at convergence: 232.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.6138 -0.5419  0.1072  0.7088  2.0647 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  ID       (Intercept) 0.000    0.000   
##  Residual             5.168    2.273   
## Number of obs: 52, groups:  ID, 13
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   2.8230     0.3153   8.955
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

We can see this when we look at the group-specific estimated intercepts. They’re all the same (no variance). This is what is causing the warning.

coef(m)
## $ID
##    (Intercept)
## 1     2.822981
## 2     2.822981
## 3     2.822981
## 4     2.822981
## 5     2.822981
## 6     2.822981
## 7     2.822981
## 8     2.822981
## 9     2.822981
## 10    2.822981
## 11    2.822981
## 12    2.822981
## 13    2.822981
## 
## attr(,"class")
## [1] "coef.mer"

The warning message suggests one way to alleviate the issue is to “impose priors on your random effects parameters”. This involves Bayesian modeling. The rstanarm package makes this easy to implement. It fits a Bayesian model with default priors on the random effect parameters.

library(rstanarm)
m2 <- stan_lmer(y ~ 1 + (1 | ID), data = d, refresh = 0)
icc(m2)
## # Intraclass Correlation Coefficient
## 
##     Adjusted ICC: 0.051
##   Unadjusted ICC: 0.051

Now we’re getting an estimate for ICC but at the cost of a more complex analysis involving prior distributions. (Though I suppose you could quietly use this method just to get an estimate for the ICC.)

It’s worth asking yourself if you even need to report ICC. Accounting for multilevel data in your analysis doesn’t necessarily mean we need to quantify what proportion of variance is due to random effect groups. We can do it, but it’s usually tangential to the main analysis.