Questions about model interactions

question 1

Client says: “With cell culture assays, we are trying to experimentally break down these interactions in controlled environments. In this example (see plot below), what we really want to test is: Did the stim change the pg/mL compared to control, and did the drug change the response to the stim. We give the drug alone to experimentally tease apart that interaction. In this example, I would say that the drug by itself did not affect control cells, but it inhibited the response to the stim.” (emphasis mine)

For example:

StatLab associate says: My initial thoughts are, for question 1 (Did the stim change the pg/mL compared to control, and did the drug change the response to the stim) to run a two-way ANOVA with an interaction to see the main effects of the drug and stim and if the stim changes the effect of the drug. In clients’s example, he would likely see that “Drug” is not significant, “Stim” is significant, and “Drug + Stim” is not significant. (emphasis mine)

simulate data

We can simulate data based on the figure to help us think about this.

drug <- factor(c(0, 0, 0, 
          1, 1, 1, 
          0, 0, 0, 
          1, 1, 1))
stim <- factor(c(0, 0, 0, 
          0, 0, 0,
          1, 1, 1,
          1, 1, 1))
set.seed(10)
pgmL <- rnorm(n = 12, mean = 10 + 0.2*(drug == "1") + 
                50*(stim == "1") - 
                40*(drug == "1")*(stim == "1"), sd = 2)
d <- data.frame(pgmL, drug, stim)

# calculate means
d2 <- aggregate(pgmL ~ drug + stim, data = d, FUN = mean)

visualize and analyze

Recreate plot without error bars, which appear to be SD bars not SE.

library(ggplot2)
ggplot() +
  aes(x = interaction(drug, stim), y = pgmL) +
  geom_col(data = d2, alpha = 1/5, color = "black") +
  geom_point(data = d, position = position_jitter(width = 0.3, height = 0)) +
  scale_x_discrete(labels = c("ctrl", "drug", "stim", "drug+stim"))

In this case I think an interaction plot would be more suitable. When drug = 1, the effect of stim is greatly reduced.

interaction.plot(x.factor = d$stim, trace.factor = d$drug, response = d$pgmL)

We can run a 2-way ANOVA as follows.

m <- lm(pgmL ~ drug * stim, data = d)

If we look at the model coefficients, it does seem the effect of drug is not significant relative to the control. However, this assumes stim = 0. The interaction is large and significant, so it may be the drug has an effect when stim = 1. That’s what we saw in the interaction plot. Likewise, the significant stim coefficient suggests that stim does change pg/mL relative to control (when drug = 0).

summary(m)

## 
## Call:
## lm(formula = pgmL ~ drug * stim, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7181 -1.1546  0.4882  0.8076  1.4049 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   8.9754     0.7689  11.673 2.64e-06 ***
## drug1         1.2813     1.0874   1.178    0.272    
## stim1        48.8923     1.0874  44.964 6.61e-11 ***
## drug1:stim1 -37.8817     1.5378 -24.634 7.88e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.332 on 8 degrees of freedom
## Multiple R-squared:  0.997,  Adjusted R-squared:  0.9959 
## F-statistic: 883.9 on 3 and 8 DF,  p-value: 2.011e-10

If we look at the ANOVA table for this analysis, we see both main effects and the interaction are significant. This may seem contradictory to the model summary output. However, it simply shows that both variables and their interaction explain the variability in the outcome.

anova(m)

## Analysis of Variance Table
## 
## Response: pgmL
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## drug       1  935.57  935.57  527.52 1.370e-08 ***
## stim       1 2691.27 2691.27 1517.47 2.073e-10 ***
## drug:stim  1 1076.27 1076.27  606.85 7.878e-09 ***
## Residuals  8   14.19    1.77                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can do posthoc comparisons to see if the drug changes the response to the stim. This means looking at the mean response of stim conditional on drug. We see below that the effect of stim changes dramatically depending on drug. The effect of stim drops from 48.9 to 11.

library(emmeans)
emmeans(m, specs = ~ stim | drug) |> 
  pairs(reverse = TRUE)

## drug = 0:
##  contrast      estimate   SE df t.ratio p.value
##  stim1 - stim0     48.9 1.09  8  44.964 <0.0001
## 
## drug = 1:
##  contrast      estimate   SE df t.ratio p.value
##  stim1 - stim0     11.0 1.09  8  10.126 <0.0001

We can also look at the difference of those differences as follows. This shows the difference is about -37.9 with a 95% CI of [-41.4,-34.4].

emmeans(m, specs = ~ stim | drug) |> 
  pairs(reverse = TRUE) |> 
  pairs(simple = "drug", reverse = TRUE) |> 
  confint()

## contrast = stim1 - stim0:
##  contrast1     estimate   SE df lower.CL upper.CL
##  drug1 - drug0    -37.9 1.54  8    -41.4    -34.3
## 
## Confidence level used: 0.95

We can visualize this model with an effect plot. This shows standard error bars. We can clearly see the effect of stim greatly reduced when drug = 1.

library(ggeffects)
ggpredict(m, terms = c("stim", "drug")) |> plot()

question 2

Client says: “This example is the same as the previous one, but we’ve just added a second drug. Again, the drug B alone is more of an experimental control, but here we do see that it reduces the signal compared to the control. Meanwhile, Drug B doesn’t impact whatever the stim is doing. Also an important note, we would NOT be looking at what does Drug A and Drug B do together, or what Drug A and Drug B do to Stim.” (emphasis mine)

StatLab associate says: For question 2, Client doesn’t have values for all interactions (2x2x2) and is not interested in what Drug A and Drug B do together, or what Drug A and Drug B do to Stim. It seems to me that these could be treated as different treatments, and a one-way ANOVA could be used? He would need to do post-hoc testing to look at differences between treatments. Could he also fit a model where:

Y ~ Drug A + Drug B + Stim + Drug A:Stim + Drug B:stim

Would this better model the interactions instead of thinking of them as “treatment groups”?

simulate data

We can simulate data based on the figure to help us think about this.

drugA <- factor(
  c(0, 0, 0, 
    1, 1, 1, 
    0, 0, 0, 
    0, 0, 0,
    1, 1, 1,
    0, 0, 0))
drugB <- factor(
  c(0, 0, 0,
    0, 0, 0,
    1, 1, 1, 
    0, 0, 0, 
    0, 0, 0,
    1, 1, 1))
stim <- factor(
  c(0, 0, 0,
    0, 0, 0,
    0, 0, 0,
    1, 1, 1,
    1, 1, 1, 
    1, 1, 1))
set.seed(2)
pgmL <- rnorm(n = 18, mean = 10 + 
                0.2*(drugA == "1") + 
                -4*(drugB == "1") +
                50*(stim == "1") - 
                40*(drugA == "1")*(stim == "1") -
                0.8*(drugB == "1")*(stim == "1"), sd = 2)
d <- data.frame(pgmL, drugA, drugB, stim)

# calculate means
d2 <- aggregate(pgmL ~ drugA + drugB + stim, data = d, FUN = mean)

visualize and analyze

Recreate plot without error bars, which appear to be SD bars not SE.

library(ggplot2)
ggplot() +
  aes(x = interaction(drugA, drugB, stim), y = pgmL) +
  geom_col(data = d2, alpha = 1/5, color = "black") +
  geom_point(data = d, position = position_jitter(width = 0.3, height = 0)) +
  scale_x_discrete(labels = c("ctrl", "drugA", "drugB", "stim", 
                              "drugA+stim", "drugB+stim"))

Since he doesn’t want all two way interactions, we can also visualize this using two separate interaction plots. In the first plot, it’s easy to see how the effect of drug A reduces the effect of stim.

interaction.plot(x.factor = d$stim, trace.factor = d$drugA, response = d$pgmL, 
                 main = "Drug A and Stim interaction")

In this plot, there appears to be a mild interaction, but this may be due to sampling variability.

interaction.plot(x.factor = d$stim, trace.factor = d$drugB, response = d$pgmL, 
                 main = "Drug B and Stim interaction")

I think the StatLab associate’s proposed model is the cleanest way to analyze this question:

Y ~ Drug A + Drug B + Stim + Drug A:Stim + Drug B:stim

m2 <- lm(pgmL ~ drugA + drugB + stim + drugA:stim + drugB:stim, data = d)

The model summary tells us the drug B and stim interaction is small and uncertain.

summary(m2)

## 
## Call:
## lm(formula = pgmL ~ drugA + drugB + stim + drugA:stim + drugB:stim, 
##     data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6910 -1.4360 -0.1096  1.0930  3.3312 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    10.584      1.373   7.707 5.50e-06 ***
## drugA1         -1.103      1.942  -0.568    0.581    
## drugB1         -2.949      1.942  -1.518    0.155    
## stim1          50.257      1.942  25.878 6.77e-12 ***
## drugA1:stim1  -39.305      2.747 -14.311 6.65e-09 ***
## drugB1:stim1   -3.623      2.747  -1.319    0.212    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.379 on 12 degrees of freedom
## Multiple R-squared:  0.9922, Adjusted R-squared:  0.989 
## F-statistic: 305.8 on 5 and 12 DF,  p-value: 3.24e-12

The same basic result can be seen in the ANOVA table.

anova(m2)

## Analysis of Variance Table
## 
## Response: pgmL
##            Df Sum Sq Mean Sq   F value    Pr(>F)    
## drugA       1 1350.5  1350.5  238.7178 2.769e-09 ***
## drugB       1   68.0    68.0   12.0152  0.004663 ** 
## stim        1 5815.0  5815.0 1027.8397 5.355e-13 ***
## drugA:stim  1 1405.7  1405.7  248.4732 2.201e-09 ***
## drugB:stim  1    9.8     9.8    1.7399  0.211765    
## Residuals  12   67.9     5.7                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since he’s interested in what Drug A and Drug B do to Stim, we can use this model for some post-hoc comparisons using emmeans. We should do it separately for drug A and drug B. First let’s look at the effect of stim conditional on drug A. We see the effect of stim is reduced quite a bit when drug A is administered.

# Hold drug B at 0 since drug B is not used with drug A
emmeans(m2, specs = ~ stim | drugA, at = list(drugB = "0")) |> 
  pairs(reverse = TRUE)

## drugA = 0:
##  contrast      estimate   SE df t.ratio p.value
##  stim1 - stim0     50.3 1.94 12  25.878 <0.0001
## 
## drugA = 1:
##  contrast      estimate   SE df t.ratio p.value
##  stim1 - stim0     11.0 1.94 12   5.639  0.0001

Take the difference in those differences. This reduction is estimated to be in the range of [-45.3, -33.3] with 95% confidence.

emmeans(m2, specs = ~ stim | drugA, at = list(drugB = "0")) |> 
  pairs(reverse = TRUE) |> 
  pairs(simple = "drugA", reverse = TRUE) |> 
  confint()

## contrast = stim1 - stim0:
##  contrast1       estimate   SE df lower.CL upper.CL
##  drugA1 - drugA0    -39.3 2.75 12    -45.3    -33.3
## 
## Confidence level used: 0.95

We can carry out the same analysis for drug B. The effect of stim is about the same regardless of whether drug B is administered. There appears to be a slight reduction, but is it reliably in one direction?

# Hold drug A at 0 since drug A is not used with drug B
emmeans(m2, specs = ~ stim | drugB, at = list(drugA = "0")) |> 
  pairs(reverse = TRUE)

## drugB = 0:
##  contrast      estimate   SE df t.ratio p.value
##  stim1 - stim0     50.3 1.94 12  25.878 <0.0001
## 
## drugB = 1:
##  contrast      estimate   SE df t.ratio p.value
##  stim1 - stim0     46.6 1.94 12  24.012 <0.0001

The slight reduction of -3.62 is uncertain with a 95% CI of [-9.61, 2.36]. We’re not sure what kind of effect drug B has on stim, if any.

emmeans(m2, specs = ~ stim | drugB, at = list(drugA = "0")) |> 
  pairs(reverse = TRUE) |> 
  pairs(simple = "drugB", reverse = TRUE) |> 
  confint()

## contrast = stim1 - stim0:
##  contrast1       estimate   SE df lower.CL upper.CL
##  drugB1 - drugB0    -3.62 2.75 12    -9.61     2.36
## 
## Confidence level used: 0.95

As before, we can visualize these results with effect plots. We see the effect of stim is greatly reduced when drug A is administered.

ggpredict(m2, terms = c("stim", "drugA"), condition = c(drugB = "0")) |> 
  plot()

On the other hand, the effect of stim hardly changes in the presence of drug B.

ggpredict(m2, terms = c("stim", "drugB"), condition = c(drugA = "0")) |> 
  plot()

one-way ANOVA

Let’s try this as a one-way ANOVA. If we fit a model without the intercept we simply get group means.

d$grp <- interaction(d$drugA, d$drugB, d$stim, drop = TRUE)
levels(d$grp) <- c("ctrl", "drugA", "drugB", "stim", "drugA+stim", "drugB+stim")
m3 <- lm(pgmL ~ -1 + grp, data = d)
summary(m3)

## 
## Call:
## lm(formula = pgmL ~ -1 + grp, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6910 -1.4360 -0.1096  1.0930  3.3312 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## grpctrl         10.584      1.373   7.707 5.50e-06 ***
## grpdrugA         9.481      1.373   6.904 1.64e-05 ***
## grpdrugB         7.635      1.373   5.560 0.000124 ***
## grpstim         60.840      1.373  44.304 1.14e-14 ***
## grpdrugA+stim   20.433      1.373  14.879 4.26e-09 ***
## grpdrugB+stim   54.269      1.373  39.518 4.45e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.379 on 12 degrees of freedom
## Multiple R-squared:  0.9969, Adjusted R-squared:  0.9954 
## F-statistic: 647.3 on 6 and 12 DF,  p-value: 2.378e-14

The ANOVA table is much less informative.

anova(m3)

## Analysis of Variance Table
## 
## Response: pgmL
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## grp        6 21973.2  3662.2  647.32 2.378e-14 ***
## Residuals 12    67.9     5.7                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can get comparisons using emmeans but it’s a bit tricky.

The first comparison (c1) is grpstim - grpctrl (ie, effect of stim when drug A = 0) and the second comparison (c2) is grpdrugA+stim - grpdrugA (ie, effect of stim when drug A = 1)

emmeans(m3, specs = ~ grp) |> contrast(list(c1 = c(-1, 0, 0, 1, 0, 0),
                                            c2 = c(0, -1, 0, 0, 1, 0)))

##  contrast estimate   SE df t.ratio p.value
##  c1           50.3 1.94 12  25.878 <0.0001
##  c2           11.0 1.94 12   5.639  0.0001

Taking the difference of those differences gives us the same result as before: 95% CI of [-45.3, -33.3]

emmeans(m3, specs = ~ grp) |> contrast(list(c1 = c(-1, 0, 0, 1, 0, 0),
                                            c2 = c(0, -1, 0, 0, 1, 0))) |> 
  pairs(reverse = TRUE) |> 
  confint()

##  contrast estimate   SE df lower.CL upper.CL
##  c2 - c1     -39.3 2.75 12    -45.3    -33.3
## 
## Confidence level used: 0.95

I think the model with interactions is easier to work with.

Questions about model interactions

Clay Ford

2026-02-03

question 1

simulate data

visualize and analyze

question 2

simulate data

visualize and analyze

one-way ANOVA