The rvif package

Intro

The rvif (Redefined Variance Inflation Factor) package focuses on determining whether or not the degree of approximate multicollinearity in a multiple linear regression model is of concern, meaning that it affects the statistical analysis (i.e. individual significance tests) of the model. It does so by implementing a statistical test via the multicollinearity() function. From the package vignette: “the test determines whether the non-rejection of the null hypothesis in the individual significance tests of the coefficient associated with each independent variable is due to the degree of multicollinearity.”

The intro to the R Journal article quotes the O’Brien paper: “However, the measures traditionally applied to detect multicollinearity may conclude that multicollinearity exists even if it does not lead to the negative effects mentioned above, when, in fact, the best solution in this case may be not to treat the multicollinearity (see O’Brien (2007).”

Preliminaries

High VIF leads to a high variance.
High VIF will lead to a low coefficient test statistic (t), provoking the tendency not to reject the null hypothesis.
From O’Brien (2007), other factors such as the estimation of the random disturbance and the size of the sample can counterbalance the high value of the VIF to yield a low value for the experimental statistic.
it is possible to obtain VIF values greater than 10 that do not necessarily imply high estimated variance because of a large sample size or a low value for the estimated variance of the random disturbance.
VIF is defined as the ratio between the variance of the estimators in the model and the variance of the estimators of a hypothetical reference model where the independent variables are orthogonal.
RVIF (Redefined Variance Inflation Factor) is defined as the ratio between the variance of the estimators in the model and the variance of the estimators of a hypothetical reference model where the independent variables are orthonormal.
See this Stack Exchange post for an explanation of the difference between orthogonal and orthonormal.
the RVIF can be calculated for the intercept of the linear regression model.
the RVIF is always greater than or equal to 1 and its minimum value is indicative of the absence of multicollinearity.
Salmerón et al. (2026) distinguish between essential and non-essential multicollinearity:
- essential: multicollinearity between independent variables excluding the intercept
- non-essential: strong relationship between the intercept and at least one of the independent variables.

The four scenarios

“To determine whether the tendency not to reject the null hypothesis in the individual significance test is caused by a troubling approximate multicollinearity that inflates the variance of the estimator, or whether it is caused by variables not being statistically significantly related, the following situations are distinguished with a significance level \(\alpha\):”

a If the null hypothesis is initially rejected in the original model, the following results can be obtained for the orthonormal model:

a.1. the null hypothesis is rejected, the results are consistent.
a.2. the null hypothesis is not rejected, this could be an inconsistency.

b If the null hypothesis is not initially rejected in the original model, the following results may occur for the orthonormal model:

b.1 the null hypothesis is rejected, it is possible to conclude that the degree of multicollinearity affects the statistical analysis of the model, provoking not rejecting the null hypothesis in the original model.
b.2 the null hypothesis is also not rejected, the results are consistent.

The decision rule

Screenshot from the rvif vignette:

In words, if RVIF exceeds the maximum of \(c_0(i)\) and \(c_3(i)\), then multicollinearity is affecting the statistical analysis of the model. See the R Journal article for derivations of these expressions.

Examples

library(rvif)
library(faraway)

Prostate data

Using the prostate data from the faraway package, model lpsa (log prostate specific antigen) as a function of all other variables and investigate multicollinearity using vif() from the car package.

data("prostate")
m <- lm(lpsa ~ ., data = prostate)
vif(m)

##   lcavol  lweight      age     lbph      svi      lcp  gleason    pgg45 
## 2.054115 1.363704 1.323599 1.375534 1.956881 3.097954 2.473411 2.974361

Now investigate multicollinearity using the rvifs() function from the rvif package. Notice we need to use model.matrix() to extract the right-hand side variables from the model.

RVIF = Redefined Variance Inflation Factor of each independent variable.
% = Percentage of near multicollinearity due to each independent variable.

Notice the intercept is included in the output. These RVIF values are much higher than VIF, but there is no threshold to consider.

rvifs(x = model.matrix(m))

##                  RVIF       %
## Intercept  324.836827 99.6922
## Variable 2   4.777115 79.0669
## Variable 3  75.902487 98.6825
## Variable 4  99.736712 98.9974
## Variable 5   1.382185 27.6508
## Variable 6   2.497599 59.9615
## Variable 7   3.149462 68.2485
## Variable 8 220.997936 99.5475
## Variable 9   5.220262 80.8439

Use the Decision Rule to Detect Troubling Multicollinearity. The function returns the value of the RVIF and the established thresholds (c0 and c3), as well as indicating whether or not the individual significance analysis is affected by multicollinearity at the chosen significance level.

multicollinearity(y = prostate[,"lpsa"], x = model.matrix(m))

##          RVIFs           c0           c3 Scenario Affects
## 1 3.348833e+00 2.260428e-01 0.0111399208      b.1     Yes
## 2 1.540289e-02 1.738641e-01 0.0004424191      a.1      No
## 3 5.759506e-02 1.042094e-01 0.0191900584      a.1      No
## 4 2.487380e-04 1.945627e-04 0.0011738455      b.2      No
## 5 6.807399e-03 5.782391e-03 0.0126207847      b.2      No
## 6 1.189333e-01 2.961675e-01 0.0396061480      a.1      No
## 7 1.650575e-02 5.612992e-03 0.2530673200      b.2      No
## 8 4.940709e-02 1.028147e-03 0.1384051616      b.2      No
## 9 3.894933e-05 1.033197e-05 0.0001468308      b.2      No

The individual significance test of the intercept is affected by the degree of existing multicollinearity. The Scenario of b.1 says that the intercept was not rejected in the original model but is rejected in the orthonormal reference model.

coef(summary(m))["(Intercept)",]

##   Estimate Std. Error    t value   Pr(>|t|) 
##  0.6693367  1.2963875  0.5163091  0.6069335

Savings rates data

Model savings rate (sr) as a function of pop15 (percent population under age of 15), pop75, (percent population over age of 75), dpi (per-capita disposable income in dollars), and ddpi (percent growth rate of dpi). View VIF.

data("savings")
m2 <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = savings)
vif(m2)

##    pop15    pop75      dpi     ddpi 
## 5.937661 6.629105 2.884369 1.074309

Now look at RVIFs.

rvifs(x = model.matrix(m2))

##                  RVIF       %
## Intercept  187.025680 99.4653
## Variable 2  95.009426 98.9475
## Variable 3  27.976176 96.4255
## Variable 4   6.556330 84.7476
## Variable 5   2.953623 66.1433

Use the Decision Rule to Detect Troubling Multicollinearity.

multicollinearity(y = savings[,"sr"], x = model.matrix(m2))

##          RVIFs           c0           c3 Scenario Affects
## 1 3.740514e+00 1.391109e+01 4.692010e-02      a.1      No
## 2 1.446816e-03 3.625978e-03 4.157893e-04      a.1      No
## 3 8.120077e-02 4.877557e-02 8.929468e-02      b.2      No
## 4 5.995458e-08 1.934935e-09 2.836012e-07      b.2      No
## 5 2.662002e-03 2.861414e-03 2.476486e-03      a.1      No

According to the results, the level of multicollinearity detected does not affect the statistical analysis.

Simulated data

Simulate highly correlated data, generate an outcome (Y), and then fit the correct model.

library(MASS)
set.seed(1)
n <- 300
r <- 0.99
d <- as.data.frame(mvrnorm(n = n, mu = c(0,0,0), 
                           Sigma = matrix(c(1,r,r, 
                                            r,1,r,
                                            r,r,1), 
                                          byrow = T, 
                                          nrow = 3)))
d$Y <- 0.2 + 0.5*d$V1 + -0.5*d$V2 + 0.9*d$V3 +
  rnorm(n = n, sd = 1.1)
m3 <- lm(Y ~ V1 + V2 + V3, data = d)
vif(m3)

##       V1       V2       V3 
## 54.52195 52.42181 57.53355

Now look at RVIFs:

rvifs(x = model.matrix(m3))

##                 RVIF       %
## Intercept   1.004846  0.4823
## Variable 2 54.607026 98.1687
## Variable 3 52.468519 98.0941
## Variable 4 57.603948 98.2640

Use the Decision Rule to Detect Troubling Multicollinearity.

multicollinearity(y = d[,"Y"], x = model.matrix(m3))

##         RVIFs           c0          c3 Scenario       Affects
## 1 0.003349487 0.0039348024 0.004787978      a.2 Contradiction
## 2 0.197220475 0.0009671839 0.004267780      b.1           Yes
## 3 0.186752687 0.0185034292 1.320402808      b.2            No
## 4 0.207587988 0.3379381655 0.127516738      a.1            No

The degree of multicollinearity in the model affects the individual significance of variable 1 (V1). The “Contradiction” result for the intercept is due to scenario “a.2” where the intercept test statistic was rejected in the original model but not in the reference orthonormal model.

View the confidence intervals on the coefficients. Notice the interval of V1 [-1.05508392, 0.9169819] contains 0 and is not significant. The “true” value of the coefficient is 0.5. The test on V1 is not significant in the model, but is significant in the reference orthonormal model.

confint(m3)

##                   2.5 %    97.5 %
## (Intercept)  0.01077577 0.2677766
## V1          -1.05508392 0.9169819
## V2          -1.26153260 0.6574845
## V3           0.27910799 2.3023439

Set a new seed, double the sample size and check results.

set.seed(11)
n <- 600
r <- 0.99
d <- as.data.frame(mvrnorm(n = n, mu = c(0,0,0), 
                           Sigma = matrix(c(1,r,r, 
                                            r,1,r,
                                            r,r,1), 
                                          byrow = T, 
                                          nrow = 3)))
d$Y <- 0.2 + 0.5*d$V1 + -0.5*d$V2 + 0.9*d$V3 +
  rnorm(n = n, sd = 1.1)
m3 <- lm(Y ~ V1 + V2 + V3, data = d)
multicollinearity(y = d[,"Y"], x = model.matrix(m3))

##         RVIFs          c0           c3 Scenario Affects
## 1 0.001667416 0.007239347 0.0004419016      a.1      No
## 2 0.113089227 0.072965163 0.0009234877      b.1     Yes
## 3 0.107643540 0.012047303 9.5106673247      b.2      No
## 4 0.113329710 0.086660972 0.1482053903      b.2      No

Collinearity again is affecting V1. Check the confidence intervals. The interval for V1 is wide and includes 0. The “true” value of the coefficient is 0.5. The test on V1 is not significant in the model, but is significant in the reference orthonormal model.

confint(m3)

##                   2.5 %    97.5 %
## (Intercept)  0.09410333 0.2677796
## V1          -0.14071109 1.2895950
## V2          -0.93113933 0.4643045
## V3          -0.08987580 1.3419502

Limitations/Questions

Not sure how multicollinearity() handles interactions and non-linear terms. Neither the R journal article or rvif vignettes addresses these topics.
The rvif package is “not designed to distinguish between quantitative and binary independent variables, so users should be especially careful when interpreting results involving binary variables”
It’s not clear to me how or if non-essential collinearity (i.e., collinearity between the intercept and at least one of the independent variables) affects the analysis.
It’s not clear to me how or if one should address a result of “Contradiction” when running multicollinearity().
Unclear if this is valid for mixed-effect/multilevel models. The following example runs and produces output, but not sure if it’s meaningful:

library(lme4)
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
multicollinearity(y = sleepstudy[,"Reaction"], x = model.matrix(fm1))

References

O’Brien, R.M. A Caution Regarding Rules of Thumb for Variance Inflation Factors. Qual Quant 41, 673–690 (2007). https://doi.org/10.1007/s11135-006-9018-6
R Core Team (2026). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
Salmerón-Gómez & García-García, “rvif: a Decision Rule to Detect Troubling Statistical Multicollinearity Based on Redefined VIF”, The R Journal, 2026. https://journal.r-project.org/articles/RJ-2025-040/
Salmerón R, García C (2025). rvif: Collinearity Detection using Redefined Variance Inflation Factor and Graphical Methods. doi:10.32614/CRAN.package.rvif https://doi.org/10.32614/CRAN.package.rvif, R package version 3.2, https://CRAN.R-project.org/package=rvif.