Polynomial Regression in RStudio

How to statistically compare polynomial models at 5% significance level

Step-by-Step Guide

Using R's built-in functions for polynomial regression comparison

1. Prepare Your Data

# Sample dataset
set.seed(123)
x <- 1:20
y <- 2 + 0.5*x + 0.1*x^2 - 0.01*x^3 + rnorm(20, sd=0.5)
data <- data.frame(x, y)

2. Fit Polynomial Models

# Fit quadratic (degree 2) and cubic (degree 3) models
model_quad <- lm(y ~ x + I(x^2), data=data)
model_cubic <- lm(y ~ x + I(x^2) + I(x^3), data=data)

3. Compare Models with ANOVA

# Perform F-test at 5% significance
anova_result <- anova(model_quad, model_cubic)
print(anova_result)

# Extract p-value
p_value <- anova_result$`Pr(>F)`[2]
cat("\nComparison result at 5% significance level:\n")
if(p_value < 0.05) {
    cat("Significant difference (p =", round(p_value,4), ") - prefer higher degree model\n")
} else {
    cat("No significant difference (p =", round(p_value,4), ") - prefer simpler model\n")
}

Key Interpretation

The F-test compares the residual sum of squares (RSS) between nested models. A p-value < 0.05 indicates the more complex model provides a significantly better fit.

4. Visualize the Models

# Plot both models
plot(data$x, data$y, pch=19, col="blue", 
     main="Polynomial Regression Comparison",
     xlab="X", ylab="Y")
lines(data$x, predict(model_quad), col="red", lwd=2)
lines(data$x, predict(model_cubic), col="green", lwd=2)
legend("topleft", legend=c("Quadratic", "Cubic"), 
       col=c("red", "green"), lwd=2)

Additional Resources