R-squared (R2) is a stat­ist­ic­al error metric used to measure the quality of linear re­gres­sions. In R pro­gram­ming, it can be cal­cu­lated by calling up a simple function.

Why is R-squared in R important?

R-squared is a stat­ist­ic­al measure that measures how well a linear re­gres­sion line ap­prox­im­ates the data. It assumes values between 0 and 1 and is a key measure for re­gres­sion model quality.

An R-squared in­ter­pret­a­tion provides in­form­a­tion about how close the data is to a cal­cu­lated re­gres­sion line. The higher the R-squared value, the better the model explains the data. A low R-squared value indicates poor model fitting.

Tip

R lets you program a whole range of different ap­plic­a­tions. And getting your own webspace lets you host them. Discover different IONOS webspace plans and find one that meets your in­di­vidu­al needs.

R-squared in R with linear re­gres­sion

R-squared in R is often used in the context of linear re­gres­sion. Since R is a pro­gram­ming language often used in stat­ist­ics, it’s not sur­pris­ing that there are various R functions to help you calculate:

x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 5, 4, 5)
# Linear regression
model <- lm(y ~ x)
R

In the code example above, two R vectors named x and y were created. These vectors contain the datasets on which the linear re­gres­sion will be performed. The dependent variable in this case is the variable y. The re­gres­sion model is then cal­cu­lated using the R-function lm() and stored in the variable model.

How to calculate R-squared in R

The R2 value in R can be cal­cu­lated using a function. You don’t need in-depth math­em­at­ic­al knowledge to do this, you just need to know how to use the correct function. It’s a simple function, even if you’re just starting out with coding.

The function to calculate this is called summary(). As the name suggests, it provides a summary of the re­gres­sion analysis, including the R-squared value. The code example below, which builds on the linear re­gres­sion that has already been cal­cu­lated, shows the summary() function in action:

# R-squared-value
summary(model)$r.squared
R

You can use this code to extract the R-squared value from the linear re­gres­sion model lm_model. The R-squared value indicates how well the model ap­prox­im­ates the variation in the dependent variable y, based on the in­de­pend­ent variable x.

In the code example above, the summary() function is applied to the re­gres­sion model that has already been cal­cu­lated. At the same time, the R operator $ is used to display the R-squared value from the values returned by the function call. In our example, the value is 0.6.

Tip

Looking to dive deeper into the world of R pro­gram­ming? Our how-to guides will help you get started:

How to interpret R-squared

Once the R-squared value has been de­term­ined, you have to interpret the result. Here, it‘s a good idea to look at certain intervals that the value can take. As mentioned earlier, the range of R2 values is between 0 and 1.

  • 0 (no ad­just­ment): an R-squared value of 0 means that the model does not match the data at all. In this case, there is no linear re­la­tion­ship between the variables.
  • 1 (perfect fit): an R-squared value of 1 indicates that all ob­ser­va­tions lie perfectly on the re­gres­sion line. This is extremely rare and may indicate over­fit­ting.
  • 0.7 to 0.9 (good fit): an R-squared value in this interval indicates that the model describes the data suf­fi­ciently well.
  • 0.5 to 0.7 (ac­cept­able ad­just­ment): an R-squared value in the range of 0.5 to 0.7 is ac­cept­able but indicates that there’s still room for im­prove­ment.
  • Less than 0.5 (poor fit): an R-squared value below 0.5 indicates that the cal­cu­lated model doesn’t describe the data with suf­fi­cient accuracy. In this case, the model should be adapted to obtain mean­ing­ful results.
Note

A high R-squared value alone isn’t enough to judge the quality of your model. That’s why you should also consider factors like model val­id­a­tion, analysis of residuals, and ad­apt­a­tion to specific re­quire­ments when de­term­in­ing the goodness of fit of a re­gres­sion model. The summary() function shown earlier provides ad­di­tion­al key figures that you can use for the as­sess­ment.

Go to Main Menu