How to do R programming

The R Programming language is a popular statistical programming language used primarily in science and mathematics for statistical computing. R is an interesting language with some distinctive features. The language is quite fun once you get used to it.

What sets R apart from other programming languages?

R is not a general-purpose programming language like Java or Python. The language is intended for statistical computing. R has remained in the top 20 most popular programming languages for years despite some strong competition.

R is special because it comes with the whole package. R programming usually takes place in an interactive environment, complete with read-eval-print loop (REPL) and integrated help. The open-source language is supported by a widely developed ecosystem. The community maintains the package repository ‘The Comprehensive R Archive Network’ (CRAN). Data sets and scientific white papers on new approaches and packages are also continually being submitted.

These features make R the perfect programming environment for statistics and data science. The interactive environment promotes research and fosters playful learning of both the language and the underlying mathematics.

R is a statistical programming language used for data analysis

R is a statistical programming language and concepts such as normal distribution, statistical tests, models and regression are commonly used. In addition to R, there are a number of comparable scientific languages, such as the commercial product Matlab and the more recent language Julia. Python has become another strong competitor in recent years.

Unlike Python, R has native support for statistical programming. The key difference is how the language operates on values. In R, you usually compute with multiple values at once. This is a special feature in R, as almost all other languages use a single number as the simplest value.

Let’s have a look at R’s approach to data processing with a simple example. Mathematical operations can be performed in every programming language. This is also the case in R. Let’s add two numbers:

10 + 5
R

Nothing unusual so far. However, the same addition operation can be applied to a list of numbers in R. We can combine two numbers into a list and add a constant value:

# returns 15, 25
c(10, 20) + 5
R

This may be a surprising result for seasoned programmers. Even a modern, dynamic language like Python does not facilitate this:

# throws an error
[10, 20] + 5
R

Two lists can also be added in R. In this case, the list elements are not combined into one list, rather the appropriate mathematical operation is performed for each element:

# returns 42, 69
c(40, 60) + c(2, 9)
R

A loop is required to process multiple elements of a list in older languages like Java or C++. This is because these languages separate single values, or scalars, from composite data structures, known as vectors. In R, the vector is the basic unit. A scalar operating as a one-element vector is unique to R programming.

What makes statistics so special is that it makes mathematical precision more flexible. In statistics, you have to calculate with uncertainties and imperfect data derived from reality. Something can, of course, always go wrong. But fortunately, R is equipped to deal with error to a certain extent. The language can handle missing values without crashing a running script.

Let’s look at an example of the language’s robustness. In any programming language, a crash can occur when a number is divided by zero. However, R is not affected by this. Division by zero results in the value Inf, which can be easily filtered out of the data during a cleanup later on:

# list of divisors, containing zero
divisors = c(2, 4, 0, 10)
# returns 'c(50, 25, Inf, 10) '
quotients = 100 / divisors
# filter out Inf; returns 'c(50, 25, 10)'
cleaned_quotients = quotients[quotients != Inf]
R

R supports OOP and functional programming

R makes programming extremely flexible. The language doesn’t fit clearly into the hierarchy of programming paradigms. It is supported by an OOP system, but you won’t find the usual class definitions. Its primarily functional and imperative approaches are used on a daily basis. The functional features are strongly pronounced, and they are ideal for data processing.

Similar to JavaScript, the object system’s flexibility is its main advantage. The generic functions are comparable to Python, in the sense that they can be applied to different types of objects. For example, the length() function exists in R programming, which is similar to len() in Python.

How does R programming work?

R programming specializes in data and statistics. In R, you need a data set to develop a solution to a problem. Unfortunately, this may not always exist at the time of development. This means that an R programming project usually begins with simulated data. Users write the code, test the functionality, and replace the test data with real data at a later point.

How is R code executed?

R is a dynamic, interpreted scripting language, similar to Ruby and Python. Unlike the programming language C there is no separation of source code and executable code in R. Development usually takes place interactively, whereby the interpreter is fed line by line with source code, which is executed immediately. Variables are created automatically when needed and names are bound at runtime.

This kind of interactive and dynamic programming is like being inside the running program. Objects can be examined and modified, and new ideas can be tested immediately. The help command grants access to the syntax and functions documentation:

# view help for 'for' syntax
help('for')
# view help for 'c()' function
help(c)
R

Script files can be loaded dynamically from the interpreter. The source command works in the same way as the shell command. The R source code file is read and fed into the running session:

source('path/to/file.r')
R

What is the syntax of the R programming language?

The scripting language uses curly braces to delimit the bodies of functions and control statements, like in C and Java. In contrast to Python, indenting code does not affect the function. Comments start with a hash, like in Ruby and Python, and no semicolon is needed at the end of a statement.

The language has some peculiarities, making it easy to recognize R code once you become more familiar with it. The equal sign and two arrow-like operators are used in R programming for assignments. This allows the assignment’s direction to be reversed:

# equivalent assignments
age <- 42
'Jack' -> name
person = c(age, name)
R

Another typical feature of R code is the pseudo-object notation following the pattern object.method():

# test if argument is a number
is.numeric(42)
R

The is.numeric function looks like a numeric() method, which belongs to an object named is. However, this is not the case. In R programming, the dot is a regular character. The function could be called is_numeric instead of is .numeric.

The concatenation function c() is used to create ubiquitous vectors in R programming:

people.ages <- c(42, 51, 69)
R

Applying the function to vectors will merge them into a coherent vector:

# yields 'c(1, 2, 3, 4)'
c(c(1, 2), c(3, 4))
R

Unlike most programming languages, indexing a vector’s elements starts at 1 in R. This takes some time to get used to, but it helps to avoid the dreaded off-by-one errors. The highest vector index corresponds to the vector’s length:

# create a vector of names
people <- c('Jack', 'Jim', 'John')
# access the first name
people[1] == 'Jack'
# access the last name
people[length(people)] == 'John'
R

Similar to Python, R programming also uses slicing. A slice can be used to index a vector’s subrange. This is based on sequences, which are natively supported in R. Let’s create a sequence of numbers and select a slice:

# create vector of numbers between 42 and 69
nums = seq(42, 69)
# equivalent assignment using sequence notation
nums = 42:69
# using a sequence, slice elements 3 through 7
sliced = nums[3:7]
R

How do control structures work in R programming?

Basic operations are defined for vectors in R programming. This means that loops are not required. Instead, an operation is performed on the entire vector, which modifies the individual elements. We square the first ten positive numbers without a loop:

nums <- seq(10)
squares <- nums ** 2
squares[3] == 9
R

The for loop in R does not work the same way as for loops in C, Java or JavaScript. There is no detour via a loop variable. Iteration is performed directly over the elements, like in Python:

people = c('Jim', 'Jack', 'John')
for (person in people) {
    print(paste('Here comes', person, sep = ' '))
}
R

The if-else branching in R exists as a basic control structure. However, this can be replaced by filter functions or the logical indexing of vectors. Let’s create a vector of ages and filter the data using two variables: over 18 and under 18. This can be done without a loop or branching:

# create 20 ages between 1 and 99
ages = as.integer(runif(20, 1, 99))
# filter adults
adults = ages[ages > 18]
# filter children
children = ages[ages < 18]
# make sure everyone is accounted for
length(adults) + length(children) == length(ages)
R

The same approach can be taken with control structures:

# create 20 ages between 1 and 99
ages = as.integer(runif(20, 1, 99))
# start with empty vectors
adults = c()
children = c()
# populate vectors
for (age in ages) {
    if (age > 18) {
        adults = c(adults, age)
    }
    else {
        children = c(children, age)
    }
}
R

How to get started with R programming

To get started with R programming, you just need a local R installation. There are installers available for all major operating systems. A standard R installation includes a GUI interpreter with REPL, integrated help and an editor. For efficient coding, we recommend using an established code editor. RStudio is a great alternative to the R environment.

Which projects is R suitable for?

R programming is used mainly in science and research, for example, in bioinformatics and machine learning. However, the language is suitable for all projects that use mathematical models or statistical modeling. R does not have an advantage when it comes to processing text. This is Python’s area of expertise.

Common calculations and visualizations in spreadsheets can be replaced with R code. Data and code are not mixed in the same cells, allowing for code to be written once and applied to multiple data sets. Furthermore, there is no danger of overwriting a cell’s formula when making manual changes.

R is considered the gold standard for scientific publications. The separation of code and data is what makes scientific reproducibility possible. The mature ecosystem of tools and packages allows efficient publication pipelines to be created. Evaluations and visualizations are automatically generated from code and data and then integrated into high-quality LaTeX or RMarkdown documents.

Tip

Buy webspace at an affordable price from IONOS. It’s the perfect foundation for your website.

Was this article helpful?
Page top