How to create and use strings in R
Strings are a fundamental data structure in R. They are used to display sequences of characters and individual letters. In contrast to other programming languages, R does not have a data type called ‘string’. Instead, this R data type is referred to as ‘character’.
What are R strings?
Strings are a standard in programming languages and a data structure that all seasoned programmers are familiar with. If you are just getting started with learning how to code, it’s important for you to understand what a string is.
Strings are essentially nothing more than a sequence of characters. Strings are commonly used to store and process non-numeric data in programs. Similar to other programming languages, strings are also enclosed in single or double quotation marks when writing code in R.
How to create a string in R
You can create a string in R with just one line of code. Both single quotation marks and double quotation marks can be used to create strings, so the choice is up to you:
string1 <- "Hello world!"
# String with single quotation marks
string2 <- 'Hello world!'
RHow to use R string functions and operations
R provides programmers with a set of basic functions to make working with strings efficient. These can be used to perform various operations both on strings and together with strings. We’ve compiled a list of the most important R string operations here:
- substr(): Extracts a portion of a string
- paste(): Concatenates (joins) strings
- tolower() / toupper(): Converts all of the letters in a string to lowercase letters or uppercase letters
- strsplit(): Splits a string at a specified point
- trimws(): Removes blank spaces at the beginning and end of a string
- gsub(): Replaces patterns in a string with other characters
- nchar(): Calculates the length of a string
If you have already worked with other programming languages, you’ve probably already encountered functions like the ones above. Strings in Python, for example, can also be manipulated with operations in Python that are equivalent to the ones above.
substr()
You can use the substr()
function to extract substrings from your R strings. To do this, pass your string to the function as the first parameter. For the second and third parameters, specify the start and end indices of the substring you want to extract. Remember that, unlike many other programming languages, R strings are indexed starting from 1 and not from 0.
string <- "Hello World"
print(substr(string, start=7, stop=11))
RThe example above outputs World
.
paste()
The function paste()
is used in R to join two or more strings together. This is known as concatenation. Keep in mind that the +
symbol cannot be used to concatenate strings. The R operator +
is only defined for numerical data types.
string <- "Hello"
string2 <- "World"
print(paste(string+ string2))
RWhen paste()
is called, string and string 2 are concatenated, resulting in the output: Hello World
.
tolower() / toupper()
With tolower()
und toupper()
, you can change all of the letters in your string to either uppercase or lowercase. For both R string functions, you’ll need to use the string that you want to change as the parameter. The function will then provide you with a new string where all letters are written either in lowercase or uppercase.
string <- "Hello World"
print(tolower(string))
print(toupper(string))
RThe code above will display hello world
and HELLO WORLD
on your screen. These two R string functions are especially useful for managing data that needs to be processed in a case-sensitive manner.
strsplit()
The strsplit()
function in R may seem somewhat familiar to experienced programmers. For example, Python also has a function named split()
. For the R string function strsplit()
, your parameters will be the string that you want to separate into substrings and a delimiter, which will determine where the string should be split. When the function is called, it returns a list of the substrings that have been created, even if there is only one.
string <- "Hello World"
print(strsplit(string, " "))
RThe code produces the following output:
[[1]]
[1] "Hello" "World"
The result is a list with two strings: "Hello"
and "World"
. In this example, the blank space between the two words was used as the delimiter.
trimws()
Using the trimws()
function, you can remove unwanted whitespace from the beginning and end of your R string. This can be especially helpful when processing input from users who may have unintentionally entered blank spaces when filling out a form.
string <- " Hello World "
print(trimws(string))
RThe code above will display Hello World
without any blank spaces at the beginning or end of the string.
gsub()
Another string operation in R is the gsub()
function. In this function, the first parameter is the substring that you want to replace. For the second parameter, use the string that you want to replace the substring in the first parameter with. The third parameter specifies which string the replacement should be applied to.
string <- "Hello World"
print(gsub("World", "User", string))
RInstead of saying hello to the entire world, the code outputs a text that only addresses a single user: Hello User
.
nchar()
One of the most important built-in functions for strings is nchar()
, which tells you what the length of an R string is.
string <- "Hello World"
print(nchar(string))
RThe R command length()
may be a source of confusion at first. The length()
function in R, however, is used to determine the number of elements in an object and not the number of characters in an R string. To determine R string length, make sure to use nchar()
.
Get your programming projects online with webspace hosting from IONOS. Webspace hosting comes with a free domain for the first year and a 30-day money-back guarantee.
What are control characters and escape sequences?
You can use control characters to control the text layout of your R strings. Control characters are predefined escape sequences that can be used to format text outputs. For example, with control characters, you can implement line breaks or tabs.
Special characters such as quotation marks, which would normally be interpreted as the beginning or end of a string in R syntax, can also be displayed in strings using an escape sequence. Escape sequences begin with a backslash in R. Here are the most important ones:
- \n: Newline/line break
- \t: Tabulator
- \: Backslash
- ": Double quotation marks
- ': Single quotation marks