What is the substring() function in R?
R’s substring()
function is helpful for preparing data for analyses. It can be used, for example, to convert text data into better structured formats.
What is R’s substring()
function used for?
R’s substring()
is a built-in function that selects a part of an existing string. It allows for a precise definition of the starting and ending indices so that you can isolate the part of the string you need. The function has a variety of uses, from data cleaning to extracting specific information from unstructured text data. You can use it, for example, to extract postcodes from addresses or dates from time stamps.
substring()
is flexible in situations that require fine-grained control of the position and length of selected substrings. The method is frequently used in data analyses and for preparing text data for further processing.
What is the syntax of the substring()
function in R?
substring()
returns the part of the string that has been extracted and takes the following parameters:
substring(x, first, last)
R-
x
: The string that the substring will be extracted from -
first
: The starting index (the first character) of the substring you want to extract -
last
: The ending index (the last character) of the substring you want to extract
Let’s look at an example.
original_string <- "data analysis"
result <- substring(original_string, 1, 4)
print(result)
RIn this example, we select a substring from index 1 to index 4 of the string "data analysis"
and save it in the variable result
. The output is "data"
.
What are some practical uses of R’s substring()
?
When processing data sets, you often have to select, manipulate or extract certain parts of strings. You can use the substring()
function in R to do this in different ways.
Extracting characters with substring()
You can save indices in variables and then enter them as arguments in substring()
.
# Original string
original_string <- "Data Science"
# Indices for extraction
start_index <- 6
end_index <- 12
# Using substring() for extraction
extracted_substring <- substring(original_string, start_index, end_index)
print(extracted_substring)
# Output: Science
RThis example shows what substring()
does. A substring from position 6 to 12 was selected from the original string "Data Science"
. We defined the variables start_index
as the starting point and end_index
as the ending point. The output shows the extracted substring, in this case "Science"
. The ending index’s value of 12 is inclusive, meaning that the character from position 12 is included in the substring.
Manipulating strings with the substring()
function in R
First we’ll create a data frame df
that contains IDs, ages and occupations. Then we’ll use the substring()
function to insert a space in the second position of each string in the ID column.
# Creating a sample data frame
df <- data.frame(
ID = c("01235", "02345", "04531"),
Age = c(25, 30, 22),
Occupation = c("Engineer", "Doctor", "Teacher")
)
# Inserting a space in the second position in the "ID" column
df$ID <- paste(substring(df$ID, 1, 1), " ", substring(df$ID, 2))
# Displaying the modified data frame
print("Modified Data Frame:")
print(df)
RIn this example, substring()
extracts the first digit of every number (substring(df$ID, 1, 1
) and the rest of the number sequence starting from the second position (substring(df$ID, 2)
). The space is then inserted between these two substrings using R paste. The result appears in the ID column of the data frame.
The output looks as follows:
Modified Data Frame:
ID Age Occupation
1 0 1235 25 Engineer
2 0 2345 30 Doctor
3 0 4531 22 Teacher
RIf you want to learn more about working with strings in R, check out our R gsub() and sub() tutorial in our Digital Guide.
- 99.9% uptime and super-fast loading
- Advanced security features
- Domain and email included