What are R’s gsub() and sub() functions?
R’s gsub()
and sub()
functions help with text manipulation and are easy to use and combine with other functions. They can be seamlessly integrated into data analyses and statistical calculations.
What do gsub()
and sub()
do in R?
R’s gsub()
and sub()
functions can replace patterns in strings. sub()
, short for ‘substitute’, finds the first instance of a pattern in a string and replaces it with another expression. This function only makes a single replacement. gsub()
stands for ‘global substitute’ and finds all the instances of a pattern in a string and replaces each of them with another expression.
Both functions have broad applications in data cleaning and transformation. Their main purpose is to delete unwanted patterns and adapt text. They are especially important for text manipulation in statistical analyses and machine learning applications in R. For example, the functions can be used to extract certain patterns or transform data into the form necessary for an analysis.
What is the syntax of R’s gsub()
and sub()
?
The syntax of R’s gsub()
and sub()
functions is pretty similar. The two methods both take the following parameters:
- pattern: The pattern you’re looking for, in the form of a string or regular expression
- replacement: The expression the pattern should be replaced with
- x: The vector or data frame to find and replace in
The structure of R’s gsub()
gsub(pattern, replacement, x)
RThe structure of R’s sub()
sub(pattern, replacement, x)
RExamples for gsub()
in R
The distinguishing feature of R’s gsub()
is that it finds and replaces all instances of a pattern.
Deleting spaces
You can use gsub()
to remove extra spaces from strings.
sentence <- " Data science is powerful. "
clean_sentence <- gsub("\\s+", " ", sentence)
cat(clean_sentence)
RThis produces the output:
"Data science is powerful."
RThe regular expression \\s+
corresponds to one or more consecutive spaces. When used in the above example, it removes the empty spaces from the sentence.
Replacing phone numbers
R’s gsub()
is also useful for anonymising or deleting private data such as phone numbers.
text <- "Contact us at 123-456-7890 for more information."
modified_text <- gsub("\\d{3}-\\d{3}-\\d{4}", "redacted phone number", text)
cat(modified_text)
ROutput:
"Contact us at redacted phone number for more information."
RIn the above example, we extract phone numbers with the regular expression \\d{3}-\\d{3}-\\d{4}
and replace them with the string "redacted phone number"
.
Examples for sub()
in R
If you just want to replace the first instance of a pattern, use R’s sub()
function.
Replacing the first instance of a word
Let’s say we have a string with a repeated word and want to replace the first instance of that word.
text <- "Data science is powerful. Data analysis is fun."
result_sub <- sub("Data", "Information", text)
cat(result_sub)
RThe output looks as follows:
"Information science is powerful. Data analysis is fun."
RR’s sub()
searches the text for the string "Data"
and replaces the first instance it finds with "Information"
.
Replacing numbers
We can also replace numbers with sub()
.
numeric_text <- "The cost is £1000. Please pay by 01/02/2024."
result <- sub("\\d+", "2000", numeric_text)
cat(result)
ROutput:
"The cost is £2000. Please pay by 01/02/2024."
RThe regular expression \\d+
corresponds to one or more digits. sub()
just replaces the first group of digits in the text.
Read about other R functions like R substring and R rbind in our Digital Guide.
- 99.9% uptime and super-fast loading
- Advanced security features
- Domain and email included