What are R’s gsub() and sub() functions?

R’s gsub() and sub() functions help with text manipulation and are easy to use and combine with other functions. They can be seamlessly integrated into data analyses and statistical calculations.

What do gsub() and sub() do in R?

R’s gsub() and sub() functions can replace patterns in strings. sub(), short for ‘substitute’, finds the first instance of a pattern in a string and replaces it with another expression. This function only makes a single replacement. gsub()stands for ‘global substitute’ and finds all the instances of a pattern in a string and replaces each of them with another expression.

Both functions have broad applications in data cleaning and transformation. Their main purpose is to delete unwanted patterns and adapt text. They are especially important for text manipulation in statistical analyses and machine learning applications in R. For example, the functions can be used to extract certain patterns or transform data into the form necessary for an analysis.

What is the syntax of R’s gsub() and sub()?

The syntax of R’s gsub() and sub() functions is pretty similar. The two methods both take the following parameters:

  • pattern: The pattern you’re looking for, in the form of a string or regular expression
  • replacement: The expression the pattern should be replaced with
  • x: The vector or data frame to find and replace in

The structure of R’s gsub()

gsub(pattern, replacement, x)
R

The structure of R’s sub()

sub(pattern, replacement, x)
R

Examples for gsub()in R

The distinguishing feature of R’s gsub() is that it finds and replaces all instances of a pattern.

Deleting spaces

You can use gsub() to remove extra spaces from strings.

sentence <- "  Data science  is  powerful.  "
clean_sentence <- gsub("\\s+", " ", sentence)
cat(clean_sentence)
R

This produces the output:

"Data science is powerful."
R

The regular expression \\s+ corresponds to one or more consecutive spaces. When used in the above example, it removes the empty spaces from the sentence.

Replacing phone numbers

R’s gsub() is also useful for anonymising or deleting private data such as phone numbers.

text <- "Contact us at 123-456-7890 for more information."
modified_text <- gsub("\\d{3}-\\d{3}-\\d{4}", "redacted phone number", text)
cat(modified_text)
R

Output:

"Contact us at redacted phone number for more information."
R

In the above example, we extract phone numbers with the regular expression \\d{3}-\\d{3}-\\d{4} and replace them with the string "redacted phone number".

Examples for sub()in R

If you just want to replace the first instance of a pattern, use R’s sub() function.

Replacing the first instance of a word

Let’s say we have a string with a repeated word and want to replace the first instance of that word.

text <- "Data science is powerful. Data analysis is fun."
result_sub <- sub("Data", "Information", text)
cat(result_sub)
R

The output looks as follows:

"Information science is powerful. Data analysis is fun."
R

R’s sub() searches the text for the string "Data" and replaces the first instance it finds with "Information".

Replacing numbers

We can also replace numbers with sub().

numeric_text <- "The cost is £1000. Please pay by 01/02/2024."
result <- sub("\\d+", "2000", numeric_text)
cat(result)
R

Output:

"The cost is £2000. Please pay by 01/02/2024."
R

The regular expression \\d+ corresponds to one or more digits. sub() just replaces the first group of digits in the text.

Tip

Read about other R functions like R substring and R rbind in our Digital Guide.

Web Hosting
Secure, reliable hosting for your website
  • 99.9% uptime and super-fast loading
  • Advanced security features
  • Domain and email included
Was this article helpful?
Page top