gsub: Replace Pattern Occurrences¶
Description¶
sub2
replaces the first pattern occurrence in each string with a given replacement string. gsub2
replaces all (i.e., ‘globally’) pattern matches.
Usage¶
sub2(x, pattern, replacement, ..., ignore_case = FALSE, fixed = FALSE)
gsub2(x, pattern, replacement, ..., ignore_case = FALSE, fixed = FALSE)
sub(
pattern,
replacement,
x,
...,
ignore.case = FALSE,
fixed = FALSE,
perl = FALSE,
useBytes = FALSE
)
gsub(
pattern,
replacement,
x,
...,
ignore.case = FALSE,
fixed = FALSE,
perl = FALSE,
useBytes = FALSE
)
Arguments¶
|
character vector with strings whose chunks are to be modified |
|
character vector of nonempty search patterns |
|
character vector with the corresponding replacement strings; in |
|
further arguments to |
|
single logical value; indicates whether matching should be case-insensitive |
|
single logical value; |
|
not used (with a warning if attempting to do so) [DEPRECATED] |
Details¶
Not to be confused with substr
.
These functions are fully vectorised with respect to x
, pattern
, and replacement
.
gsub2
uses vectorise_all=TRUE
because of the attribute preservation rules, stri_replace_all
should be called directly if different behaviour is needed.
The [DEPRECATED] sub
and [DEPRECATED] gsub
simply call sub2
and gsub2
which have a cleaned-up argument list. Additionally, if fixed=FALSE
, the back-references in replacement
strings are converted to these accepted by the ICU regex engine.
Value¶
Both functions return a character vector. They preserve the attributes of the longest inputs (unless they are dropped due to coercion).
Differences from Base R¶
Replacements for base sub
and gsub
implemented with stri_replace_first
and stri_replace_all
, respectively.
there are inconsistencies between the argument order and naming in
grepl
,strsplit
, andstartsWith
(amongst others); e.g., where the needle can precede the haystack, the use of the forward pipe operator,|>
, is less convenient [fixed here]base R implementation is not portable as it is based on the system PCRE or TRE library (e.g., some Unicode classes may not be available or matching thereof can depend on the current
LC_CTYPE
category [fixed here]not suitable for natural language processing [fixed here – use
fixed=NA
]two different regular expression libraries are used (and historically, ERE was used in place of TRE) [here, ICU Java-like regular expression engine is only available, hence the
perl
argument has no meaning]not vectorised w.r.t.
pattern
andreplacement
[fixed here]only 9 (unnamed) back-references can be referred to in the replacement strings [fixed in
sub2
andgsub2
]perl=TRUE
supports\U
,\L
, and\E
in the replacement strings [not available here]
See Also¶
The official online manual of stringx at https://stringx.gagolewski.com/
Related function(s): paste
, nchar
, grepl2
, gregexpr2
, gregextr2
strsplit
, gsubstr
trimws
for removing whitespaces (amongst others) from the start or end of strings
Examples¶
"change \U0001f602 me \U0001f603" |> gsub2("\\p{L}+", "O_O")
## [1] "O_O 😂 O_O 😃"
x <- c("mario", "Mario", "M\u00E1rio", "M\u00C1RIO", "Mar\u00EDa", "Rosario", NA)
sub2(x, "mario", "M\u00E1rio", fixed=NA, strength=1L)
## [1] "Mário" "Mário" "Mário" "Mário" "María" "Rosario" NA
sub2(x, "mario", "Mario", fixed=NA, strength=2L)
## [1] "Mario" "Mario" "Mário" "MÁRIO" "María" "Rosario" NA
x <- "abcdefghijklmnopqrstuvwxyz"
p <- "(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)"
base::sub(p, "\\1\\9", x)
## [1] "ainopqrstuvwxyz"
base::gsub(p, "\\1\\9", x)
## [1] "ainv"
base::gsub(p, "\\1\\9", x, perl=TRUE)
## [1] "ainv"
base::gsub(p, "\\1\\13", x)
## [1] "aa3nn3"
sub2(x, p, "$1$13")
## [1] "amnopqrstuvwxyz"
gsub2(x, p, "$1$13")
## [1] "amnz"