gregextr: Extract Pattern Occurrences¶
Description¶
regextr2
and gregextr2
extract, respectively, first and all (i.e., globally) occurrences of a pattern. Their replacement versions substitute the matching substrings with new content.
Usage¶
regextr2(
x,
pattern,
...,
ignore_case = FALSE,
fixed = FALSE,
capture_groups = FALSE
)
gregextr2(
x,
pattern,
...,
ignore_case = FALSE,
fixed = FALSE,
capture_groups = FALSE
)
regextr2(x, pattern, ..., ignore_case = FALSE, fixed = FALSE) <- value
gregextr2(x, pattern, ..., ignore_case = FALSE, fixed = FALSE) <- value
Arguments¶
|
character vector whose elements are to be examined |
|
character vector of nonempty search patterns |
|
further arguments to |
|
single logical value; indicates whether matching should be case-insensitive |
|
single logical value; |
|
single logical value; whether matches individual capture groups should be extracted separately |
|
character vector (for |
Details¶
Convenience functions based on gregexpr2
and gsubstrl
(amongst others). Provided as pipe operator-friendly alternatives to [DEPRECATED] regmatches
and [DEPRECATED] strcapture
.
They are fully vectorised with respect to x
, pattern
, and value
.
Note that, unlike in gsub2
, each substituted chunk can be replaced with different content. However, references to matches to capture groups cannot be made.
Value¶
capture_groups
is FALSE
, regextr2
returns a character vector and gregextr2
gives a list of character vectors.
Otherwise, regextr2
returns a list of character vectors, giving the whole match as well as matches to the individual capture groups. In gregextr2
, this will be a matrix with as many columns as there are matches.
Missing values in the inputs are propagated consistently. In regextr2
, a no-match is always denoted with NA
(or series thereof). In gregextr2
, the corresponding result is empty (unless we mean a no-match to an optional capture group within a matching substring). Note that this function distinguishes between a missing input and a no-match.
Their replacement versions return a character vector.
These functions preserve the attributes of the longest inputs (unless they are dropped due to coercion).
See Also¶
The official online manual of stringx at https://stringx.gagolewski.com/
Related function(s): paste
, nchar
, strsplit
, gsub2
grepl2
, gregexpr2
, gsubstrl
,
Examples¶
x <- c(aca1="acacaca", aca2="gaca", noaca="actgggca", na=NA)
regextr2(x, "(?<x>a)(?<y>cac?)")
## aca1 aca2 noaca na
## "acac" "aca" NA NA
gregextr2(x, "(?<x>a)(?<y>cac?)")
## $aca1
## [1] "acac" "aca"
##
## $aca2
## [1] "aca"
##
## $noaca
## character(0)
##
## $na
## [1] NA
regextr2(x, "(?<x>a)(?<y>cac?)", capture_groups=TRUE)
## $aca1
## x y
## "acac" "a" "cac"
##
## $aca2
## x y
## "aca" "a" "ca"
##
## $noaca
## x y
## NA NA NA
##
## $na
## x y
## NA NA NA
gregextr2(x, "(?<x>a)(?<y>cac?)", capture_groups=TRUE)
## $aca1
## [,1] [,2]
## "acac" "aca"
## x "a" "a"
## y "cac" "ca"
##
## $aca2
## [,1]
## "aca"
## x "a"
## y "ca"
##
## $noaca
##
##
## x
## y
##
## $na
## [,1]
## NA
## x NA
## y NA
# substitution - note the different replacement strings:
`gregextr2<-`(x, "(?<x>a)(?<y>cac?)", value=list(c("!", "?"), "#"))
## aca1 aca2 noaca na
## "!?" "g#" "actgggca" NA
# references to capture groups can only be used in gsub and sub:
gsub2(x, "(?<x>a)(?<y>cac?)", "{$1}{$2}")
## aca1 aca2 noaca na
## "{a}{cac}{a}{ca}" "g{a}{ca}" "actgggca" NA
regextr2(x, "(?<x>a)(?<y>cac?)") <- "\U0001D554\U0001F4A9"
print(x) # x was modified 'in-place'
## aca1 aca2 noaca na
## "𝕔💩aca" "g𝕔💩" "actgggca" NA