gregextr: Extract Pattern Occurrences¶
Description¶
regextr2 and gregextr2 extract, respectively, first and all (i.e., globally) occurrences of a pattern. Their replacement versions substitute the matching substrings with new content.
Usage¶
regextr2(
x,
pattern,
...,
ignore_case = FALSE,
fixed = FALSE,
capture_groups = FALSE
)
gregextr2(
x,
pattern,
...,
ignore_case = FALSE,
fixed = FALSE,
capture_groups = FALSE
)
regextr2(x, pattern, ..., ignore_case = FALSE, fixed = FALSE) <- value
gregextr2(x, pattern, ..., ignore_case = FALSE, fixed = FALSE) <- value
Arguments¶
|
character vector whose elements are to be examined |
|
character vector of nonempty search patterns |
|
further arguments to |
|
single logical value; indicates whether matching should be case-insensitive |
|
single logical value; |
|
single logical value; whether matches individual capture groups should be extracted separately |
|
character vector (for |
Details¶
Convenience functions based on gregexpr2 and gsubstrl (amongst others). Provided as pipe operator-friendly alternatives to [DEPRECATED] regmatches and [DEPRECATED] strcapture.
They are fully vectorised with respect to x, pattern, and value.
Note that, unlike in gsub2, each substituted chunk can be replaced with different content. However, references to matches to capture groups cannot be made.
Value¶
capture_groups is FALSE, regextr2 returns a character vector and gregextr2 gives a list of character vectors.
Otherwise, regextr2 returns a list of character vectors, giving the whole match as well as matches to the individual capture groups. In gregextr2, this will be a matrix with as many columns as there are matches.
Missing values in the inputs are propagated consistently. In regextr2, a no-match is always denoted with NA (or series thereof). In gregextr2, the corresponding result is empty (unless we mean a no-match to an optional capture group within a matching substring). Note that this function distinguishes between a missing input and a no-match.
Their replacement versions return a character vector.
These functions preserve the attributes of the longest inputs (unless they are dropped due to coercion).
See Also¶
The official online manual of stringx at https://stringx.gagolewski.com/
Related function(s): paste, nchar, strsplit, gsub2 grepl2, gregexpr2, gsubstrl,
Examples¶
x <- c(aca1="acacaca", aca2="gaca", noaca="actgggca", na=NA)
regextr2(x, "(?<x>a)(?<y>cac?)")
## aca1 aca2 noaca na
## "acac" "aca" NA NA
gregextr2(x, "(?<x>a)(?<y>cac?)")
## $aca1
## [1] "acac" "aca"
##
## $aca2
## [1] "aca"
##
## $noaca
## character(0)
##
## $na
## [1] NA
regextr2(x, "(?<x>a)(?<y>cac?)", capture_groups=TRUE)
## $aca1
## x y
## "acac" "a" "cac"
##
## $aca2
## x y
## "aca" "a" "ca"
##
## $noaca
## x y
## NA NA NA
##
## $na
## x y
## NA NA NA
gregextr2(x, "(?<x>a)(?<y>cac?)", capture_groups=TRUE)
## $aca1
## [,1] [,2]
## "acac" "aca"
## x "a" "a"
## y "cac" "ca"
##
## $aca2
## [,1]
## "aca"
## x "a"
## y "ca"
##
## $noaca
##
##
## x
## y
##
## $na
## [,1]
## NA
## x NA
## y NA
# substitution - note the different replacement strings:
`gregextr2<-`(x, "(?<x>a)(?<y>cac?)", value=list(c("!", "?"), "#"))
## aca1 aca2 noaca na
## "!?" "g#" "actgggca" NA
# references to capture groups can only be used in gsub and sub:
gsub2(x, "(?<x>a)(?<y>cac?)", "{$1}{$2}")
## aca1 aca2 noaca na
## "{a}{cac}{a}{ca}" "g{a}{ca}" "actgggca" NA
regextr2(x, "(?<x>a)(?<y>cac?)") <- "\U0001D554\U0001F4A9"
print(x) # x was modified 'in-place'
## aca1 aca2 noaca na
## "𝕔💩aca" "g𝕔💩" "actgggca" NA