substr: Extract or Replace Substrings¶
Description¶
substr
and substrl
extract contiguous parts of given character strings. The former operates based on start and end positions while the latter is fed with substring lengths.
Their replacement versions allow for substituting parts of strings with new content.
gsubstr
and gsubstrl
allow for extracting or replacing multiple chunks from each string.
Usage¶
substr(x, start = 1L, stop = -1L)
substrl(
x,
start = 1L,
length = attr(start, "match.length"),
ignore_negative_length = FALSE
)
substr(x, start = 1L, stop = -1L) <- value
substrl(x, start = 1L, length = attr(start, "match.length")) <- value
gsubstr(x, start = list(1L), stop = list(-1L))
gsubstrl(
x,
start = list(1L),
length = lapply(start, attr, "match.length"),
ignore_negative_length = TRUE
)
gsubstr(x, start = list(1L), stop = list(-1L)) <- value
gsubstrl(x, start = list(1L), length = lapply(start, attr, "match.length")) <- value
substring(text, first = 1L, last = -1L)
substring(text, first = 1L, last = -1L) <- value
Arguments¶
|
character vector whose parts are to be extracted/replaced |
|
numeric vector (for |
|
numeric vector (for |
|
numeric vector (for |
|
single logical value; whether negative lengths should be ignored or yield missing values |
|
character vector (for |
Details¶
Not to be confused with sub
.
substring
is a [DEPRECATED] synonym for substr
.
Note that these functions can break some meaningful Unicode code point sequences, e.g., when inputs are not normalised. For extracting initial parts of strings based on character width, see strtrim
.
Note that gsubstr
(and related functions) expect start
, stop
, length
, and value
to be lists. Non-list arguments will be converted by calling as.list
. This is different from the default policy applied by stri_sub_all
, which calls list
.
Note that substrl
and gsubstrl
are interoperable with regexpr2
and gregexpr2
, respectively, and hence can be considered as substituted for the [DEPRECATED] regmatches
(which is more specialised).
Value¶
substr
and substrl
return a character vector (in UTF-8). gsubstr
and gsubstrl
return a list of character vectors.
Their replacement versions modify x
‘in-place’ (see Examples).
The attributes are copied from the longest arguments (similar to binary operators).
Differences from Base R¶
Replacements for and enhancements of base substr
and substring
implemented with stri_sub
and stri_sub_all
,
substring
is “for compatibility with S”, but this should no longer matter [here,substring
is equivalent tosubstr
; in a future version, using the former may result in a warning]substr
is not vectorised with respect to all the arguments (andsubstring
is not fully vectorised wrtvalue
) [fixed here]not all attributes are taken from the longest of the inputs [fixed here]
partial recycling with no warning [fixed here]
the replacement must be of the same length as the chunk being substituted [fixed here]
negative indexes are silently treated as 1 [changed here: negative indexes count from the end of the string]
replacement of different length than the extracted substring never changes the length of the string [changed here – output length is input length minus length of extracted plus length of replacement]
regexpr
(amongst others) return start positions and lengths of matches, but basesubstr
only uses start and end [fixed by introducingsubstrl
]there is no function to extract or replace multiple chunks in each string (other than
regmatches
that works on outputs generated bygregexpr
et al.) [fixed by introducinggsubstrl
]
See Also¶
The official online manual of stringx at https://stringx.gagolewski.com/
Related function(s): strtrim
, nchar
, startsWith
, endsWith
, gregexpr
Examples¶
x <- "spam, spam, bacon, and spam"
base::substr(x, c(1, 13), c(4, 17))
## [1] "spam"
base::substring(x, c(1, 13), c(4, 17))
## [1] "spam" "bacon"
substr(x, c(1, 13), c(4, 17))
## [1] "spam" "bacon"
substrl(x, c(1, 13), c(4, 5))
## [1] "spam" "bacon"
# replacement function used as an ordinary one - return a copy of x:
base::`substr<-`(x, 1, 4, value="jam")
## [1] "jamm, spam, bacon, and spam"
`substr<-`(x, 1, 4, value="jam")
## [1] "jam, spam, bacon, and spam"
base::`substr<-`(x, 1, 4, value="porridge")
## [1] "porr, spam, bacon, and spam"
`substr<-`(x, 1, 4, value="porridge")
## [1] "porridge, spam, bacon, and spam"
# interoperability with gregexpr2:
p <- "[\\w&&[^a]][\\w&&[^n]][\\w&&[^d]]\\w+" # regex: all words but 'and'
gsubstrl(x, gregexpr2(x, p))
## [[1]]
## [1] "spam" "spam" "bacon" "spam"
`gsubstrl<-`(x, gregexpr2(x, p), value=list(c("a", "b", "c", "d")))
## [1] "a, b, c, and d"
# replacement function modifying x in-place:
substr(x, 1, 4) <- "eggs"
substr(x, 1, 0) <- "porridge, " # prepend (start<stop)
substr(x, nchar(x)+1) <- " every day" # append (start<stop)
print(x)
## [1] "porridge, eggs, spam, bacon, and spam every day"