sort: Sort Strings¶
Description¶
The sort
method for objects of class character
(sort.character
) uses the locale-sensitive Unicode collation algorithm to arrange strings in a vector with regards to a chosen lexicographic order.
xtfrm2
and [DEPRECATED] xtfrm
generate an integer vector that sort in the same way as its input, and hence can be used in conjunction with order
or rank
.
Usage¶
xtfrm2(x, ...)
## Default S3 method:
xtfrm2(x, ...)
## S3 method for class 'character'
xtfrm2(
x,
...,
locale = NULL,
strength = 3L,
alternate_shifted = FALSE,
french = FALSE,
uppercase_first = NA,
case_level = FALSE,
normalisation = FALSE,
numeric = FALSE
)
xtfrm(x)
## Default S3 method:
xtfrm(x)
## S3 method for class 'character'
xtfrm(x)
## S3 method for class 'character'
sort(
x,
...,
decreasing = FALSE,
na.last = NA,
locale = NULL,
strength = 3L,
alternate_shifted = FALSE,
french = FALSE,
uppercase_first = NA,
case_level = FALSE,
normalisation = FALSE,
numeric = FALSE
)
Arguments¶
|
character vector whose elements are to be sorted |
|
further arguments passed to other methods |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
single logical value; if |
|
single logical value; if |
Details¶
What ‘xtfrm’ stands for the current author does not know, but would appreciate someone’s enlightening him.
Value¶
sort.character
returns a character vector, with only the names
attribute preserved. Note that the output vector may be shorter than the input one.
xtfrm2.character
and xtfrm.character
return an integer vector; most attributes are preserved.
Differences from Base R¶
Replacements for the default S3 methods sort
and xtfrm
for character vectors implemented with stri_sort
and stri_rank
.
Collation in different locales is difficult and non-portable across platforms [fixed here – using services provided by ICU]
Overloading
xtfrm.character
has no effect in R, because S3 method dispatch is done internally with hard-coded support for character arguments. Thus, we needed to replace the genericxtfrm
with the one that callsUseMethod
[fixed here]xtfrm
does not support customisation of the linear ordering relation it is based upon [fixed by introducing...
argument to the new generic,xtfrm2
]Neither
order
,rank
, norsort.list
is a generic, therefore they should have to be rewritten from scratch to allow the inclusion of our patches; interestingly,order
even callsxtfrm
, but only for classed objects [not fixed here – see Examples for a workaround]xtfrm
for objects of typecharacter
does not preserve the names attribute (but does so fornumeric
) [fixed here]sort
seems to preserve only the names attribute which makes sense ifna.last
isNA
, because the resulting vector might be shorter [not fixed here as it would break compatibility with other sorting methods]Note that
sort
by default removes missing values whatsoever, whereasorder
hasna.last=TRUE
[not fixed here as it would break compatibility with other sorting methods]
See Also¶
The official online manual of stringx at https://stringx.gagolewski.com/
Related function(s): strcoll
Examples¶
x <- c("a1", "a100", "a101", "a1000", "a10", "a10", "a11", "a99", "a10", "a1")
base::sort.default(x) # lexicographic sort
## [1] "a1" "a1" "a10" "a10" "a10" "a100" "a1000" "a101" "a11"
## [10] "a99"
sort(x, numeric=TRUE) # calls stringx:::sort.character
## [1] "a1" "a1" "a10" "a10" "a10" "a11" "a99" "a100" "a101"
## [10] "a1000"
xtfrm2(x, numeric=TRUE) # calls stringx:::xtfrm2.character
## [1] 1 8 9 10 3 3 6 7 3 1
rank(xtfrm2(x, numeric=TRUE), ties.method="average") # ranks with averaged ties
## [1] 1.5 8.0 9.0 10.0 4.0 4.0 6.0 7.0 4.0 1.5
order(xtfrm2(x, numeric=TRUE)) # ordering permutation
## [1] 1 10 5 6 9 7 8 2 3 4
x[order(xtfrm2(x, numeric=TRUE))] # equivalent to sort()
## [1] "a1" "a1" "a10" "a10" "a10" "a11" "a99" "a100" "a101"
## [10] "a1000"
# order a data frame w.r.t. decreasing ids and increasing vals
d <- data.frame(vals=round(runif(length(x)), 1), ids=x)
d[order(-xtfrm2(d[["ids"]], numeric=TRUE), d[["vals"]]), ]
## vals ids
## 4 0.9 a1000
## 3 0.4 a101
## 2 0.8 a100
## 8 0.9 a99
## 7 0.5 a11
## 6 0.0 a10
## 9 0.6 a10
## 5 0.9 a10
## 1 0.3 a1
## 10 0.5 a1