strcoll: Compare Strings#

Description#

These functions provide means to compare strings in any locale using the Unicode collation algorithm.

Usage#

strcoll(
  e1,
  e2,
  locale = NULL,
  strength = 3L,
  alternate_shifted = FALSE,
  french = FALSE,
  uppercase_first = NA,
  case_level = FALSE,
  normalisation = FALSE,
  numeric = FALSE
)

e1 %x<% e2

e1 %x<=% e2

e1 %x==% e2

e1 %x!=% e2

e1 %x>% e2

e1 %x>=% e2

Arguments#

e1, e2

character vector whose corresponding elements are to be compared

locale

NULL or "" for the default locale (see stri_locale_get) or a single string with a locale identifier, see stri_locale_list

strength

see stri_opts_collator

alternate_shifted

see stri_opts_collator

french

see stri_opts_collator

uppercase_first

see stri_opts_collator

case_level

see stri_opts_collator

normalisation

see stri_opts_collator

numeric

see stri_opts_collator

Details#

These functions are fully vectorised with respect to both arguments.

For a locale-insensitive behaviour like that of strcmp from the standard C library, call strcoll(e1, e2, locale="C", strength=4L, normalisation=FALSE). However, some normalisation will still be performed.

Value#

strcoll returns an integer vector representing the comparison results: if a string in e1 is smaller than the corresponding string in e2, the corresponding result will be equal to -1, and 0 if they are canonically equivalent, as well as 1 if the former is greater than the latter.

The binary operators call strcoll with default arguments and return logical vectors.

Differences from Base R#

Replacements for base Comparison operators implemented with stri_cmp.

  • collation in different locales is difficult and non-portable across platforms [fixed here – using services provided by ICU]

  • overloading `<.character` has no effect in R, because S3 method dispatch is done internally with hard-coded support for character arguments. We could have replaced the generic `<` with the one that calls UseMethod, but it feels like a too intrusive solution [fixed by introducing the `%x<%` operator]

Author(s)#

Marek Gagolewski

See Also#

The official online manual of stringx at https://stringx.gagolewski.com/

Related function(s): xtfrm

Examples#

# lexicographic vs. numeric sort
strcoll("100", c("1", "10", "11", "99", "100", "101", "1000"))
## [1]  1  1 -1 -1  0 -1 -1
strcoll("100", c("1", "10", "11", "99", "100", "101", "1000"), numeric=TRUE)
## [1]  1  1  1  1  0 -1 -1
strcoll("hladn\u00FD", "chladn\u00FD", locale="sk_SK")
## [1] -1