chartr: Transliteration and Other Text Transforms#

Description#

These functions can be used to translate characters, including case mapping and folding, script to script conversion, and Unicode normalisation.

Usage#

strtrans(x, transform)

chartr2(x, pattern, replacement)

chartr(old, new, x)

tolower(x, locale = NULL)

toupper(x, locale = NULL)

casefold(x, upper = NA)

Arguments#

x

character vector (or an object coercible to)

transform

single string with ICU general transform specifier, see stri_trans_list

pattern, old

single string

replacement, new

single string, preferably of the same length as old

locale

NULL or "" for the default locale (see stri_locale_get) or a single string with a locale identifier, see stri_locale_list

upper

single logical value; switches between case folding (the default, NA), lower-, and upper-case

Details#

tolower and toupper perform case mapping. chartr2 (and [DEPRECATED] chartr) translate individual code points. casefold commits case folding. The new function strtrans applies general ICU transforms, see stri_trans_general.

Value#

These functions return a character vector (in UTF-8). They preserve most attributes of x. Note that their base R counterparts drop all the attributes if not fed with character vectors.

Differences from Base R#

Unlike their base R counterparts, the new tolower and toupper are locale-sensitive; see stri_trans_tolower.

The base casefold simply dispatches to tolower or toupper ‘for compatibility with S-PLUS’ (which was only crucial long time ago). The version implemented here, by default, performs the true case folding, whose purpose is to make two pieces of text that differ only in case identical, see stri_trans_casefold.

chartr2 and [DEPRECATED] chartr are wrappers for stri_trans_char. Contrary to the base chartr, they always generate a warning when old and new are of different lengths. chartr2 has argument order and naming consistent with gsub.

Author(s)#

Marek Gagolewski

See Also#

The official online manual of stringx at https://stringx.gagolewski.com/

Examples#

strtrans(strcat(letters_bf), "Any-NFKD; Any-Upper")
## [1] "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
strtrans(strcat(letters_bb[1:6]), "Any-Hex/C")
## [1] "\\U0001D552\\U0001D553\\U0001D554\\U0001D555\\U0001D556\\U0001D557"
strtrans(strcat(letters_greek), "Greek-Latin")
## [1] "abgdezēthiklmn'xoprstyphchpsō"
toupper(letters_greek)
##  [1] "Α" "Β" "Γ" "Δ" "Ε" "Ζ" "Η" "Θ" "Ι" "Κ" "Λ" "Μ" "Ν" "Ξ" "Ο" "Π" "Ρ" "Σ" "Τ"
## [20] "Υ" "Φ" "Χ" "Ψ" "Ω"
tolower(LETTERS_GREEK)
##  [1] "α" "β" "γ" "δ" "ε" "ζ" "η" "θ" "ι" "κ" "λ" "μ" "ν" "ξ" "ο" "π" "ρ" "σ" "τ"
## [20] "υ" "φ" "χ" "ψ" "ω"
base::toupper("gro\u00DF")
## [1] "GROß"
stringx::toupper("gro\u00DF")
## [1] "GROSS"
casefold("gro\u00DF")
## [1] "gross"
x <- as.matrix(c(a="\u00DFpam ba\U0001D554on spam", b=NA))
chartr("\u00DF\U0001D554aba", "SCXBA", x)
##   [,1]             
## a "SpAm BACon spAm"
## b NA
toupper('i', locale='en_US')
## [1] "I"
toupper('i', locale='tr_TR')
## [1] "İ"