|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.google.javascript.jscomp.regex.CaseCanonicalize
public final class CaseCanonicalize
Implements the EcmaScript 5 Canonicalize operation used to specify how case-insensitive regular expressions match.
From section 15.10.2.9,
The abstract operation Canonicalize takes a character parameter ch and performs the following steps:
- If IgnoreCase is false, return ch.
- Let u be ch converted to upper case as if by calling the standard built-in method
String.prototype.toUpperCase
on the one-character String ch.- If u does not consist of a single character, return ch.
- Let cu be u's character.
- If ch's code unit value is greater than or equal to decimal 128 and cu's code unit value is less than decimal 128, then return ch.
- Return cu.
Field Summary static com.google.javascript.jscomp.regex.CharRanges
CASE_SENSITIVE
Set of code units that are case-insensitively equivalent to some other code unit according to the EcmaScript Canonicalize operation described in section 15.10.2.8.
Method Summary static char
caseCanonicalize(char ch)
Returns the case canonical version of the given code-unit.static String
caseCanonicalize(String s)
Returns the case canonical version of the given string.static com.google.javascript.jscomp.regex.CharRanges
expandToAllMatched(com.google.javascript.jscomp.regex.CharRanges ranges)
Given a character range that may include case sensitive code-units, such as[0-9B-M]
, returns the character range that includes all the code-units in the input and those that are case-insensitively equivalent to a code-unit in the input.static com.google.javascript.jscomp.regex.CharRanges
reduceToMinimum(com.google.javascript.jscomp.regex.CharRanges ranges)
Given a character range that may include case sensitive code-units, such as[0-9B-M]
, returns the character range that includes the minimal set of code units such that for every code unit in the input there is a case-sensitively equivalent canonical code unit in the output.
Methods inherited from class java.lang.Object clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Field Detail CASE_SENSITIVE
public static final com.google.javascript.jscomp.regex.CharRanges CASE_SENSITIVE
- Set of code units that are case-insensitively equivalent to some other code unit according to the EcmaScript Canonicalize operation described in section 15.10.2.8. The case sensitive characters are the ones that canonicalize to a character other than themselves or have a character that canonicalizes to them. Canonicalize is based on the definition of
String.prototype.toUpperCase
which is itself based on Unicode 3.0.0 as specified at UnicodeData-3.0.0 and SpecialCasings-2.txt .This table was generated by running the below on Chrome:
for (var cc = 0; cc < 0x10000; ++cc) { var ch = String.fromCharCode(cc); var u = ch.toUpperCase(); if (ch != u && u.length === 1) { var cu = u.charCodeAt(0); if (cc <= 128 || u.charCodeAt(0) > 128) { print('0x' + cc.toString(16) + ', 0x' + cu.toString(16) + ','); } } }
Method Detail caseCanonicalize
public static String caseCanonicalize(String s)
- Returns the case canonical version of the given string.
caseCanonicalize
public static char caseCanonicalize(char ch)
- Returns the case canonical version of the given code-unit. EcmaScript 5 explicitly says that code-units are to be treated as their code-point equivalent, even surrogates.
expandToAllMatched
public static com.google.javascript.jscomp.regex.CharRanges expandToAllMatched(com.google.javascript.jscomp.regex.CharRanges ranges)
- Given a character range that may include case sensitive code-units, such as
[0-9B-M]
, returns the character range that includes all the code-units in the input and those that are case-insensitively equivalent to a code-unit in the input.
reduceToMinimum
public static com.google.javascript.jscomp.regex.CharRanges reduceToMinimum(com.google.javascript.jscomp.regex.CharRanges ranges)
- Given a character range that may include case sensitive code-units, such as
[0-9B-M]
, returns the character range that includes the minimal set of code units such that for every code unit in the input there is a case-sensitively equivalent canonical code unit in the output.
Overview Package Class Tree Deprecated Index Help PREV CLASS NEXT CLASS FRAMES NO FRAMES SUMMARY: NESTED | FIELD | CONSTR | METHOD DETAIL: FIELD | CONSTR | METHOD