Yes, it is [[:digit:]]
~ [0-9]
~ \d
(where ~ means aproximate).
In most programming languages (where it is supported) \d
≡ [[:digit:]]
(identical).
The \d
is less common than [[:digit:]]
(not in POSIX but it is in GNU grep -P
).
There are many digits in UNICODE, for example:
123456789 # Hindu-Arabic
Arabic numerals٠١٢٣٤٥٦٧٨٩ # ARABIC-INDIC
۰۱۲۳۴۵۶۷۸۹ # EXTENDED ARABIC-INDIC/PERSIAN
߀߁߂߃߄߅߆߇߈߉ # NKO DIGIT
०१२३४५६७८९ # DEVANAGARI
All of which may be included in [[:digit:]]
or \d
.
Instead, [0-9]
is generally only the ASCII digits 0123456789
.
There are many languages: Perl, Java, Python, C. In which [[:digit:]]
(and \d
) calls for an extended meaning. For example, this perl code will match all the digits from above:
$ a='0123456789 ٠١٢٣٤٥٦٧٨٩ ۰۱۲۳۴۵۶۷۸۹ ߀߁߂߃߄߅߆߇߈߉ ०१२३४५६७८९'
$ echo "$a" | perl -C -pe 's/[^\d]//g;' ; echo
0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९
Which is equivalent to select all characters that have the Unicode properties of Numeric
and digits
:
$ echo "$a" | perl -C -pe 's/[^\p{Nd}]//g;' ; echo
0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९
Which grep could reproduce (the specific version of pcre may have a diferent internal list of numeric code points than Perl):
$ echo "$a" | grep -oP '\p{Nd}+'
0123456789
٠١٢٣٤٥٦٧٨٩
۰۱۲۳۴۵۶۷۸۹
߀߁߂߃߄߅߆߇߈߉
०१२३४५६७८९
Change it to [0-9] to see:
$ echo "$a" | grep -o '[0-9]\+'
0123456789
POSIX
For the specific POSIX BRE or ERE:
The \d
is not supported (not in POSIX but is in GNU grep -P
). [[:digit:]]
is required by POSIX to correspond to the digit character class, which in turn is required by ISO C to be the characters 0 through 9 and nothing else. So only in C locale all [0-9]
, [0123456789]
, \d
and [[:digit:]]
mean exactly the same. The [0123456789]
has no possible misinterpretations, [[:digit:]]
is available in more utilities and it is common to mean only [0123456789]
. The \d
is supported by few utilities.
As for [0-9]
, the meaning of range expressions is only defined by POSIX in the C locale; in other locales it might be different (might be codepoint order or collation order or something else).
所有评论(0)