-
-
Notifications
You must be signed in to change notification settings - Fork 276
Emoji Unicode Category #277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm a little rusty on how this works, but it looks like https://www.unicode.org/reports/tr51/#Emoji_Characters does indeed link to the data files needed in order to add this category to XRegExp as part of Unicode 12. I think we'll need to adapt the approach in #248 to upgrade to unicode-12.1.0. Might take a shot at it myself! |
Actually, it looks like @mathiasbynens (the publisher of the aforementioned unicode packages that we use in XRegExp) has published an |
Thanks for pointing to the packages, I saw the list of codes for Emoji is huge 😅. If Unicode 12.1.0 is added, then it means than there would be an Emoji category ? Sorry I don't know much about regex unicode/regex |
Unicode defines a character property named "Emoji" but it only expands to single code points. It already works in JS RegExp: You probably want to match emoji sequences as well, though. https://github.com/tc39/proposal-regexp-unicode-sequence-properties aims to address this at the standards level. Until that happens, the emoji-regex package @josephfrazier pointed to above can be used. |
I'm trying to find a way to use xregexp to replace all non-Latin chars, but preserving the basic emojis table. Right now I developed this regex: const regex = xregexp("[^\\s\\p{Latin}\\p{Common}]+", "g"); It works fine allowing just latin characters (and accents), but it also removes Emojis. Is there a way to combine with |
I see the library is now using Unicode 14.0.0. Has the emoji support been added? When I try |
XRegExp indeed uses Unicode 14.0.0 character data. It supports all Unicode scripts (via e.g. XRegExp does not include the It turns out, matching all/only what most people recognize as emoji is complicated. You probably want something like this, which works in native JS:
Note that this is significantly more robust than the example emoji regex @mathiasbynens gave at https://github.com/tc39/proposal-regexp-unicode-property-escapes#matching-emoji, since it includes standard flag sequences as well as emoji character sequences that include zero-width joiners ( Aside: I'm curious how this compares to the |
Thanks for that.. just curious, why not add your own token to this library to do just that. If they ever come out with |
I want a Regex to match emojis, i saw the example of
But it's not specified that
'(?A)^\\pS$'
would match only emojis, maybe could be a\\p{Emoji}
Category.to bring some spec: Unicode has other categories not listed under RL1.2 and talks about TR15(Emojis).
At the end of the day I just want to be able to match characters from iPhone and/or Android keyboard , but I cannot find anywhere the list of characters 😿
The text was updated successfully, but these errors were encountered: