-
-
Notifications
You must be signed in to change notification settings - Fork 813
[RFC]: Hamming distance between two strings #836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
ASCII strings only. Address issue stdlib-js#836
PR-URL: #1166 Co-authored-by: Athan Reines <[email protected]> Reviewed-by: Athan Reines <[email protected]> Ref: #836 Ref: #151
Hi, I would like to work on this |
@marsian83 Sure. Would you be open to working on |
Note that the UTF-16 code unit implementation has already been added. The code points implementation should iterate over Unicode code points (e.g., most Unicode characters and some emoji). You can find other code points implementation in the |
Sure, I'll see how these work. |
@kgryte Hello ! Can I work on the grapheme clusters part ? |
@mayankkamboj47 Sure. You'll probably want to rely on |
@kgryte @Planeshifter If noone is working on @stdlib/string/base/distances/hamming-code-points, can i work on it? |
Hello there, can someone please review PR #1948 on |
Hi @kgryte, I am interested in contributing to the Hamming distance feature for comparing Unicode code points in the Looking forward to your guidance and feedback! Thank you!`` |
Hello, everyone ! I'm interested in contributing to the Hamming distance feature. I noticed that parts of the implementation for different comparison modes (UTF-16 code units, Unicode code points, and grapheme clusters) have been discussed and some PRs have been submitted. Could someone please update me on the current status of this issue and let me know if there are any open tasks I could help with? Thank you ! |
@MynameisSanskar @Anant1004 As tracked in the OP, calculating the distance between two strings when comparing code points has not been completed. Initial attempt to add support can be found in #1948, but that PR stalled. |
Description
This RFC proposes adding a function to calculate the Hamming distance between two strings.
The function should have the following signature
(a: string, b: string): number
.The function should take two strings as arguments and return the Hamming distance between them. The Hamming distance is defined as the number of characters that have to be changed to convert one string to the other. Since it only allows substitutions, it can only be used to compare strings of the same length.
Additionally, in order to account for code points and grapheme clusters, we should add separate packages for dealing with each, as the underlying algorithms are likely to differ. We then can provide a more general API which unifies the underlying algorithms. Accordingly, we should create the following packages:
@stdlib/string/base/distances/hamming
: compares UTF-16 code units.string/base/distances/hamming
#1166@stdlib/string/base/distances/hamming-code-points
: compares Unicode code points.@stdlib/string/base/distances/hamming-grapheme-clusters
: compares grapheme clusters (i.e., visual characters)Once the above are completed, we can add
@stdlib/string/distances/hamming
: unifies the above "base" packages and provides an option for specifying the computation "mode" (i.e.,code_units
,code_points
, orgrapheme_clusters
, withgrapheme_clusters
being the default).Related Issues
Related issues #151.
Questions
No.
Other
No.
Checklist
RFC:
.The text was updated successfully, but these errors were encountered: