Skip to content

Commit 1d7e321

Browse files
committed
Perfect SIFT4
1 parent 02174e4 commit 1d7e321

File tree

3 files changed

+31
-1
lines changed

3 files changed

+31
-1
lines changed

Diff for: README.md

+9
Original file line numberDiff line numberDiff line change
@@ -393,7 +393,16 @@ Distance is computed as 1 - similarity.
393393
### SIFT4
394394
SIFT4 is a general purpose string distance algorithm inspired by JaroWinkler and Longest Common Subsequence. It was developed to produce a distance measure that matches as close as possible to the human perception of string distance. Hence it takes into account elements like character substitution, character distance, longest common subsequence etc. It was developed using experimental testing, and without theoretical background.
395395

396+
```python
397+
from strsimpy import SIFT4
398+
399+
s = SIFT4()
396400

401+
# result: 11.0
402+
s.distance('This is the first string', 'And this is another string') # 11.0
403+
# result: 12.0
404+
s.distance('Lorem ipsum dolor sit amet, consectetur adipiscing elit.', 'Amet Lorm ispum dolor sit amet, consetetur adixxxpiscing elit.', maxoffset=10)
405+
```
397406

398407
## Users
399408
* [StringSimilarity.NET](https://github.com/feature23/StringSimilarity.NET) a .NET port of java-string-similarity

Diff for: strsimpy/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@
3434
from .string_distance import StringDistance
3535
from .string_similarity import StringSimilarity
3636
from .weighted_levenshtein import WeightedLevenshtein
37-
from .sift4 import SIFT4
37+
from .sift4 import SIFT4Options, SIFT4
3838

3939
__name__ = 'strsimpy'
4040
__version__ = '0.1.9'

Diff for: strsimpy/sift4_test.py

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
import unittest
2+
3+
from .sift4 import SIFT4
4+
5+
6+
class SIFT4Test(unittest.TestCase):
7+
8+
def testSIFT4(self):
9+
s = SIFT4()
10+
11+
results = [
12+
('This is the first string', 'And this is another string', 5, 11.0),
13+
('Lorem ipsum dolor sit amet, consectetur adipiscing elit.', 'Amet Lorm ispum dolor sit amet, consetetur adixxxpiscing elit.', 10, 12.0)
14+
]
15+
16+
for a, b, offset, res in results:
17+
self.assertEquals(res, s.distance(a, b, maxoffset=offset))
18+
19+
20+
if __name__ == "__main__":
21+
unittest.main()

0 commit comments

Comments
 (0)