Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change LevenshteinDistance to Damerau–Levenshtein distance? #41

Open
tang-hi opened this issue Jun 30, 2022 · 4 comments
Open

Change LevenshteinDistance to Damerau–Levenshtein distance? #41

tang-hi opened this issue Jun 30, 2022 · 4 comments

Comments

@tang-hi
Copy link
Contributor

tang-hi commented Jun 30, 2022

Hi,
Damerau–LevenshteinDistance is the minimum number of edit operations ( insertions, deletions ,substitutions and transposition) required to change one word to another.It has one more action(transposition) compared to LevenshteinDistance.
I think Damerau–LevenshteinDistance is more practical than LevenshteinDistance. And it's time complexity is also O(N^2), but unfortunately it's space complexity is O(N^2) while LevenshteinDistance's space complexity is O(N)

Do you think it's an acceptable change? If so, I would like to raise a PR to change LevenshteinDistance to Damerau–Levenshtein distance.

@lithammer
Copy link
Owner

It would be nice to see some benchmarks comparing them.

@tang-hi
Copy link
Contributor Author

tang-hi commented Jul 1, 2022

I have implemented the Damerau–LevenshteinDistance(Optimal string alignment distance), and do some benchmark on them.
1656688418264
It seems like Damerau–LevenshteinDistance is 2x slower than LevenshteinDistance.Do you think it's an acceptable change?
If the performance is ok, I will clean the code and raise a pr

benchmark
code

@lithammer
Copy link
Owner

Thinking about this a bit more, I'm reluctant to replace the current implementation with something twice as slow with vague real-world advantages. I think a better approach would be to make the edit distance function pluggable (with the current LevenshteinDistance function as a default).

I will investigate how this can be done without breaking changes (hopefully). And afterwards maybe you can contribute a DamerauLevenshteinDistance function? What do you think?

@tang-hi
Copy link
Contributor Author

tang-hi commented Jul 4, 2022

Thinking about this a bit more, I'm reluctant to replace the current implementation with something twice as slow with vague real-world advantages. I think a better approach would be to make the edit distance function pluggable (with the current LevenshteinDistance function as a default).

I will investigate how this can be done without breaking changes (hopefully). And afterwards maybe you can contribute a DamerauLevenshteinDistance function? What do you think?

I think it's good idea.I'm looking forward that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants