Skip to content

Java implementation for MinHash and LSH for finding near duplicate documents as measured by Jaccard similarity.

Notifications You must be signed in to change notification settings

ALShum/MinHashLSH

Repository files navigation

MinHashLSH

Java implementation for MinHash and LSH for finding near duplicate documents as measured by Jaccard similarity.

Implementation of MinHash for approximating Jaccard similarity in text documents.
Also includes an implementation of LSH which is a fast way to find approximate nearest neighbors.

About

Java implementation for MinHash and LSH for finding near duplicate documents as measured by Jaccard similarity.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages