Package smile.hash
Interface SimHash<T>
- Type Parameters:
T- the data type of set objects.
public interface SimHash<T>
SimHash is a technique for quickly estimating how similar two sets are.
The algorithm is used by the Google Crawler to find near duplicate pages.
-
Method Summary
-
Method Details
-
hash
Return the hash code.- Parameters:
x- the object.- Returns:
- the hash code.
-
of
Returns theSimHashfor a set of generic features (represented as byte[]).- Parameters:
features- the generic features.- Returns:
- the
SimHash.
-
text
Returns theSimHashfor string tokens.- Returns:
- the
SimHashfor string tokens.
-