Java hashCode

What Is Hash Code :

If you want to file something away for later retrieval, it can be faster if you file it numerically rather than by a long alphabetic key. A hashCode is a way of computing a small (32-bit) digest numeric key from a long String or even an arbitrary clump of bytes. The numeric key itself is meaningless and the hashCode functions for computing them can look a bit insane. However, when you go to look for something, you can do the same digest calculation on the long alphabetic key you are looking for, and no matter how bizarre an algorithm you used, you will calculate the same hashCode, and will be able to look up numerically with it. Of course there is always the possibility two different Strings will have the same digest hashCode. However, even then, all is not lost; it greatly narrows down the search, hence speeding it up. A Hashtable goes a step further, scrunching down the hashCode even further to an even smaller number that it can use to directly index an array, usually by dividing it by some (ideally prime) number and taking the remainder.

hashCode Misconceptions :

  • Uniqueness:

As we know, hashCode is a numeric value. It is used in retrieval of the respective object and it need not be unique which we see later. Now since its a number its always fast to retrieve an object using a number rather than an alphabetic key. How it will do? Assume we created a new object by passing some value which is already available in someother object. Now the new object will return the same hash value as of another object because the value passed is same. Once the same hash value is returned, JVM will go to the same memory address every time and if in case there are more than one objects present for the same hash value it will use equals() method to identify the correct object.

hashCode values have longer lifetimes than the objects that produced them, making reuse visible even when an implementation provides unique values for all objects that are reachable at any moment in time, either via handle addresses or naked object pointers. The relative ease with which users can detect reuse would vary not just with a relationship to
memory addresses (i.e. use of handles) but with a range of policies involving memory management.

Also, more and more often hashCode implementations are faced with mapping very large sets of values into a 32 bit Java integer, so uniqueness is more obviously impossible to mandate.

String.hashCode Implementation :

In JDK 1.0+ and 1.1+ the hashCode function for long Strings worked by sampling every nth character. This pretty well guaranteed you would have many Strings hashing to the same value, thus slowing down Hashtable lookup. In Java version 1.2 the function has been improved to multiply the result so far by 31 then add the next character in sequence. This is a little slower, but is much better at avoiding collisions.

For the Java String‘s hashcode() implementation:
s[0]*31^(n-1) + s[1]*31^(n-2) + … + s[n-1]

(jdk source code):
int h = 0;
int off = offset;
char val[] = value;
int len = count;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
Essentially the prime value is used to reduce collisions and to give the best hashcode distribution on random string values. The Java code uses 31, therefore you should use a prime other than 31 for your own calculations, such as 37.

Object.hashCode Implementation :

The default hashCode() method uses the 32-bit internal JVM address of the Object as its hashCode.

public int hashCode() {
return VMMemoryManager.getIdentityHashCode(this);
}

Note that getIdentityHashCode is a native method.

static native int getIdentityHashCode(Object object);
This method satisfies the requirements of the specification for the System.identityHashCode(java.lang.Object)System.identityHashCode(Object x) method.

The identity hashcode is always the default java.lang.Object implementation, even if the class for a particular object overrides this and computes a different hash code.
The identity hashcode takes no account of the content of the object, just where it is located. The ordinary hashcode may (should) take account of content. Thus, the identity hashcodes for two strings that each contain “hello world” would be different, but the ordinary hashcodes would be the same.
The ordinary hashcode should be coded to be consistent with equals() and compareTo(). The identity hashcode generally isn’t consistent with them. In the example above, equals() would return true and compareTo() would return zero, which is consistent with the two strings having the same hash code.

public class HashCodeTest {

public int hashCode() { return 0xDEADBEEF; }

public static void main(String[] argv) {
HashCodeTest o1 = new HashCodeTest();
HashCodeTest o2 = new HashCodeTest();

System.out.println(“Using default toString():”);
System.out.println(“First: ” + o1);
System.out.println(“Second: ” + o2);

System.out.println(“Using System.identityHashCode():”);
System.out.println(“First: ” + System.identityHashCode(o1));
System.out.println(“Second: ” + System.identityHashCode(o2));
}
}

/* output */

Using default toString():
First: HashCodeTest@deadbeef
Second: HashCodeTest@deadbeef
Using System.identityHashCode():
First: 27553328
Second: 4072869

Implement hashCode :

  • Keep in mind that two equal objects must return the same integer. This is not a problem if the same class constructs the two equal objects. Both objects will have the same hashCode() method and hence, return the same integer.
  • Design your hashCode method so that the hash codes are evenly distributed; otherwise, you won’t get the performance benefits of hashing.Remember that hash codes do not have to be unique.

In Effective Java (Addison-Wesley 2001), Joshua Bloch gives a basic recipe for generating a decent hashCode( ):

1. Store some constant nonzero value, say 17, in an int variable called result.
2. For each significant field f in your object (each field taken into account by the
equals method, that is), do the following:
a. Compute an int hash code c for the field:
i.  If the field is a boolean, compute (f ? 0 : 1).
ii. If the field is a byte, char, short, or int, compute (int)f.
iii. If the field is a long, compute (int)(f ^ (f >>> 32)).
iv. If the field is a float compute Float.floatToIntBits(f).
v. If the field is a double, compute Double.doubleToLongBits(f), and
then hash the resulting long as in step 2.a.iii.
vi. If the field is an object reference and this class’s equals method
compares the field by recursively invoking equals, recursively
invoke hashCode on the field. If a more complex comparison is
required, compute a “canonical representation” for this field and
invoke hashCode on the canonical representation. If the value of the
field is null, return 0 (or some other constant, but 0 is traditional).

vii. If the field is an array, treat it as if each element were a separate field.
That is, compute a hash code for each significant element by applying
these rules recursively, and combine these values as described in
step 2.b.
b. Combine the hash code c computed in step a into result as follows:
result = 37*result + c;
3. Return result.
4. When you are done writing the hashCode method, ask yourself whether equal instances have equal hash codes. If not, figure out why and fix the problem.
It is acceptable to exclude redundant fields from the hash code computation.
In other words, it is acceptable to exclude any field whose value can be computed from fields that are included in the computation. It is required that you exclude any fields that are not used in equality comparisons. Failure to exclude these fields may result in a violation of the second provision of the hashCode contract.
A nonzero initial value is used in step 1, so the hash value will be affected by initial fields whose hash value, as computed in step 2.a, is zero. If zero was used as the initial value in step 1, the overall hash value would be unaffected by any such initial fields, which could increase collisions. The value 17 is arbitrary.
The multiplication in step 2.b makes the hash value depend on the order of the fields, which results in a much better hash function if the class contains multiple similar fields. For example, if the multiplication were omitted from a String hash function built according to this recipe, all anagrams would have identical hash codes. The multiplier 37 was chosen because it is an odd prime. If it was even and the multiplication overflowed, information would be lost because multiplication by two is equivalent to shifting. The advantages of using a prime number are less clear, but it is traditional to use primes for this purpose.
Let’s apply this recipe to the PhoneNumber class. There are three significant
fields, all of type short. A straightforward application of the recipe yields this hash function:
public int hashCode() {
int result = 17; int seed=37;
result = seed*result + areaCode.hashCode;
result = seed*result + exchange.hashCode;
result = seed*result + extension.hashCode;
return result;
}

*************************************************************

hashCode = multiplier * hashCode + attribute’s hashCode

*************************************************************

static class CacheKey
{
private String elementType;
private String id;
private int version;

public CacheKey(String type, String id, int version)
{
this.elementType = type;
this.id = id;
this.version = version;
}

@Override
public int hashCode()
{
final int prime = 31;
int result = 1;
result = prime * result
+ ((elementType == null) ? 0 : elementType.hashCode());
result = prime * result + ((id == null) ? 0 : id.hashCode());
result = prime * result + version;
return result;
}
}

HashCodeBuilder :

The Jakarta-Commons org.apache.commons.lang.builder package is providing a class named HashCodeBuilder which is designed to help implementing hashCode() method. Usually developers struggle hard with implementing hashCode() method and this class aims to simplify the process.

public class HashTest {

private String field1;

private short  field2;

—-

@Override

public int hashCode() {

return new HashCodeBuilder(83, 7)

.append(field1)

.append(field2)

.toHashCode();

}

}

Note that the two numbers for the constructor are simply two different, non-zero,

odd numbers – these numbers help to avoid collisions in the hashCode value across objects.

If required, the superclass hashCode() can be added using appendSuper(int).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s