Tuesday, June 28, 2011

Hashmap : How it works?


On very high level, Hash Map is used to store data in form of key value pair.
Hash Map is a Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.” … from Java API
After reading this definition, some question comes in my mind:
  1. How hash map store data internally?
  2. What happen when I try to store some new information in map?
  3. How hash map find my data?
And when I tried to explore it, I find it more and more interesting.
HashMap has a static class named Entry which implements Map.Entry interface. The Entry class looks like:
static class Entry implements Map.Entry {
final Object key;
Object value;
final int hash;
Entry next;
Entry(int i, Object obj, Object obj1, Entry entry) {
value = obj1;
next = entry;
key = obj;
hash = i;
}
// Other methods
}
Every time we insert a into hashmap using .put() method, a new Entry object is created (not true is some cases. if key already exists, then it just replace the value).  Map internally used two data structures to manage/store data:
  1. Array
  2. Link List
This image shows how hashmap manage data. Here
  • Each index of array is a bucket
  • To identify the bucket for any <key,value>, Hash map use key.hashCode() and perform some operation:
    Bucket (index) =HashMap.indexFor (HashMap.hash(key.hashCode()), entryArray.length)
    It means, two keys with different hashCode can fall under same bucket.
  • If a bucket is empty (table[i] is null) then the Entry object is simply inserted at ith position
    table[i] = new Entry(hash, key, value, null)
  • If a bucket has some values, then the following algo is used:
    Entry entry = table[i] Table[i] = new Entry(hash,key,value,entry)
    It means, the latest entry resides on the top of the bucket.
  • If a key already exists in a map, then it just replace the value and return the old value
    If(entry.key == key || key.equals(entry.key)
    Then
    Object oldValue = entry.value;
    Entry.value = newValue;
    Return oldValue;
    End if
  • In case of new key,  map add the key, value (a new Entry object with key value) and return null
  • Entry.hash is not the hashCode of key, but HashMap use its own hash algo to create hash based on key.hashCode(). So it’s like:
    entry.hash = HashMap.hash(key.hashCode());
  • HashMap allow single null as key, in case of null key, it create a dummy object and use it as key.
    static final Object NULL_KEY = new Object();
  • hashMap.get(Obj) and hashMap.containsKey(obj), both method iterate through the map. Avoid to use extra call of containsKey().
    If(null != map.containsKey(obj)) {
    myObj = map.get(obj);
    //Perform other operations
    }
    This can be easily replaced by:
    myMap  = map.get(obj);
    if(null != myMap) {
    // Perform other operations
    }

    (http://mkbansal.wordpress.com/2010/06/24/hashmap-how-it-works/)


    How HashMap works in Java
    How HashMap works in Java or sometime how get method work in HashMap is common interview questions now days. Almost everybody who worked in Java knows what hashMap is, where to use hashMap or difference between hashtable and HashMap then why this interview question becomes so special? Because of the breadth and depth this question offers. It has become very popular java interview question in almost any senior or mid-senior level java interviews.

    Questions start with simple statement 

    "Have you used HashMap before" or "What is HashMap? Why do we use it “
    Almost everybody answers this with yes and then interviewee keep talking about common facts about hashMap like hashMap accpt null while hashtable doesn't, HashMap is not synchronized, hashMap is fast and so on along with basics like its stores key and value pairs etc.
    This shows that person has used hashMap and quite familier with the funtionalities HashMap offers but interview takes a sharp turn from here and next set of follow up questions gets more detailed about fundamentals involved in hashmap. Interview here you and come back with questions like

    "Do you Know how hashMap works in Java” or
    "How does get () method of HashMap works in Java"
    And then you get answers like I don't bother its standard Java API, you better look code on java; I can find it out in Google at any time etc.
    But some interviewee definitely answer this and will say "HashMap works on principle of hashing, we have put () and get () method for storing and retrieving data from hashMap. When we pass an object to put () method to store it on hashMap, hashMap implementation calls
    hashcode() method hashMap key object and by applying that hashcode on its own hashing funtion it identifies a bucket location for storing value object , important part here is HashMap stores both key+value in bucket which is essential to understand the retrieving logic. if people fails to recognize this and say it only stores Value in the bucket they will fail to explain the retrieving logic of any object stored in HashMap . This answer is very much acceptable and does make sense that interviewee has fair bit of knowledge how hashing works and how HashMap works in Java.
    But this is just start of story and going forward when depth increases a little bit and when you put interviewee on scenarios every java developers faced day by day basis. So next question would be more likely about collision detection and collision resolution in Java HashMap e.g 

    "What will happen if two different objects have same hashcode?”
    Now from here confusion starts some time interviewer will say that since Hashcode is equal objects are equal and HashMap will throw exception or not store it again etc. then you might want to remind them aobut equals and hashCode() contract that two unequal object in Java very much can have equal hashcode. Some will give up at this point and some will move ahead and say "Since hashcode () is same, bucket location would be same and collision occurs in hashMap, Since HashMap use a linked list to store in bucket, value object will be stored in next node of linked list." great this answer make sense to me though there could be some other collision resolution methods available this is simplest and HashMap does follow this.
    But story does not end here and final questions interviewer ask like 

    "How will you retreive if two different objects have same hashcode?”
     Hmmmmmmmmmmmmm
    Interviewee will say we will call get() method and then HashMap uses keys hashcode to find out bucket location and retreives object but then you need to remind him that there are two objects are stored in same bucket , so they will say about traversal in linked list until we find the value object , then you ask how do you identify vlaue object because you don't value object to compare ,So until they know that HashMap stores both Key and Value in linked list node they won't be able to resolve this issue and will try and fail.

    But those bunch of people who remember this key information will say that after finding bucket location , we will call keys.equals() method to identify correct node in linked list and return associated value object for that key in Java HashMap. Perfect this is the correct answer.

    In many cases interviewee fails at this stage because they get confused between hashcode () and equals () and keys and values object in hashMap which is pretty obvious because they are dealing with the hashcode () in all previous questions and equals () come in picture only in case of retrieving value object from HashMap.
    Some good developer point out here that using immutable, final object with proper equals () and hashcode () implementation would act as perfectJava HashMap keys and improve performance of Java hashMap by reducing collision. Immutablity also allows caching there hashcode of different keys which makes overall retreival process very fast and suggest that String and various wrapper classes e.g Integer provided by Java Collection API are very good HashMap keys.

    Now if you clear all this java hashmap interview question you will be surprised by this very interesting question "What happens On HashMap in Java if the size of the Hashmap exceeds a given threshold defined by load factor ?". Until you know how hashmap works exactly you won't be able to answer this question.
    if the size of the map exceeds a given threshold defined by load-factor e.g. if load factor is .75 it will act to re-size the map once it filled 75%.Java Hashmap does that by creating another new bucket array of size twice of previous size of hashmap, and then start putting every old element into that new bucket array and this process is called rehashing because it also applies hash function to find new bucket location. 


    If you manage to answer this question on hashmap in java you will be greeted by "do you see any problem with resizing of hashmap in Java" , you might not be able to pick the context and then he will try to give you hint about multiple thread accessing the java hashmap and potentially looking for race condition on HashMap in Java

    So the answer is Yes there is potential race condition exists while resizing hashmap in Java, if two thread at the same time found that nowJava Hashmap needs resizing and they both try to resizing. on the process of resizing of hashmap in Java , the element in bucket which is stored in linked list get reversed in order during there migration to new bucket because java hashmap doesn't append the new element at tail instead it append new element at head to avoid tail traversing. if race condition happens then you will end up with an infinite loop. though this point you can potentially argue that what the hell makes you think to use HashMap in multi-threaded environment to interviewer :)

    I like this question because of its depth and number of concept it touches indirectly, if you look at questions asked during interview this HashMap questions has verified
    Concept of hashing
    Collision resolution in HashMap
    Use of equals () and hashCode () method and there importance?
    Benefit of immutable object?
    race condition on hashmap in Java
    Resizing of Java HashMap

    Just to summararize here are the answers which does makes sense for above questions

    How HashMAp works in Java
    HashMap works on principle of hashing, we have put () and get () method for storing and retrieving object form hashMap.When we pass an both keyand value to put() method to store on HashMap, it uses key object hashcode() method to calculate hashcode and they by applying hashing on that hashcode it identifiesbucket location for storing value object.
    While retrieving it uses key object equals method to find out correct key value pair and return value object associated with that key. HashMap uses linked list in case ofcollision and object will be stored in next node of linked list.
    Also hashMap stores both key+value tuple in every node of linked list.

    What will happen if two different HashMap key objects have same hashcode?
    They will be stored in same bucket but no next node of linked list. And keys equals () method will be used to identify correct key value pair in HashMap.

    In terms of usage HashMap is very versatile and I have mostly used hashMap as cache in electronic trading application I have worked . Since finance domain used Java heavily and due to performance reason we need caching a lot HashMap comes as very handy there.


    (http://javarevisited.blogspot.com/2011/02/how-hashmap-works-in-java.html)