Tuesday, 23 May 2017

Print binary tree in Spiral order

Background

Sometime back we had seen how to traverse a binary tree and print it. We saw -
  • Pre-order 
  • post-order
  • In-order
  • level order traversals
Binary Tree Traversal

In this post we will see how to print them in a spiral order.  Consider following tree -




We need to print the tree in following order - 1, 2, 3, 4, 5, 6, 7.




Code

Following recursive approach will help achieve this. Idea is to keep a boolean toggle param to print nodes either from left to right or right to left.


    public static int getHeight(BTreeNode root) {
        if (root == null) {
            return 0;
        } else {
            int leftHeight = getHeight(root.getLeft());
            int rightHeight = getHeight(root.getRight());
            return leftHeight > rightHeight ? leftHeight + 1 : rightHeight + 1;
        }
    }


    /**
     * 
     * @param root
     *            if the btrr Worst case time complexity - O(N^2) for skewed
     *            trees No extra space
     */
    public static void printSpiralRecurssive(BTreeNode root) {
        boolean leftToRight = false;
        int height = getHeight(root);
        for (int i = 1; i <= height; i++) {
            printLevelRecurssive(root, i, leftToRight);
            leftToRight = !leftToRight;
        }
    }

    public static void printLevelRecurssive(BTreeNode root, int level, boolean leftToRight) {
        if (level == 1) {
            System.out.println(root.getData());
        } else {
            if (leftToRight) {
                printLevelRecurssive(root.getLeft(), level - 1, leftToRight);
                printLevelRecurssive(root.getRight(), level - 1, leftToRight);
            } else {
                printLevelRecurssive(root.getRight(), level - 1, leftToRight);
                printLevelRecurssive(root.getLeft(), level - 1, leftToRight);
            }
        }
    }

Logic : We first calculate the height of the tree which are basically levels. We then iterate from 1 to height (basically all levels) and print them from left to right or right to left based on the boolean toggle. We toggle this value after each level. For each recursive call we start from root and we go down till we reach the level we want it to be (one next to previously iterated on) based on the height and print nodes.

Complete solution with example is provide under my github repo of Data Structures -
In the same link there is a recursive solution as well that takes O(N) extra space to give same result. Iterative solution  -

private static Stack<BTreeNode> leftToRight = new Stack<>();
private static Stack<BTreeNode> rightToLeft = new Stack<>();
public static void printSpiralIterative(BTreeNode root) {

        rightToLeft.push(root);

        while (!rightToLeft.isEmpty() && !leftToRight.isEmpty()) {
            while (!rightToLeft.isEmpty()) {
                BTreeNode node = rightToLeft.pop();
                System.out.println(node.getData());
                if (node.getLeft() != null) {
                    leftToRight.push(node.getLeft());
                }
                if (node.getRight() != null) {
                    leftToRight.push(node.getRight());
                }
            }

            while (!leftToRight.isEmpty()) {
                BTreeNode node = rightToLeft.pop();
                System.out.println(node.getData());
                if (node.getLeft() != null) {
                    rightToLeft.push(node.getLeft());
                }
                if (node.getRight() != null) {
                    rightToLeft.push(node.getRight());
                }
            }
        }

}


Let me know if you have any questions.

Related Links

Saturday, 20 May 2017

How ConcurrentHashMap Works Internally in Java

Background

In one of the previous posts we saw how HashMap works -
and how it's time complexity of insertion and deletion is O(1) is normal case. Though this is a great data structure to work with in terms of time complexity it is not thread safe which means you cannot use it directly in multi threaded environments without taking additional precautions like synchronizing put/get on your own. Instead Java has provided a thread safe implementation of concurrent hashmap. We can directly use it in case of multi threaded environments for thread safety. Eg. in case of parallel stream introduced in java 8.

How ConcurrentHashMap Works Internally in Java

Before we see how it is implemented in Java lets give it some though. What are possible problems with a HashMap. Race condition, invalid state. Lets say two writes happen at the same time. Since write is not an atomic operation one value may overwrite other and Map may go in inconsistent state. We can obviously add synchronization over read/writes of a HashMap but it would be very inefficient and have performance impact. I would be like single threaded application certainly the behavior we don't expect. To solve this issue Java provides ConcurrentHashMap that has built in thread safety. Let see how -

We know how HashMap works. Internally it stores an array of Entry object which essentially has key, value and pointer to next Entry object (linked list used in case of collision). You can think of each array element as bucket and each Entry object as a data point containing key (in case 2 keys have same hash - collision), value  and pointer to next data element. 

Working :
ConcurrentHashMap as the name suggests allows concurrent read/writes to the Map. But there are limitations. ConcurrentHashMap maintains another data structure internally called segments. Each bucket of HashMap is part of one of the segments. Number of segments is called Concurrency-Level which determines number of thread that can write simultaneous. This Segments gets locked when writing/updating/removing data. Think of Segments as locks used to prevent concurrent write to same bucket of hashmap leading to inconsistency. So as long as write to concurrent hashmap is on different segments it can happen in parallel. Reads are completely lock free i.e No need to acquire lock for reading. Last updated value is returned.


 Now lets go step by step -

 Concurrency-Level , Segment array and initialization :
  • First when you create a ConcurrentHashMap you can provide concurrency level. This determines size of Segment array. Size of segment array will always be equal or more than the concurrency level. If this is not provided default is used - 
    • static final int MAX_SEGMENTS = 1 << 16; // slightly conservative
  • Note that the size of segment table will always be power of 2. So if you give  concurrency level as 10 then next best power of 2 match will be picked up i.e 16 and Segment array of size 16 will be created which implies 16 threads can simultaneously operate on the map.
static final class Segment<K,V> extends ReentrantLock implements Serializable {

    //The number of elements in this segment's region.
    transient volatile int count;
    //The per-segment table. 
    transient volatile HashEntry<K,V>[] table;
}

Putting element in ConcurrentHashMap :

  • For putting element in Map we first need to determine which segment the element should be processed for. For this we first get hascode of the key. Next we do a rehash of the existing hash to ensure
     /**
     * Applies a supplemental hash function to a given hashCode, which
     * defends against poor quality hash functions.  This is critical
     * because ConcurrentHashMap uses power-of-two length hash tables,
     * that otherwise encounter collisions for hashCodes that do not
     * differ in lower or upper bits.
     */
    private static int hash(int h) {
        // Spread bits to regularize both segment and index locations,
        // using variant of single-word Wang/Jenkins hash.
        h += (h <<  15) ^ 0xffffcd7d;
        h ^= (h >>> 10);
        h += (h <<   3);
        h ^= (h >>>  6);
        h += (h <<   2) + (h << 14);
        return h ^ (h >>> 16);
    }
  •  Once hash is calculated you can get the segment which it belongs to and delegate put method to segments put method as follows -
    public V put(K key, V value) {
        if (value == null)
            throw new NullPointerException();
        int hash = hash(key.hashCode());
        return segmentFor(hash).put(key, hash, value, false);
    }

    final Segment<K,V> segmentFor(int hash) {
        return segments[(hash >>> segmentShift) & segmentMask];
    }


We will see how segment is computed in some time with a proper example. Once put is delegated to segment , segment will add it to the appropriate bucket in the segment.

        V put(K key, int hash, V value, boolean onlyIfAbsent) {
            lock();
            try {
                int c = count;
                if (c++ > threshold) // ensure capacity
                    rehash();
                HashEntry<K,V>[] tab = table;
                int index = hash & (tab.length - 1);
                HashEntry<K,V> first = tab[index];
                HashEntry<K,V> e = first;
                while (e != null && (e.hash != hash || !key.equals(e.key)))
                    e = e.next;

                V oldValue;
                if (e != null) {
                    oldValue = e.value;
                    if (!onlyIfAbsent)
                        e.value = value;
                }
                else {
                    oldValue = null;
                    ++modCount;
                    tab[index] = new HashEntry<K,V>(key, hash, first, value);
                    count = c; // write-volatile
                }
                return oldValue;
            } finally {
                unlock();
            }
        }


Now this is very interesting method. Lets understand whats happening here.

  • First call is to lock(). Since it is a write/update operation on a bucket of same segment we need a lock. If you recollect Segment class it extends ReentrantLock so each segment is a lock. So you can call lock() and unlock() directly in Segment class.
  • Next it's like a normal HashMap. You find the index of the Entry table where your elements hash falls and add it there as linked list.
  • You can see similar code has HashMap that updates value if key is same, inserts in array if there is no element in the table and adds it in the linked list of the table if element already exists.
  • Finally once operation is complete it calls unlock() so that other threads can continue update.
  • Note the lock is a blocking call. 
  • You can also see call for rehash if threshold is reached. Like Entry array Segment also has a threshold and when it is reached Segment array is resized for performance. That's what rehash. 
NOTE : For getting index of Segment table first n bits are used where as for getting index of Entry table last N bits are used from enhanced hash integer (See details in example below).

Getting element from  ConcurrentHashMap : 

Get on ConcurrentHashMap is very simple no locks involved. You simply read the data and return -

        public V get(Object key) {
                int hash = hash(key.hashCode());
                return segmentFor(hash).get(key, hash);
        }

        V get(Object key, int hash) {
            if (count != 0) { // read-volatile
                HashEntry<K,V> e = getFirst(hash);
                while (e != null) {
                    if (e.hash == hash && key.equals(e.key)) {
                        V v = e.value;
                        if (v != null)
                            return v;
                        return readValueUnderLock(e); // recheck
                    }
                    e = e.next;
                }
            }
            return null;
        }


NOTE  : readValueUnderLock method is used as a backup in case a null (pre-initialized) value is ever seen in an unsynchronized access method.

Example

Above was just all code and some understanding. Now lets take an actual example.

Let's say we have created a ConcurrentHashMap with concurrency level lets say 10. Based on this Segment array will be created based on following code -

    private static void printSegmentDetails(int concurrencyLevel) {
        int sshift = 0;
        int segmentMask = 0;
        int segmentShift = 0;

        int ssize = 1;
        while (ssize < concurrencyLevel) {
            ++sshift;
            ssize <<= 1;
        }
        segmentShift = 32 - sshift;
        segmentMask = ssize - 1;
        System.out.println("Segment array size :" + ssize);
        System.out.println("segmentShift : " + segmentShift);
        System.out.println("segmentMask : " + segmentMask);
    }


Output for 10 concurrency level:
Segment array size : 16
segmentShift : 28
segmentMask : 15

NOTE  :As mentioned before segment array is of size 2^n such that 2^n >= concurrency level. In this case 2^4

Now that we have segment table in place lets simulate put. We need to put a String called "Aniket" as key. We don't care about value. Just make sure it's not null.

  1. First we will calculate hascode of the key.
  2. Then hash it so for better hash (as mentioned above)
  3. Then based on the result hash we will find which segment will it belong
Remember of Segment table was >= 2^N we now want first N bits to determine which segment this hash falls into. Since N bits will vary from 1 - 2^N which is our segment array size. Also remember code to get this index from above? -
  • int segmentIndex = (hash >>> segmentShift) & segmentMask
This essentially means logically right shift hash with segmentShift bits. Since int is 32 bit and segmentShift = 32 - sshift, hash >>> segmentShift will essentially give you first sshift bits (sshift is nothing but N in 2^N we saw above). segmentMask is to get the N bits post shift.

So in this case,
N  = sshift =  4
2^N = 16 -> Size of segment array
segmentShift = 32 - 4 = 28 (as we saw in output above)
segmentMask = 16 -1 - 15

    public static void main(String args[]) {    
        String key = "Aniket";
        //hascode of key
        System.out.println(key.hashCode());
        //better hash
        System.out.println(hash(key.hashCode()));
        //better hash in binary
        System.out.println(Integer.toBinaryString(hash(key.hashCode())));
        //logical right shift by segmentShift
        System.out.println("Right shifter hash : " + Integer.toBinaryString(hash(key.hashCode()) >>> 28));
        // segment index as binary and of right shift and segmentMask
        System.out.println("Segment Index : " + Integer.toBinaryString((hash(key.hashCode()) >>> 28 ) & 15));
        // segment index as decimal
        System.out.println("Segment Index : " + ((hash(key.hashCode()) >>> 28 ) & 15));
    }


Output :
1965716254
1839402854
1101101101000110000111101100110
Right shifter hash : 110
Segment Index : 110
Segment Index : 6


NOTE : 1101101101000110000111101100110 is 31 bits as rightmost bit is 0 and ignored.  Same goes for all subsequent binmary bit formats.

So your element with key "Aniket" will go in Segment array of index 6. Inside segments it's pretty simple to calculate index of Entry array.

  •  int entryArrayindex = hash & (tab.length - 1);
         int entryArrayindex = (hash(key.hashCode()) & (16 - 1));
         System.out.println("Entry array index : " + entryArrayindex);
         System.out.println("Entry array index in binary : " + Integer.toBinaryString(entryArrayindex));


Output :
Entry array index : 6
Entry array index in binary : 110


So finally Entry is inserted at index 6 of Entry table.

So to summarize for getting index of Segment table first n bits are used where as for getting index of Entry table last N bits are used from enhanced hash integer.


Related Links 

Left shift and right shift operators in Java

Left shift and right shift operators in Java

A lot of time we use >> , >>> or << and <<< operators in Java for bit operations. These operations essentially shift bits left or right depending on the operator. In this post we will see what exactly is the difference.
  • >> is arithmetic or signed shift right
  • >>> is logical or unsigned shift right
  • << is arithmetic or signed shift left
The signed left shift operator "<<" shifts a bit pattern to the left, and the signed right shift operator ">>" shifts a bit pattern to the right. The bit pattern is given by the left-hand operand, and the number of positions to shift by the right-hand operand.

The unsigned right shift operator ">>>" shifts a zero into the leftmost position, while the leftmost position after ">>" depends on sign extension.

NOTE : There is no logic left shift as it is same as arithmetic left shift.

Example :

        System.out.println(Integer.toBinaryString(-121));
        // prints "11111111111111111111111110000111"
        System.out.println(Integer.toBinaryString(-121 >> 1));
        // prints "11111111111111111111111111000011"
        System.out.println(Integer.toBinaryString(-121 >>> 1));
        // prints "1111111111111111111111111000011"
        
        System.out.println(Integer.toBinaryString(121));
        // prints "1111001"
        System.out.println(Integer.toBinaryString(121 >> 1));
        // prints "111100"
        System.out.println(Integer.toBinaryString(121 >>> 1));
        // prints "111100"


As you can see in case of -121 since it is a negative number arithmetic or signed shift right adds a 1 to the rightmost bit where as in case if 121 it adds 0.

logical or unsigned shift does not care about sign. It just adds 0 to the shifted bits.


Related Links

Monday, 15 May 2017

How does database indexing work?

Background

Database indexing is a wide topic. Database indexing plays a important role in your query result performance. But like everything this too has a trade off.  In this post we will see what database indexing is and how does it work.


Clustered and Non clustered index

Before we go to how indexing actually works lets see the two types on indexes -
  1. Clustered index
  2. Non clustered index
Data in tables of a database need to be stored on a physical disk at the end of the day. It is important the way data is stored since data lookup is based on it. The way data is stored on physical disk is decided by an index which is known as Clustered index. Since data is physically stored only once , only one clustered index is possible for a DB table. Generally that's the primary key of the table. That's right. Primary key of your table is a Clustered index by default and data is stored physically based on it.

NOTE : You can change this though. You can create a primary key that is not clustered index but a non clustered one. However you need to define one clustered index for your table since physical storage order depends on it.

Non clustered indexes are normal indexes. They order the data based on the column we have created non clustered index on. Note since data is stored only once on the disk and there is just one column(or group of columns ) which can be used to order stored data on disk we cannot store the same data with ordering specified by non clustered index (that's the job of clustered index). So new memory is allocated for non clustered indexes. These have the column on which it is created as the key of the data row and a pointer to the actual data row (which is ordered by clustered index - usually the primary key). So if a search is performed on this table based on the columns that have non clustered indexes then this new data set is searched (which is faster since records are ordered with respect to this column) and then the pointer in it is used to access the actual data record with all columns.


Now that we have knowledge of clustered and non clustered indexes lets see how it actually works.

Why is indexing needed?

When data is stored on disk based storage devices, it is stored as blocks of data. These blocks are accessed in their entirety, making them the atomic disk access operation. Disk blocks are structured in much the same way as linked lists; both contain a section for data, a pointer to the location of the next node (or block), and both need not be stored contiguously.

Due to the fact that a number of records can only be sorted on one field, we can state that searching on a field that isn’t sorted requires a Linear Search which requires N/2 block accesses (on average), where N is the number of blocks that the table spans. If that field is a non-key field (i.e. doesn’t contain unique entries) then the entire table space must be searched at N block accesses.

Whereas with a sorted field, a Binary Search may be used, this has log2 N block accesses. Also since the data is sorted given a non-key field, the rest of the table doesn’t need to be searched for duplicate values, once a higher value is found. Thus the performance increase is substantial.

What is indexing?

This is rather a silly question given we already saw clustered and non clustered indexes but lets give it a try.

Indexing is a way of sorting a number of records on multiple fields. Creating an index on a field in a table creates another data structure which holds the field value, and pointer to the record it relates to. This index structure is then sorted, allowing Binary Searches to be performed on it. This index is obviously the non clustered one. There is no need to create separate data structure for Clustered indexes since the original data is stored physically sorted based on it.

The downside to (non clustered) indexing is that these indexes require additional space on the disk, since the indexes are stored together in a table using the MyISAM engine, this file can quickly reach the size limits of the underlying file system if many fields within the same table are indexed.

Performance

Indexes don't come free.  They have their own overhead. Each index creates an new data set ordered by the columns on which it is created. This takes extra space (though not as much as the original data set since this just has single data and pointer to actual row data). Also inserts are slower now since each insert will have to update this new index data set as well. Same goes for deletion.

Data structures used in indexes

Hash index :
 Think of this using a HashMap. Key here would be derived from columns that are used to create a index (non clustered index to be precise). Value would be pointer to the actual table row entry. They are good for equality lookups like get rows of data of all customer whose age is 24. But what happens when we need a query like set of data of customer with age greater than 24. Hash index does not work so go in this case.  Hash indexes are just good for equality lookups.
Eg.



B-tree Indexes:
These are most common types of indexes. It's kind of self balancing tree. It stores data in an ordered manner so that retrievals are fast. These are useful since they provide insertion, search and deletion in logarithmic time.
Eg.


Consider above picture. If we need rows with data less that 100 all we need are notes to the left of 100.

These are just common ones. Others are R-tree indexes, bitmap indexes etc.

Related Links

Sunday, 14 May 2017

Difference between ClassNotFoundException vs NoClassDefFoundError in Java

Background

In last post we how classloading works in Java -
In this post we will try to understand the difference between ClassNotFoundException vs NoClassDefFoundError thet generally bugs all Java developers.

If you have not gone through above link I strongly suggest you do it right away. It will give you very good understanding on class loading mechanism that we will be using shortly to understand these two situations.

Difference between ClassNotFoundException vs NoClassDefFoundError in Java

  • First point to note is their types. ClassNotFoundException is a checked exception. So you will need to handle it. Either catch it or throw it in method signature. NoClassDefFoundError is an error. Error are generally something you cannot recover from. You can still catch and handle it though.
  • Both things are related to class not available during runtime. Difference is the cause of non availability which we will see shortly. 
  • ClassNotFoundException is throw by running application where as NoClassDefFoundError is thrown by Java runtime.
ClassNotFoundException
We have already seen an example of ClassNotFoundException in last post when we tried to load our custom class with Extension classloader which was parent of Application classloader that actually loaded the class. We said parent classloader does not have visibility of classes loaded by child class loaders. So it threw ClassNotFoundException. Generally ClassNotFoundException is thrown when you try to load a custom class using methods like -
  • Class.forName()
  • ClassLoader.loadClass()
  • ClassLoader.findSystemClass()
and the required class is not found in the classpath. This could be because your classpath is incorrectly configured. Famous example is when you connect to a database using Jave you load the driver class using - Class.forName() [We do this explicit loading pre Java 6. From java 6 this class loading happens automatically. All you have to do is make sure driver jar is present in classpath] -
So for above usecase if you don have a driver class in your classpath then it will led to  ClassNotFoundException. Also you must have noticed by not you need to explicitly handle this exception since this is checked exception.

To resolve this issue you need to check that the class is available in your classpath.

NoClassDefFoundError:
Now this unlike ClassNotFoundException is an Error which is hard to recover from. This generally happens when class is available at compile time but not available at runtime.

One example can be static method/block of a class throws error due to which class does not load (though it was available at compile time and went through in compilation phase). Now if such a class is reference at runtime then it will throw NoClassDefFoundError.

To resolve this error you need check your classpath. It is possible that in your configuration (say in gradle files) you have added jar is lets say test configuration only and not in runtime or compile time configuration. You also need to lookout for any Initialization errors in the logs.


Related Links

t> UA-39527780-1 back to top