Beginner’s Guide To Hazelcast Part 1

Standard

Introduction

I am going to be doing a series on Hazelcast. I learned about this product from Twitter. They decided to follow me and after some research into what they do, I decided to follow them. I tweeted that Hazelcast would be a great backbone for a distributed password cracker. This got some interest and I decided to go make one. A vice president of Hazelcast started corresponding with me and we decided that while a cracker was a good project, the community (and me) would benefit from having a series of posts for beginners. I have been getting a lot of good information in the book preview The Book of Hazelcast found on www.hazelcast.com.

What is Hazelcast?

Hazelcast is a distributed, in-memory database. There are projects all over the world using Hazelcast. The code is open source under the Apache License 2.0.

Features

There are a lot of features already built into Hazelcast. Here are some of them:

  • Auto discovery of nodes on a network
  • High Availablity
  • In memory backups
  • The ability to cache data
  • Distributed thread pools
    • Distributed Executor Service
  • The ability to have data in different partitions.
  • The ability to persist data asynchronously or synchronously.
  • Transactions
  • SSL support
  • Structures to store data:
    • IList
    • IMap
    • MultiMap
    • ISet
  • Structures for communication among different processes
    • IQueue
    • ITopic
  • Atomic Operations
    • IAtomicLong
  • Id Generation
    • IdGenerator
  • Locking
    • ISemaphore
    • ICondition
    • ILock
    • ICountDownLatch

Working with Hazelcast

Just playing around with Hazelcast and reading has taught me to assume these things.

  1. The data will be stored as an array of bytes. (This is not an assumption, I got this directly from the book)
  2. The data will go over the network.
  3. The data is remote.
  4. If the data is not in memory, it doesn’t exist.

Let me explain these assumptions:

The data will be stored as an array of bytes

I got this information from The Book of Hazelcast so it is really not an assumption. This is important because not only is the data stored that way, so is the key. This makes life very interesting if one uses something other than a primitive or a String as a key. The developer of hash() and equals() must think about it in terms of the key as an array of bytes instead of as a class.

The data will go over the network

This is a distributed database and so parts of the data will be stored in other nodes. There are also backups and caching that happen too. There are techniques and settings to reduce transferring data over the network but if one wants high availability, backups must be done.

The data is remote

This is a distributed database and so parts of the database will be stored on other nodes. I put in this assumption not to resign to the fact that the data is remote but to motivate designs that make sure operations are preformed where most of the data is located. If the developer is skilled enough, this can be kept to a minimum.

If the data is not in memory, it doesn’t exist

Do not forget that this is an in-memory database. If it doesn’t get loaded into memory, the database will not know that data is stored somewhere else. This database doesn’t persist data to bring it up later. It persists because the data is important. There is no bringing it back from disk once it is out of memory like a conventional database (MySQL) would do.

Data Storage

Java developers will be happy to know that Hazelcast’s data storage containers except one are extensions of the java.util.Collections interfaces. For example, an IList follows the same method contracts as java.util.List. Here is a list of the different data storage types:

  • IList – This keeps a number of objects in the order they were put in
  • IQueue – This follows BlockingQueue and can be used as alternative to a Message Queue in JMS. This can be persisted via a QueueStore
  • IMap – This extends ConcurrentMap. It can also be persisted by a MapStore. It also has a number of other features that I will talk about in another post.
  • ISet – The keeps a set of unique elements where order is not guaranteed.
  • MultiMap – This does not follow a typical map as there can be multiple values per key.

Example

Setup

For all the features that Hazelcast contains, the initial setup steps are really easy.

  1. Download the Hazelcast zip file at www.hazelcast.org and extract contents.
  2. Add the jar files found in the lib directory into one’s classpath.
  3. Create a file named hazelcast.xml and put the following into the file
 

    
        
    
    

Hazelcast looks in a few places for a configuration file:

  • The path defined by the property hazelcast.config
  • hazelcast.xml in the classpath if classpath is included in the hazelcast.config
  • The working directory
  • If all else fails, hazelcast-default.xml is loaded witch is in the hazelcast.jar.
  • If one dose not want to deal with a configuration file at all, the configuration can be done programmatically.

The configuration example here defines multicast for joining together.  It also defines the IMap “a.”

A Warning About Configuration

Hazelcast does not copy configurations to each node.  So if one wants to be able to share a data structure, it needs to be defined in every node exactly the same.

Code

This code brings up two nodes and places values in instance’s IMap using an IdGenerator to generate keys and reads the data from instance2.

package hazelcastsimpleapp;

import com.hazelcast.core.Hazelcast;
import com.hazelcast.core.HazelcastInstance;
import com.hazelcast.core.IdGenerator;
import java.util.Map;

/**
 *
 * @author Daryl
 */
public class HazelcastSimpleApp {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        HazelcastInstance instance = Hazelcast.newHazelcastInstance();
        HazelcastInstance instance2 = Hazelcast.newHazelcastInstance();
        
        Map map = instance.getMap("a");
        IdGenerator gen = instance.getIdGenerator("gen");
        for(int i = 0; i < 10; i++) {
            map.put(gen.newId(), "stuff " + i);
        }
        
        Map map2 = instance2.getMap("a");
        for(Map.Entry entry: map2.entrySet()) {
            System.out.printf("entry: %d; %s\n", entry.getKey(), entry.getValue());
        }
        
        System.exit(0);
    }
    
}

Amazingly simple isn’t it!  Notice that I didn’t even use the IMap interface when I retrieved an instance of the map.  I just used the java.util.Map interface.  This isn’t good for using the distributed features of Hazelcast but for this example, it works fine.

One can observe the assumptions at work here.  The first assumption is storing the information as an array of bytes.  Notice the data and keys are serializable.  This is important because that is needed to store the data.  The second and third assumptions hold true with the data being being accessed by the instance2 node.  The fourth assumption holds true because every value that was put into the “a” map was displayed when read.  All of this example can be found at https://github.com/darylmathison/hazelcast-simple-app-example.  The project was made using Netbeans 8.0.

Conclusion

An quick overview of the numerous features of Hazelcast were reviewed with a simple example showing IMap and IdGenerator.  A list of assumptions were discussed that apply when developing in a distributed, in-memory database environment.

References

The Book of Hazelcast. Download from http://www.hazelcast.com

Advertisements

5 thoughts on “Beginner’s Guide To Hazelcast Part 1

  1. alium

    The contents of hazelcast.xml no longer appear – I find this to be so using different browsers from different devices.

  2. Aliaksandr Kavalevich

    Could you please comment about possible severe contradiction of the two statements from your post:
    “If the data is not in memory, it doesn’t exist.” vs “IMap – This extends ConcurrentMap. It can also be persisted by a MapStore”
    Volatility is very important architectural factor.
    Thanks.

    • Yes I can. One thing to understand is that my rules are teaching tools to get the reader to start thinking in a way that will help them. When I explain what the rules mean later in the post I do say that data can be persisted but that one must make special consideration for the data to be persisted and restored. If one reads through the java docs of MapStore, one will find that just because you can persist data doesn’t mean it will reload on restart. That has to be coded in too. One even has to be careful on what type of queries on the IMap are being done to make sure it will read in data into memory. Special, purposeful coding needs to be done so data is persisted and gets reloaded into memory. So for a normal, casual program, the data really does not exist if it is not in memory.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s