The term Polyglot Persistence means, the ability to store data over multiple stores(data stores). In fact Polyglot Persistence has been in existence for a few more years now, but what is interesting is the ability to map this into the big data space- more particularly in the various nosql stores. Lets take a few moments to understand this.
Normally the NOSQL stores have been classified into 4 main categories
- Key-Value Stores ( Riak, Redis)
- Column Oriented Stores (Hbase, Cassandra)
- Document Oriented Stores ( MongoDB, CouchDB)
- Graph Oriented Stores.(Neo4j)
Now each of the these data stores have specific use cases where one would use one of them over another. For example, if someone is trying to do really fast calculations - performance is on the cards, he would normally go for key-value stores due to the inherent data structure supporting high performance and some one would chose a graph DB, when there are lots of recursive decisions.
In a typical large enterprise, there may be multiple use cases, where more than one or multiple of the type of nosql data stores may be required, for e.g. a retail enterprise may require a graph db to store the relations of its customers with other customers, a KV store to do near real time calculations, a column store to do click stream analysis etc. But what is critical is, that there must be some way to store data, or a single data, through its various modifications/transformations on all of these various data stores. The ability to do this is what Polyglot Persistence means in the big data context.
Polyglot Persistence is quite complex and difficult to implement in the nosql- big data context, since we are dealing with poly structured data which has both volume and velocity. So how do we do manage the persistence? A few suggested approaches are
- We follow Abstract Factory Pattern and create a Dynamic DAO
- The Dynamic DAO creates factories for each individual data stores and stores the data(read/write)
- Each of the factories are dynamic themselves because based on the data that they get they create varying number of rows & columns at run time. Think of Hector here.
- A platform similar to the one that is provided by DataNucleus. This is a very interesting and effective platform, but it depends to be seen to what extent can it provide run time polyglot persistence and I have not yet tested this fully. So I will reserve my judgement.
Finally Polyglot Persistence is very important from an enterprise's point of view because the way Big Data is growing and the impact it has on the way we conduct business, its imperative that enterprises will soon require multiple types of nosql data stores to manage and work with their data and track the relationships of data with various other data from multiple data sources.