In the last article, we had an introduction to NoSql databases. Now let us go into details of main types of NoSql databases.
Some of the most popular types of NoSql datastores are as follows :
- Key-value stores
- Column family stores
- Document databases
- Graph databases
In the next few articles, we will do a detailed analyis of some of these, starting with key-value store.
What is a key-value store?
As the name suggests, in a key-value store, data is represented as a collection of key–value pairs. It is also known as associative arrays, organized into rows. These databases store the data as a hash table with a unique key and a pointer to a particular item of data. Similar to traditional hash tables, it allows data storage and retrieval through keys.
The key-value stores are used whenever the data would be queried by precise parameters and needs to be retrived really fast.
How do key-value stores work?
The key value stores do not impose a specific schema. Traditional RDBs pre-define the data structure in the database as a series of tables containing fields with well defined data types. Exposing the data types to the database program allows it to apply a number of optimizations. In contrast, key-value systems treat the data as a single opaque collection which may have different fields for every record. In each key-value pair the key is represented by an arbitrary string such as a filename, URI or hash. The value can be any kind of data like an image, user preference file or document. The value is stored as a blob requiring no upfront data modeling or schema definition.
This offers considerable flexibility and more closely follows modern concepts like object-oriented programming. Because optional values are not represented by placeholders as in most RDBs, key-value stores often use far less memory to store the same database, which can lead to large performance gains in certain workloads.
The storage of the value as a blob removes the need to index the data to improve performance. However, you cannot filter or control what’s returned from a request based on the value because the value is opaque.
In general, key-value stores have no query language. They provide a way to store, retrieve and update data using simple get, put and delete commands; the path to retrieve data is a direct request to the object in memory or on disk. The simplicity of this model makes a key-value store fast, easy to use, scalable, portable and flexible.
Now let us evaluate key-value stores in terms of different DBMs parameters.
- Concurrency : In Key/Value Store, concurrency is only applicable on a single key, and it is usually offered as either optimistic writes or as eventually consistent. In highly scalable systems, optimistic writes are often not possible, because of the cost of verifying that the value haven’t changed (assuming the value may have replicated to other machines), there for, we usually see either a key master (one machine own a key) or the eventual consistency model.
- Queries : As mentioned above, there really isn’t any way to perform a query in a key value store, except by the key. Even range queries on the key are usually not possible. However, in many web application use-cases, the key-based access is required, and the need for the DBMS to actually “understand” the data is minimal. In use-cases like user profiles, user sessions, shopping carts etc, the DBMS can actually be oblivious to the data attributes and store this information as blob passing it to the application layer directly and relying on it to process it. Thus using key-value store in such cases makes it cheap to handle (one request to read, one request to write) when you run into concurrency conflict (you only need to resolve a single key).
- Transactions : While it is possible to offer transaction guarantees in a key value store, those are usually only offer in the context of a single key put. It is possible to offer those on multiple keys, but that really doesn’t work when you start thinking about a distributed key value store, where different keys may reside on different machines. Some data stores offer no transaction guarantees.
- Schema : Key-value stores do not have a pre-defined schema – they have just two fields – a key and the value. They rely on the application using the data for parsing it.
- Scaling up : Key-value stores scale out by implementing partitioning (storing data on more than one node), replication and auto recovery. They can scale up by maintaining the database in RAM and minimize the effects of ACID guarantees (a guarantee that committed transactions persist somewhere) by avoiding locks, latches and low-overhead server calls.
The simplest way for key-value stores to scale up is to shard the entire key space. hat means that keys starting in A go to one server, while keys starting with B go to another server. In this system, a key is only stored on a single server. That drastically simplify things like transactions guarantees, but it expose the system for data loss if a single server goes down. At this point, we introduce replication.
- Replication : In key value stores, the replication can be done by the store itself or by the client (writing to multiple servers). Replication also introduce the problem of divergent versions. In other words, two servers in the same cluster think that the value of key ‘ABC’ are two different things. Resolving that is a complex issue, the common approaches are to decide that it can’t happen (Scalaris) and reject updates where we can’t ensure non conflict or to accept all updates and ask the client to resolve them for us at a later date (Amazon Dynamo, Rhino DHT).
- Portability and lower operational costs : Key-value stores are portable because they do not require a complex query language. You can move an application from one system to another without rewriting code or constructing new architecture. Companies can expand their product offerings on new operating systems, without affecting their core technology.
When to use key-value stores?
Key-value stores handle size well and are good at processing a constant stream of read/write operations with low latency making them perfect for:
- Session management at high scale
- User preference and profile stores
- Product recommendations; latest items viewed on a retailer website drive future customer product recommendations
- Ad servicing; customer shopping habits result in customized ads, coupons, etc. for each customer in real-time
- Can effectively work as a cache for heavily accessed but rarely updated data
Key-value stores differ in their implementation where some support ordering of keys like Berkeley DB, FoundationDB and MemcacheDB, some maintain data in memory (RAM) like Redis, some, like Aerospike, are built natively to support both RAM and solid state drives (SSDs). Others, like Couchbase Server, store data in RAM but also support rotating disks. Some popular key-value stores are:
- Apache Cassandra
- Berkeley DB
- Couchbase Server
Common use cases for Key-Value Store:
- Storing data for customer preferences
- Using cache to accelerate application responses
Key-Value Store Vs Cache
Sometimes likened to a key-value store because of its ability to return a value given a specific key, a cache transparently stores a pool of read data so that future requests for the data can be quickly accessed at a later time to improve performance.
Data stored in a cache can be precomputed values or a copy of data stored on disk. When an application receives a request for data and it resides in the cache (called a hit), the request can be served by reading the cache, which is fast. If on the other hand, the requested information does not reside in the cache (called a miss) the requested data must be recomputed or retrieved from its original source which results in a delay.
Caches and key-value store do have differences. Where key-value stores can be used as a database to persist data, caches are used in conjunction with a database when there is a need to increase read performance. Caches are not used to enhance write or update performance yet key-value stores are very effective. Where key-value stores can be resilient to server failure, caches are stored in RAM and cannot provide you with transactional guarantees if the server crashes.
Howeber some key-value stores like Aerospike, have the ability to function as a cache and as a result improve performance and reduce latency making them perfect for consumer facing applications.