NoSQL is a disruptive trend that changed many years of rigid perception of data stores and how they should be serving applications. NoSQL evolved from distributed theories and the common notion that distributed foundations are an absolute necessity for making the next leap in data management.
The data management needs of large scale services such as web search engines and global social networking cannot be properly served by ”old school” data management systems. Comparing today’s hardware to the past, your average $200 netbook has the computing power of a $10M mainframe of the 90’s. Yet the data growth we’re experiencing these days far exceeds hardware improvement and scaling, and many databases cannot be contained in a single server.
Distribution of data across multiple machines enables you to deal with large amounts of data and improve throughput. However, once the data is distributed, it becomes more difficult to maintain data consistency. For example, transactions may change data that resides on multiple servers. To be able to synchronize all the changes made by the transaction into a single known state (before and after committing), a synchronization point is needed, in the form of a transaction state.
Some NoSQL solutions loosen the requirements when it comes to consistency by not supporting transactions, thus reducing the need for synchronization points. This relaxed consistency approach may suit some use cases where consistency is less important, but would be unsuitable for others.
Let’s consider the example of social networking applications. In these applications, operations over the data objects are well spread, and there is a low likelihood of collision to start with. On top of that, data is mostly added and rarely updated. By contrast, NoSQL’s compromise of consistency can create major problems for most applications, some of which include CRMs, ERPs and trading applications — those cannot accept data integrity inconsistencies, even if they are temporary.
To date, I have yet to see proper research that recommends a viable approach to dealing with the problem of data inconsistency. Most applications are very intolerant to the “third state of a bit” –computer systems by nature are very dichotomic — it’s either Yes or No. There is no easy way to obtain a programming paradigm that deals well with an unknown state of data.
Unless a worthy solution at the application level is found that can deal with the data inconsistency issues in NoSQL, most applications can only find limited use of NoSQL in the form of caches, accelerators for specific tables, and more.
The question is: can distributed data services (NoSQL) help to scale data, even if it takes into account the existence of synchronization points and their limitations? What would be sacrificed if synchronization and transaction mechanisms were added on top of NoSQL foundations? The answer is performance. Distributed and consistent data services will always be slower than pure raw distributed data service.
Is sacrificing performance worth it?
Yes. Despite the performance penalty introduced by adding the consistency governing mechanisms, performance would still be reasonable for the majority of applications. It would not be as fast as a single server, all in-memory data service – but for most needs, the performance provided by a distributed yet consistent service would be good enough and comparable with standard database performance.
What do I get in exchange for higher performance?
The gains would be:
- Larger storage space
- X times more throughput than a single node
- More stable and predictable performance
- Better average performance under load
What would be the performance of a SQL implementation on top of Consistent NoSQL foundations?
Because the level of parallelism in a distributed system is high (yes, even with locks and transactions), the average performance of such a solution could exceed that of a standard database if the right optimizations are done to realize the parallelism and distribution potential.
So what sacrifices would I need to make for a distributed SQL database as opposed to a single server database?
When the database load is low, a single server will probably be faster. In this case, parallelism has a smaller effect and single server has no network overheads. Single server solutions have their drawbacks in the form of scalability, availability, throughput and stability, and therefore they are usually not a good solution.
Xeround’s cloud database is a SQL database that combines NoSQL principles to facilitate elasticity, alongside the mechanisms to support locks, transactions and query capabilities as in relational databases. Xeround’s SQL cloud database couples both SQL and NoSQL to create a highly parallel, highly scalable, and available cloud database.