NoSQL: The Sequel

SQL vs. NoSQL in the Cloud: Database Considerations

NoSQL versus SQLLet’s start with a simple question: What is the real difference between NoSQL and SQL?  In my view, the different access patterns provided by NoSQL and SQL result in very different scalability and performance.

NoSQL elements allow data access only in a narrow predefined access pattern. For example, DHT (Distributed Hash Table) is accessible via hashtable API; given the exact key, the value is returned. The access pattern for other NoSQL data services is similarly narrow and well-defined, and as a result scalability and performance structure are predictable and reliable.

In SQL, the access pattern is not known in advance, the tables are modeled, assumptions are made regarding the access patterns, and these assumptions are translated into predefined optimizations like index definitions. SQL is by definition a generic language that allows access to data in various ways. The programmer also has limited control over the execution of the SQL statements; mostly, the database engine is responsible for optimizing the execution of the statements. In other words, in SQL, the data model does not enforce a specific way to work with the data — it is built with an emphasis on data integrity, simplicity, data normalization and abstraction, which are all extremely important for large complex applications.

Why NoSQL

The NoSQL approach presents huge advantages over SQL databases because it allows one to scale an application to new levels. The new data services are based on truly scalable structures and architectures, built for the cloud, built for distribution, and are very attractive to the application developer. There’s no need for DBA, no need for complicated SQL queries and it is fast. Hooray, freedom for the people!

This is no small matter — a good programmer’s freedom to choose a data model, write a program or an application with familiar tools, reduce dependencies on other people, test and optimize the code without doing guesswork or counting on a black box (DB). Yes, it’s slow on the test system, but someone will take care of it later by tuning the DB…these are all major advantages of the NoSQL movement.

And Why Not…

There are some disadvantages to the NoSQL approach. Those are less visible at the developer level, but are highly visible at the system, architecture and operational levels.

  1. At the system level, data models are key. Not having a skilled authority to design a single, well-defined data model, regardless of the technology used, has its drawbacks. The data model may suffer from duplication of data objects (non-normalized model). This can happen due to the different object model used by different developers and their mapping to the persistency model. At the system level one must also understand the limitations of the chosen data service, whether it is size, ops per second, concurrency model, etc.
  2. At the architecture level, two major issues are interfaces and interoperability. Interfaces for the NoSQL data services are yet to be standardized. Even DHT, which is one of the simpler interfaces, still has no standard semantics, which includes transactions, none blocking API etc. Each DHT service used comes with its own set of interfaces. Another big issue is how different data structures, such as DHT and a binary tree, just as an example, share data objects. There are no intrinsic semantics for pointers in all those services. In fact, there’s usually not even strong typing in these services — it’s the developer’s responsibility to deal with that.Interoperability is an important point, especially when data needs to be accessed by multiple services. A simple example: backoffice works in Java, web serving works in php, can the data be accessed easily from both domains? Clearly one can use web services in front of the data as a data access layer, but that complicates things even more, and reduces business agility, flexibility and performance while increasing development overhead.
  3. Moving to the operational realm, here, from my experience, lies the toughest resistance, and rightfully so…The operational environment requires a set of tools that is not only scalable but also manageable and stable, be it on the cloud or on a fixed set of servers. When something goes wrong, it should not require going through the whole chain and up to the developer level to diagnose the problem. In fact, that is exactly what operation managers regard as an operational nightmare. Have you ever tried getting a developer to diagnose why a payment system is not functioning while he’s at a bar and a few beers in? I’m sure the developer’s date would be impressed by his dedication to his work, but that’s a pretty expensive way to impress someone :)Operation needs to be systematic and self contained. With the current NoSQL services available in the market, this is not easy to achieve, even in managed environments such as Amazon.

So, how can we gain the major advantages of the NoSQL approach while keeping the advantages of the SQL approach?

SQL and NoSQL Joined:

A SQL database implementation that uses NoSQL infrastructure is a good solution. A SQL database that is scalable, manageable, cloud-ready, highly available and built entirely on NoSQL infrastructure, but still provides all the advantages of a SQL database, such as interoperability, well-defined semantics and more. Xeround’s cloud database is just such a solution.

This hybrid would not be as fast as a NoSQL service, but it may be good enough for the 80% of the market that needs stronger scalability and organic cloud behavior.

Such a solution would also allow migrating existing applications easily into cloud environments, thus protecting huge investments made by organizations in those applications.

It is my opinion that a SQL database built on NoSQL foundations can provide the highest value to customers who wish to be both agile and efficient while they grow.

This entry was posted in MySQL Cloud Database, NoSQL and tagged , , . Bookmark the permalink.

10 Responses to NoSQL: The Sequel

  1. Eric says:

    I wonder if this is exactly what Google is going to do with their AppEngine relational database offering?

    I want to like NoSQL … I really liked the AE datastore when I used it, but that was mostly because it has a nice familiar ORM-like API and what appears to be a relational model of sorts. Which all might mean I wasn’t using it to its capacity. One thing that really bothers me, and I think you’ve alluded to this, is that NoSQL seems to push all of the data model management up to the application. That’s fine, if you the programmer like doing all that extra work that a RDBMS would do. But, as you say, it’s not so fine when heterogeneous software systems all need to access that data.

    With the data management all in the app, it makes the model very, very easy and tempting to change frequently. And these changes, if not orchestrated by someone across all of an organization’s (potentially) diverse systems/platforms, can cause a tidal wave of hurt. I agree that ops has to be able to live with NoSQL without losing whatever benefits it brings.

    Again I want to like NoSQL … but I suspect that rather than it being an end in itself, it will be more, eventually, an influence on more traditional database designs.

    • Avi Kapuya says:

      Thanks for your comment and I absolutely agree with you Eric, I am a fan of NoSQL technologies myself, as they are the foundations for scalability and distribution. Like you I am afraid they are too raw and difficult to digest for enterprises and this is even more true for business in which the main line of business is not technology.

  2. Ron Wolf says:

    Avi,

    Thanks(!) for giving voice to my noSQL misgivings. It was hard for me to do that because, well, it just made me feel old – I have a first edition Codd & Date from when I was in grad school. But with the noSQL movement, we are once again venturing into the mess that the relational model pulled us out of. I, unfortunately, remember working on ISAM-based apps and it was not easy because of all of the data-program coupling issues that the relational model, at least, partly led us away from. The overhead costs (computing and development) were always there in the intervening 30 years. Its only now with the extreme scale of some apps, the liberation of SOA (or at least CSP), and youth that we (the profession) is wandering back into the quagmire.

    Justified or not, I really can’t tell. But I have my intuition based on the history, and it is not pretty. Thx again for giving voice to this.

    BTW, I’ve built some very large very responsive apps based on SQL. One trick was to minimize indexes (maybe none) and do joins at in the application itself. But that way the relational DB was at least there, well-designed and ready for ad0hoc query, data cleansing, or whatever other non-primary (and often un-anticipated) uses came up. This is the converse (reverse?) of your suggestion. Maybe this architecture has a name? I call it the great compromise.

    Best,

    __________________Ron

    • Avi Kapuya says:

      Hey Ron, I’m really glad to hear this, I think many people share the same view and understand how NoSQL is really a foundation level service, which is not suitable for real world applications, we do think it is extremely important though, as a building block for better suited services for the cloud.
      Again, Thanks!

    • Avi Kapuya says:

      Marius, There are indeed other choices.
      Neither of the names you mentioned is offering a database on a public cloud, in addition none has the same level of elasticity as Xeround Cloud Database.
      Some are good only for OLTP, like Voltdb; Clusterix is a hardware solution.
      Xeround Cloud Database, Running today on Amazon in the US and Europe, offers a simple DBaaS solution, like none of the above. Feel free to try it and see how simple it is to crate a database, which is highly available and scalable.

  3. Salvia says:

    So it seems that Xeround is a bit like jackhare?
    Jackhare is built on Hadoop-HBase (a NOSQL), and it is also compatible with basic ANSI-SQL.
    http://sourceforge.net/projects/jackhare/

    I am wondering what NOSQL platform is used by Xeround. Could anyone answer this question?

  4. Pingback: SQL vs. NoSQL ¿Cuál es el mejor? | Un blog personal

  5. Pingback: SQL vs. NoSQL Which is the best? | Un blog personal

Leave a Reply

Your email address will not be published. Required fields are marked *

*