The Problem
While designing models for an MVP, I faced a common problem in modern web development:
Should I use an SQL or a NoSQL database?
Unsurprisingly the answer to the question above is… It depends! I spent a couple of days reading about this interesting topic. Here, I present a high level overview of what I’ve learned.
Data driven decisions
When it comes to engineering, the best decisions are backed with data and you’re going to need some real-world production data to be able to answer properly. Some of the questions you’ll need to answer are:
- Do you need atomicity?
- What kind of data are you going to store?
- Do you need consistency (ordered events)?
- How do you plan to access your data?
- How complex queries are you going to run?
- How many reads and writes per second will you need?
- How many GB of data do you expect to store in your db?
- Do you need distributed writes?
- Are you going to use your own hardware or a cloud provider?
- Do you want to fine tune the database or standard configuration is okay?
- Do you think I/O will be a constraint? (e.g. EBS powered AWS volumes might be a bad idea)
- etc.
I could go on, but you get the point: too many questions with no clear answer! It is hard to answer these questions at an MVP stage, usually you’ll have to deploy an application, run it in production and see what your real needs are going to be.
NoSQL databases
There are many NoSQL databases out there and they’re build for different use cases! Some fairly common NoSQL database are:
Redis
A fast in-memory key-value store database. Often used by sysadmins for caching, as an alternative to memcached that supports persistence which is nice for session caching. Other use cases are caching frequently accessed data and simple pub/sub implementations. Redis - or any NoSQL for the matter - can be an excellent second addition to speed up database response time.
Neo4j
This one is a graph database. It’s your best bet if you’re building a social network application and you need to solve path problems (e.g. decision trees, linked lists). Graph databases are ACID.
RethinkDB
A database made specifically for realtime applications. This database will push results to your application as soon as they become available! Ideal for applications which need to display data really fast.
MongoDB and ElasticSearch
These are document-oriented NoSQL databases . These kind of DBs are ideal for storing unstructured data (e.g. program logs, metadata, etc.) and issuing fast simple queries.
Cassandra and DynamoDB
A column-oriented NoSQL database. Cassandra is ideal for big data with high volume writes. DynamoDB apparently is used for CRUD applications as well which blurs a bit the standard use case.
Which way to go?
Cassandra, RethinkDB, Redis and Neo4j have specific use cases in my view. So the real question should be whether to use an SQL or a NoSQL document or column oriented database. Here are the pros and cons I’ve found interesting regarding a NoSQL database.
NoSQL database pros:
- Easy object model which developers understand. Data is stored as a hash table.
- NoSQL documents are schemaless, you don’t have to know the exact schema a priory.
- High availability, sharding and scaling is usually baked in, which is a important for devops/sysadmins.
- Thousand fast simultaneous writes per second are possible (e.g. more than 20k inserts per second).
NoSQL database cons:
- Not ACID. This is a big deal for most common use cases
- Expensive deployment compared to SQL (denormlisation, sharding, etc.)
- Complex queries are slow to the nearly impossible, a lot depends on how you design your columns/documents.
- Rather young projects compared to SQL databases.
It seems that the fact that NoSQL databases are schemaless, leads to poor model design. Models need to be designed carefully for NoSQL too or you risk to end up with slow and possibly erroneous responses.
SQL Databases
Most common SQL databases are MySQL, PostgreSQL and MSSQL. In their latest versions, all of the above support some sort of schemaless table in the form of JSON, but the implementations are work in progress.
SQL database pros:
- SQL databases are ACID and as already stated this is a big deal
- Complex queries are fast
- Battle tested, rock solid performance. Most of these database has been around for 20+ years
- Deployment is cheap compared to NoSQL in terms of deployment (normalisation)
SQL database cons:
- HA is complex to setup, you’ll need a DBA at some point
- Changing table schema is complex and time consuming
- Horizontal scaling is complex to setup properly and can be problematic
- Sharding is complex to setup and might lead to custom in-house solutions
- Writes per second might become your bottleneck, especially if they need to concurrent
Still undecided? Go with SQL!
Most DBA experts seem to converge to the following advise for the undecided:
If you’re not 100% sure that you need a NoSQL database go with an SQL!
It’s unlikely that your application will ever reach the limits of an SQL database. If it does you can always switch to a NoSQL database later or combine the two!