So. When learning about CosmosDb, the biggest thing I have struggled with is the actual concept of ‘What is a document?’ Azure is fairly intuitive (once you get used to the structure and blade layout…) and the SDK is ultimately an SDK – CRUD functionality to a cloud database. It was the actual idea behind what is a document, when should you use a document and why is it better than a relational database?
So what is a document?
I’m going to skip the slightly obvious question (what is a document database) and assume you know that a document database is a database for documents.
So I thought initially that a document was a misleading name for the records that get stored in the database as in my head, there would be an actual word document or pdf stored in there somewhere. I thought it was going to be a super fast version of Sharepoint or something, where documents were extended with meaningful metadata to make something really usable. Let me tell you straight off, I was wrong.
Let’s boil it down and make it super simple.
A document is simply a representation of data.
To take that further, a document is a complete representation of data.
Imagine a book. You wouldn’t expect to have Chapter 1 but nothing else, just like you wouldn’t expect to find only the middle and final chapter. So a document contains everything you need to make sense of it.
Let’s take a look at a JSON document example.
{
"firstName": "David",
"lastName": "Masters",
"age": 20,
"address":
{
"streetAddress":
"123 the street",
"city": "New City",
"postCode": "NR1 1RN"
},
"phoneNumber":
[
{
"type": "home",
"number": "01525 123456"
},
{
"type": "mobile",
"number": "07123 456787"
}
],
"tweets":[
{
"title":"tweet 1",
"body": "detail of tweet 1"
}
]
}
From this, we can see a good example of a document and why it’s different from a relational equivalent. This basic document gives all the information you need to make sense of it, who it is, where they are, how to contact them and then their tweets.
Now a crucial point here is that this document is only good in the context of showing users tweets. Ultimately, if your business wants to know how many pets people have, this won’t do, so defining your document structure should go through a strict business objectives review (similar to when you abstract classes) in order to ensure it is fit for purpose.
How is a document db different to relational db?
This question can be phrased in lots of ways. Big Data vs Data. Relational vs Document. SQL vs NOSQL. Normalised or Denormalised.
Take the document example above. With a single query (like select document from database where document id = 1), you get the above result. To do the same thing in a relational database would require a select statement with a significant amount of joins, to the address, phone numbers and tweets tables. This could be both a good thing and a bad thing depending on what you are trying to achieve.
Ultimately, the information in a document both describes the data, as well as representing it. A document can make sense on its own, independently of any other records. Relational data requires context to make sense. You need to know what you’re asking to get a meaningful result.
When do you use a document model instead of a relational?
This depends on your data, how you plan to consume your data and your distribution needs. You need to understand these things in order to decide if you should use a document or a relational record. If you can represent your data in a single object with no extra context, a document is great.
Documents have no schema, which is great for systems that are constantly evolving as you can add a document that you have never used before and not break the system. Such a thing in a strictly modelled SQL database is unthinkable.
Documents are horizontally scalable which basically means they have more scalability than a traditional SQL database.
If you can define your entire document (like a person, with an address, associated tweets etc) then you can use a document database. If your data is going to be regularly updated and constantly queried, relational is your friend because multiple row/partition queries aren’t how documents dbs were built.
Document databases are a great choice for large systems requiring super scalable global distribution, but to be honest, they seem like an overpowered option for small-scale solutions. Maybe that will change in time but for now, stick with classic relational databases for standard work and only move to documents when you need scalability beyond that of a traditional server.