Why do we need NoSQL over SQL and how do we choose the best database for the project?
In this article we will learn about databases RDBMS and NoSQL.
What is a Database?
Database is Structured, provides Random access that will help us to search data quickly based on index, having low latency and provides ACID properties.
ACID properties:
- Atomicity : The data will be transfer either full or none will be transferred. Example: When you transfer money online using netbanking , either full amount should be transferred or none should be transferred.
- Consistency : Data should be consistent.
- Isolation : If multiple people are working in same bank / doing transfers at the same time, the there should be a sequencing/locking mechanism
- Durability : Whenever there is a database failure, the system should be up and running to save the data.
Why do the companies want NoSQL over SQL?
NoSQL characteristics:
A distributed system is a network that stores data on more than one node (physical or virtual machines) at the same time. Because all cloud applications are distributed systems, it’s essential to understand the CAP theorem when designing a cloud app so that you can choose a data management system that delivers the characteristics your application needs most.
CAP theorem explained:
- Consistency -- Each node will hold the latest value. It can't give the old value.
- For example: In banking, we can withdraw / transfer money based on latest balance. If there is no clear latest information, it will deny the transaction.
- Availability -- System should always give response. There is no guarantee that the value is latest.
- Partitions to tolerance -- system will continue to operate even when there is a network partition / failure
CAP theorem says that out of these 3 we can get only 2.We can have the system that gives all 3 components. In case of distributed computing, we should have partitions to tolerance. Hence, distributed systems fall into 2 categories
- CA Database (Consistency & Availability):
- A CA database delivers consistency and availability across all nodes. It can’t do this if there is a partition between any two nodes in the system, however, and therefore can’t deliver fault tolerance. In a distributed system, partitions can’t be avoided. So, while we can discuss a CA distributed database in theory, for all practical purposes, a CA distributed database can’t exist.
- However, this doesn’t mean you can’t have a CA database for your distributed application if you need one. Many relational databases, such as PostgreSQL, deliver consistency and availability and can be deployed to multiple nodes using replication.
- AP Database.(Availability & Partitions to tolerance) :
- An AP database delivers availability and partition tolerance at the expense of consistency. When a partition occurs, all nodes remain available but those at the wrong end of a partition might return an older version of data than others. (When the partition is resolved, the AP databases typically resync the nodes to repair all inconsistencies in the system.)
- Typically NoSQL databases like Cassandra and DynamoDB.
- CP Database - Consistency and Partitions to tolerance :
- A CP database delivers consistency and partition tolerance at the expense of availability. When a partition occurs between any two nodes, the system has to shut down the non-consistent node (i.e., make it unavailable) until the partition is resolved.
- Typically NoSQL databases like HBase and MongoDB
- When do we choose AP over CP ?:The systems should be available immediately. We want some results though they are not latest.
- Ex: Booking a hotel. The system is not able to verify the latest price due to network partition. But the hotel price is available. In Travel industry, if there is an error, then the customer will go to some other portals to book hotels. So Availability is the main thing here than consistency. The travel portals like gobibo, booking.com prefer availability over consistency.
- When do we choose CP over AP ? : Preferred when we need latest results.
- For example, in chats like WhatsApp, the system can wait until the network is connected and deliver the latest message.
- Also in banking applications, the system can give error but it should not work on wrong balance.
Comments
Post a Comment