In this series of hot DBA discussion topics, we will try to compare distributed SQL DBs against the NewSQL DBs to understand the significant differences. However, before we jump into the latest categorization of NewSQL databases, it is vital to understand why the NoSQL DBs like MariaDB, MongoDB, and Cassandra, etc. started gaining popularity in the last decade. They came up into the DBMS playground as innovative alternatives to the relational SQL databases but fell short of their objective.
From our experience, we now know that these databases were monolithic, and the distributed nature of these NoSQL databases was attractive to those applications which needed more scalability. Since most of the NoSQL database systems focused on the key-value (single row) data models and failed to handle the multi-row or relational structures of the conventional SQL language, these weren’t able to be tagged as “SQL” DBs. That is how they were called NoSQL.
In fact, NoSQL originally meant “No support to SQL” but then later re-termed as “Not Only SQL” by realizing that NoSQL databases may have to coexist with SQL databases, but cannot replace them completely. The need for conventional SQL databases persisted with the relational database models supporting single-row and ACID multi-row transactions. As time passed by, NoSQL databases proved out to be architecturally unfit to the server for the consistency-first application needs.
The invent of NewSQL
As these were proved out to be the limitations of NoSQL databases, the large scale OLTP workloads in which scalability and data correctness were critical continued to suffer even with NoSQL. Then started the era of NewSQL DBs, which were started showing up in the early 2010s to address this issue. Matthew Aslett of 451 Research coined the term ‘NewSQL’ in 2011 to categorize this new set of “scalable” DBs. Now, the NewSQL DBs come in two flavors.
- One flavor of NewSQL DBs offers an automated sharding layer over multiple independent instances of the SQL monolithic databases. Say, for example, Vitess DB can handle it in the way how MySQL does it, whereas Citus handles it the same as of PostgreSQL. So, each instance, when taken independently, is similar to the same old monolithic approach. The challenges like native failover, ACID transactions in a distributed manner, etc. remain impossible to handle. Above all, the developers also have to compromise on agility, which they get by only interacting with a single logical SQL DB.
- The second flavor covers DBs like VoltDB, NuoDB, and Clustrix, etc. which are built as distributed storage systems with the objective of keeping the concept of a single logical SQL database in place.
Next, let us evaluate some of the NewSQL Databases with distributed SQL
Vitess provides MySQL automated sharding features. Each of the MySQL instances acts as a shard here. A very consistent key-value store is used in the case of Vitess, which is called ECTD. This helps to store the metadata related to the shard location like which shred is located against which given instance. Vitess also uses VTGate as a set of coordinator nodes. This helps to accept the client queries of the applications and route all those to the corresponding shard based on the pre-stored ECTD mapping. Each such instance uses the master-slave replications as per MySQL.
However, as per RemoteDBA.comexperts, the SQL features like accessing various rows of data spread across multiple rows and across various shards are strongly discouraged in this database application. Some such discouraged features are the global secondary indexes and the cross-chard JOINs. All these reiterate the point that the Vitess cluster lacks the single logical SQL DB notion in a real applicational environment. The developers should be aware of the sharding to account for this shortfall while designing their schema and executing their queries.
Basically, Citus is the PostgreSQL version of Vitess. Plying as the extension of PostgreSQL, Citus can ensure both vertical and horizontal scalability for the write commands to PostgreSQL DB deployments using open sharding. This installation begins with the number of nodes of the PostgreSQL, and each node also has a Citus extension. Afterward, one single node of the ‘number of nodes of PostgreSQL becomes the coordinator node for the situs, and the remaining nodes act as worker’s nodes.
The applications only interact with one coordinator node and will not be aware of the worker nodes existing. The replication-based architecture, which ensures availability even during failures, still acts as master-slave based on the Postgres standards. There may probably be availability and performance bottlenecks with this single-coordinator node constitution. Any slowdown for the coordinator node may ultimately slow down the whole cluster even when the worker nodes may function normally. Similarly, any coordinator node outrage may make the whole cluster down. When worker nodes are unable to interact with client applications directly, there would not be any ways to make the client drivers smarter by caching the shard metadata.
VoltDB acts based on the auto-sharded distributed database architecture. This is a proprietary SQL which has not foreign key support. Intra-cluster replications act on the basis of the K-safety algorithm in which K denotes the number of extra copies of the same data stored at each of the shards. For example, the configuration of K=2 maps to the Replication Factor 3 of the distributed SQL databases by default, i.e., YugabyteDB and Google Spanner, etc.
In the case of VoltDB, the replicas for any given shard get simultaneously updated in a synchronized manner by the client application. However, when the distributed consensus protocols as Paxos and Raft etc. require some writes to be sent to every replica, but only commits so when the majority of the replicas acknowledge the request. In real, waiting for responses from all the replicas is not necessary as the consensus can also be established with the majority. Also, VoltDB may not be able to detect any network partitions but requires an add-on network-fault-protection to be set. When a single node in the cluster is partitioned, fault protection mode gets activated, which may adversely impact the cluster performance, too, by increasing the cluster recovery time for accepting rights.
Other examples are NuoDB (a proprietary NewSQL DB), ClustrixDB (a scale-out SQL DB), and so on. In fact, the NewSQL cloud is still in its infancy, and the distributed SQL DBs like Google Spanner is slowly building up to take advantage of the cloud elasticity to work even on the inherently unreliable database infrastructures.
Navy Veteran Davis Chris Takes the Music Industry by storm
In life, you need to break down anything that might be holding you back and change course if need be...
5 Disruptive Leaders Paving the Way in 2021
Where there is uncertainty, lies a whirlwind of opportunity. 2020 was the year that had entrepreneurs learn a great deal...
Brock Pierce Wants To Disrupt The Two Party System And Be Your Next President
We don’t usually cover politics much here at Disrupt, but when Crypto billionaire and friend of the show, Brock Pierce...
John Mcafee – Predictions For The Future
John McAfee is a world-famous tech CEO, computer scientist, civil disobedience activist, privacy advocate, and pioneer of the commercial anti-virus...
Gaby Wall Street – Teaching Latinas to Thrive During The Crisis
It’s no secret we are facing one of the most challenging financial times of the last few decades as we...
Tony Delgado – The #1 Entrepreneurship Movement In Puerto Rico
Puerto Rican online market is in constant progress. With many entrepreneurs who are coming here to start a business, it...
Elena Cardone – The 10X Ladies Conference Is Declaring 2020 The Decade For Women
The next ten years are meant for women to continue growing their potential and succeeding in multiple areas, including business....
How Josh Elizetxe Built Snow Into a $40 Million Dollar Business
There is nothing quite like an entrepreneur’s determination when starting a business. That’s my original quote by the way (pun...
How Jason Capital Became A Self Made Millionaire By 24
Have you ever wanted to earn the respect of everyone who ever looked down on you at some point in...
Sam Bakhtiar On His Way To A Quarter Billion
Dr. Saman Bakhtiar, who prefers being referred as Sam, lives in an 8200 square foot $5.2 million house, Sam is...
Entrepreneurship1 week ago
Meet Ross Lee, CEO and Music Industry Disruptor
Executive Voice2 weeks ago
10 Leading Website & Web Designers to Watch in 2021
News1 week ago
Tony Lit is on the lead bombing every social media platform covering all the corners of the globe.
Executive Voice2 weeks ago
First Clinical Study Confirming Ayahuasca Is Able To Change Your DNA Expressions
Executive Voice1 week ago
Lashiivo: The Lash Company Taking the Industry by Storm
Executive Voice1 week ago
How SafeTrip Token can help to cut costs in Financial Management?
Executive Voice2 weeks ago
Book Coach Shelley Wilson Teaching Everyone How To Write a Book and Boost Their Coaching Business
Entrepreneurship2 weeks ago
How The CEO of Field Service Tech Startup Contractor+ Aims To Disrupt A $4B+ Industry