fbpx
Connect with us

Community

A look Into Latest DBA Discussions – Distributed SQL vs. NewSQL Databases

Published

on

In this series of hot DBA discussion topics, we will try to compare distributed SQL DBs against the NewSQL DBs to understand the significant differences. However, before we jump into the latest categorization of NewSQL databases, it is vital to understand why the NoSQL DBs like MariaDB, MongoDB, and Cassandra, etc. started gaining popularity in the last decade. They came up into the DBMS playground as innovative alternatives to the relational SQL databases but fell short of their objective.

From our experience, we now know that these databases were monolithic, and the distributed nature of these NoSQL databases was attractive to those applications which needed more scalability. Since most of the NoSQL database systems focused on the key-value (single row) data models and failed to handle the multi-row or relational structures of the conventional SQL language, these weren’t able to be tagged as “SQL” DBs. That is how they were called NoSQL.

In fact, NoSQL originally meant “No support to SQL” but then later re-termed as “Not Only SQL” by realizing that NoSQL databases may have to coexist with SQL databases, but cannot replace them completely. The need for conventional SQL databases persisted with the relational database models supporting single-row and ACID multi-row transactions. As time passed by, NoSQL databases proved out to be architecturally unfit to the server for the consistency-first application needs.

The invent of NewSQL

As these were proved out to be the limitations of NoSQL databases, the large scale OLTP workloads in which scalability and data correctness were critical continued to suffer even with NoSQL. Then started the era of NewSQL DBs, which were started showing up in the early 2010s to address this issue. Matthew Aslett of 451 Research coined the term ‘NewSQL’ in 2011 to categorize this new set of “scalable” DBs. Now, the NewSQL DBs come in two flavors.

  1. One flavor of NewSQL DBs offers an automated sharding layer over multiple independent instances of the SQL monolithic databases. Say, for example, Vitess DB can handle it in the way how MySQL does it, whereas Citus handles it the same as of PostgreSQL. So, each instance, when taken independently, is similar to the same old monolithic approach. The challenges like native failover, ACID transactions in a distributed manner, etc. remain impossible to handle. Above all, the developers also have to compromise on agility, which they get by only interacting with a single logical SQL DB.
  2. The second flavor covers DBs like VoltDB, NuoDB, and Clustrix, etc. which are built as distributed storage systems with the objective of keeping the concept of a single logical SQL database in place.

Next, let us evaluate some of the NewSQL Databases with distributed SQL

  1. Vitess

Vitess provides MySQL automated sharding features. Each of the MySQL instances acts as a shard here. A very consistent key-value store is used in the case of Vitess, which is called ECTD. This helps to store the metadata related to the shard location like which shred is located against which given instance. Vitess also uses VTGate as a set of coordinator nodes. This helps to accept the client queries of the applications and route all those to the corresponding shard based on the pre-stored ECTD mapping. Each such instance uses the master-slave replications as per MySQL.

However, as per RemoteDBA.comexperts, the SQL features like accessing various rows of data spread across multiple rows and across various shards are strongly discouraged in this database application. Some such discouraged features are the global secondary indexes and the cross-chard JOINs. All these reiterate the point that the Vitess cluster lacks the single logical SQL DB notion in a real applicational environment. The developers should be aware of the sharding to account for this shortfall while designing their schema and executing their queries.

  1. Citus

Basically, Citus is the PostgreSQL version of Vitess. Plying as the extension of PostgreSQL, Citus can ensure both vertical and horizontal scalability for the write commands to PostgreSQL DB deployments using open sharding. This installation begins with the number of nodes of the PostgreSQL, and each node also has a Citus extension. Afterward, one single node of the ‘number of nodes of PostgreSQL becomes the coordinator node for the situs, and the remaining nodes act as worker’s nodes.

The applications only interact with one coordinator node and will not be aware of the worker nodes existing. The replication-based architecture, which ensures availability even during failures, still acts as master-slave based on the Postgres standards. There may probably be availability and performance bottlenecks with this single-coordinator node constitution. Any slowdown for the coordinator node may ultimately slow down the whole cluster even when the worker nodes may function normally. Similarly, any coordinator node outrage may make the whole cluster down. When worker nodes are unable to interact with client applications directly, there would not be any ways to make the client drivers smarter by caching the shard metadata.

  1. VoltDB

VoltDB acts based on the auto-sharded distributed database architecture. This is a proprietary SQL which has not foreign key support. Intra-cluster replications act on the basis of the K-safety algorithm in which K denotes the number of extra copies of the same data stored at each of the shards. For example, the configuration of K=2 maps to the Replication Factor 3 of the distributed SQL databases by default, i.e., YugabyteDB and Google Spanner, etc.

In the case of VoltDB, the replicas for any given shard get simultaneously updated in a synchronized manner by the client application. However, when the distributed consensus protocols as Paxos and Raft etc. require some writes to be sent to every replica, but only commits so when the majority of the replicas acknowledge the request. In real, waiting for responses from all the replicas is not necessary as the consensus can also be established with the majority. Also, VoltDB may not be able to detect any network partitions but requires an add-on network-fault-protection to be set. When a single node in the cluster is partitioned, fault protection mode gets activated, which may adversely impact the cluster performance, too, by increasing the cluster recovery time for accepting rights.

Other examples are NuoDB (a proprietary NewSQL DB), ClustrixDB (a scale-out SQL DB), and so on. In fact, the NewSQL cloud is still in its infancy, and the distributed SQL DBs like Google Spanner is slowly building up to take advantage of the cloud elasticity to work even on the inherently unreliable database infrastructures.

Ariya Stark is a blogger and content writer who write many articles on Business, Web Design, Social Media, and Technology. She enjoys reading a new thing on the internet. She spends a lot of time on social media.

Become A Crypto Expert

Categories

Recent Stories

Trending


Copyright © 2022 Disrupt ™ Magazine is a Minority Owned Privately Held Company - Disrupt ™ was founder by Puerto Rican serial entrepreneur and philanthropist Tony Delgado who is on a mission to transform Latin America using the power of education and entrepreneurship.

Disrupt ™ Magazine
151 Calle San Francisco
Suite 200
San Juan, Puerto Rico, 00901

Opinions expressed by Disrupt Contributors are their own. Disrupt Magazine invites voices from many diverse walks of life to share their perspectives on our contributor platform. We are big believers in freedom of speech and while we do enforce our community guidelines, we do not actively censor stories on our platform because we want to give our contributors the freedom to express their opinions. Articles are not commissioned by our editorial team, and opinions expressed by our community contributors do not reflect the opinions of Disrupt or its employees.
We are committed to fighting the spread of misinformation online so if you feel an article on our platform goes against our community guidelines or contains false information, we do encourage you to report it. We need your help to fight the spread of misinformation. For more information please visit our Contributor Guidelines available here.


Disrupt ™ is the voice of latino entrepreneurs around the world. We are part of a movement to increase diversity in the technology industry and we are focused on using entrepreneurship to grow new economies in underserved communities both here in Puerto Rico and throughout Latin America. We enable millennials to become what they want to become in life by learning new skills and leveraging the power of the digital economy. We are living proof that all you need to succeed in this new economy is a landing page and a dream. Disrupt tells the stories of the world top entrepreneurs, developers, creators, and digital marketers and help empower them to teach others the skills they used to grow their careers, chase their passions and create financial freedom for themselves, their families, and their lives, all while living out their true purpose. We recognize the fact that most young people are opting to skip college in exchange for entrepreneurship and real-life experience. Disrupt Magazine was designed to give the world a taste of that.