Google Cloud vs AWS: Comparing the DBaaS Solutions
The IT landscape is rapidly changing. The public cloud is now seeing widespread enterprise adoption as organizations migrate their workloads and explore the latest technologies for storing and analyzing their data. But at the same time, they face the logistical challenges of migrating their databases and maintaining cloud-based infrastructures.
This makes a compelling case for using Database as a Service (DBaaS) as these solutions streamline many of the tasks involved in database management such as provisioning, administration, data replication, security, and server updates.
But while the DBaaS offerings of the leading cloud vendors share many similarities, they also come with their own individual characteristics to suit different use cases. So, it’s important to understand these differences to find the right fit for your cloud-based application.
In this post, we will compare the core DBaaS options on offer by two of the leading cloud vendors, AWS and Google Cloud Platform, and consider some of the key differences such as the types of databases offered, the underlying infrastructure, and the querying capabilities.
Transactional SQL DBaaS
While NoSQL has seen a huge surge in interest over the last five to ten years, traditional relational databases remain the workhorses for most websites, applications, and legacy systems.
After all, SQL is an almost universally supported language, the data is highly structured, and schemas ensure data integrity without the need for substantial coding. But at the same time, traditional SQL deployments are built on single-node architecture. This presents scaling issues and restricts query performance on larger datasets, which are limited by disk size, CPU, and available memory.
Nevertheless, a cloud-based SQL DBaaS is the ideal solution for moving existing SQL databases to the cloud when your scaling needs are not too great.
Amazon’s Relational Database Service (RDS) is the market leader’s managed relational database service while Cloud SQL is Google’s SQL counterpart. As you’d expect from two mature cloud vendors, both solutions offer automatic replication and are highly durable and available. What’s more, both services provide automated backups.
Database Engines
RDS supports six database engines, Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle and Microsoft SQL Server, whereas Cloud SQL only supports MySQL.
PostgreSQL, MySQL, MariaDB, Oracle and Microsoft SQL Server are hosted on Elastic Block Store (EBS) volumes. As Amazon’s own proprietary database engine, Aurora uses a different storage infrastructure from the other five services. Aurora’s cluster architecture is designed to address some of the scaling and replication issues associated with traditional databases.
Scaling
You can vertically scale your RDS deployment to handle higher loads by increasing the size of your virtual machine. You can do this either through the AWS console or a simple API call. Storage is decoupled from database instances. However, you’ll still need to modify your instance or change storage type to increase your allocated capacity.
Standard RDS provides up to a maximum of 6TB storage. However, it has no automatic resizing capability. Aurora is more flexible and scales automatically in 10GB increments up to a maximum of 64TB storage.
Cloud SQL is somewhat more straightforward. You can increase storage space manually, up to a maximum of 10TB, or configure your instance settings to increase it automatically. You can also modify your machine type by editing your instance settings.
Both RDS and Cloud SQL support read-only horizontal scaling, by which you can add replicas to improve query performance.
Other Features
RDS supports storage volume snapshots, which you can use for point-in-time recovery or share with other AWS accounts. You can also take advantage of its Provisioned IOPS feature, to improve I/O between your database instance and storage. RDS can also be launched in Amazon VPC, whereas Cloud SQL doesn’t yet support a virtual private network (VPN). On the other hand, RDS lacks feature parity across its supported database engines. Cloud SQL is also easier and more flexible when it comes to setting up your database deployments.
Google Cloud Spanner
In addition to Cloud SQL, Google is aiming to transform the SQL database landscape with the forthcoming launch of its new horizontally scalable relational database service, Cloud Spanner. It promises all the benefits of a traditional relational database including ACID transactions, relational schemas, SQL queries, and high availability but with the scale and performance of distributed scale-out architecture.
The service is currently in beta.
NoSQL DBaaS
A new crop of NoSQL databases has emerged in recent years in a bid to address the limitations of the traditional RDBMS. They are specifically designed with clustered architectures in mind. Through their ability to scale horizontally, they’re able to store huge amounts of data in a single deployment.
Some systems can also spread the computational load across nodes, greatly improving performance. And, owing to their distributed nature, they’re also able to take advantage of less expensive commodity servers, reducing your hardware running costs.
NoSQL engines exploit new approaches to structuring and storing data, such as columnar databases, enabling rapid analysis of data at huge scale. However, as transactional databases, they present greater challenges in terms of slower write speeds, consistency and logical complexity.
It’s also important to remember that NoSQL databases are much more geared towards APIs and SDKs for accessing data and do not yet support full-blown query languages.
DynamoDB is currently Amazon’s only NoSQL DBaaS offering whereas Google offers two distinct products: Cloud Datastore and Cloud Bigtable.
Database Models
DynamoDB and Cloud Datastore are based on the document store database model and are therefore similar in nature to open-source solutions MongoDB and CouchDB. In other words, each database is fundamentally a key-value store. But what makes document store slightly different is that the data must be in a form the database can understand.
By contrast, Cloud Bigtable is a wide-column store, so it works on the same principle as Apache Cassandra and HBase.
All three solutions fall into the same database tolerance category as HBase and MongoDB in that they provide strongly consistent operations, ensuring that the latest version of your data is always returned.
Scaling
Cloud Datastore and Cloud Bigtable automatically scale in response to your data size and access patterns. Although you can easily scale DynamoDB in the AWS console or via the API, Amazon doesn’t provide native auto-scaling support. Nevertheless, auto-scaling is still possible by means of third-party solutions such as Dynamic DynamoDB.
With Cloud Bigtable you must specify a cluster size of at least three nodes. This is far in excess of what any small or modest-sized application needs, making the service unsuitable for low-activity databases hosting small amounts of data.
Data Warehouses
In today’s data-driven business environment, the case for an enterprise data warehouse is stronger than ever.
They are large-scale analytical databases designed for analyzing data ingested from a range of different sources. They can run on clustered hardware and process superfast SQL-like queries on huge amounts of data.
But they come with a trade-off.
You cannot use a data warehouse as an operational database. Instead, you must load data into your database before you can start to analyze it.
DBaaS Approaches
Amazon’s data warehousing solution Redshift and Google’s equivalent service BigQuery offer many similar features. However, they take two very different approaches to DBaaS.
Redshift works on similar lines to many of its other computer services, where you specify your cluster resource requirements from a choice of different database instance types or nodes. By contrast, BigQuery is a serverless service. So you don’t need to worry about issues such as capacity provisioning or systems tuning. You simply load in your data and BigQuery takes care of the rest.
Redshift gives you more control over your infrastructure. You can choose between instances with high-throughput HDD and high-I/O attached storage. And you can also fine-tune your infrastructure by choosing a suitable balance between instance size and the number of nodes. On the other hand, BigQuery has virtually no management overhead and scales automatically.
Built-In Features
Both Redshift and BigQuery automatically replicate your data, providing built-in fault tolerance and high availability. They also take advantage of columnar storage, data compression, multi-node sharding, and a fast internal network for high-performance querying.
What’s more, both services support full-blown SQL SELECT statements. However, neither service is designed for INSERT, UPDATE or DELETE commands. Finally, it’s important to remember that these are proprietary analytics engines and so query features may vary.
Managed Deployments Monitoring
Using a cloud-based DBaaS can help your organization overcome many of the challenges of provisioning, managing, and troubleshooting problems with your database deployments. Nevertheless, you should still monitor your cloud systems for issues such as availability, performance, and resource usage, as these could indicate underlying problems such as poor database design or slow SQL queries.
New serverless offerings such as BigQuery are redefining the concept of fully-managed services and giving enterprises a way to host their databases with practically no management overhead.
This could represent the start of a wider trend towards serverless database environments, with significant implications for the way you monitor your cloud infrastructure. System and performance monitoring will become largely the domain of the cloud provider, leaving you to focus on business insights such as website visitor behavior and online sales conversions.
In the long term, monitoring your cloud will be far more about the things that matter directly to your business.