Open Source Comparison: Elasticsearch vs. MongoDB

By: Gedalyah Reback

May 19, 2020

Elasticsearch and MongoDB are the two most popular distributed datastores used to manage NoSQL data. Both of these technologies are highly scalable and have document-oriented design at the core. There are differences between the two technologies, however, and it’s important to understand these differences in order to choose the right one for your use case. This blog post will pit Elasticsearch vs. MongoDB and examine differences between these two databases in a number of areas.

About Elasticsearch

Elasticsearch is an open-source, Java-written, distributed RESTful search engine. It has been built on top of Apache Lucene and extends Lucene’s functionality with HTTP web interface and data distribution using the index and shards concept. An index in Elasticsearch is similar to a database. It organizes data under a namespace, has a defined schema, and can be divided into multiple shards for horizontal scaling. Each record in Elasticsearch is stored as a JSON object and is called a “document.”

Some of the core features of Elasticsearch include:

Distributed search
High availability
REST interface
Powerful query DSL
Multitenancy
Geo search
Horizontal scaling

Despite having a rich list of feature sets, Elasticsearch is not the perfect data store for all scenarios. It has a few limitations that need to be taken into account when choosing the right data store for your application.

About MongoDB

MongoDB is a document-oriented database written in C++ with design in mind to handle terabytes of data spread across multiple geolocations. In MongoDB, you can create multiple databases, and each database can have multiple collections (tables). As with Elasticsearch, each record in MongoDB enters storage as a JSON object we call a “document.” MongoDB is also schemaless database that supports built-in security features like authentication, access control, and encryption.

Some of the core features of MongoDB are:

Distributed document storage
High availability
Schemaless
Powerful queries and aggregations
Horizontal scaling
Built-in security
Great indexing capabilities
Geo search
GridFS to store any size document

The biggest limitations of MongoDB are its inability to provide full-text search at speed and its lack of some search functions, like tokenizing text.

Elasticsearch vs. MongoDB: A Detailed Comparison

As illustrated above, these technologies have a lot of similarities in their designs and features. That said, they differ greatly in nature. Elasticsearch is primarily a search server, while MongoDB is primarily a database. Let’s look at the differences between them in other areas.

Use Cases

Your use case will be critical in deciding which technology is the perfect fit. Elasticsearch will always be the better choice when full-text search is a requirement. Elasticsearch also wins the race when it comes to log analytics, since not only does it offer a wide range of aggregation queries, it also supports products like Kibana, Logstash, and beats—all of which make log analysis much easier.

On the other hand, when the data is in NoSQL format and you need a highly scalable database which requires CRUD operations without full-text search support, MongoDB is a reliable choice. MongoDB also supports full-text queries with the help of text-based indexes, but its search speed is slow and it lacks the tokenizers and analyzers that come with a search server

Configurations Files

The installation package of both Elasticsearch and MongoDB are available under all flavour of Linux, windows and Mac operating systems. Once you install the package, the default configurations are good to start with, but here are some important configuration parameters that you should modify before taking them into production. All the following configuration options are shown as per Linux Operating System.

You will find configuration files for Elasticsearch under /etc/elasticsearch/config directory as shown below:

config
|-- elasticsearch.keystore
|-- elasticsearch.yml
|-- jvm.options
|-- log4j2.properties
|-- role_mapping.yml
|-- roles.yml
|-- users
`-- users_roles

Where, all the Mongodb configurations can be done inside only a single file which is found under /etc/mongod.conf

Backup Recovery

Both Elasticsearch and MongoDB offer backup and recovery functionality by default.

Elasticsearch performs incremental backups using _snapshot REST endpoint with the help of plugins, and its backup destinations can vary from file systems to cloud storage. The benefit of snapshots is that they are incremental in nature. You can delete old snapshots easily, and the recovery of snapshots is super easy to configure. The snapshot API does not offer queryable backup, however.

For example,

To take the Elasticsearch backup in S3 buckets, you have to install the S3 repository plugin on every Elasticsearch node, using following command:

sudo bin/elasticsearch-plugin install repository-s3

And then register a repository inside your AWS s3 bucket,

curl -X PUT "localhost:9200/_snapshot/test_s3_repository?pretty" -H 'Content-Type: application/json' -d'

{
  "type": "s3",
  "settings": {
    "bucket": "s3_bucket_name"
  }
}

Once repository you register the repository, you can start taking snapshots using following command:

curl -X PUT "localhost:9200/_snapshot/test_s3_repository/snapshot_1?pretty"

MongoDB offers multiple ways to perform backups. The first is the “mongodump” tool, which ships with the MongoDB installation and is the most common solution DevOps teams use. While mongodump has some limitations—it does not take incremental backups and is not effective for large databases—it offers features like 1) queryable backup, 2) whole database backup, and 3) individual collection.

To implement incremental backup in MongoDB, you need to use the MongoDB oplog, which is a capped collection. It is also possible to create a backup of a MongoDB deployment by taking a snapshot of the file system. This makes a copy of MongoDB’s underlying data files. MongoDB’s Enterprise edition allows you to access other options, like MongoDB Atlas, MongoDB Cloud Manager, and MongoDB Ops Manager.

To take the backup using mongodump, you just need to run following command:

mongodump --db <database_name> --host <mongohost_ip_address>

But, unlike Elasticsearch snapshots, mongo dumps will save on local disk and not into the S3 buckets or any other cloud storage.

Popularity

According to DB-Engines, Elasticsearch has the number one rank in search engines and seventh overall. MongoDB takes the number one spot in document store databases and fifth overall.

Figure 1: DB-Engines Ranking—Elasticsearch vs. MongoDB Popularity (Source: DB-Engines)

Support for Handling Relational Data

NoSQL data stores are good for scaling, high throughput for writing, and reading queries. However, they can’t handle relational data and do not possess the ACID properties offered by relational databases. Relational databases store data in rows and columns. While you can normalize easily, Elasticsearch and MongoDB support the document model. Therefore, they focus on keeping data in a denormalized format.

While there is no hard and fast rule for data modeling in these data stores, it is customary to either rely on keeping duplicate data in documents or perform application-side joins.

Despite its limitations, Elasticsearch has two built-in functionalities for handling relational data: the 1) nested and 2) parent-child document models.

MongoDB also has two ways to handle relational data. One is the embedded document model, in which related objects enter storage as subdocuments. The other is the reference model, which includes links or references from one document to another.

Data Storage Architecture: Lucene vs. C++

Elasticsearch was built on top of Lucene and uses Lucene segments to write data inside inverted indexes. The metadata information—such as index mapping, settings, and other cluster states—is written in Elasticsearch files on top of Lucene.

The problem with Lucene segments is they are immutable in nature and each commit creates a new segment. These segments merge behind the scenes based on merge settings. This makes data updates into heavy operations because, when each document updates in place, a new document is generates and overrides the previous.

To avoid generating too many segments and significant I/O, Elasticsearch maintains a transactional log for each index, avoiding a low-level Lucene commit on each indexing operation. Transaction logs are also useful for recovering data in case of a crash or data corruption event.

MongoDB’s underlying storage model is completely different from Elasticsearch’s. MongoDB is written in C++ and uses a memory map file to map an on-disk data file to an in-memory byte array. It uses a doubly linked list data structure to organize the data. Each document contains a linked list to every other document as well as to the actual BSON-encoded data under the hood. MongoDB uses journal logs to help with database recovery in case of a hard shutdown. Eventually, the MongoDB process will kill itself off if there is low system memory or very high utilization of other system resources.

These differences demonstrate that MongoDB is built for 1) high write and 2) update throughput without causing high CPU and disk I/O issues.

Document Size

The default maximum document size that Elasticsearch supports only goes up to 100 MB, although you can increase this maximum to 2GB—Lucene’s limit. However, it’s important to keep in mind that very large documents often create additional issues.

By default, MongoDB supports the storage of documents up to 16 MB. You can store larger documents using GridFS functionality.

Licensing Model, Monitoring, and Security

Elasticsearch is an open source tool that comes with an Apache 2.0 license. The open-source distribution has all of the functionalities you need for building a search application along with basic security features. For alerting and monitoring, you can opt for various open source plugins. If you need additional functionalities like advanced security, alerting, and machine learning, you will have to purchase a subscription to the Gold, Platinum, or Enterprise versions of Elasticsearch.

MongoDB is also free to use, and its community edition comes with a Server Side Public License (SSPL) v1.0. The community edition contains all of the core MongoDB features, like basic monitoring tools and security. If you plan to explore and use advanced features like MongoDB Management, Advanced Monitoring, In-Memory database engine, and BI-Connector, you can opt for the MongoDB Enterprise edition.

It’s also possible to monitor MongoDB using the ELK Stack, which you can read more about in this tutorial.

Programming Language: Java vs. Lucene

Elasticsearch is written in Java, and MongoDB is written in C++; however, both of these technologies offer a wide range of client support in multiple languages. Elasticsearch has clients available for Java, Javascript, Ruby, GO, .NET, PHP, Perl, Python, and Rust. In addition, there are several community-contributed clients available for languages like C++, Scala, and R.

MongoDB offers an even wider range of drivers for languages like C, C++, Scala, and Swift. There are also multiple community-contributed clients available for MongoDB.

Summary

Both Elasticsearch and MongoDB have design in mind for specific use cases, but there may be common scenarios where the choice of one tool over the other may be more complex. In this blog, we have reviewed and compared various features of both technologies in order to help you make these more difficult decisions.

To summarize, MongoDB is a very popular and scalable NoSQL database that is a leader in document-oriented databases. It is usually the best solution when the use case necessitates a highly scalable database with high throughput transactions. When it comes to handling full-text search, log analytics, finding anomalies, and root cause detection, Elasticsearch is the clear winner.