Integrating MongoDB Atlas and Elasticsearch cloud

4 min readApr 15, 2021

Introduction:

I had requirement to have a centralized database for hundreds of millions of documents and need to have a low response time for searches.SoI thought to have MongoDB as a database and Elasticsearch to optimize the search results.

Why MongoDB:

MongoDB is an object-oriented, simple, dynamic, and scalable NoSQL database. It is based on the NoSQL document store model. The data objects are stored as separate documents inside a collection — instead of storing the data into the columns and rows of a traditional relational database.

MongoDB atlas is a fully-managed cloud database developed by the same people that build MongoDB. Atlas handles all the complexity of deploying, managing, and healing your deployments on the cloud service provider of your choice (AWS, Azure, and GCP).

Why Elasticsearch:

Elastic search not only index millions of documents but also elasticsearchcan perform queries across all those millions of documents and return accurate results in a fraction of a second.

Elastic Cloud is easy to deploy, operate, and scale Elastic products and solutions in the cloud. From an easy-to-use hosted and managed Elasticsearch experience to powerful, out-of-the-box search solutions.

Integrating MongoDB Atlas and Elasticsearch cloud:

The major challenge in this implementation was integration of MongoDB Atlas and Elastic cloud stack. As both are cloud based manages solution with SSL/TLS enabled authentication mechanism. So existing plugins were having couple of issues.

After evaluating multiple options, I selected Transporter.

River Plugin: River Plugin is deprecated
Mongo-connector: Mongo-connector currently having some issues with SSL handshake and mongoDB have stopped official support for this plugin.
Logstash: Currently Logstash in not a part of Elastic cloud so you may need to implement it on some Virtual Machine. Also, there are issues with new JDBC driver communication using logstash jdbc-input-plugin. Hopefully they could fix this.
Mogolastic: It’s open source tool and mongoDB does not have official support for this. Also, it’s jar so separate instance required.
Atlas stitch app: Database trigger function can be used to sync the data using webhooks but there would be a separate cost involved & need JavaScript expertise.
Transporter: Transporter is open source tool, and it is part of compose stack. It provides statement filter, authentication, sync and multiple other options. I found this is one cool option to keep mongoDB sync with elasticsearch.

Installation and Configuration:

Transporter:

Transporter allows the user to configure a number of data adaptors as sources or sinks. These can be databases, files or other resources. Data is read from the sources, converted into a message format, and then send down to the sink where the message is converted into a

writable format for its destination. The user can also create data transformations in JavaScript which can sit between the source and sink andmanipulate or filter the message flow.

Download transporter

wget https://github.com/compose/transporter/releases/download/v0.5.2/transporter-0.5.2-darwin-amd64

Move it in your system /usr/local/bin

mv transporter-*-linux-amd64 /usr/local/bin/transporter

Change the Permissions

chmod +x /usr/local/bin/transporter

Check if transporter is set up properly

Creating a Pipeline:

A pipeline in Transporter is defined by a JavaScript file named pipeline.jsby default. The built-in initcommand creates a basic configuration file in the correct directory, given a source and sink.

Initialize a starter pipeline.jswith MongoDB as the source and Elasticsearch as the sink.

transporter init mongodb elasticsearch

Pipeline.jslooks like this

Update the variables mentioned in pipeline.js

export ELASTICSEARCH_URI=’https://<username>:<password>@<Elastic cloud URL>:9243/<dbname>’
export MONGODB_URI=’mongodb://<username>:<password>@cluster0-shard-00–00-….mongodb.net/<index>

note: authsource should be refer from user management console, default is admin

Now, It’s time to run pipeline

Access the elasticsearch API and check if all looks good, data should be synced now.

You can use transforms for Data modeling using transform.js (Pipeline needs to be updated with transform.js info)

Ref. links —

How To Sync Transformed Data from MongoDB to Elasticsearch with Transporter on Ubuntu 16.04 |…

Transporter is an open-source tool for moving data across different data stores. Developers often write one-off scripts…

www.digitalocean.com

Get Started with Atlas - MongoDB Atlas

MongoDB Atlas provides an easy way to host and manage your data in the cloud. This tutorial guides you through creating…

docs.atlas.mongodb.com

How do I sign up? | Elasticsearch Service Documentation | Elastic

To sign up, all you need is an email address: You are ready to create your first deployment. Your 14-day free trial…

www.elastic.co

compose/transporter

Sync data between persistence engines, like ETL only not stodgy - compose/transporter

github.com