Integrating MongoDB Atlas and Elasticsearch cloud

Vishal Karande
4 min readApr 15, 2021

Introduction:

I had requirement to have a centralized database for hundreds of millions of documents and need to have a low response time for searches.SoI thought to have MongoDB as a database and Elasticsearch to optimize the search results.

Why MongoDB:

MongoDB is an object-oriented, simple, dynamic, and scalable NoSQL database. It is based on the NoSQL document store model. The data objects are stored as separate documents inside a collection — instead of storing the data into the columns and rows of a traditional relational database.

MongoDB atlas is a fully-managed cloud database developed by the same people that build MongoDB. Atlas handles all the complexity of deploying, managing, and healing your deployments on the cloud service provider of your choice (AWS, Azure, and GCP).

Why Elasticsearch:

Elastic search not only index millions of documents but also elasticsearchcan perform queries across all those millions of documents and return accurate results in a fraction of a second.

Elastic Cloud is easy to deploy, operate, and scale Elastic products and solutions in the cloud. From an easy-to-use hosted and managed Elasticsearch experience to powerful, out-of-the-box search solutions.

Integrating MongoDB Atlas and Elasticsearch cloud:

The major challenge in this implementation was integration of MongoDB Atlas and Elastic cloud stack. As both are cloud based manages solution with SSL/TLS enabled authentication mechanism. So existing plugins were having couple of issues.

After evaluating multiple options, I selected Transporter.

  • River Plugin: River Plugin is deprecated
  • Mongo-connector: Mongo-connector currently having some issues with SSL handshake and mongoDB have stopped official support for this plugin.
  • Logstash: Currently Logstash in not a part of Elastic cloud so you may need to implement it on some Virtual Machine. Also, there are issues with new JDBC driver communication using logstash jdbc-input-plugin. Hopefully they could fix this.
  • Mogolastic: It’s open source tool and mongoDB does not have official support for this. Also, it’s jar so separate instance required.
  • Atlas stitch app: Database trigger function can be used to sync the data using webhooks but there would be a separate cost involved & need JavaScript expertise.
  • Transporter: Transporter is open source tool, and it is part of compose stack. It provides statement filter, authentication, sync and multiple other options. I found this is one cool option to keep mongoDB sync with elasticsearch.

Installation and Configuration:

Transporter:

Transporter allows the user to configure a number of data adaptors as sources or sinks. These can be databases, files or other resources. Data is read from the sources, converted into a message format, and then send down to the sink where the message is converted into a

writable format for its destination. The user can also create data transformations in JavaScript which can sit between the source and sink andmanipulate or filter the message flow.

wget https://github.com/compose/transporter/releases/download/v0.5.2/transporter-0.5.2-darwin-amd64

  • Move it in your system /usr/local/bin

mv transporter-*-linux-amd64 /usr/local/bin/transporter

  • Change the Permissions

chmod +x /usr/local/bin/transporter

  • Check if transporter is set up properly

Creating a Pipeline:

A pipeline in Transporter is defined by a JavaScript file named pipeline.jsby default. The built-in initcommand creates a basic configuration file in the correct directory, given a source and sink.

  • Initialize a starter pipeline.jswith MongoDB as the source and Elasticsearch as the sink.

transporter init mongodb elasticsearch

  • Pipeline.jslooks like this
  • Update the variables mentioned in pipeline.js

export ELASTICSEARCH_URI=’https://<username>:<password>@<Elastic cloud URL>:9243/<dbname>’

export MONGODB_URI=’mongodb://<username>:<password>@cluster0-shard-00–00-….mongodb.net/<index>

note: authsource should be refer from user management console, default is admin

  • Now, It’s time to run pipeline

Access the elasticsearch API and check if all looks good, data should be synced now.

You can use transforms for Data modeling using transform.js (Pipeline needs to be updated with transform.js info)

Ref. links —

--

--