Pulling data from MongoDB into Panoply
First, you’ll need to connect your MongoDB instance to Panoply (note: if you’ve already done this, feel free to skip ahead to a later section). From your Panoply dashboard, click on Data Sources in the left pane, then hit Add Data Source in the upper right:
Which will take you into the data source selection section. MongoDB will be under Databases (in the leftmost column):
Selecting MongoDB will open a new pane where you can enter your connection details, which will look something like this:
Note that you’ll need a couple different pieces of information to connect Panoply to your MongoDB database:
- Username: the username you use to connect to your MongoDB database remotely
- Password: the password associated with that username
- Host: the address where your MongoDB database is hosted
- Port: the port your MongoDB database uses for connections
- Database name: the name of your MongoDB database
As you can see in the screengrab above, you’ll need to plug all that information into a MongoDB connection URI, which will have the general form of:
Or, more concretely, it should look something like this:
Note that Panoply encrypts all login information provided by our users. Once you’ve entered your URI, you can expand the Advanced Options section and set your preferences. In the Advanced Options section, you’ll be able to set the following:
- Destination: allows you to give a name to the table that will be created in Panoply after importing, or a collection-specific prefix if you’re planning on importing multiple collection. Learn more here
- Primary Key: allows you to set a primary key that will act as a unique identifier for each record in your table(s), and will also allow you to link related data across tables. If you already have a field in your tables called “id” or something similar, Panoply will use it as a primary key if no other parameter is set. If you don’t set a primary at all, you won’t see any deduplication in your data imports, so every single piece of data from the collection run will be added to your tables–you might not want that. With MongoDB, the default will be id. See the documentation on primary keys to learn more.
- Incremental Key: make sure to set this up if you’re planning on making multiple, recurring pulls from this data source. Doing so will make future imports more efficient, as Panoply will only collect data that has been updated since the last time you pulled from that source. Note that incremental keys will only work properly if you are pulling from a single collection with this connector. See our incremental key documentation to learn more.
- Exclude: you can use this field to exclude specific data elements from your import. If there are specific types of data (e.g. irrelevant or sensitive data) that you would like to exclude from your Panoply data warehouse, you can use this setting to manage that. You can exclude nested fields with dot notation. For example, if your “users” top level object had a field in it called “email”, you could exclude that by putting “users.email” in the exclude section in Panoply. Learn more here.
Once you’ve set all that up, click Collect. The Data sources - MongoDB pane will go gray while the process is underway, and a green status bar will appear at the top of the pane. You don’t need to stay and watch it, though. Panoply is designed to allow for multiple, parallel data collection processes, so you can move on to your next data source–and the one after that–while your MongoDB data is being ingested. Panoply will send you an alert when your data collection run is finished, but you can also monitor progress directly or cancel jobs in the Jobs pane.
Once all your data collection is finished, you can head to the Tables pane and get a bird’s eye view of all your fresh, processed data.