Meltano: the universal glue for your Data Pipelines
Why is this Data tool so π₯ right now? Let's explore!
For a few years already, since Meltano sprung into existence as an open-source product on Gitlab, I was impressed by its effectiveness as a tool for ingesting data into the Data Warehouse. I was very happy to see it kept growing and became its own company dedicated to bringing a new data tool into the modern data stack.
If you are unfamiliar with Meltano, itβs an ELT-style (Extract-Load-Transform) tool comparable to Fivetran, Stitch, Airbyte, Pipelinewise, etc. Its main functionality is to hook together taps/extractors (βThe data readersβ) with targets/loaders (βThe data writersβ), optionally with intermediate steps (Called maps).
Stitch has created the βSingerβ specification, which allows you to create taps/targets which can read/write data from any arbitrary source/destination, and then you can compose them together, e.g. if you have:
tap-stripe, tap-salesforce, tap-postgres, target-snowflake, target-postgres
You could compose them in many ways, e.g. if you wanted to copy all data to a Postgres Database, and Snowflake afterward:
tap-stripe to target-postgres, tap-salesforce to target-postgres, and tap-postgres to target-snowflake
Taps and Targets can be written in any language, and are supported by Stitch, PipelineWise and Meltano. Any of these products, in addition to running the tap and target, save state data into a database, which allows your data extracts to be incremental (e.g. if you are downloading data from Stripe, instead of downloading all payments, you only download payments after the last one you downloaded in the last run)
When you find that the system you want to import data from or export data into is not supported by any tap/target, you will have to write all the code to get or write data from your source systems (e.g. the API calls, data transformations, etc.), however, this is made much easier by using the Meltano SDK, which allows you to follow certain patterns and create taps and targets, sometimes solving a new integration in minutes.
As an example, Iβve created these taps over the past few years to import data into the Data Warehouse:
https://github.com/sicarul/tap-geoip
https://github.com/sicarul/tap-coingecko
https://github.com/decentraland/tap-decentraland-thegraph
https://github.com/decentraland/tap-opensea
https://github.com/decentraland/tap-decentraland-api
https://github.com/pulumi/tap-launchdarkly
There are a couple of interesting advantages that you get from creating data integrations like this:
Your integration works with any Singer-enabled tool, not being locked into to any of them
If an API does not have any existing integration, you do not depend on any upstream provider to create an integration, nor do you have to rely on a clunky script just for that API, you can use the Meltano SDK to create a proper integration, share it with the community, and if other users find it useful, maybe even receive contributions to improve it.
If an existing Tap exists but doesnβt expose a certain dataset or column you need, you can always fork it and add the needed features, and hopefully contribute the changes upstream.
As an organization exposing data with an API or similar method, you can share a tap with the βbestβ way to export the data, making it easy for any user to integrate your productβs data with their own Data Warehouse.
The Meltano Singer SDK is a Python library whose methods can be heavily overridden, if the API behaves in a way not supported by the SDK, the relevant methods can be swapped out with the specific behavior needed for your scenario.
An important note is that the data exported using Meltano into the Data Warehouse is just the raw data as is exported from the APIs, I heavily recommend following up the data pipeline with data modeling steps in dbt or whatever data transformation tool you have adopted, to integrate the data from the different systems into a cohesive model that users can understand without needing to understand how data comes from all source systems.
An interesting side effect of Meltanoβs extensibility is that you can also use it as a βReverse ETL toolβ, e.g. if you need to upload customer data from Snowflake into Salesforce, you could set up a tap-snowflake to target-salesforce pipeline.
Happy Data Wrangling!
I love the simplicity of this write up!
I use Meltano (Singer) every day now for writing data to Active Directory, AzureAD, Notifications (target-apprise which is public on the hub!) and more :D , it's actually much easier than writing generalized targets like postgres, duckdb, postgres, etc. Give it a shot!