Many data pipelines are miserable to monitor and troubleshoot. This used to be true of applications as well, but the state-of-the-art for application development has improved its processes and tooling:
Observability is a fast-growing concept in the Ops community that caught fire in recent years, led by major monitoring/logging companies and thought leaders like Datadog, Splunk, New Relic, and Sumo Logic. Observability allows engineers to understand if a system works like it is supposed to work, based on a deep understanding of its internal state and context of where it operates.
So...let's just apply these tools to data engineering, right? Wrong.
Using these general-purpose tools, Data Engineering teams can gain insight into high-level job (or DAG) statuses and summary database performance but will lack visibility into the right level of information they need to manage their pipelines. The reason standard tools don’t cut it is because data pipelines behave very differently than software applications and infrastructure.
This is just the setup: the post itself contains tons of detailed suggestions for data eng observability.