Building Trust in Your Data Pipelines with Observability

Stefan Dienst

Data Handling & Data Engineering
Python Skill None
Domain Expertise Intermediate
Tuesday 17:10 in None

This talk explores how observability can be applied to data pipelines to improve reliability, data quality, and confidence in complex data systems.

The talk begins with an introduction to observability in the context of data engineering. It explains the three core pillars: metrics, alarms, and logs, and discusses why observability is particularly important for data pipelines, where failures are often silent and correctness issues may only surface through stakeholder complaints.

The first section focuses on metrics. It demonstrates how straightforward it can be to instrument data pipelines with basic metrics using Python. The talk then discusses which metrics are worth monitoring, adapting established concepts such as the four golden signals to data engineering use cases. A concrete example based on a near–real-time event processing pipeline illustrates how fine-grained metrics can reveal systematic failures for specific event types.

The second section focuses on alerting. It addresses the challenge that engineers rarely have time to continuously inspect dashboards and therefore rely on alarms to surface important issues. The talk outlines what makes a good alarm, emphasizing that alarms should be actionable, reliable, and provide sufficient context for investigation. A scenario with excessive and noisy alarms is used to illustrate alarm fatigue and a strategy how to get out of such a situation is described.

The final section covers log messages and their importance to reason about how a pipeline ended up in a specific state. It discusses why logs are often difficult to work with in data pipelines, as they may contain a mixture of critical errors, informational messages, and low-level framework output. The talk introduces structured logging as a way to add context and make logs easier to search, filter, and aggregate. Examples include monitoring the distribution of log levels to uncover hidden issues and using centralized logging to identify dependencies between pipelines that are otherwise hard to detect.

The talk concludes by emphasizing how the three pillars of observability build trust in a data pipeline.

Stefan Dienst

Stefan is a data engineer and works at Covestro in a newly established data office. He has four years of experience working on a variety of data platforms, ranging from classic ETL pipelines and data warehousing to near–real-time stream processing. Before moving into data engineering, he completed a PhD in physics, where he felt in love with Python and working with data. Since then he is always curious to learn new things and share what he has learned with others.