Coursera

Open source Data Engineering with Spark, dbt & Airflow Professional Certificate

Grow your skills with Coursera Plus for $239/year (usually $399). Save now.

Coursera

Open source Data Engineering with Spark, dbt & Airflow Professional Certificate

Build Production Data Pipelines at Scale.

Explore Spark, dbt, and Airflow to design, automate, and deploy enterprise-grade data pipelines.

Included with Coursera Plus

Earn a career credential that demonstrates your expertise
Intermediate level

Recommended experience

4 weeks to complete
at 10 hours a week
Flexible schedule
Learn at your own pace
Earn a career credential that demonstrates your expertise
Intermediate level

Recommended experience

4 weeks to complete
at 10 hours a week
Flexible schedule
Learn at your own pace

What you'll learn

  • Build modular, production-grade data pipelines using Apache Spark, dbt, and Airflow to ingest, transform, and load data at scale.

  • Design and implement dimensional data models including star schemas, SCD Type 2, and incremental load strategies for data warehouses.

  • Optimize distributed data processing by resolving Spark shuffle, skew, and partitioning issues to improve pipeline performance.

  • Automate deployments and enforce data quality using CI/CD pipelines, Docker containers, and automated testing frameworks like Great Expectations.

Details to know

Shareable certificate

Add to your LinkedIn profile

Taught in English
Recently updated!

March 2026

See how employees at top companies are mastering in-demand skills

 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Advance your career with in-demand skills

  • Receive professional-level training from Coursera
  • Demonstrate your technical proficiency
  • Earn an employer-recognized certificate from Coursera

Professional Certificate - 6 course series

What you'll learn

  • Build end-to-end data pipelines that automatically ingest from databases, APIs, and streams using Spark, dbt, and Airflow tools.

  • Design data models with historical tracking using SCD Type 2 patterns to preserve complete change history for analytics.

  • Create automated workflows with intelligent retry logic, SLA monitoring, and parameterization for production reliability.

  • Optimize Spark job performance using partitioning and caching strategies to achieve 30%+ runtime improvements.

Skills you'll gain

Category: Data Transformation
Category: Enterprise Security
Category: Data Validation
Category: Data Pipelines
Category: Data Modeling
Category: Data Flow Diagrams (DFDs)
Category: Data Warehousing
Category: Apache Airflow
Category: Apache Spark
Category: Data Quality
Category: Extract, Transform, Load
Category: Data Architecture
Category: Data Integration
Category: Data Processing

What you'll learn

  • Optimize Spark job performance through strategic partitioning and caching, achieving 30%+ runtime improvements using data access analysis.

  • Implement transactional data lakes with Delta format, enabling versioning, ACID operations, and schema evolution for reliable datasets.

  • Provision secure cloud data infrastructure using IAM policies, private networks, and encrypted storage following security best practices.

  • Evaluate and benchmark storage formats (Parquet, ORC, Avro) to select optimal solutions for analytical workloads and cost efficiency.

Skills you'll gain

Category: Infrastructure as Code (IaC)
Category: Data Warehousing
Category: Apache Spark
Category: Data Integrity
Category: Cloud Security
Category: Data Security
Category: Data Storage Technologies
Category: Cloud Computing Architecture
Category: Data Infrastructure
Category: Amazon S3
Category: Data Lakes
Category: Transaction Processing
Category: PySpark
Category: Cloud Storage
Category: Cloud Deployment
Category: Data Management
Category: Cloud Computing
Category: Infrastructure Architecture
Category: Performance Tuning
Category: Data Storage

What you'll learn

  • Design star schema data models with fact and dimension tables that enable intuitive self-service business intelligence reporting.

  • Apply third normal form normalization to optimize database structure while maintaining query performance through indexing strategies.

  • Use advanced SQL window functions to calculate rolling metrics, rankings, and time-series analytics for complex data analysis.

  • Implement database replication and incremental loading techniques to ensure high availability and efficient data warehouse updates.

Skills you'll gain

Category: Performance Tuning
Category: Data Architecture
Category: Database Development
Category: Database Design
Category: Star Schema
Category: Data Modeling
Category: Extract, Transform, Load
Category: Data Infrastructure
Category: Data Integration
Category: Data Quality
Category: Database Architecture and Administration
Category: Data Warehousing
Category: Relational Databases
Category: SQL
Category: Database Software
Category: Business Intelligence
Category: Database Management

What you'll learn

  • Resolve merge conflicts and trace bugs using Git history tools, keeping collaborative codebases stable and production-ready.

  • Design branching strategies and automate deployments with CI/CD pipelines to safely promote data pipeline artifacts across environments.

  • Build and publish versioned Docker images and automate server configuration with Ansible for consistent, reproducible environments.

  • Analyze query execution metrics and optimize resource allocation to maintain performance targets in production data systems.

Skills you'll gain

Category: Continuous Deployment
Category: Data Pipelines
Category: Continuous Integration
Category: CI/CD
Category: Data Infrastructure
Category: Containerization
Category: Application Deployment
Category: Root Cause Analysis
Category: Docker (Software)
Category: Git (Version Control System)
Category: Performance Tuning
Category: DevOps
Category: Development Environment
Category: Ansible
Category: Infrastructure as Code (IaC)
Category: Automation
Category: Version Control

What you'll learn

  • Define and automate data quality tests using YAML to validate row counts, null thresholds, and uniqueness across pipeline datasets.

  • Trace data anomalies through pipeline stages by analyzing logs and dashboards to identify and fix the exact source of failure.

  • Apply advanced Python debugging tools — including conditional breakpoints, watchpoints, and pdb — to diagnose and resolve pipeline issues.

  • Resolve complex concurrency bugs by reading stack traces and correlating thread logs to identify deadlocks and race conditions in code.

Skills you'll gain

Category: Test Automation
Category: Data Integrity
Category: Development Testing
Category: Anomaly Detection
Category: Data Quality
Category: Root Cause Analysis
Category: Performance Tuning
Category: YAML
Category: Generative AI
Category: DevOps
Category: Data Pipelines
Category: Dashboard
Category: Debugging
Category: Data Validation
Category: Reliability
Category: Python Programming

What you'll learn

  • Build a data engineering portfolio with end-to-end pipeline projects that prove your ability to design, build, and deploy production-style systems.

  • Create a resume, LinkedIn profile, and GitHub presence that position you as a hands-on data engineer ready to contribute from day one.

  • Practice real data engineering interview scenarios and develop structured responses to technical, design, and behavioral questions.

  • Execute a 30-day career launch plan covering portfolio completion, job applications, and networking in the data engineering community.

Skills you'll gain

Category: Professional Networking
Category: Apache
Category: Professional Development
Category: Portfolio Management
Category: Data Quality
Category: GitHub
Category: Apache Spark
Category: Apache Airflow
Category: Software Development
Category: Communication
Category: Python Programming
Category: Data Infrastructure
Category: Interviewing Skills
Category: Collaboration
Category: SQL
Category: Data Pipelines

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructor

Professionals from the Industry
321 Courses 45,807 learners

Offered by

Coursera

Why people choose Coursera for their career

Felipe M.

Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."
Coursera Plus

Open new doors with Coursera Plus

Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Frequently asked questions

¹Based on Coursera learner outcome survey responses, United States, 2021.