- Overview
- References
- Articles and posts
- Books
- Curricula
- Specific topics
- Web sites
Created by gh-md-toc
This project aims at collecting in a single place training resources for the up skilling of data engineers.
Even though the members of the GitHub organization may be employed by some companies, they speak on their personal behalf and do not represent these companies.
- Data Engineering Helpers - Knowledge Sharing - Cheat sheets
- Material for the Data platform - Architecture principles
- Material for the Data platform - Data contracts
- Material for the Data platform - Data products
- Material for the Data platform - Data quality
- Material for the Data platform - Semantic layer
- Material for the Data platform - Data lakehouse
- Material for the Data platform - Data management
- Material for the Data platform - Modern Data Stack (MDS) in a box
- Material for the Data platform - Data life cycle
- Material for the Data platform - Metadata
- Publisher: https://substack.com/@luminousmen
- Date: Aug. 2025
- Link to the article on Substack: https://luminousmen.substack.com/p/anatomy-of-apache-spark-application
- Post on LinkedIn about the article:
https://www.linkedin.com/posts/luminousmen_anatomy-of-apache-spark-application-activity-7361101801623859200-FIMY/
- Author: Kirill Bobrov (Kirill Bobrov on LinkedIn, Kirill Bobrov on Substack)
- (TBC) LinkedIn media - PDF
 
- Title: Data Engineering Resources
- Date: Aug. 2025
- Author: Ahmed Alsaket (Ahmed Alsaket on LinkedIn)
- Post on LinkedIn: https://www.linkedin.com/posts/ahmed-alsaket-974413197_data-engineering-resources-1-master-python-activity-7355862068169723908-rHNr/
- Master Python: https://lnkd.in/e5rCbvP8
- Learn SQL: https://lnkd.in/efMKFkfX
- Learn MySQL: https://lnkd.in/efk-Mi3c
- Learn MongoDB: https://lnkd.in/eMKPWtqX
- Dominate PySpark: https://lnkd.in/exwA2hKz
- Learn Bash, Airflow & Kafka: https://lnkd.in/eyN6u2yd
- Learn Git & GitHub: https://lnkd.in/eX_Q8s99
- Learn CICD basics: https://lnkd.in/epKGivFY
- Decode Data Warehousing: https://lnkd.in/eKnVbFAB
- Learn DBT: : https://lnkd.in/eG9eaEuE
- Learn Data Lakes: https://lnkd.in/eQ9xxAJT
- Learn DataBricks: https://lnkd.in/ePZpCv86
- Learn Azure Databricks: https://lnkd.in/eBij4akJ
- Learn Snowflake: https://lnkd.in/erETmtFU
- Learn Apache NiFi: http://bit.ly/43btwYy
- Learn Debezium: http://bit.ly/3K6W5gL
- Reddit ETL Pipeline - https://lnkd.in/ekmgzGc8
- Surfline Dashboard - https://lnkd.in/e6AdaDzz
- Finnhub Streaming Data Pipeline - https://lnkd.in/eCF5kZvE
- Audiophile End-To-End ELT Pipeline - https://lnkd.in/ercYzXtX
- Streamify - https://lnkd.in/ePiEwH5k
- Blog page: https://mayursurani.medium.com/
- Author: Mayurkumar Surani (Mayurkumar Surani on LinkedIn, Mayurkumar Surani on Medium)
- Publisher: Medium
- A few posts:
- Medium - Mayurkumar Surani - May 2025 - End-to-End ETL Pipeline with AWS, PySpark, and Databricks
- Medium - Mayurkumar Surani - May 2025 - Top 20 PySpark Functions Every Data Engineer Should Master
- Medium - Mayurkumar Surani - Aug. 2025 - Mastering Databricks and DBT: An End-to-End Production-Grade Data Engineering Project
- Medium - Mayurkumar Surani - Aug. 2025 - Bronze Layer - Dynamic Incremental Data Ingestion with Databricks Autoloader Part-02
 
- Author: Sachin Chandrashekhar (Sachin Chandrashekhar on LinkedIn)
- Date: Aug. 2025
- LinkedIn post: https://www.linkedin.com/posts/mentorsachin_100-pages-of-data-engineering-qa-activity-7359055696279007232-KDz-/
- Author:
- Date: Aug. 2025
- GitHub repository: https://github.com/jrlasak/databricks_apparel_streaming
- Author: Riya Khandelwal (Riya Khandelwal )
- Date: Aug. 2025
- GitHub - Data Engineering helpers - Skilling - Dedicated readme for material published by Riya Khandelwal
- Authors: Joe Reis and Matt Housley
- Date: July 2022
- LinkedIn media - Copy of the book
- Post on LinkedIn about that book: https://www.linkedin.com/posts/riyakhandelwal_data-engineering-fundamentals-activity-7361018626092519429-cj9g/
 
- Book on Amazon: https://www.amazon.com/dp/1098108302
- Book on O'Reilly: https://learning.oreilly.com/library/view/fundamentals-of-data/9781098108298/
- Print length: 447 pages
- ISBN-10: 1098108302
- ISBN-13: 978-1098108304
- Author: Martin Kleppmann
- Book on Amazon: https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321
- Book on O'Reilly: https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/
- Date: Jan. 2026 (2nd version)
- Home page: https://dataengineer.wiki/
- Author: Andreas Kretz (Andreas Kretz on LinkedIn)
- LinkedIn service page - Data Engineering training program
- LinkedIn post about the training program: https://www.linkedin.com/posts/andreas-kretz_this-is-a-message-to-managers-stop-looking-activity-7382436735101865985-W-wu
 
- LinkedIn learning - Transition from data science to data engineering
- Level: beginners
- Duration: 47 minutes
- Author: Pooja Jain