Hands-On Data Engineering
A comprehensive data engineering program for career switchers and data specialists
Hands-On Data Engineering
Program Overview
Today, every business is data-driven. The demand for data specialists continues to grow. That is why we created this program — a micro-master’s track that provides fundamental knowledge of data storage, processing, and retrieval.
Over the course of three months, you will master all key areas of data work — from SQL querying to orchestration and monitoring. For each topic, we selected the most universal and in-demand tools. You will work with open-source tools and services such as Cassandra, Spark, Kafka, and more. These are core technologies used in data engineering and are optimal for hands-on learning. By mastering them, you will easily be able to apply your skills when working with equivalent managed services in Azure, GCP, and AWS. We will cover those in more detail in the final module of the program.
This knowledge and skill set will help you enter the field, strengthen your position in it, or systematize your existing expertise in a trending technology domain.
WHAT YOU WILL LEARN
PARTICIPANT REQUIREMENTS
Basic understanding of Python
Basic understanding of SQL
Basic understanding of Docker
English proficiency at B1 level or higher
EDUCATIONAL MODULES
Module 0: Prerequisites and Introduction to Data Engineering
This module explores foundational skills essential for success in Data Engineering. You will review Python for data handling, the RDBMS for query writing and optimization, and Docker for containerization and environment setup. Additionally, you'll revisit code management for effective collaboration using version control tools. The module will conclude with an overview of Data Engineering and its importance in modern data systems
Module Content- Python Refresher
- RDBMS Refresher
- Docker Refresher
- Code Management Refresher
- What is Data Engineering
Module 1: Data Storage
This module introduces the basics of data storage. You'll start with relational databases and learn SQL and data modeling techniques for structured data. The module also covers non-relational databases, such as document-based, column-family, key-value, and analytical types. You'll explore data formats and storage strategies in object storage systems. Additionally, you'll master data modeling principles, including normalization for organization and denormalization for improved performance.
Module Content- Introduction to Database Types
- Relational Databases and SQL
- Data Modeling: Normalization and Denormalization
- Non-Relational Database Types: Documents, Column-Family, Key-Value, and Analytical Insights
- Data Formats and Storage Strategies in Object Storage Systems
- Workshop 1 - What is Data Engineering and Why Do We Need It? Storing Data: Overview of Database Types
- Workshop 2 - Relational Databases (RDBMS) and SQL: Data Modeling and Querying
- Workshop 3 - NoSQL Databases. Cassandra and MongoDB. Data Modelling and Querying
- Workshop 4. NoSQL Databases (nRDBMS) and Data Warehouses (DWH). RDBMS Data Modeling
- Workshop 5. Massive and Distributed Storages (Hadoop, ADX). Object Storage and Data Organization: Formats (Text, Binary, Columnar), Partitioning, Queueing (Redis, Kafka)
Module 2: Data Processing
This module explores data processing principles, focusing on batch and stream processing methods. You will understand how to handle data at scale, working with tools like PySpark and Flink to process datasets efficiently. By the end of this module, you will be equipped to implement robust and scalable data pipelines for real-world applications.
Module Content- Batch and Stream Processing
- Using the Tools: PySpark and Flink
- Workshop 6 - Batch Processing with PySpark
Module 3: Data Retrieval
This module explores various methods for obtaining data from diverse sources. You will begin by handling files in file systems and object storage environments and learning to extract and manage data effectively. Next, you will dive into REST APIs to understand how to interact with external services to retrieve and integrate data. Lastly, you will study event streams and message queues.
Module Content- Files on File Systems and Object Storages
- REST АРІ
- Event Stream and Message Queue
- Workshop 7. Event Streams and Apache Kafka
- Workshop 8. Streaming Processing with PySpark Streaming
Module 4. Coordination and Monitoring
This module develops your ability to coordinate and monitor data workflows and systems effectively. You will explore Airflow, a powerful tool for orchestrating data pipelines, and gain insights into designing efficient workflows. Additionally, you will delve into Prometheus and Grafana, mastering the art of monitoring system performance, visualizing metrics, and ensuring reliability in data operations.
Module Content- Airflow
- Prometheus and Grafana
- Workshop 9. Pipeline Orchestration using Airflow
- Workshop 10. Monitoring in Data Engineering: Prometheus and Grafana
Module 5. Data Engineering on Cloud Platforms
This module introduces the principles of data engineering in the cloud, comparing cloud-based and on-premise solutions and their respective advantages and limitations. You will gain a high-level understanding of major cloud platforms — AWS, Azure, and Google Cloud — and explore their functional equivalents of popular data engineering tools.
Module Content- Pros and Cons of Using Cloud Platforms Compared to On-Premise Solutions
- High-level Overview of Cloud Platforms: AWS, Azure, GCP
- Functional Analogs of the Common Data Engineering Tools
- PostgreSql: AWS RDS, Azure Database for PostgreSQL, Google Cloud SQL
- Cassandra: AWS DynamoDB, Azure Cosmos DB, Google Cloud Bigtable
- MongoDb: AWS DocumentDB, Azure Cosmos DB, Google Cloud Firestore, Atlas MongoDB
- Spark: AWS EMR, Databricks on Azure, Google Cloud Dataproc
- Spark Streaming: AWS Kinesis, Azure Event Hubs, Google Cloud Dataflow
- Kafka: AWS Kinesis, Azure Event Hubs, Google Cloud Pub/Sub
- Cloud Analytics: AWS Redshift, Azure Synapse Analytics, Google Cloud BigQuery
- Workshop 11. Overview and Comparison of the Cloud-Based Data Engineering Tools
The module “Data Engineering on Cloud Platforms” is designed within the project “Knowledge Rise: Advancing Sustainable Blue-Green Economies via Deep Tech — Innovation Capacity Building in Higher Education” (grant agreement No. 24473). The project is part of the broader CloudEARTHi initiative and is funded by the European Union through the EIT HEI Initiative, coordinated by the European Institute of Innovation and Technology (EIT), Cohort 4.
Curator and Instructor
Dmytro Pryimak
Engineer with over 10 years of professional experience in designing and building systems for distributed data processing. Throughout his career, Dmytro has worked on numerous projects covering insurance, healthcare, medical data processing, online media, and entertainment. In recent years, he has shifted his focus from purely engineering to team leadership and mentoring/coaching.
Dmytro is also a guest lecturer at SET University, where he teaches Big Data at the Master’s degree programs.
Maksym Ivashura
Experienced database/data warehouse and business intelligence engineer with over 30 years of professional background in manufacturing and outsourcing sectors. Currently works at Trinetix, where he also serves as a mentor and technical interviewer. Maksym is the author of the StoreOff accounting system and has extensive experience with various database management systems, including MS SQL Server (as well as SSAS, SSIS), Azure DB, PostgreSQL, Redshift, Snowflake, MySQL, Oracle, SQLight, MongoDB, Redis, Cassandra, and less conventional systems such as Firebird/InterBase, MS Access, DBase, DataEase, and DuckDB. Based in Kharkiv, Ukraine, and Málaga, Spain.
Khrystyna Kokolius
A seasoned Data Engineer at SoftServe with over 4 years of hands-on experience in designing and developing data pipelines. Her core expertise lies in optimizing SQL queries, streamlining data processing workflows, and elevating overall system performance. By leveraging best practices in data analytics, Khrystyna helps teams scale efficiently and adopt modern technologies for enhanced data-driven solutions.
Sirojiddin Dushaev
Data Engineering & BI expert with extensive experience in building scalable data solutions. Specializes in database architecture, data warehousing, and business intelligence development. Proficient in cloud platforms such as AWS, GCP, and Azure, as well as big data technologies like Apache Spark and Kafka. Skilled in SQL, Python, and deploying machine learning models. Passionate about data-driven decision-making and optimizing analytical infrastructures. Loves collaborating with teams to enhance data efficiency and business insights.
ADVANTAGES
The program provides the necessary skills and knowledge to start a career in one of the most in-demand IT fields
Flexible learning format, allowing you to combine it with full-time work
Training by expert practitioners who provide relevant feedback and quality support during the course
WHO IT’S FOR
Developers looking to grow in the field of data engineering
Data Scientists and Data Analysts who want to transition into a Data Engineer role
Junior Data Engineers looking to organize their knowledge and use data tools effectively
Experienced technical specialists who need to learn data engineering for project management, architecture design, and expanding overall skills in this technology area.
Senior+ tech specialists looking to expand their expertise for project management and architecture design
Reviews
Yevhenii Pylypchuk
Wrapped up the Data Engineering certificate with SET University – and honestly, this was the toughest course I’ve taken there so far. What a journey!
For me, this path wasn’t about becoming a “data engineer,” but about expanding my mindset into the data world. Along the way, I got my hands dirty with:
-
Building real-time pipelines (Kafka + Spark + Cassandra);
-
Debugging containers until they screamed (well, honestly, that was me who screamed most of the time);
-
Seeing how fragile data flows can be – and how dangerous that is from a security perspective.
Key reflection: data rules the world. Realizing that every broken data pipeline isn’t just a tech headache – in a security context, it can turn into an open door for attackers or a blind spot that hides malicious activity.
I may never call myself a data engineer, but this course helped me realize how important and interconnected this path is with cybersecurity. It showed me how much more there is to explore – and I’m more driven than ever to go deeper down the rabbit hole.
Thank you Dmytro Pryimak, MAKSYM IVASHURA and the whole SET University community. Your guidance, challenges, and energy turned this course from “just hard” into one of the toughest – and most rewarding – journeys I’ve ever taken.
Iryna Yershova
It was a fascinating yet difficult experience. I’ve learned so much new, and now I’m able to apply this knowledge in my professional career.
Even though I’m currently working as a frontend engineer, I believe it’s crucial to understand how to work with data. Data is the new oil, as they say.
Thanks to Dmytro Pryimak and MAKSYM IVASHURA for challenges you were creating for us during this course!
FAQ
I already work as a data engineer. Is it worth taking your course?
If you have been working in this position for a year or less, then yes, this course will help you structure your knowledge and fill in gaps in your mastery of specific tools.
Learn more about the SET University program