Data Engineering Projects for Beginners

Data Engineering Projects for Beginners

Hi everyone,

I am a little bit obsessed with data engineering and lately I have been working on several open source projects about this topic, here is a list of repositories and technologies used in each one, if you decide to go deeper into this funny world then these repositories could help you as a guide.

❤ means "I like this one"

Tracking your Uber Rides and Uber Eats expenses through a data engineering process

Technologies and skills:

Python, Docker, Apache Airflow, AWS Redshift, Power BI, data modelling, Task schedulling, ETL and ELT processes, Data warehousing, Cloud

Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag

Technologies and skills:

Python, Docker, Big Data, Cloud, BigQuery, Workflow Engines, GCP, Task scheduler, Google Cloud Platform, Dataproc cluster, GCS, Google Cloud Storage, Redis, DAG, Parallel Processing, Apache Spark

Building Big Data Pipelines in the Cloud with AWS EMR

Technologies and skills:

Python, PySpark, AWS EMR, Task Schedulling, IAC, EC2 Instances, Apache Spark, Cloud

Building a Lossless Data Compression and Data Decompression Pipeline

Technologies and skills:

Python, Data compression, BZIP2, Parallel programming

Learn how to dockerize an Apache Spark Standalone Cluster

Technologies and skills:

Python, Jupyter Notebook, Apache Spark, Docker, docker-compose, Hive

Dockerizing and Consuming an Apache Livy environment

Technologies and skills:

Python, Big Data, Docker, docker-compose, Apache Livy, Apache Spark, PostgreSQL, PySpark, Jupyter Notebook

Design, Development and Deployment of a simple Data Pipeline

Technologies and skills:

Python, data Modelling, Docker, docker-compose, PostgreSQL, data pipeline, FastApi

Dockerizing a Python Script for Faster Web Scraping

Technologies and skills:

Python, Docker, Sqlite, Dockerfile, Web scraping, Data pipeline, FastApi

Understanding Similarity Measures for Text Analysis

Technologies and skills:

Python, Machine Learning, Similarity measures, Distance metrics, Text Analysis

Learn how to build a content-based Movie Recommender System

Technologies and skills:

Python, Machine Learning, TF-IDF, Cosine similarity, BM25, BERT, NLP, word2vec, Text Analysis, recsys

A Text Analysis of Speeches

Technologies and skills:

Python, Machine Learning, NLP, word2vec, Text Analysis, Sentiment Analysis, PCA, t-SNE, Word Embeddings, Text Preprocessing, Web scraping, Data Visualization, Mexico

Dropout Students Prediction

Technologies and skills:

R, Genetic algorithm, Neural Networks, K-Means, Clustering, Machine Learning

I will be working on more complex projects in the next months using modern tech data stacks.

Did you find this article valuable?

Support Ramses Alexander Coraspe Valdez by becoming a sponsor. Any amount is appreciated!