Gabriel Alves — Data Engineer

At a glance

5

Certifications

5

6+

Projects built

6

2

Live deployments

2

PT

Based in Portugal

PT

Data stack — end to end

pipeline.py — gabriel_alves

01

Ingest

REST APIs
CSV / Batch
Paginated I/O

02

Transform

PySpark
Pandas · NumPy
SQL

03

Orchestrate

Apache Airflow
GitHub Actions
Logging

04

ML-Ready

Feature Eng.
Model Eval.
Prediction Svc

05

Deploy

Render
AWS Cognito
WAF

Featured projects

Data Engineering · ELT

World Bank GDP Pipeline

Complete

Paginated REST API → S3 (Hive partitioning) → PySpark → PostgreSQL star schema (JDBC) → Airflow → Docker.

PySpark Airflow AWS S3 PostgreSQL Python Docker

ML Pipeline · Churn

Customer Churn Pipeline

▶ Demo

ML-integrated ETL, feature engineering, profit-based threshold optimization; deployed on AWS with Cognito auth.

Python AWS ML Render

NLP · Sentiment

NLP Sentiment Pipeline

Live

Large-scale text processing pipeline with a Streamlit dashboard.

NLP Streamlit Neural nets

Analytics · SQL

Workforce SQL Analysis

Complete

CTE-first analytical SQL surfacing workforce metrics.

SQL DataCamp Analytics

Work Experience

Software Engineering Intern — Valuedate.io

Current

Jun 2026 – Present · Viana do Castelo, Portugal

Fixed a production bug in an inherited, undocumented Django codebase (no DB access), shipped with Pytest coverage.
Introduced Playwright to the company: built an E2E test suite from scratch with reusable helpers, fixing additional bugs surfaced by test failures.
4 production PRs merged in first 3 weeks, each reviewed and approved by the engineering lead.

Main Projects

Data Engineering · ELT

World Bank GDP Pipeline

Complete

Paginated REST API → S3 (Hive partitioning) → PySpark → PostgreSQL star schema (JDBC) → Airflow → Docker.

PySpark Airflow AWS S3 PostgreSQL Python Docker

ML Pipeline · Churn

Customer Churn Pipeline

▶ Demo

ML-integrated ETL, feature engineering, profit-based threshold optimization; deployed on AWS with Cognito auth.

Python ETL AWS ML Render

NLP · Sentiment

NLP Sentiment Analysis Pipeline

Live

Large-scale text processing pipeline with a Streamlit dashboard.

NLP Streamlit Neural nets Python

Other Projects

Competition · DataCamp

Cleaning Data & The Skies

Competition

Cleaned and preprocessed real-world messy flight data to extract business insights and answer key analytical questions.

Python Data Cleaning EDA

Analytics · SQL

SQL Workforce Data Analysis

Complete

CTE-first analytical SQL surfacing workforce metrics.

SQL Analytics

EDA · Finance

S&P 500 Financial EDA

Complete

Analysis of S&P 500 company distribution across US states, with focus on sector concentration patterns within regions.

Python Pandas Matplotlib Finance

Technical Skills

Languages

3 skills

Python SQL Java

Data Engineering

6 skills

PySpark Airflow dbt ETL/ELT Design Star-Schema Modeling Data Quality Validation

Backend / SWE

5 skills

Django REST APIs PostgreSQL JDBC Git/PR-Based Review

Testing / QA

3 skills

Pytest Playwright (Suite Architecture + Reusable Helpers) E2E Automation

AI / ML Tooling

8 skills

Copilot Claude Local LLMs (Dev) GPT Gemini Perplexity (Research/Chat) Feature Engineering Model Evaluation

Cloud & DevOps

7 skills

AWS S3 AWS IAM AWS Cognito Docker CI/CD Kubernetes (Learning) Terraform (Learning)

Soft Skills

Critical Thinking & Problem Solving

Analytical Mindset & Data-Driven Thinking

Attention to Detail

Curiosity & Learning Agility

Ability to Present Results & Insights

Persistence & Self-Discipline

Professional Certifications

01

Data Engineer Professional

DataCamp

View PDF

02

AI Engineer for Data Scientists Associate

DataCamp

View PDF

03

Data Scientist Professional

DataCamp

View PDF

04

Python Data Associate

DataCamp

View PDF

05

SQL Associate

DataCamp

View PDF

About Me

Building reliable, scalable data systems that bridge Engineering and Machine Learning.

I'm an Informatics Engineering graduate focused on building scalable data pipelines and production-ready backend systems — with hands-on experience across the full stack from ingestion to deployment.

I design ETL workflows, transform large datasets, and build the backend systems that support them, including authentication, testing, and CI. My work spans data ingestion, transformation, and orchestration, as well as Django applications, automated testing (Pytest, Playwright), and production pull requests.

Recently, I completed the Data Engineer Professional Certification, working with tools such as Airflow and logging systems, strengthening my understanding of workflow orchestration and pipeline monitoring.

Currently working as a Software Engineering Intern at Valuedate.io, and open to Data Engineer, Software Engineer, Test Automation, and AI Engineering roles — remote worldwide or hybrid in North Portugal.

Quick Info

Education

Bachelor in Informatics Engineering

ESTG, Instituto Politécnico de Viana do Castelo

Location

Viana do Castelo, Portugal

Open to remote (worldwide) · Hybrid (North Portugal)

Contact

social@gabrieldaes.com

Response within 24h

Resume

View CV

Opens in a new tab