// learn · data
Data engineer to RAG engineer
You already own the hardest half of retrieval-augmented generation: ingestion, pipelines, and data quality. RAG is your pipeline with two new stages — embeddings and retrieval. Here's the on-ramp.
Start with the guide this path is built around: The data engineer's path to RAG.
You're closer than most. Retrieval quality is a data problem, and that's your home turf.
What you know
Batch and streaming ingestion
What you'll build with it
Document ingestion for RAG — loaders, dedupe, and incremental refresh of a knowledge base.
What you know
ETL and transformation
What you'll build with it
Chunking strategies that respect document structure instead of fixed character counts.
What you know
Schemas, partitioning, and metadata
What you'll build with it
Vector stores with rich metadata for filtered, hybrid retrieval that stays fast at scale.
What you know
Data quality and lineage
What you'll build with it
Retrieval evals — measuring whether the right chunk was fetched, and tracing why when it wasn't.
Work these in order. Every link is free to read.
- 01RAG in production
The full production pipeline: chunk, embed, store, retrieve, rerank, ground, cite.
- 02Build a production RAG app
Build one end to end and see where your ingestion and ETL skills map directly.
- 03The AI Engineer Roadmap
Zoom out to the six-stage path from concept to offer.
- 04Interview prep
Prepare for the retrieval and system-design questions on RAG-heavy interviews.
You already run the pipeline. Add embeddings and retrieval.
Production AI Notes
One practical AI engineering email each week
One concept, one architecture, one project idea, and one interview question — written for developers who want to build and ship real AI systems.