Case Study 02 · Demoed at AWS Startup Loft, 2025
PubMed AI Agent
Scalable data pipelines and structured data systems for AI-powered research workflows on top of the PubMed corpus.
- Role
- Data engineering & agent
- Venue
- AWS Startup Loft, SF
- Stack
- Python, RAG, vector store
- Surface
- Clinical-grade research

Problem
Clinical researchers rely on PubMed but its retrieval surface rewards exact-term searches over question-shaped queries. The gap between “what a researcher asks” and “what the index returns” eats hours per week.
What I built
Scalable ingestion and annotation pipelines for the PubMed corpus, plus the structured data schemas a RAG agent could actually reason over. The agent answers natural-language clinical questions with inline citations back to the source papers.
Demo
Showed live at the AWS Startup Loft in SF: typed an open-ended clinical question, the agent walked through retrieval, ranking, and synthesis with sources visible at every step.