BioBarrier Literature Extraction Pipeline
Improving a regex + RAG pipeline that converts bio-based barrier-materials papers into validated MongoDB records, capturing five additional experimental fields per paper.
CS Student · Applied AI/ML · Data Products
I build reliable AI systems, data products, and full-stack tools that turn complex information into clear, useful decisions.
ABOUT ME
I’m a Computer Science student at Georgia Tech focused on building reliable AI systems, data products, and full-stack tools that turn complex, unstructured information into decisions people can trust.
My work spans scientific literature extraction, genomics machine learning, knowledge-graph design, and enterprise AI workflows. Across each project, I am drawn to the same challenge: making powerful technical systems more useful, traceable, and practical for the people who depend on them.
I’m especially interested in applied AI engineering, ML systems, research tools, and products that make difficult work easier to understand and act on. I enjoy working through ambiguous problems, learning quickly, and turning early ideas into systems with clear value in the real world.
“It's better to say 'I did it' than I wish I had.”
PORTFOLIO
Improving a regex + RAG pipeline that converts bio-based barrier-materials papers into validated MongoDB records, capturing five additional experimental fields per paper.
Built an XGBoost + PCA model that reached 98.5% accuracy across 33 cancer types using 16,000 gene-expression profiles, then analyzed cross-species transfer gaps and model-error drivers.
Redesigned the learning-path knowledge graph for a platform serving 20,000+ Georgia Tech students, reducing graph redundancy by 40% through node deduplication and center-based navigation.
Built a WatsonX-powered research agent that uses OCR, retrieval, and prompt workflows to extract, rank, and summarize relevant papers, cutting literature-search time from hours to minutes.
CAREER
Built a Python/FastAPI policy-validation service and LangGraph workflow that reduced manual review time by 15% and improved miscapitalization detection for enterprise financial asset-management software.
Improving a literature-extraction system for bio-based barrier materials by mapping ingestion, retrieval, parsing, validation, and MongoDB schema flows to increase structured-data reliability.
Developed a 98.5%-accurate XGBoost/PCA cancer classifier across 33 tumor types and analyzed cross-species transfer gaps using per-class metrics and error-association analysis.
Built a WatsonX-powered research agent combining OCR, retrieval, and prompt workflows to extract, rank, and summarize papers by topic.
Built Python and SQL analytics pipelines across a 40M+ candidate-profile database, reducing candidate-placement decision time by 30%.
Researched web-robot detection approaches and supported data-driven analysis of automated versus human web activity.
EXPERTISE
ACADEMIC
B.S. Computer Science
CONNECT
Send a note directly from here and I'll get it in my inbox.