Infosys — AI Quality Evaluation Tool
Engineered an internal agentic AI evaluation system to assess large language model behaviour, with emphasis on output quality, reliability, and safe iteration.
• Implemented an end-to-end evaluation pipeline for analysing LLM responses under consistent testing conditions
• Designed repeatable evaluation workflows to support regression analysis across model updates
• Worked with curated datasets and controlled evaluation criteria to ensure reliable comparisons
• Embedded quality-engineering principles within an enterprise development environment
Technologies
Python • LangChain • DeepEval • RAGAS
Engineering focus
AI evaluation • Quality engineering • Testing pipelines • Reliability & validation