Experience

U.S. Food & Drug Administration (FDA)

Jan 2024 – Present

  • Built a Retrieval-Augmented Generation (RAG) pipeline using LangChain to enable accurate, context-aware question answering from internal document repositories.
  • Leveraged Nomic Embed models to generate high-quality vector embeddings for semantic search and similarity comparisons.
  • Used MongoDB to store raw documents and metadata, and ChromaDB to persist and query vector embeddings for real-time retrieval.
  • Designed robust chunking strategies to split documents contextually before embedding, improving retrieval precision and LLM performance.
  • Automated the end-to-end RAG pipeline in Python, from document ingestion and chunking to embedding generation and similarity-based retrieval.
  • Benchmarked LLMs (LLaMA 3.x, MistralAI) across three datasets, improving QA response accuracy by 15% and reducing latency by 25% through custom prompt strategies.
  • Explored and identified biomedical datasets for LLM-based question-answering tasks, aligning data sources with model capabilities.
  • Led data cleaning and preprocessing efforts for biomedical datasets, ensuring high-quality, structured input to improve model training and inference.
  • Automated LLM data pipelines using Python and AWS CLI, reducing preprocessing time by 40% and accelerating model evaluation cycles.
  • Deployed pre-trained foundation models using AWS Bedrock, reducing deployment time by 20% and enabling scalable, low-latency AI applications.
  • Packaged and containerized applications using Docker, optimized image sizes, and managed dependencies with virtual environments and requirements.txt.
  • Deployed containerized apps on AWS EC2 and validated performance in high-compute environments with GPU acceleration.
  • Engineered production-ready systemd services to ensure 99.9% uptime and automatic recovery after system reboots.
  • Automated key operational tasks including system updates, log capture, and monitoring to streamline system management and reduce manual effort.
  • Established a new pre-production environment to bridge development and production workflows, improving deployment stability.
  • Designed and implemented a benchmarking system using SQLite3 to track and analyze LLM outputs, supporting data-driven prompt tuning.
  • Led QA benchmarking initiatives, improving reference accuracy by 20% and delivering monthly evaluation reports to guide LLM enhancements.
  • Utilized Amazon S3 for storing and accessing data, enabling efficient uploads, downloads, and integration with cloud-based pipelines.
  • Configured and managed AWS CLI to automate resource provisioning, deployments, and system monitoring.
  • Used GitLab for source control and CI/CD pipelines, ensuring efficient code collaboration, change tracking, and release management.
  • Leveraged aider-chat, an AI-powered coding assistant, to accelerate development, improve code quality, and reduce debugging time.
  • Collaborated with cross-functional teams to secure GPU access and optimize AI model performance across distributed infrastructure.
  • Provided ongoing documentation, support, and system enhancements to maintain reliability and ensure operational efficiency.

Environment: AWS, AI/ML, LLM, Bash, Python, GitLab, Docker, AWS Bedrock

Protech Solutions

Dec 2021 – Jan 2024

  • Collaborated with Data Engineers and operations team to implement ETL processes and ensure data availability for analysis.
  • Wrote and optimized SQL queries to extract, transform, and load data tailored to analytical requirements.
  • Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
  • Worked on data cleaning and reshaping, generating segmented subsets using NumPy and pandas libraries in Python.
  • Created scripts and integrated systems including Python, SQL, and R to analyze large volumes of current and historical data.
  • Developed analytical approaches to answer high-level business questions and provide actionable recommendations.
  • Developed and implemented predictive models using machine learning algorithms such as linear regression, classification, and Random Forest.
  • Evaluated model performance using F-score, accuracy, and precision metrics.
  • Leveraged AWS services like S3, EC2, and SageMaker to prepare data, build, deploy, and optimize predictive models.
  • Utilized AWS EC2 instances for model training and deployment, optimizing compute resources to handle high computational requirements during model development and testing.
  • Integrated AWS Lambda to automate real-time data processing tasks, improving system responsiveness and reducing manual intervention.
  • Worked with the NLTK library for natural language processing and pattern recognition in textual data.
  • Automated recurring data analysis tasks and reporting pipelines using Python scripts, reducing manual workload and improving efficiency.
  • Ensured data integrity and compliance with data governance policies, supporting initiatives related to data privacy and security.
  • Managed version control and collaborated on code with team members using GitLab repositories, ensuring smooth collaboration and code integrity.
  • Implemented continuous integration (CI) pipelines in GitLab CI/CD to automate testing, model validation, and deployment, increasing code deployment efficiency and reducing errors.
  • Documented project requirements and work plans in Confluence and managed progress through Jira Sprints.
  • Routinely presented metrics and outcomes of analysis to team members and management to support data-driven decision-making.

Environment: AWS, Python, SQL, NumPy, Pandas, NLTK, Bash, GitLab

Accenture

Jan 2014 – July 2018

  • Created and analyzed business requirements to design and implement technical data solutions that align with organizational goals.
  • Designed and maintained MySQL databases, creating user-defined functions and stored procedures to automate daily reporting tasks.
  • Built ETL pipelines to retrieve data from NoSQL databases and load aggregated data into the analytical platform for analysis.
  • Wrote and executed SQL scripts to implement database changes, including table updates, view creation, and the addition of stored procedures.
  • Conducted reviews of database objects (tables, views) to assess the current design, identify discrepancies, and provide recommendations for optimization.
  • Performed performance tuning and optimization of SQL scripts and stored procedures to improve data processing efficiency and overall database performance.
  • Analyzed database discrepancies and synchronized development, pre-production, and production environments with accurate data models.
  • Developed and maintained various SQL scripts and job scripts to handle complex transactional and reporting tasks, ensuring data formatting as needed.
  • Monitored and maintained multiple automated data extraction and daily job processes, ensuring seamless data workflows.
  • Created ad-hoc SQL queries to generate custom reports, trend analysis, and customer-specific reports.
  • Manipulated data and calculated key metrics for reporting using MySQL queries and MS Excel, facilitating data-driven decision-making.
  • Debugged and resolved execution errors using data logs, trace statistics, and thorough examination of source and target data.
  • Created and scheduled Unix cron jobs to periodically load flat files into Oracle databases, ensuring timely data availability.

Environment: SQL, Unix, Stored procedures, ETL

Soniks Consulting

June 2013 – Dec 2013

  • Assisted in gathering and analyzing business requirements to create data-driven insights for decision-making.
  • Cleaned, transformed, and organized data from various sources to prepare it for analysis using Excel and SQL.
  • Performed basic data analysis using descriptive statistics, identifying trends and patterns to support business operations.
  • Wrote simple SQL queries to extract data from databases and performed data manipulation tasks for reporting purposes.
  • Developed and maintained basic reports and dashboards to track key performance indicators (KPIs) and business metrics.
  • Participated in team meetings to discuss ongoing projects, provide progress updates, and receive feedback from senior data analysts.
  • Prepared various statistical and financial reports using MS Excel.
  • Strong verbal and written communication skills, and an ability to work on project teams, with stakeholders, and across departments.

Environment: Linux, Bash script, MS Office, SQL