Experience

U.S. Food & Drug Administration (FDA)

Jan 2024 – Present

Built a Retrieval-Augmented Generation (RAG) pipeline using LangChain to enable accurate, context-aware question answering from internal document repositories.
Leveraged Nomic Embed models to generate high-quality vector embeddings for semantic search and similarity comparisons.
Used MongoDB to store raw documents and metadata, and ChromaDB to persist and query vector embeddings for real-time retrieval.
Designed robust chunking strategies to split documents contextually before embedding, improving retrieval precision and LLM performance.
Automated the end-to-end RAG pipeline in Python, from document ingestion and chunking to embedding generation and similarity-based retrieval.
Benchmarked LLMs (LLaMA 3.x, MistralAI) across three datasets, improving QA response accuracy by 15% and reducing latency by 25% through custom prompt strategies.
Explored and identified biomedical datasets for LLM-based question-answering tasks, aligning data sources with model capabilities.
Led data cleaning and preprocessing efforts for biomedical datasets, ensuring high-quality, structured input to improve model training and inference.
Automated LLM data pipelines using Python and AWS CLI, reducing preprocessing time by 40% and accelerating model evaluation cycles.
Deployed pre-trained foundation models using AWS Bedrock, reducing deployment time by 20% and enabling scalable, low-latency AI applications.
Packaged and containerized applications using Docker, optimized image sizes, and managed dependencies with virtual environments and requirements.txt.
Deployed containerized apps on AWS EC2 and validated performance in high-compute environments with GPU acceleration.
Engineered production-ready systemd services to ensure 99.9% uptime and automatic recovery after system reboots.
Automated key operational tasks including system updates, log capture, and monitoring to streamline system management and reduce manual effort.
Established a new pre-production environment to bridge development and production workflows, improving deployment stability.
Designed and implemented a benchmarking system using SQLite3 to track and analyze LLM outputs, supporting data-driven prompt tuning.
Led QA benchmarking initiatives, improving reference accuracy by 20% and delivering monthly evaluation reports to guide LLM enhancements.
Utilized Amazon S3 for storing and accessing data, enabling efficient uploads, downloads, and integration with cloud-based pipelines.
Configured and managed AWS CLI to automate resource provisioning, deployments, and system monitoring.
Used GitLab for source control and CI/CD pipelines, ensuring efficient code collaboration, change tracking, and release management.
Leveraged aider-chat, an AI-powered coding assistant, to accelerate development, improve code quality, and reduce debugging time.
Collaborated with cross-functional teams to secure GPU access and optimize AI model performance across distributed infrastructure.
Provided ongoing documentation, support, and system enhancements to maintain reliability and ensure operational efficiency.

Environment: AWS, AI/ML, LLM, Bash, Python, GitLab, Docker, AWS Bedrock

Protech Solutions

Dec 2021 – Jan 2024

Collaborated with Data Engineers and operations team to implement ETL processes and ensure data availability for analysis.
Wrote and optimized SQL queries to extract, transform, and load data tailored to analytical requirements.
Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
Worked on data cleaning and reshaping, generating segmented subsets using NumPy and pandas libraries in Python.
Created scripts and integrated systems including Python, SQL, and R to analyze large volumes of current and historical data.
Developed analytical approaches to answer high-level business questions and provide actionable recommendations.
Developed and implemented predictive models using machine learning algorithms such as linear regression, classification, and Random Forest.
Evaluated model performance using F-score, accuracy, and precision metrics.
Leveraged AWS services like S3, EC2, and SageMaker to prepare data, build, deploy, and optimize predictive models.
Utilized AWS EC2 instances for model training and deployment, optimizing compute resources to handle high computational requirements during model development and testing.
Integrated AWS Lambda to automate real-time data processing tasks, improving system responsiveness and reducing manual intervention.
Worked with the NLTK library for natural language processing and pattern recognition in textual data.
Automated recurring data analysis tasks and reporting pipelines using Python scripts, reducing manual workload and improving efficiency.
Ensured data integrity and compliance with data governance policies, supporting initiatives related to data privacy and security.
Managed version control and collaborated on code with team members using GitLab repositories, ensuring smooth collaboration and code integrity.
Implemented continuous integration (CI) pipelines in GitLab CI/CD to automate testing, model validation, and deployment, increasing code deployment efficiency and reducing errors.
Documented project requirements and work plans in Confluence and managed progress through Jira Sprints.
Routinely presented metrics and outcomes of analysis to team members and management to support data-driven decision-making.

Environment: AWS, Python, SQL, NumPy, Pandas, NLTK, Bash, GitLab

Accenture

Jan 2014 – July 2018

Created and analyzed business requirements to design and implement technical data solutions that align with organizational goals.
Designed and maintained MySQL databases, creating user-defined functions and stored procedures to automate daily reporting tasks.
Built ETL pipelines to retrieve data from NoSQL databases and load aggregated data into the analytical platform for analysis.
Wrote and executed SQL scripts to implement database changes, including table updates, view creation, and the addition of stored procedures.
Conducted reviews of database objects (tables, views) to assess the current design, identify discrepancies, and provide recommendations for optimization.
Performed performance tuning and optimization of SQL scripts and stored procedures to improve data processing efficiency and overall database performance.
Analyzed database discrepancies and synchronized development, pre-production, and production environments with accurate data models.
Developed and maintained various SQL scripts and job scripts to handle complex transactional and reporting tasks, ensuring data formatting as needed.
Monitored and maintained multiple automated data extraction and daily job processes, ensuring seamless data workflows.
Created ad-hoc SQL queries to generate custom reports, trend analysis, and customer-specific reports.
Manipulated data and calculated key metrics for reporting using MySQL queries and MS Excel, facilitating data-driven decision-making.
Debugged and resolved execution errors using data logs, trace statistics, and thorough examination of source and target data.
Created and scheduled Unix cron jobs to periodically load flat files into Oracle databases, ensuring timely data availability.

Environment: SQL, Unix, Stored procedures, ETL

Soniks Consulting

June 2013 – Dec 2013

Assisted in gathering and analyzing business requirements to create data-driven insights for decision-making.
Cleaned, transformed, and organized data from various sources to prepare it for analysis using Excel and SQL.
Performed basic data analysis using descriptive statistics, identifying trends and patterns to support business operations.
Wrote simple SQL queries to extract data from databases and performed data manipulation tasks for reporting purposes.
Developed and maintained basic reports and dashboards to track key performance indicators (KPIs) and business metrics.
Participated in team meetings to discuss ongoing projects, provide progress updates, and receive feedback from senior data analysts.
Prepared various statistical and financial reports using MS Excel.
Strong verbal and written communication skills, and an ability to work on project teams, with stakeholders, and across departments.

Environment: Linux, Bash script, MS Office, SQL