Experience

U.S. Food & Drug Administration (FDA)

Jan 2024 – Present

  • Explored and identified datasets for large language models (LLMs), focusing on question-answering tasks in biomedical and scientific research.
  • Tested various LLMs including Llama and Mistral AI, evaluating response quality, accuracy, and latency.
  • Applied advanced statistical techniques to validate model performance and ensure robustness.
  • Developed and fine-tuned custom evaluation pipelines using metrics such as precision, recall, F1-score, BLEU, and ROUGE.
  • Conducted Q/A benchmarking to verify accuracy and citation validity, producing detailed evaluation reports.
  • Utilized Nomic Embed for generating embedding vectors to improve similarity comparisons and LLM response accuracy.
  • Performed comprehensive data cleaning and preprocessing to prepare datasets for LLM tasks.
  • Created Python scripts for end-to-end pipeline automation: dataset formatting, prompt generation, output handling, and metric-based comparisons (Euclidean distance, cosine similarity).
  • Conducted exploratory data analysis (EDA) using Pandas, NumPy, Matplotlib, and Seaborn to identify trends and anomalies in biomedical datasets.
  • Designed and implemented a benchmarking system using SQLite3 for storing and analyzing LLM-generated results.
  • Leveraged AWS Bedrock for deploying and scaling pre-trained foundation models in custom AI/ML applications.
  • Automated provisioning, deployment, and monitoring using AWS CLI for efficient infrastructure management.
  • Integrated Amazon S3 for data storage and cloud-based file sharing in ML workflows.
  • Monitored AWS EC2 instances to optimize usage, reduce cost, and enhance performance.
  • Containerized applications with Docker and deployed them on AWS servers for high-compute environments.
  • Built Docker images with minimal size and included all dependencies using venv and requirements.txt for smooth cross-platform deployment.
  • Created systemd service files for automatic application startup and high availability.
  • Automated system updates, log capturing, and key processes to enhance maintainability and reduce manual effort.
  • Integrated ML workflow observability using MLflow, Weights & Biases, and AWS CloudWatch for tracking experiments and performance.
  • Set up a new pre-production environment to bridge dev and prod, improving stability and QA.
  • Used GitLab for source control, CI/CD pipelines, and collaborative development.
  • Integrated aider-chat (AI coding assistant) to speed up dev cycles and improve code quality.
  • Collaborated with cross-functional teams to secure GPU resources and optimize LLM performance.
  • Provided technical documentation, system support, and ongoing enhancements to ensure reliability and improve workflows.

Environment: AWS, AI/ML, LLM, Bash, Python, GitLab, Docker, AWS Bedrock

Protech Solutions

Dec 2021 – Jan 2024

  • Collaborated with Data Engineers and operations team to implement ETL processes and ensure data availability for analysis.
  • Wrote and optimized SQL queries to extract, transform, and load data tailored to analytical requirements.
  • Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
  • Worked on data cleaning and reshaping, generating segmented subsets using NumPy and pandas libraries in Python.
  • Created scripts and integrated systems including Python, SQL, and R to analyze large volumes of current and historical data.
  • Developed analytical approaches to answer high-level business questions and provide actionable recommendations.
  • Developed and implemented predictive models using machine learning algorithms such as linear regression, classification, and Random Forest.
  • Evaluated model performance using F-score, accuracy, and precision metrics.
  • Leveraged AWS services like S3, EC2, and SageMaker to prepare data, build, deploy, and optimize predictive models.
  • Utilized AWS EC2 instances for model training and deployment, optimizing compute resources to handle high computational requirements during model development and testing.
  • Integrated AWS Lambda to automate real-time data processing tasks, improving system responsiveness and reducing manual intervention.
  • Worked with the NLTK library for natural language processing and pattern recognition in textual data.
  • Automated recurring data analysis tasks and reporting pipelines using Python scripts, reducing manual workload and improving efficiency.
  • Ensured data integrity and compliance with data governance policies, supporting initiatives related to data privacy and security.
  • Managed version control and collaborated on code with team members using GitLab repositories, ensuring smooth collaboration and code integrity.
  • Implemented continuous integration (CI) pipelines in GitLab CI/CD to automate testing, model validation, and deployment, increasing code deployment efficiency and reducing errors.
  • Documented project requirements and work plans in Confluence and managed progress through Jira Sprints.
  • Routinely presented metrics and outcomes of analysis to team members and management to support data-driven decision-making.

Environment: AWS, Python, SQL, NumPy, Pandas, NLTK, Bash, GitLab

Accenture

Jan 2014 – July 2018

  • Created and analyzed business requirements to design and implement technical data solutions that align with organizational goals.
  • Designed and maintained MySQL databases, creating user-defined functions and stored procedures to automate daily reporting tasks.
  • Built ETL pipelines to retrieve data from NoSQL databases and load aggregated data into the analytical platform for analysis.
  • Wrote and executed SQL scripts to implement database changes, including table updates, view creation, and the addition of stored procedures.
  • Conducted reviews of database objects (tables, views) to assess the current design, identify discrepancies, and provide recommendations for optimization.
  • Performed performance tuning and optimization of SQL scripts and stored procedures to improve data processing efficiency and overall database performance.
  • Analyzed database discrepancies and synchronized development, pre-production, and production environments with accurate data models.
  • Developed and maintained various SQL scripts and job scripts to handle complex transactional and reporting tasks, ensuring data formatting as needed.
  • Monitored and maintained multiple automated data extraction and daily job processes, ensuring seamless data workflows.
  • Created ad-hoc SQL queries to generate custom reports, trend analysis, and customer-specific reports.
  • Manipulated data and calculated key metrics for reporting using MySQL queries and MS Excel, facilitating data-driven decision-making.
  • Debugged and resolved execution errors using data logs, trace statistics, and thorough examination of source and target data.
  • Created and scheduled Unix cron jobs to periodically load flat files into Oracle databases, ensuring timely data availability.

Environment: SQL, Unix, Stored procedures, ETL

Soniks Consulting

June 2013 – Dec 2013

  • Assisted in gathering and analyzing business requirements to create data-driven insights for decision-making.
  • Cleaned, transformed, and organized data from various sources to prepare it for analysis using Excel and SQL.
  • Performed basic data analysis using descriptive statistics, identifying trends and patterns to support business operations.
  • Wrote simple SQL queries to extract data from databases and performed data manipulation tasks for reporting purposes.
  • Developed and maintained basic reports and dashboards to track key performance indicators (KPIs) and business metrics.
  • Participated in team meetings to discuss ongoing projects, provide progress updates, and receive feedback from senior data analysts.
  • Prepared various statistical and financial reports using MS Excel.
  • Strong verbal and written communication skills, and an ability to work on project teams, with stakeholders, and across departments.

Environment: Linux, Bash script, MS Office, SQL