Experience
U.S. Food & Drug Administration (FDA)
Jan 2024 – Present
- Explored and identified datasets for large language models (LLMs), focusing on question-answering tasks in biomedical and scientific research.
- Tested various LLMs including Llama and Mistral AI, evaluating response quality, accuracy, and latency.
- Applied advanced statistical techniques to validate model performance and ensure robustness.
- Developed and fine-tuned custom evaluation pipelines using metrics such as precision, recall, F1-score, BLEU, and ROUGE.
- Conducted Q/A benchmarking to verify accuracy and citation validity, producing detailed evaluation reports.
- Utilized Nomic Embed for generating embedding vectors to improve similarity comparisons and LLM response accuracy.
- Performed comprehensive data cleaning and preprocessing to prepare datasets for LLM tasks.
- Created Python scripts for end-to-end pipeline automation: dataset formatting, prompt generation, output handling, and metric-based comparisons (Euclidean distance, cosine similarity).
- Conducted exploratory data analysis (EDA) using Pandas, NumPy, Matplotlib, and Seaborn to identify trends and anomalies in biomedical datasets.
- Designed and implemented a benchmarking system using SQLite3 for storing and analyzing LLM-generated results.
- Leveraged AWS Bedrock for deploying and scaling pre-trained foundation models in custom AI/ML applications.
- Automated provisioning, deployment, and monitoring using AWS CLI for efficient infrastructure management.
- Integrated Amazon S3 for data storage and cloud-based file sharing in ML workflows.
- Monitored AWS EC2 instances to optimize usage, reduce cost, and enhance performance.
- Containerized applications with Docker and deployed them on AWS servers for high-compute environments.
- Built Docker images with minimal size and included all dependencies using
venv
and requirements.txt
for smooth cross-platform deployment. - Created
systemd
service files for automatic application startup and high availability. - Automated system updates, log capturing, and key processes to enhance maintainability and reduce manual effort.
- Integrated ML workflow observability using MLflow, Weights & Biases, and AWS CloudWatch for tracking experiments and performance.
- Set up a new pre-production environment to bridge dev and prod, improving stability and QA.
- Used GitLab for source control, CI/CD pipelines, and collaborative development.
- Integrated aider-chat (AI coding assistant) to speed up dev cycles and improve code quality.
- Collaborated with cross-functional teams to secure GPU resources and optimize LLM performance.
- Provided technical documentation, system support, and ongoing enhancements to ensure reliability and improve workflows.
Environment: AWS, AI/ML, LLM, Bash, Python, GitLab, Docker, AWS Bedrock
Protech Solutions
Dec 2021 – Jan 2024
- Collaborated with Data Engineers and operations team to implement ETL processes and ensure data availability for analysis.
- Wrote and optimized SQL queries to extract, transform, and load data tailored to analytical requirements.
- Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
- Worked on data cleaning and reshaping, generating segmented subsets using NumPy and pandas libraries in Python.
- Created scripts and integrated systems including Python, SQL, and R to analyze large volumes of current and historical data.
- Developed analytical approaches to answer high-level business questions and provide actionable recommendations.
- Developed and implemented predictive models using machine learning algorithms such as linear regression, classification, and Random Forest.
- Evaluated model performance using F-score, accuracy, and precision metrics.
- Leveraged AWS services like S3, EC2, and SageMaker to prepare data, build, deploy, and optimize predictive models.
- Utilized AWS EC2 instances for model training and deployment, optimizing compute resources to handle high computational requirements during model development and testing.
- Integrated AWS Lambda to automate real-time data processing tasks, improving system responsiveness and reducing manual intervention.
- Worked with the NLTK library for natural language processing and pattern recognition in textual data.
- Automated recurring data analysis tasks and reporting pipelines using Python scripts, reducing manual workload and improving efficiency.
- Ensured data integrity and compliance with data governance policies, supporting initiatives related to data privacy and security.
- Managed version control and collaborated on code with team members using GitLab repositories, ensuring smooth collaboration and code integrity.
- Implemented continuous integration (CI) pipelines in GitLab CI/CD to automate testing, model validation, and deployment, increasing code deployment efficiency and reducing errors.
- Documented project requirements and work plans in Confluence and managed progress through Jira Sprints.
- Routinely presented metrics and outcomes of analysis to team members and management to support data-driven decision-making.
Environment: AWS, Python, SQL, NumPy, Pandas, NLTK, Bash, GitLab
Accenture
Jan 2014 – July 2018
- Created and analyzed business requirements to design and implement technical data solutions that align with organizational goals.
- Designed and maintained MySQL databases, creating user-defined functions and stored procedures to automate daily reporting tasks.
- Built ETL pipelines to retrieve data from NoSQL databases and load aggregated data into the analytical platform for analysis.
- Wrote and executed SQL scripts to implement database changes, including table updates, view creation, and the addition of stored procedures.
- Conducted reviews of database objects (tables, views) to assess the current design, identify discrepancies, and provide recommendations for optimization.
- Performed performance tuning and optimization of SQL scripts and stored procedures to improve data processing efficiency and overall database performance.
- Analyzed database discrepancies and synchronized development, pre-production, and production environments with accurate data models.
- Developed and maintained various SQL scripts and job scripts to handle complex transactional and reporting tasks, ensuring data formatting as needed.
- Monitored and maintained multiple automated data extraction and daily job processes, ensuring seamless data workflows.
- Created ad-hoc SQL queries to generate custom reports, trend analysis, and customer-specific reports.
- Manipulated data and calculated key metrics for reporting using MySQL queries and MS Excel, facilitating data-driven decision-making.
- Debugged and resolved execution errors using data logs, trace statistics, and thorough examination of source and target data.
- Created and scheduled Unix cron jobs to periodically load flat files into Oracle databases, ensuring timely data availability.
Environment: SQL, Unix, Stored procedures, ETL
Soniks Consulting
June 2013 – Dec 2013
- Assisted in gathering and analyzing business requirements to create data-driven insights for decision-making.
- Cleaned, transformed, and organized data from various sources to prepare it for analysis using Excel and SQL.
- Performed basic data analysis using descriptive statistics, identifying trends and patterns to support business operations.
- Wrote simple SQL queries to extract data from databases and performed data manipulation tasks for reporting purposes.
- Developed and maintained basic reports and dashboards to track key performance indicators (KPIs) and business metrics.
- Participated in team meetings to discuss ongoing projects, provide progress updates, and receive feedback from senior data analysts.
- Prepared various statistical and financial reports using MS Excel.
- Strong verbal and written communication skills, and an ability to work on project teams, with stakeholders, and across departments.
Environment: Linux, Bash script, MS Office, SQL