SHK.

Loading

Sai Hemanth Kilaru.

Data Scientist & AI Engineer

I build intelligent
systems.

Bridging
data and decisions.

Beyond the Code.

I am a Data Scientist and AI Engineer driven by the challenge of transforming complex, unstructured data into actionable intelligence. My background spans predictive analytics, generative AI, and distributed systems.

Whether it's fine-tuning leading LLMs for medical precision, building autonomous agents, or architecting robust data pipelines, I thrive at the intersection of rigorous research and practical application.

Currently pursuing my studies at the University of Arizona, I am constantly exploring the bleeding edge of machine learning to build faster, smarter, and more reliable AI solutions.

AI & ML

Building intelligent systems with LLMs and robust algorithms.

Data Engineering

Constructing scalable pipelines and structured architectures.

Full-Stack Dev

Crafting seamless user experiences from backend to frontend.

Optimization

Making models faster, cheaper, and more accurate.

Work Experience

Data Science Intern

ORBO(In association with Teachnook) · Remote · Mar 2023 - Apr 2023 · 2 mos

Improved data accuracy by 40% and cut analysis time by 50% by creating an image analysis tool with advanced OpenCV and Python techniques. Debugging times dropped by 30% after I restructured workflows and optimized team collaboration. This internship gave me real-world expertise in machine learning models, scikit-learn, and application development, with a focus on delivering fast, accurate results.

Responsibilities

  • Created an image analysis tool using OpenCV and Python.
  • Restructured workflows and optimized team collaboration.
  • Gained real-world expertise in scikit-learn and application development.

Impact

  • Improved data accuracy by 40%.
  • Cut analysis time by 50%.
  • Dropped debugging times by 30%.
PythonOpenCVscikit-learnAnalytical SkillsProject Management

Machine Learning Intern

AICTE · India · Sep 2023 – Nov 2023

Worked on end-to-end machine learning pipeline development for supervised classification problems, focusing on model optimization, reproducibility, and workflow efficiency.

Responsibilities

  • Engineered complete ML pipelines including preprocessing, feature engineering, model training, validation, and evaluation.
  • Tuned hyperparameters of scikit-learn classification models, improving validation performance by 10–15%.
  • Standardized experimentation workflows and documentation, reducing model iteration cycles by 25%.
  • Implemented structured model evaluation using cross-validation and performance metrics benchmarking.

Impact

  • Improved reproducibility of ML experiments.
  • Reduced development time through structured workflow design.
  • Delivered optimized classification models ready for deployment.
Pythonscikit-learnPandasNumPyCross-ValidationData Preprocessing

Data Analytics Intern

AICTE · India · May 2023 – Jul 2023

Designed and implemented cloud-based data analytics solutions using AWS infrastructure and analytics services focusing on secure, scalable architectures.

Responsibilities

  • Built and managed cloud-based analytics pipelines using Amazon EC2, S3, RDS, IAM, and CloudFront.
  • Architected secure virtual environments using Amazon VPC and security groups to ensure data isolation and access control.
  • Implemented data ingestion, transformation, and querying workflows using AWS-native services.
  • Evaluated AWS pricing models and optimized cost-performance trade-offs for analytics workloads.
  • Applied data lake concepts including collection, storage, and processing patterns for large-scale structured datasets.

Impact

  • Reduced manual data handling effort by 30% through automation.
  • Improved access reliability and infrastructure security.
  • Gained hands-on experience with production-like cloud analytics architecture.
AWS EC2AWS S3RDSIAMVPCCloudFrontSQLCloud Architecture

Coding domain member

SRM ASV · Chennai, Tamil Nadu, India · On-site · Apr 2022 - Sep 2022 · 6 mos

During my time at ASV SRM, I was recruited as a member of the coding domain, where I gained hands-on experience and learned the fundamentals of Machine Learning and Ubuntu. I successfully completed the training period, developing key technical skills, including problem-solving and coding. I was also assigned to contribute to a club project, which, unfortunately, couldn't be completed due to the club becoming inactive. Despite this, the experience helped me build a solid foundation in tech and teamwork.

Responsibilities

  • Gained hands-on experience in Machine Learning and Ubuntu.
  • Successfully completed the technical training period.
  • Developed key technical skills including problem-solving and coding.
  • Contributed to a club-level team project.

Impact

  • Built a solid foundation in technology.
  • Enhanced teamwork and collaborative problem-solving skills.
Machine LearningUbuntuProblem SolvingCoding

Selected Work

LLM Systems

Autonomous LLM-Powered Data Insights Engine

Jun 2025 – Sep 2025

Built a fully autonomous AI system that ingests arbitrary datasets and generates deterministic, validated analytical insights and publication-ready visualizations. Reduced analysis time from hours to minutes, decreased hallucinated outputs by 40%, and increased dataset robustness by 60%.

PythonPandasPlotnineLangChainLiteLLMQwen2.5-Coder-32B
Agent-Based Modeling

Bio-Inspired Routing in Social Wasps

Oct 2025 – Present

Simulated decentralized feeding behavior in Ropalidia marginata and benchmarked heuristic efficiency against TSP solutions. Demonstrated that greedy local heuristics achieve near-optimal efficiency, with implications for swarm robotics.

PythonMesaNumPyPandasMatplotlib
Data Science

Consumer Electronics Recommendation Analytics

Feb 2025 – May 2025

Analyzed 1,000+ Best Buy products to identify key drivers of recommendation percentage. Used OLS, Ridge, Lasso regression and HuggingFace BART zero-shot sentiment analysis. Detected seasonal peaks in April and November.

PythonDashPlotlyARIMAHuggingFacePandas
NLP

Crowdfunding Campaign Success Prediction

Feb 2025 – May 2025

Trained on 15K+ real GoFundMe campaigns using TF-IDF vectorization and ensemble ML (Random Forest, Gradient Boosting). Achieved 99% accuracy and 0.999 ROC-AUC. Identified gratitude and personal tone as top success drivers.

TF-IDFRandom ForestGradient BoostingLogistic RegressionPython
LLM Fine-Tuning

Mistral-7B LoRA Medical Fine-Tuning

Aug 2024 – Nov 2024

Fine-tuned Mistral-7B on 256K medical samples using LoRA/PEFT on Colab T4 GPU. Achieved 20%+ improvement in response relevance over base zero-shot model, evaluated via ROUGE-L & BERTScore.

Mistral-7BLoRAPEFTHuggingFaceROUGE-LBERTScore
Healthcare AI

Early Sepsis Prediction (IEEE Published)

Aug 2023 – Dec 2023

Trained on 40K+ patient records using Random Forest and Gradient Boosting with imputation and normalization. Achieved 99% accuracy, reduced false negatives, and deployed as a Streamlit & PowerBI dashboard. Published in IEEE Xplore.

Random ForestGradient BoostingStreamlitPowerBIPython
Computer Vision

AI-Powered Self-Proctoring Platform

Jan 2024 – May 2024

Built a real-time proctoring system using YOLOv5 for phone detection and OpenCV for gaze/face orientation tracking. Generates a live productivity score enforcing academic integrity without human oversight.

YOLOv5OpenCVFlaskPythonComputer Vision
Predictive Analytics

Healthcare Demand Forecasting

Oct 2024 – Dec 2024

Processed 50K simulated patient records using Random Forest and Multinomial regression for revenue segmentation. Key insight: the 65+ age group contributes 29% of total billing. Deployed with interactive Quarto dashboards.

Random ForestMultinomial RegressionQuartoPythonRevenue Analytics
Business Intelligence

Netflix Global Content Strategy Dashboard

Built a Tableau dashboard revealing Netflix's global content strategy: 68% Movies vs 32% TV Shows, exponential growth from 2014–2019, and TV-MA as the dominant rating. Provides actionable insights for content investment decisions.

TableauData VisualizationBusiness Intelligence
Deep Learning

Early Detection of Neurological Disorders (CNN)

May 2023 – Jul 2023

Classified Alzheimer's, Parkinson's, and Brain Tumors from MRI scans using CNN and transfer learning. Applied histogram equalization and image preprocessing pipelines to enhance feature extraction from medical imagery.

CNNTransfer LearningOpenCVTensorFlowMedical Imaging
Data Engineering

YouTube Performance Analytics Dashboard

Apr 2022 – Jun 2022

Built an automated ETL pipeline for YouTube channel analytics featuring difference-from-median benchmarking and 30-day cumulative growth tracking. Enables content creators to make data-driven production decisions.

PythonETLPandasData VisualizationAnalytics
Business Intelligence

Airbnb Market Analysis Dashboard

Interactive Tableau dashboard analyzing Airbnb listing data to uncover pricing trends, neighborhood demand, and seasonal occupancy patterns. Helps hosts and investors optimize pricing strategies for maximum revenue.

TableauData AnalysisBusiness Intelligence
Data Extraction

Web Scraping Pipeline (Amazon & Stanford)

Engineered web scrapers for Amazon product listings and Stanford faculty pages using Python and BeautifulSoup. Built clean, parsed data pipelines with structured output for downstream analysis and research.

PythonBeautifulSoupWeb ScrapingData Parsing
Full-Stack

Taxi Management System

Full-stack ride management platform with driver/passenger authentication, ride booking & scheduling, in-app ratings, messaging, and a loyalty reward program. Includes advanced routing for efficient dispatch.

SQLBackend LogicWeb StackSystem Design

Technical Toolkit.

A comprehensive overview of the languages, frameworks, and tools I use to build scalable, intelligent architectures.

Python
C
C++
SQL (Postgres, PL/SQL)
R
HTML / CSS
Supervised & Ensemble Learning
Random Forest
Gradient Boosting
Logistic Regression
Feature Engineering
Model Evaluation
Hyperparameter Tuning
Fine-Tuning LLMs (LoRA, PEFT)
Zero-Shot Learning
Sentiment Analysis
Transfer Learning
Computer Vision (YOLOv5, CNN)
Agent-Based Modeling
Heuristic Optimization
Exploratory Data Analysis (EDA)
Regression Modeling
Time-Series Forecasting (ARIMA)
Statistical Testing
Python
C
C++
SQL (Postgres, PL/SQL)
R
HTML / CSS
Supervised & Ensemble Learning
Random Forest
Gradient Boosting
Logistic Regression
Feature Engineering
Model Evaluation
Hyperparameter Tuning
Fine-Tuning LLMs (LoRA, PEFT)
Zero-Shot Learning
Sentiment Analysis
Transfer Learning
Computer Vision (YOLOv5, CNN)
Agent-Based Modeling
Heuristic Optimization
Exploratory Data Analysis (EDA)
Regression Modeling
Time-Series Forecasting (ARIMA)
Statistical Testing
Data Normalization
Data Cleaning
ETL Pipelines
Business Intelligence Dashboarding
LangChain
LiteLLM
HuggingFace Transformers
Prompt Engineering
Deterministic Inference Control
LLM Evaluation
Safe Code Generation Pipelines
PySpark (Basic)
Kafka (Basic)
Data Pipelines
Schema Normalization
Streaming Fundamentals
Distributed Processing
AWS (EC2, S3, RDS, Glue, Lambda, Bedrock)
GCP (Basic)
Docker
CI/CD Fundamentals
REST APIs
FastAPI
Flask
Data Normalization
Data Cleaning
ETL Pipelines
Business Intelligence Dashboarding
LangChain
LiteLLM
HuggingFace Transformers
Prompt Engineering
Deterministic Inference Control
LLM Evaluation
Safe Code Generation Pipelines
PySpark (Basic)
Kafka (Basic)
Data Pipelines
Schema Normalization
Streaming Fundamentals
Distributed Processing
AWS (EC2, S3, RDS, Glue, Lambda, Bedrock)
GCP (Basic)
Docker
CI/CD Fundamentals
REST APIs
FastAPI
Flask

Programming Languages

PythonCC++SQL (Postgres, PL/SQL)R+1 more

Machine Learning & AI

Supervised & Ensemble LearningRandom ForestGradient BoostingLogistic RegressionFeature Engineering+9 more

Data Science & Analytics

Exploratory Data Analysis (EDA)Regression ModelingTime-Series Forecasting (ARIMA)Statistical TestingData Normalization+3 more

LLM & Generative AI

LangChainLiteLLMHuggingFace TransformersPrompt EngineeringDeterministic Inference Control+2 more

Data Engineering

PySpark (Basic)Kafka (Basic)Data PipelinesSchema NormalizationStreaming Fundamentals+1 more

Cloud & DevOps

AWS (EC2, S3, RDS, Glue, Lambda, Bedrock)GCP (Basic)DockerCI/CD FundamentalsREST APIs+4 more

Data Visualization & BI

PowerBITableauPlotlyPlotnineMatplotlib+3 more

Developer Tools

GitJupyter NotebookGoogle ColabVS CodePyCharm+3 more

Research & Methods

Agent-Based SimulationGraph Theory ConceptsTSP ApproximationCohen's KappaHypothesis Testing+2 more

Let's Build the Future.

Open for Data Science, AI/ML roles, and cutting-edge research collaborations. Reach out to discuss how we can turn complex problems into intelligent solutions.

© 2026 Sai Hemanth Kilaru. Built with Next.js & Framer Motion.