Data-driven Product Thinker building scalable growth & experimentation systems

I sit at the intersection of data science, ML, and user behaviour — turning raw signals into product decisions that drive measurable growth. I don't just analyse; I hypothesise, experiment, and close the loop.

Experience

Where I've worked, what I built, and the technical systems behind the results.

Data Analyst
WorkIndia
May 2025 - Present

Working on the Candidate Growth team combining data engineering, ML feature work, and campaign automation to improve how job notifications reach and convert the right candidates.

📊 Data & Analytics
  • Built and maintained MySQL + Metabase dashboards tracking L0 notification funnel metrics (sent → delivered → read → clicked → applied) at daily, weekly, and segment granularity.
  • Authored complex Athena SQL queries joining event logs, user metadata, and campaign config tables to produce CPD (Cost per Download) and CPR (Cost per Registration) views across WhatsApp, push, and email channels.
  • Designed a star-schema data model (facts + dimensions) for the notification system, enabling ad-hoc metric queries without schema changes.
⚙️ Systems & Automation
  • Automated WhatsApp campaigns via cron-based segmentation pipelines scoring each user by engagement window (morning / afternoon / evening) using 30-day rolling event history.
  • Built an auto-refresh pipeline that re-evaluates user segment assignments every 14 days, detecting drift via a 20% engagement-window shift threshold.
  • Designed an Agentic Copy Writer pipeline using LLMs to generate personalised, on-brand notification copy per segment, structured output with character-limit guards, ready for direct injection into the campaign scheduler.
🧠 ML & Modelling
  • Built the Notification Time Affinity Model: a rule-based send-time scoring system that computes per-user engagement windows from historical event streams, reducing back-off to a single best-time-of-day approximation.
  • Analysed 2024 - 2025 tech job market trend data to derive demand-weighted job category signals, used to improve default targeting logic for cold-start users.
  • Developed weighted composite notification health metrics balancing CTR, application rate, and unsubscribe rate to prevent vanity metric optimisation in campaign scoring.
MySQL Athena SQL Metabase Python Cron WhatsApp API LLMs Data Modelling
Research Analyst Intern
Hevo Data
June 2024 - Dec 2024

Worked across data infrastructure, AI-powered product tooling, and content-driven conversion, building systems that turned technical evaluators into qualified leads.

🔗 Data Infrastructure
  • Administered 80+ ETL/ELT pipelines in Hevo's platform covering real-time and batch workflows, with connector development via REST APIs and secure TCP/IP data channels.
  • Ensured data reliability for downstream marketing attribution maintaining pipeline health metrics and debugging connector-level failures across Snowflake, Redshift, and BigQuery destinations.
  • Worked with Snowflake's credit-based pricing model at depth including virtual warehouse sizing, query optimisation, and cost attribution per pipeline.
🤖 AI Product Tooling
  • Designed and built a context-sensitive support chatbot using LangChain + Gemini 1.5 Flash + FAISS vector retrieval deployed across 50+ technical blogs with sub-second retrieval latency.
  • Implemented fine-tuned prompt engineering with structured output constraints to keep responses within Hevo brand voice and within token limits across varied user queries.
  • Instrumented chatbot usage to identify top unanswered query patterns feeding insights back into content and documentation teams.
📈 Growth & Conversion
  • Led design and implementation of the Snowflake Pricing Calculator an interactive self-serve tool embedded at key mid-funnel content pages to reduce friction in the technical buyer journey.
  • Mapped the full technical evaluator → MQL conversion funnel, identifying pricing ambiguity as the primary drop-off driver which directly motivated the calculator feature.
LangChain Gemini 1.5 Flash FAISS Snowflake REST APIs Python ETL/ELT Redshift
Research & Development Intern
Samsung Prism
Nov 2023 – May 2024

Built ML systems for large-scale app policy violation detection combining transformer-based classification, anomaly detection, and RAG-based explainability to reduce false positive rates at scale.

🧠 ML Modelling
  • Fine-tuned Roberta-Large for multi-class policy violation classification on Samsung's proprietary app review dataset achieving 86.3% accuracy on a held-out test set.
  • Designed and ran the full ML training pipeline: data cleaning, label alignment, tokenisation, hyperparameter tuning (learning rate schedules, weight decay), and evaluation via confusion matrix + F1 breakdown by class.
  • Built monitoring systems to detect anomalies in 10K+ user interactions using statistical baselines + sliding window z-score alerts to flag unusual app behaviour patterns.
🏗️ RAG Architecture
  • Implemented a Retrieval-Augmented Generation (RAG) pipeline using Llama 3.2 1B as the base LLM enabling the system to generate natural-language explanations for violation flags, making moderation decisions auditable.
  • Built a policy document chunk index using dense retrieval so the LLM's generated explanations were grounded in specific policy clauses, not hallucinated.
  • Reduced false positives by 40% by layering the RAG-based explanation pass on top of the classifier output reviewer rejections of low-confidence flags dropped significantly.
Roberta-Large Llama 3.2 RAG PyTorch HuggingFace Python Anomaly Detection

Measurable Impact

Numbers that reflect decisions made, not just tasks completed.

Product Case Studies

How I've approached real problems — from hypothesis to impact.

WorkIndia - Performance Marketing Automation

Experimentation Segmentation Growth ML

Hevo Data - Funnel Conversion Engine

Product Design Conversion Self-serve

Samsung Prism - LLM Safety Guardian

ML Systems NLP Anomaly Detection

Independent Technical Projects

Explorations in modeling and simulation outside of work.

🚗
10K+ Scenarios Optimised

EvoDrive - Genetic Algorithm Simulation

Can evolutionary algorithms solve pathfinding better than rule-based systems?

🧠
Fine-tuned LLM for SQL

Text-to-SQL (StarCoder2 Fine-tune)

How cheap can you make an LLM that's actually good at SQL generation?

🔍
RAG Search Engine

Perplexa

A conversational engine capable of grounded, real-time web retrieval.

📹
CV + Analytics Pipeline

Vehicle Movement Analysis & Insight Generation

Extracting structured, queryable data from unstructured video streams of vehicle traffic.

Experimentation & Growth Systems

Building infrastructure that makes decisions repeatable, not one-off.

🧪
A/B Testing via Cron Segmentation

Segmented user cohorts by engagement history to run controlled timing experiments moving from intuition-based send times to data-validated windows.

📐
Metric Design (Weighted Metrics)

Designed composite notification health metrics that balance engagement with user fatigue avoiding optimisation for a single vanity metric.

🔄
Auto-Refresh System (14-day cycle)

Built a pipeline that automatically re-evaluates user segment assignments every 14 days based on fresh engagement data preventing segment staleness.

How I Think About Products

Mental models I use to make better product decisions.

01
Optimise for downstream metrics, not vanity metrics

Open rate is a proxy. Application rate is a business outcome. I always trace the metric chain to find what actually matters and set guardrails on the rest.

02
Balance engagement with user fatigue

More notifications ≠ more engagement. Every product decision has an attention budget. I always model the unsubscribe cost alongside the click gain before shipping a frequency change.

03
Prefer systems over one-off solutions

A script that runs once is technical debt. A pipeline that runs every 14 days, logs itself, and handles edge cases is a product. I build for the second run, not the first.

04
Use data to guide, not dictate decisions

Data shows what happened; it rarely explains why. I combine quantitative signals with user behaviour intuition and business context never outsourcing the decision to a number alone.

05
Ship experiments, not assumptions

Every product strong opinion is a hypothesis in disguise. I default to small, fast experiments before committing to large builds; minimising the cost of being wrong.

06
Make the invisible legible

The best product insight is the one nobody sees yet, buried in event logs, latency spikes, or drop-off points. I treat data exploration as a product skill, not just an analytical one.

Education

KIIT University

B.Tech in Computer Science (2021-2025)
CGPA: 8.9

Sri Chaitanya Techno School

All India Senior School Certificate Examination (2020-2021)
Percentage: 93%

St. Patricks HS School

Indian Certificate of Secondary Education Examination (2018-2019)
Percentage: 92%

Publications

Precision Agriculture: Digital Twins and Advanced Crop Recommendation

IEEE ICOCT 2025 (Feb 2025)

Authors: Sayan Banerjee, Aniruddha Mukherjee, Suket Kamboj
DOI: 10.48550/arXiv.2502.04054

Efficient Waste Collection and Filtration using IOT

IJSREM (Jan 2023)

Authors: Sayan Banerjee, Rahul Naugariya, Shubham Patel, Shubham Kumar
DOI: 10.55041/IJSREM17403

Skills

Achievements

1st Position in Eureka Innovation

IIT Kharagpur (Feb 2024)

Machine Learning Specialization

DeepLearning.ai (Mar 2023)

Applied Python

Udemy (Dec 2022)

Extra-Curricular

Volunteering