PythonPython· SQL· Apache AirflowApache Airflow· dbtdbt· AWS S3AWS S3· PySpark· Power BIPower BI· FastAPIFastAPI· StreamlitStreamlit· LangChainLangChain· SQLSQL· Docker· Scikit-learn· FlaskFlask· PythonPython· SQL· Apache AirflowApache Airflow· dbtdbt· AWS S3AWS S3· PySpark· Power BIPower BI· FastAPIFastAPI· StreamlitStreamlit· LangChainLangChain· SQLSQL· Docker· Scikit-learn· FlaskFlask·
Welcome to my portfolio

I'm Kirtan Soni,
a Data Engineer

Building scalable data systems & intelligent AI solutions

Designing end-to-end ELT pipelines, cloud data warehouses, and AI/ML applications. Focused on building reliable systems that turn raw data into actionable business insights.

Kirtan Soni
Cisco Introduction to Data Science — Cisco· SAP Certified Associate — SAP Analytics Cloud· Cisco Data Analytics Essentials — Cisco· Coursera ChatGPT + Excel AI Analytics — Coursera· IBM Data Visualization Foundations — IBM / Coursera· Cisco Introduction to Data Science — Cisco· SAP Certified Associate — SAP Analytics Cloud· Cisco Data Analytics Essentials — Cisco· Coursera ChatGPT + Excel AI Analytics — Coursera· IBM Data Visualization Foundations — IBM / Coursera·

Modern Data Pipeline Architecture

The medallion architecture (Bronze → Silver → Gold) represents my approach to building scalable data systems. Each layer serves a specific purpose in transforming raw data into reliable, analysis-ready insights.

Data Source
APIs · Databases · Files
Airflow
Apache Airflow
Orchestration & DAGs
AWS
AWS S3
Bronze Layer
Snowflake
Snowflake
Cloud DW
dbt
dbt
Transformation · SCD
Power BI
Power BI
Analytics & BI

Building scalable data systems
for meaningful decisions.

I am Kirtan Soni, a final-year Computer Engineering student focused on Data Engineering, Data Analytics, and Machine Learning. I build end-to-end ELT pipelines using Apache Airflow, dbt, Snowflake, and AWS — implementing medallion architecture (Bronze, Silver, Gold) for reliable, analytics-ready data.

On the AI/ML side, I develop intelligent applications using LangChain, LLMs, Scikit-learn, FastAPI, and Streamlit — from real-time fraud detection systems to Generative AI tools that automate workflows. I'm driven by a simple goal: turn raw data into decisions that matter.

Final
Year Computer Engineering
7+
Production Projects
3
Core Focus Areas
4+
Certifications Earned
2026
Available for Full-time
Learning Mindset

Education

  • B.Tech — Computer Engineering
    G H Patel College, Gujarat · 2022–2026
    Final Year
  • Class XII — Science Stream
    Bits Education High School, Khambhat
    Science

Interests & Passions

Generative AI
Data Engineering
Machine Learning
Cloud Platforms
BI Dashboards
Cricket
Open Source

My Skills

Technical Stack

Python
Python
Language
SQL
SQL
Language
PySpark
PySpark
Processing
Airflow
Airflow
Orchestration
dbt
dbt
Transformation
Snowflake
Snowflake
Warehouse
AWS
AWS S3
Cloud
MySQL
MySQL
Database
Power BI
Power BI
Visualization
FastAPI
FastAPI
API
Streamlit
Streamlit
App
Flask
Flask
Framework
LangChain
LangChain
GenAI
LLMs
LLMs
GenAI
Sklearn
Scikit-learn
ML
Docker
Docker
Infrastructure
Jupyter
Jupyter
Tool

Soft Skills

Problem SolvingCore
Analytical ThinkingCore
Decision-MakingStrategy
CommunicationInterpersonal
Team CollaborationTeamwork
AdaptabilityExecution
Critical ThinkingStrategy
Attention to DetailQuality
IntegrityValues
Time ManagementExecution

Verified Credentials

Click any card to verify on the issuer's official site.

Education & Background

G H Patel College of Engineering & Technology
Bachelor of Technology — Computer Engineering
Anand, Gujarat, IndiaAug 2022 – May 2026
  • Focused on building production-grade data pipelines, analytics dashboards, and ML applications.
  • Specialized in Data Engineering, Data Analytics, AI/ML, and Cloud Computing through self-led projects.
  • Applied academic and hands-on learning to real-world systems used for analysis and automation.
  • Actively building a portfolio aimed at full-time industry roles.
Computer Engineering Data Engineering AI/ML
Bits Education High School
Higher Secondary — Science Stream
Khambhat, GujaratCompleted 2022
  • Completed Class XII in the Science stream with focus on Physics, Chemistry, Mathematics, and Computer Science.
  • Built a strong foundation in logical reasoning, mathematics, and computer science fundamentals.
Science Stream
Data Engineering Projects
ELT Pipelines · Cloud Warehousing · Orchestration
Mar 2026 – Present
  • Built end-to-end ELT pipelines using Apache Airflow DAGs for automated data ingestion from APIs.
  • Implemented medallion architecture (Bronze → Silver → Gold) on AWS S3 and Snowflake.
  • Developed dbt transformation models with data quality tests for analytics-ready datasets.
  • Created Power BI dashboards connected to Snowflake Gold layer for business insights.
Apache AirflowdbtSnowflakeAWS S3Docker
AI/ML & GenAI Projects
Machine Learning · LLMs · Generative AI
Jan 2025 – Present
  • Deployed ML models (Random Forest, SVM, KNN) via FastAPI achieving 92% accuracy on validation sets.
  • Built a Generative AI cold email tool using LangChain + LLaMA3 + ChromaDB with RAG pipelines.
  • Developed real-time fraud detection system with Flask, MySQL, and Power BI dashboards.
  • Reduced manual workflows by 70%+ through AI automation across multiple projects.
LangChainLLaMA3FastAPIScikit-learnFlask

What I've Built

RECENTMar 2026 – Present
AQI Real-Time Data Pipeline
End-to-end ELT pipeline ingesting real-time Air Quality Index data from Government of India Open Data API. Orchestrated with Airflow DAGs, stored in AWS S3, and transformed via dbt into Snowflake (Bronze → Silver → Gold). Power BI dashboards on top.
100K+ Records Real-Time
PythonApache AirflowdbtSnowflakeAWS S3DockerPower BI
RECENTMar 2026 – Present
Airbnb Data Engineering Pipeline
End-to-end data warehouse pipeline for Airbnb listings, bookings, and host data. Ingested raw data into AWS S3, implemented dbt models with medallion architecture, and built SCD Type 2 snapshots for historical data tracking in Snowflake.
Medallion Architecture SCD Type 2
SnowflakedbtAWS S3PythonSCD Type 2
FEATUREDJun 2025
Cold Email Generator
Generative AI tool that extracts job details from career pages using NLP and RAG pipelines. Uses LangChain + LLaMA3 to generate personalized cold emails. ChromaDB for semantic retrieval improves content relevance by 35%. Deployed on Streamlit, reducing manual effort by 70%.
35% Relevance ↑ 70% Automation
LangChainChromaDBLLaMA3StreamlitRAGNLP
Jun 2025
Credit Card Fraud Detection System
ML-powered fraud detection engine integrated into a Flask application with automated transaction monitoring. Reduced false positives by 15%. Real-time Power BI dashboards track high-risk activity and model performance, enabling faster intervention.
92% Accuracy 15% FP Reduction
FlaskPythonMySQLMLPower BI
Apr 2025
Wine Quality Prediction
ML pipeline predicting wine quality from physicochemical properties. Trained Random Forest, SVM, KNN — achieving 92% validation accuracy. Deployed via FastAPI processing 200+ queries/min. Data preprocessing time cut by 30%, with a Power BI Live dashboard improving reliability by 12% over baseline.
92% Accuracy 30% Faster
PythonScikit-learnFastAPIPower BI
Jan 2025
Social Media Fatigue Dashboard AI
End-to-end solution analyzing and predicting digital fatigue from social media usage. RESTful API using Flask integrated with an ML model for real-time predictions. Automated data pipeline to MySQL, with an interactive Power BI dashboard for non-technical stakeholders.
Predictive Analytics Real-Time
Power BIMySQLPythonFlaskML
2023
Pizza Sales Dashboard
Interactive dashboard using Excel and SQL to analyze pizza sales data. SQL used for ETL processes and Excel for dynamic, drill-down dashboards. Visualizes top-selling products, peak hours, and revenue trends with stakeholder-friendly insights.
Sales Insights Drill-Down
ExcelSQLData Analysis
Let's connect to build
something meaningful.

Open to Data Engineering, AI/ML, and Data Analyst opportunities. Based in Gujarat, India — open to remote and relocation.