Summary

Overview

Work History

Education

Profiles

Additional Information

Accomplishments

Timeline

Piyush Kulkarni

Hamburg

Summary

I work hands-on with AI, machine learning, and data analytics, with a strong base in Python, SQL, and cloud platforms. I build practical, end-to-end solutions using real-world data often involving NLP and generative AI to solve business problems, automate workflows, and improve decision-making. I enjoy turning messy, complex data into clear insights and reliable systems that create measurable impact.

Overview

years of professional experience

Work History

Data Engineer(Working Student)

STATISTA GmbH

Hamburg, Germany

07.2025 - Current

Developed and maintained ETL pipelines for 20+ international data sources.
Built automated data sourcing scripts using Selenium and REST APIs to extract economic and statistical datasets from government agencies worldwide.
Implemented data lake architecture on AWS S3 with Bronze/Silver layer pattern for raw and processed data storage.
Processed large-scale datasets ( files with millions of records) using pandas with chunked processing to optimize memory usage.
Created data quality validation and unit mapping transformations to standardize data formats across multiple sources.

Company Project

Galagos AI

Hamburg, Germany

03.2025 - 08.2025

Led a team of five in developing the 'Bioinformatics Semantic Search Engine,' focusing on intelligent tool discovery and workflow automation for bioinformatics applications.
Architected and implemented the Tool Discovery Agent using LangChain, integrating multiple data sources (MCP servers, EXA Search, ChromaDB, Smithery) to enable natural language-based bioinformatics tool recommendations.
Designed and developed the Self-RAG (self-refining retrieval-augmented generation) agent to iteratively improve search results by retrieving additional context and refining responses based on biomedical terminology.
Provided technical leadership across all system components including data ingestion pipelines, ChromaDB vector stores with biomedical embeddings, and external platform integrations, unblocking team members on complex architectural decisions.
Successfully delivered a prototype-ready semantic search platform that transforms complex bioinformatics queries into actionable tool and workflow recommendations using advanced LLM-based reasoning.
Skills: Python, LangChain, ChromaDB, Vector Databases, RAG, Semantic Search, NLP, LLM Integration, MCP Servers, API Integration, Team Leadership, Bioinformatics Tools, Agile Methodology.

Data Scientist

Quantum Innotek Solutions

12.2023 - 08.2024

Developed customized dictionary databases and efficiently managed data using SQLAlchemy in a MySQL database, automating tasks with Python and Flask for backend functionality.
Created and maintained a customized CI/CD pipeline, integrating Git for continuous deployment to Linode servers.
Built a dynamic web application utilizing Flask, JavaScript, Jinja, and SQL, incorporating a payment gateway for seamless transactions.
Created a matching algorithm using KNN and FLAN algorithm.
Skills: SQL, Python, Machine Learning algorithms, Flask, CI/CD, Git, Payment Gateway Integration, Linode, Agile methodology.

Software Engineer

CIS IT Solutions Pvt Ltd

02.2023 - 12.2023

Contributed to the development of machine learning models for various clients, applying Python and NumPy to enhance prediction accuracy.
Collaborated with the research team on Big Data Analysis in Epidemiology, utilizing PowerBI for data visualization and MS Excel for exploratory analysis, improving decision-making processes.
Developed applications using Microsoft Power Platform, including Power Apps, to automate workflows and improve client business operations, demonstrating my skills in cloud computing with AWS services.
Skills: NumPy, Power BI, MS Excel, EDA, Microsoft Power Apps, AWS.

Education

Master of Science - Data Science & AI

SRH University of Heidelberg Campus Hamburg

Hamburg, Germany

09-2026

Bachelor of Technology - Computer Science & Engineering

MIT ADT University

Pune, India

03-2022

Profiles

Linked-In

Additional Information

IoT Smoke Detection Data Pipeline (March 2025 to June 2025) - Link

Built a production-ready real-time data pipeline for IoT smoke detection using Apache Kafka, Python, Scikit-learn, Flask, PostgreSQL, Prometheus, Grafana, Docker and Apache Airflow.
Implemented streaming and batch ingestion to process live sensor data and historical datasets, with real-time anomaly detection and ML-based smoke prediction exposed via REST APIs.
Designed automated ML pipelines for feature engineering, model training (RandomForest/Logistic Regression), evaluation, deployment and continuous performance monitoring with drift checks.
Added comprehensive observability (custom Prometheus metrics, Grafana dashboards) and a robust test suite (unit, integration, API, streaming, performance) achieving ~95 percent overall coverage.

Restaurant Data ELT Pipeline (June 2025 to August 2025) - Link

Designed a production-style ELT pipeline for a multi-source restaurant dataset using Python, Dagster, DuckDB, Pandas and Azure Blob Storage.
Implemented a layered Medallion architecture (Bronze/Silver/Gold) to ingest raw CSV/JSONL data, clean and join entities, and build a central fact table in Parquet.
Developed SQL transformation logic in DuckDB to generate analytical marts for Average Order Value and tickets-per-order, ready for BI and reporting use.
Configured daily cron-based scheduling in Dagster and built a QA script to verify row counts and metric calculations from the final Parquet outputs.

Topic Modelling Using LDA (July 2021 to May 2022) - Link

Developed a Question-Answering (QA) model specifically designed for biomedical text summarization in the context of COVID-19.
Leveraged the PICO (Population, Intervention, Comparison & Outcome) framework to formulate questions and extract relevant information from biomedical texts related to COVID- 19.
Utilized natural language processing (NLP) techniques to process and analyze biomedical text data.
Developed a framework for exploratory literature review using Latent Dirichlet Allocation (LDA) topic modeling.

Accomplishments

AWS Certified Cloud Practitioner
Copyright: Perspective Approach for Mango Ripening Classifiers

Timeline

Data Engineer(Working Student)

STATISTA GmbH

07.2025 - Current

Company Project

Galagos AI

03.2025 - 08.2025

Data Scientist

Quantum Innotek Solutions

12.2023 - 08.2024

Software Engineer

CIS IT Solutions Pvt Ltd

02.2023 - 12.2023

Master of Science - Data Science & AI

SRH University of Heidelberg Campus Hamburg

Bachelor of Technology - Computer Science & Engineering

MIT ADT University

Piyush Kulkarni

Summary

Overview

Work History

Data Engineer(Working Student)

Company Project

Data Scientist

Software Engineer

Education

Master of Science - Data Science & AI

Bachelor of Technology - Computer Science & Engineering

Profiles

Additional Information

Accomplishments

Timeline

Data Engineer(Working Student)

Company Project

Data Scientist

Software Engineer

Master of Science - Data Science & AI

Bachelor of Technology - Computer Science & Engineering

Similar Profiles

Samantha BurnsSamantha Burns

Marin McCarthyMarin McCarthy

David KlümperDavid Klümper

Sidhant Konwar RoySidhant Konwar Roy

Sophie SchwarzenbergSophie Schwarzenberg