Summary
Overview
Work History
Education
Profiles
Additional Information
Accomplishments
Timeline
Generic
Piyush Kulkarni

Piyush Kulkarni

Hamburg

Summary

I work hands-on with AI, machine learning, and data analytics, with a strong base in Python, SQL, and cloud platforms. I build practical, end-to-end solutions using real-world data often involving NLP and generative AI to solve business problems, automate workflows, and improve decision-making. I enjoy turning messy, complex data into clear insights and reliable systems that create measurable impact.

Overview

3
3
years of professional experience

Work History

Data Engineer(Working Student)

STATISTA GmbH
Hamburg, Germany
07.2025 - Current
  • Developed and maintained ETL pipelines for 20+ international data sources.
  • Built automated data sourcing scripts using Selenium and REST APIs to extract economic and statistical datasets from government agencies worldwide.
  • Implemented data lake architecture on AWS S3 with Bronze/Silver layer pattern for raw and processed data storage.
  • Processed large-scale datasets ( files with millions of records) using pandas with chunked processing to optimize memory usage.
  • Created data quality validation and unit mapping transformations to standardize data formats across multiple sources.

Company Project

Galagos AI
Hamburg, Germany
03.2025 - 08.2025
  • Led a team of five in developing the 'Bioinformatics Semantic Search Engine,' focusing on intelligent tool discovery and workflow automation for bioinformatics applications.
  • Architected and implemented the Tool Discovery Agent using LangChain, integrating multiple data sources (MCP servers, EXA Search, ChromaDB, Smithery) to enable natural language-based bioinformatics tool recommendations.
  • Designed and developed the Self-RAG (self-refining retrieval-augmented generation) agent to iteratively improve search results by retrieving additional context and refining responses based on biomedical terminology.
  • Provided technical leadership across all system components including data ingestion pipelines, ChromaDB vector stores with biomedical embeddings, and external platform integrations, unblocking team members on complex architectural decisions.
  • Successfully delivered a prototype-ready semantic search platform that transforms complex bioinformatics queries into actionable tool and workflow recommendations using advanced LLM-based reasoning.
  • Skills: Python, LangChain, ChromaDB, Vector Databases, RAG, Semantic Search, NLP, LLM Integration, MCP Servers, API Integration, Team Leadership, Bioinformatics Tools, Agile Methodology.

Data Scientist

Quantum Innotek Solutions
12.2023 - 08.2024
  • Developed customized dictionary databases and efficiently managed data using SQLAlchemy in a MySQL database, automating tasks with Python and Flask for backend functionality.
  • Created and maintained a customized CI/CD pipeline, integrating Git for continuous deployment to Linode servers.
  • Built a dynamic web application utilizing Flask, JavaScript, Jinja, and SQL, incorporating a payment gateway for seamless transactions.
  • Created a matching algorithm using KNN and FLAN algorithm.
  • Skills: SQL, Python, Machine Learning algorithms, Flask, CI/CD, Git, Payment Gateway Integration, Linode, Agile methodology.

Software Engineer

CIS IT Solutions Pvt Ltd
02.2023 - 12.2023
  • Contributed to the development of machine learning models for various clients, applying Python and NumPy to enhance prediction accuracy.
  • Collaborated with the research team on Big Data Analysis in Epidemiology, utilizing PowerBI for data visualization and MS Excel for exploratory analysis, improving decision-making processes.
  • Developed applications using Microsoft Power Platform, including Power Apps, to automate workflows and improve client business operations, demonstrating my skills in cloud computing with AWS services.
  • Skills: NumPy, Power BI, MS Excel, EDA, Microsoft Power Apps, AWS.

Education

Master of Science - Data Science & AI

SRH University of Heidelberg Campus Hamburg
Hamburg, Germany
09-2026

Bachelor of Technology - Computer Science & Engineering

MIT ADT University
Pune, India
03-2022

Profiles

Linked-In

Additional Information

IoT Smoke Detection Data Pipeline (March 2025 to June 2025) - Link

  • Built a production-ready real-time data pipeline for IoT smoke detection using Apache Kafka, Python, Scikit-learn, Flask, PostgreSQL, Prometheus, Grafana, Docker and Apache Airflow.
  • Implemented streaming and batch ingestion to process live sensor data and historical datasets, with real-time anomaly detection and ML-based smoke prediction exposed via REST APIs.
  • Designed automated ML pipelines for feature engineering, model training (RandomForest/Logistic Regression), evaluation, deployment and continuous performance monitoring with drift checks.
  • Added comprehensive observability (custom Prometheus metrics, Grafana dashboards) and a robust test suite (unit, integration, API, streaming, performance) achieving ~95 percent overall coverage.

Restaurant Data ELT Pipeline (June 2025 to August 2025) - Link

  • Designed a production-style ELT pipeline for a multi-source restaurant dataset using Python, Dagster, DuckDB, Pandas and Azure Blob Storage.
  • Implemented a layered Medallion architecture (Bronze/Silver/Gold) to ingest raw CSV/JSONL data, clean and join entities, and build a central fact table in Parquet.
  • Developed SQL transformation logic in DuckDB to generate analytical marts for Average Order Value and tickets-per-order, ready for BI and reporting use.
  • Configured daily cron-based scheduling in Dagster and built a QA script to verify row counts and metric calculations from the final Parquet outputs.

Topic Modelling Using LDA (July 2021 to May 2022) - Link

  • Developed a Question-Answering (QA) model specifically designed for biomedical text summarization in the context of COVID-19.
  • Leveraged the PICO (Population, Intervention, Comparison & Outcome) framework to formulate questions and extract relevant information from biomedical texts related to COVID- 19.
  • Utilized natural language processing (NLP) techniques to process and analyze biomedical text data.
  • Developed a framework for exploratory literature review using Latent Dirichlet Allocation (LDA) topic modeling.

Accomplishments

  • AWS Certified Cloud Practitioner
  • Copyright: Perspective Approach for Mango Ripening Classifiers

Timeline

Data Engineer(Working Student)

STATISTA GmbH
07.2025 - Current

Company Project

Galagos AI
03.2025 - 08.2025

Data Scientist

Quantum Innotek Solutions
12.2023 - 08.2024

Software Engineer

CIS IT Solutions Pvt Ltd
02.2023 - 12.2023

Master of Science - Data Science & AI

SRH University of Heidelberg Campus Hamburg

Bachelor of Technology - Computer Science & Engineering

MIT ADT University
Piyush Kulkarni