Summary
Overview
Work History
Education
Skills
Timeline
Objective
Languages
Personal Information
Target Job
Objective
Languages
Personal Information
Endorsements
Generic
Anatolie Gaina

Anatolie Gaina

Berlin

Summary

Experienced Data Scientist with 17 years of expertise in developing and implementing data-driven solutions across various industries, including automotive, aerospace, e-commerce, and finance. Highly skilled in programming and big data technologies, with 9 years of hands-on experience using Hadoop and Spark, and 3 years with Microsoft Azure and Databricks. Proficient in leveraging AI, machine learning, and generative AI techniques to drive business insights and innovation. Known for mentoring junior colleagues and fostering collaborative team environments. Proven track record in developing scalable applications for data analytics, processing complex datasets, and delivering actionable insights. Participated in Spark Summit Europe in 2015, 2017, and 2019.

Overview

10
10
years of professional experience

Work History

Senior Data Scientist

MSX International Inc.
11.2021 - 10.2024
  • Developed and optimized big-data pipelines and data models, resulting in a 12.4% improvement in data processing efficiency for car repair workshops.
  • Created a parts recommendation system that improved repair cost estimations by 6.98%, significantly enhancing operational performance and cost savings.
  • Designed and implemented generative AI techniques to automate data augmentation processes, leading to a [specific %] increase in model accuracy and robustness, while reducing manual intervention.
  • Led the development of a novel clustering tool, empowering business analysts to identify and rank clusters based on objective business criteria, streamlining data-driven decision-making.
  • Delivered end-to-end data science solutions, from problem identification and solution design to full implementation, improving both business and technical outcomes.
  • Collaborated with cross-functional teams to deliver high-impact projects under tight deadlines, recognized for problem-solving, initiative, and resilience in high-pressure environments.
  • Mentored junior data scientists, fostering a culture of collaboration and continuous learning, while leading efforts to standardize data science best practices within the team.
  • Praised for strong programming skills and ownership of complex data science tasks, delivering consistent value through exploratory data analysis, modeling, and AI-driven automation.

Senior Data Engineer

Thinkport GmbH
05.2021 - 07.2021
  • Migrated structured data from relational databases (e.g., SQL Server) to a Hadoop-based Data Lake within an AWS environment, improving scalability and accessibility for large-scale data processing.
  • Developed ETL processing applications, automating data ingestion and transformation, which enhanced data workflow efficiency and ensured seamless integration of large datasets.
  • Achieved AWS certifications: Earned AWS Certified Cloud Practitioner and AWS Certified Machine Learning – Specialty, deepening expertise in cloud-based data solutions and machine learning applications in an AWS environment.

Freelancer / Big Data Developer

Home Office
01.2020 - 04.2021
  • Designed and built a modern big data science analytics platform, enabling the processing and analysis of large datasets across numerical, categorical, textual, and mixed data types.
  • Developed custom algorithms for the automatic labeling of diverse data types, streamlining data preprocessing and improving the efficiency of machine learning workflows.
  • Applied generative AI techniques to create synthetic data for training and testing purposes, significantly enhancing the robustness and generalizability of data models, leading to improved performance in real-world scenarios.
  • Collaborated with clients remotely, delivering scalable data analytics solutions tailored to specific business needs, supporting decision-making and innovation in various industries.

Senior Data Scientist / Big Data Developer

KPMG AG Wirtschaftsprüfungsgesellschaft
08.2017 - 12.2019
  • Automated key audit processes, reducing manual effort by 7.9%, which significantly improved the efficiency and accuracy of audits.
  • Performed advanced text analytics on large audit document corpora, extracting key insights and automating the identification of relevant information, resulting in a considerable reduction in audit processing time.
  • Developed custom clustering techniques to group audit cases based on similarity, enabling more efficient decision-making and enhanced audit quality.
  • Applied generative AI principles to synthesize audit data, generating diverse scenarios for better model validation and more accurate auditing outcomes.
  • Led the implementation of distributed algorithms using Apache Spark, solving complex data problems involving large and high-dimensional datasets, which improved audit automation and data analysis processes.
  • Mentored junior data scientists, sharing knowledge on advanced mathematical and algorithmic concepts, contributing to a collaborative and productive team environment.
  • Praised for technical expertise in solving domain-specific engineering challenges, with a skill set described as "robust to the current LLM hype" by senior management.

Senior Data Scientist / Big Data Engineer

Altran Deutschland S.A.S. & Co. KG
09.2016 - 06.2017
  • Analyzed flight data for Airbus Civil, developing high-accuracy probabilistic models that substituted complex engineering calculations, accelerating aircraft development timelines.
  • Migrated large-scale datasets from traditional data sources (SQL Server, SAP, SAP Hand) to a Hadoop-based Data Lake, improving data accessibility and enabling high-volume analytics for engineering teams.
  • Developed clustering models to analyze load-stress data (moments and forces) for different wing elements at various flight stages, contributing to more efficient design and performance testing.
  • Performed structural capability envelope analysis using advanced techniques such as pipeline clustering, decision trees, and multiple linear regression, providing engineers with actionable insights into structural performance.
  • Predicted Reserve Factors using Random Forest classification, allowing the engineering team to make data-driven decisions that enhanced safety and minimized risks during the development of new aircraft.
  • Built and maintained a scalable big data lake, developing multiple ETL pipelines to ensure seamless data migration and consistency between relational databases and the data lake.
  • Collaborated with cross-functional engineering teams to deliver data-driven insights that improved aircraft design, safety, and performance, supporting critical decisions throughout the development cycle.

Data Scientist / Big Data Developer

Exactag GmbH
01.2016 - 07.2016
  • Developed predictive models to estimate the probability of customer purchases based on their behavior in online journeys, contributing to improved targeting strategies and increased conversion rates in e-commerce.
  • Analyzed large datasets from e-commerce and web advertisement platforms, identifying key patterns and actionable insights that optimized customer acquisition and marketing efforts.
  • Performed advanced data analysis on categorical and mixed data, uncovering trends and customer behaviors that informed decision-making across marketing campaigns.
  • Conducted correlation analysis on heterogeneous data, revealing relationships between different customer segments, product interactions, and web journeys, enhancing overall business intelligence.
  • Addressed missing data issues by applying imputation techniques, ensuring the accuracy and reliability of predictive models and data-driven decisions.

Data Scientist / Big Data Developer

AG Computer SRL
08.2014 - 01.2016
  • Led biomedical data analysis projects, including the detection of stress levels in truck drivers through ECG data analysis, and the development of optimal treatment strategies for various diseases using supervised learning techniques.
  • Developed and implemented the MaltClusterer algorithm in Spark, optimizing clustering processes for large-scale datasets and improving the efficiency of pattern recognition across biomedical and other industries.
  • Designed the architecture for a Big Data Mining Portal, facilitating the processing, analysis, and visualization of vast amounts of data for research and business applications.
  • Created specifications for a General Scheme of Clustering, standardizing clustering processes to streamline data analysis workflows and improve consistency in data-driven insights.

Education

Professional Certificate - AWS Certified Cloud Practitioner

AWS
Online
07.2021

Professional Certificate - AWS Certified Machine Learning - Specialty

AWS
Online
07.2021

Certificate - Data Mining

University Of Illinois At Urbana-Champaign
Champaign And Urbana
2019

Certificate - Executive Data Science

Johns Hopkins University
Baltimore
2016

PhD Candidate - Pattern recognition and machine learning

Moldavian Academy of Sciences
Moldova
1989

Master of Science - Mathematics - Numerical Analysis

Moldova State University
Moldova
09.1982

Skills

  • Generative AI: In-depth understanding of generative AI concepts applied in data synthesis, augmentation, and predictive modeling Proficient in frameworks such as GANs (Generative Adversarial Networks) and variational autoencoders (VAEs)
  • Data Science and Big Data: Expert in clustering, classification, regression, outlier detection, and fraud detection Strong experience with dimension reduction and parallel computing, particularly using Apache Spark
  • Machine Learning & Data Mining Tools: Extensive experience with MLlib, PySpark, R, Pandas, SciKit-Learn, and SPSS for building and optimizing machine learning models
  • Deep Learning & NLP Frameworks: Skilled in PyTorch, TensorFlow, SparkNLP, and Gensim for developing deep learning models and natural language processing applications
  • Programming Languages: Highly proficient in Scala, Python, SQL, Java, and Delphi, with experience in developing scalable applications
  • Big Data Sources & Databases: Experienced with HDFS, Hive, Impala, Apache Parquet, ORC, as well as SQL databases such as Oracle, SQL Server, PostgreSQL, and MySQL
  • Cloud Platforms & Big Data Frameworks: Expertise in AWS, Microsoft Azure, Databricks, and Cloudera for deploying and managing large-scale data solutions
  • Business Intelligence Tools: Proficient in Power BI, Tableau, and QlikView for data visualization and reporting

Timeline

Senior Data Scientist

MSX International Inc.
11.2021 - 10.2024

Senior Data Engineer

Thinkport GmbH
05.2021 - 07.2021

Freelancer / Big Data Developer

Home Office
01.2020 - 04.2021

Senior Data Scientist / Big Data Developer

KPMG AG Wirtschaftsprüfungsgesellschaft
08.2017 - 12.2019

Senior Data Scientist / Big Data Engineer

Altran Deutschland S.A.S. & Co. KG
09.2016 - 06.2017

Data Scientist / Big Data Developer

Exactag GmbH
01.2016 - 07.2016

Data Scientist / Big Data Developer

AG Computer SRL
08.2014 - 01.2016

Professional Certificate - AWS Certified Cloud Practitioner

AWS

Professional Certificate - AWS Certified Machine Learning - Specialty

AWS

Certificate - Data Mining

University Of Illinois At Urbana-Champaign

Certificate - Executive Data Science

Johns Hopkins University

PhD Candidate - Pattern recognition and machine learning

Moldavian Academy of Sciences

Master of Science - Mathematics - Numerical Analysis

Moldova State University

Objective

My main objective is to improve the real business by creation of useful tools and solutions. My final products are scalable applications, which perform big data analysis. I can translate business requirements to technical description, distribute the tasks across the team and implement the most complicated part of the project. I can propose the solution as a mixture of fundamental mathematics, modern algorithms and my heuristic methods. My skills as a programmer are equal to data scientist skills. I have a lot of practical knowledge that I want to share with my colleagues.

Languages

  • English (professional proficiency)
  • German (B2)
  • French (B2)
  • Russian (C2)
  • Romanian (mother tongue)

Personal Information

  • Married, 2 children.
  • EU citizenship (Romania).

Target Job

Senior Data Scientist / Team Leader specializing in Big Data, Machine Learning, and AI-driven solutions for scalable business applications.

Objective

To drive business transformation by creating scalable, AI-powered solutions that leverage both traditional and generative AI techniques. I specialize in translating business requirements into technical implementations, leading teams to deliver high-impact projects. My expertise in combining advanced mathematics, modern algorithms, and practical programming allows me to develop applications that perform complex big data analysis and provide actionable insights. I am eager to share my experience and collaborate on innovative projects.

Languages

  • English (professional proficiency)
  • German (B2)
  • French (B2)
  • Russian (C2)
  • Romanian (mother tongue)

Personal Information

  • Married, 2 children.
  • EU citizenship (Romania).

Endorsements

From Dr. Hendrik Thiel, KPMG
"Anatol was part of the data science team that I managed from fall 2017 to March 2020. His work focused on analyzing domain-specific text corpora and solving various engineering problems that arose before transformer models like BERT and its descendants became industry-ready. Anatol possesses a deep understanding of advanced mathematical and algorithmic concepts, which he skillfully applies to solve complex data problems. In summary, Anatol is a highly experienced data scientist with a skill set that is, casually speaking, largely 'robust to the current LLM hype.'"

From MSX International
"Anatol is responsible for advanced analytics, model building, and leveraging data as a corporate asset. He is praised for his specialist knowledge, problem-solving skills, initiative, and resilience under stress. His contributions include creating a novel clustering tool and supporting exploratory data analysis and modeling. Anatol is a great team player, always ready to help, and engaged with the group."

Anatolie Gaina