Summary
Overview
Work History
Education
Skills
Personal Information
Awards
Publications
Timeline
Generic

Yves Gaetan Nana Teukam

Summary

Innovative and detail-oriented Data Scientist with 4 years of experience in developing AI/ML-based tools for complex biological problems. Soon to complete a PhD in Language Modeling for Protein Design at IBM Research Zürich and Eindhoven University of Technology. Expert in machine learning, generative modeling, NLP, and bioinformatics, with a proven track record of enhancing model performance and optimizing biomolecules for diverse applications. Skilled in compiling, transforming, and analyzing complex datasets using advanced statistical and computational techniques. Demonstrated success in mentoring students, contributing to open-source projects, and advancing scientific research. Fluent in English, French, Italian, and Spanish, with a strong commitment to driving innovative projects and achieving team goals.

Overview

5
5
years of professional experience

Work History

Pre Doctoral Researcher

IBM Research - Zürich, Switzerland
01.2022 - Current
  • Developed Enzeptional, an AI/ML-based computational tool for biological process modeling
  • Leveraged large language models and evolutionary algorithms to optimize biomolecules for different processes from drug design to green chemistry
  • Contributed to GT4SD, an open source library for training and fine-tuning generative models to accelerate scientific discovery
  • Developed RXNAAMapper for protein active sites predictions, leveraging language models (BERT family)
  • Improved model performance from 40% to 52% while reducing the false positive rate by 30%.

Research Intern

IBM Research - Zürich, Switzerland
02.2021 - 07.2021
  • Contributed to an approach for synthesis planning that integrates biocatalysis with data-driven learning for more efficient and sustainable chemical synthesis
  • Achieving a top-1 accuracy of 49.6% in forward predictions
  • Utilized advanced Python tools for data analysis and modeling such as: Pandas, NumPy, SciPy, TensorFlow, Keras, and Biopython.

Data Science and Bioinformatics Project Lead

StemAway - California, USA
05.2020 - 09.2020
  • Led a group of 30 students coming from different countries and with different academic backgrounds through different stages of gene expression analysis
  • From data collection, coding in python and R
  • By the end of the summer, more than 75% of the students were able to independently query public databases and perform gene expression analysis using R and Python.

Research Intern

Sequentia Biotech - Barcelona, Spain
04.2019 - 07.2019
  • Achieved 90% accuracy in classifying individual microbiomes applying machine learning techniques to large-scale biological data sets
  • Conducted human gut microbiome analysis using bioinformatics tools.

Education

PhD In Language Modelling For Protein Design -

IBM Research Zürich & Eindhoven University Of Technology
03.2025

Master's Degree in Data Science -

University Of Rome La Sapienza - Rome, Italy
10.2021

Bachelor's Degree In Bioinformatics -

University Of Rome La Sapienza - Rome, Italy
06.2019

Exchange Program Erasmus -

ESCI-UPF - Barcelona, Spain
02.2019

Skills

  • Python, R, Bash, Linux
  • Git, GitHub, GitLab
  • Statistical Analysis, Data Mining, Visualization
  • Machine Learning, Deep Learning
  • Generative Modeling, NLP
  • Protein Optimization, Bioinformatics
  • Molecular Dynamics (Gromacs)
  • Computational Biology, Evolutionary Algorithms
  • Data Processing, Analysis
  • Object-Oriented Programming
  • Experiment Design, Scientific Writing
  • Experiment Tracking (MLflow, Weights & Biases)

Personal Information

  • Date of Birth: 06/25/1996
  • Nationality: Cameroonian
  • Work Permit: B

Awards

  • 1st IEEE Open Software Service Awards as part of the GT4SD team., 2023
  • Sandmeyer Award of the Swiss Chemical Society as part of the RXN for Chemistry project team., 2022

Publications

  • Generative Toolkit for Scientific Discovery, Manica M., Cadow J., Christofidellis D., Dave A., Bon J., Clarke., Nana Teukam Y.G., npj Comput Mater, 9, 69, 2023, https://doi.org/10.48550/arXiv.2207.03928
  • Language models can identify enzymatic active sits in protein sequences, Nana Teukam YG, Kwate Dassi L, Manica M, Probst D, Schwaller P, Laino T., ChemRxiv, 2023, doi:10.26434/chemrxiv-2021-m20gg-v2, (Preprint)
  • Biocatalysed synthesis planning using datadriven learning, Probst, D., Manica, M., Nana Teukam, Y.G., Nat Commun, 13, 964, 2022, https://doi.org/10.1038/s41467-022-28536-w

Timeline

Pre Doctoral Researcher

IBM Research - Zürich, Switzerland
01.2022 - Current

Research Intern

IBM Research - Zürich, Switzerland
02.2021 - 07.2021

Data Science and Bioinformatics Project Lead

StemAway - California, USA
05.2020 - 09.2020

Research Intern

Sequentia Biotech - Barcelona, Spain
04.2019 - 07.2019

PhD In Language Modelling For Protein Design -

IBM Research Zürich & Eindhoven University Of Technology

Master's Degree in Data Science -

University Of Rome La Sapienza - Rome, Italy

Bachelor's Degree In Bioinformatics -

University Of Rome La Sapienza - Rome, Italy

Exchange Program Erasmus -

ESCI-UPF - Barcelona, Spain
Yves Gaetan Nana Teukam