Summary
Overview
Work History
Education
Skills
Languages
Additional Information
Timeline
Generic
Mohd Saif Khan

Mohd Saif Khan

Bonn

Summary

I'm a data scientist with 5 years of experience, skilled in carrying out projects using various technical tools. My background includes a strong focus on machine learning and software development. I can skillfully analyze data, create visualizations, and predict outcomes. I have also successfully led projects in Natural Language Processing, Large Language Models, and deep learning.

Overview

9
9
years of professional experience

Work History

Machine Learning and NLP Engineer

SER group
05.2022 - Current
  • Optimized data annotation processes, improving overall efficiency and quality of training data sets used for model development.
  • Fine-tuned and deployed NLP models by leveraging libraries such as Hugging Face, flair, spacy and NLTK
  • Deployed open source LLM models like Falcon, Llama-2, Mistral, etc
  • Engineered and deployed no-code system for training and deploying deep learning models on customer data
  • Implemented sophisticated Retrieval-Augmented Generation (RAG) pipelines using Langchain

Master Thesis and Working Student

German Aerospace Center, DLR
06.2021 - 03.2022
  • Conducted a thesis project focused on classifying and tagging sounds to study their impact on sleep using audio analysis and machine learning techniques
  • Developed a machine learning pipeline to automate the classification and tagging of audio data, eliminating the need for manual intervention
  • Achieved significant time savings by automating the audio analysis and tagging process
  • Optimized data collection methods to reduce analysis errors and improve data accuracy

Student – Data Scientist

Berlin University of Economics and Law
01.2021 - 03.2022
  • Responsible for cleaning the datasets and creating NLP transformer models to classify and summarize text sequences
  • SumExp: a new summary-based Explainable AI framework to explain the output of an NLP model in classification tasks (ICIS Publication)

Working Student-Software Developer

Friedrich-Schiller, Universität Jena
11.2020 - 06.2021
  • Developed a docker cluster for easy initialization and startup of the ODK applications
  • Added support for Semantic ontologies to platforms of ODK and KOBO to make collected data reusable.

Software Developer

Salesforce
08.2019 - 02.2020
  • Developed the framework in Spring using an in-house ORM layer
  • Responsible for developing a new ‘Cleanup Jobs Framework’ for Commerce cloud application to improve the performance of the database.

Data Scientist and Software Developer

TATA Consultancy Services
07.2015 - 02.2019
  • Built and deployed an NLP-powered sales dashboard using Named-Entity Recognition to enable natural language query search
  • Created REST microservices in Spring Boot to integrate the sales dashboard with Cassandra DB and Apache Lucene
  • Led a team and used Oracle SOA suite to connect various distributed systems using SOAP that served millions of users
  • Utilized Oracle DB 11g, PL/SQL, and Oracle CRM to develop various PAN India modules that reached thousands of users.

Education

Master of Science - Human-Computer Interaction

Bauhaus University
Weimar, Germany
03.2022

Bachelor of Science - Electronics and Communication

Jamia Millia Islamia
New Delhi, India
06.2015

Skills

  • Machine Learning: Exploratory Data Analysis, Natural Language Processing, Large Language models (OpenAI and Open-source), Retrieval-Augmented Generation (RAG)
    ML tools: MLflow, DVC, Airflow, Apache Spark
  • Programming Languages: Python, JAVA, C, PL/SQL
  • Cloud Services: AWS, Azure
  • ML Frameworks: Pytorch, Sklearn, Spacy, Flair, HuggingFace, OpenAI, Langchain
  • Databases: Oracle 12C, PostgreSQL, MongoDB
  • Vector Database: Chroma, Elasticsearch, Pgvector
  • Others: Kubernetes, Git, Docker, HPC, Kanban, JIRA

Languages

  • English (Business Fluent)
  • German (A2 level)

Additional Information

  • Hiking
  • Basketball
  • Reading Books
  • Calligraphy

Timeline

Machine Learning and NLP Engineer

SER group
05.2022 - Current

Master Thesis and Working Student

German Aerospace Center, DLR
06.2021 - 03.2022

Student – Data Scientist

Berlin University of Economics and Law
01.2021 - 03.2022

Working Student-Software Developer

Friedrich-Schiller, Universität Jena
11.2020 - 06.2021

Software Developer

Salesforce
08.2019 - 02.2020

Data Scientist and Software Developer

TATA Consultancy Services
07.2015 - 02.2019

Master of Science - Human-Computer Interaction

Bauhaus University

Bachelor of Science - Electronics and Communication

Jamia Millia Islamia
Mohd Saif Khan