Gwang Jin (Josephus) Kim-Data Scientist / Applied AI, Automation & Data Systems Researcher
Check rate
Experience
Data Scientist / Applied AI, Automation & Data Systems Researcher
Independent
- Built and explored applied GenAI, RAG, GraphRAG, local LLM, agentic AI and document-intelligence prototypes for structured analysis, evidence extraction, semantic search, technical reasoning and decision-useful reporting
- Developed private local-LLM workflows and AI system patterns focused on privacy, reproducibility, reviewability, low-cost inference and practical user control
- Built reproducible Python/R workflows for data analysis, automation, API-driven tooling, validation logic, technical documentation and AI-assisted software development
- Designed workflows around explicit assumptions, traceable inputs, reviewable outputs and failure-mode awareness rather than black-box “looks good” demonstrations
- Supported RAHN AG in a chemical/regulatory environment with data extraction and processing around WERCS, a regulatory application for chemical product and compliance data
- Explored complex application/database schemas and wrote nested SQL queries to extract information for mixture calculations, component relationships, regulatory rules and reporting logic
- Continued hands-on development in Git/GitHub/GitLab/Bitbucket, Docker/Linux deployment patterns, REST/API workflows, error handling, technical writing and fast AI-assisted prototyping
- Built technical writing and documentation workflows that turn complex systems into clear runbooks, checklists, decision notes and user-facing explanations
Associate Expert Digital Solutions
Novartis AG (via Actalent / Allegis Group)
- Built machine-learning and deep-learning models for bioprocess and spectral data, including CNN-based prediction of analytical readouts from Raman/spectral inputs under realistic small-data constraints
- Designed validation logic, compared model behaviour, analyzed prediction errors and treated model failures as useful signals for data, preprocessing and experimental limitations
- Developed Python, Streamlit, Flask and REST-API-based tools for process monitoring, data retrieval, visualization, reporting and operational decision support
- Created a Python wrapper for an internal REST API, making live process and bioreactor data easier to access, inspect, analyze, reuse and explain
- Automated data extraction, cleaning, reporting, backup and monitoring workflows using Python, PowerShell, Bash, SQL and structured documentation
- Worked in a regulated biopharma R&D / antibody-production environment where model outputs, dashboards and data workflows had to be understandable, traceable and useful to technical users
- Supported database-backed application/data workflows around Lucullus, a bioreactor software system used to collect sensor data and support on-site bioreactor operations
- Worked closely with wet-lab, dry-lab, IT, automation and support teams to clarify workflow problems, validate data behaviour and convert analytical needs into usable tools
- Helped colleagues and senior data scientists with R/Python scripting, deep learning, model interpretation, practical validation and translating modeling ideas into workflows others could use
Postdoctoral Bioinformatician / Data Scientist
University of Freiburg, SFB 992 “Medical Epigenetics”
- Led end-to-end analysis of large biomedical datasets, including single-cell RNA-seq, bulk RNA-seq, ChIP-seq and ATAC-seq: QC, preprocessing, differential analysis, annotation, visualization, interpretation and publication support
- Built reproducible Linux/HPC pipelines for large datasets using Bash, R/Bioconductor, Python, Git, conda environments, Make-style automation and job schedulers
- Processed structured and semi-structured data from sample sheets, genome annotations, result tables, metadata, large text/annotation files and tool outputs
- Developed and evaluated ML-oriented approaches for genomics questions, including feature engineering, dimensionality reduction, statistical validation, failure-mode analysis and careful interpretation
- Designed analysis strategies where conclusions had to survive QC, biological plausibility checks, alternative explanations and collaborator review
- Advised professors, postdocs, PhD students, MSc students and medical students on analysis design, validity limits, reproducibility, troubleshooting and defensible interpretation
- Wrote practical HPC, pipeline and workflow documentation that improved usability beyond my own immediate research group
- Contributed to peer-reviewed publications, including work featured as a Nature Cell Biology cover
Molecular Biology Researcher – PhD, Molecular Medicine
University of Freiburg
- Conducted publication-grade research in human genetics, developmental biology, disease mechanisms, regulatory genomics, molecular biology and experimental model systems
- Worked with structured experimental documentation, sequence/annotation resources, experimental datasets, control logic, failure sources and reproducible interpretation
- Built the scientific foundation I still use in ML work: experimental design, controls, causality, confounding, data quality, failure modes and cautious interpretation
- Worked on gene regulation, Dicer/SOX9/SOX8-related biology and genetic disease contexts connected to high-impact publications
Master Student and Assistant Student Worker, Molecular Medicine
University of Freiburg
- Built early quantitative and experimental discipline through biomathematics / biostatistics, qRT-PCR, genotyping, recombinant virology, immunohistology, primary cell culture and structured data interpretation
- Worked in genetics, internal medicine and virology research environments with a strong focus on evidence, controls, data quality and reproducibility
Industry Experience
See where this freelancer has spent most of their professional time.
Experienced in Education, Biotechnology, Chemical, Information Technology, and Pharmaceutical.
Business Area Experience
See which departments and functions this freelancer has contributed to most.
Experienced in Research and Development, Information Technology, and Quality Assurance.
Summary
Applied Machine Learning Engineer and PhD-trained natural scientist with 10+ years of data-intensive modeling, scientific computing, automation and machine-learning experience, including direct Novartis R&D work in a regulated biopharma environment. Strong hands-on background in Python, deep learning, CNNs, model evaluation, data pipelines, experimental design, failure analysis, scientific validation and production-adjacent digital tooling.
Strong fit for Vision-Language Model / multimodal AI projects where models must be evaluated carefully, improved iteratively and connected to real-world use cases rather than treated as impressive demos. My core strength is the combination of ML implementation, data-centric experimentation, benchmark design, robustness thinking, and the scientific habit of asking why a model fails before claiming that it works.
For FRATCH’s conversational driving / parking use case, I would position myself as an applied ML builder who can support VLM-based solutions, datasets, evaluation frameworks, prompting / fine-tuning strategies, failure analysis, automated testing, and collaboration with integration or domain engineers. I have not worked in automotive ADAS directly, but I have repeatedly worked where sensor-like data, complex systems, model reliability and domain constraints must meet.
Skills
Applied Ml / Deep Learning: Supervised Learning, Cnns, Classification, Regression, Model Adaptation, Model Evaluation, Failure Analysis, Error Taxonomy, Small-Data Modeling, Robustness Checks And Cautious Interpretation
Vision / Multimodal Ai Readiness: Computer-Vision Fundamentals, Image/Signal-Style Data Thinking, Multimodal Ai, Vision-Language Models, Foundation-Model Evaluation, Prompt-Based Adaptation, Fine-Tuning Concepts And Model-Output Validation
Evaluation And Experimentation: Benchmark Design, Experiment Tracking Discipline, Ablation-Style Thinking, Performance Metrics, Edge-Case Discovery, Failure-Mode Analysis, Model Comparison And Data-Centric Improvement Loops
Data Engineering For Ml: Dataset Construction, Preprocessing, Annotation-Aware Thinking, Data Validation, Feature Engineering, Versioned Inputs, Reproducible Pipelines, Automated Reporting And Quality-Control Gates
Python Ml Implementation: Python, Pytorch, Tensorflow/Keras, Scikit-Learn, Xgboost, Pandas/Numpy, Jupyter, Streamlit/Flask, Rest Apis, Git, Linux And Reproducible Environments
Llm / Genai Systems: Llms, Rag, Graphrag, Embeddings, Document Intelligence, Prompt Engineering, Context Engineering, Structured Outputs, Hallucination Reduction And Reviewable Ai Workflows
Real-World Model Reliability: Translating Ambiguous Domain Requirements Into Measurable Tests, Identifying Data Gaps, Documenting Assumptions, Explaining Uncertainty And Building Tools Others Can Inspect And Reuse
Cross-Functional Delivery: Working With Scientists, Engineers, It/Support Teams And Domain Experts; Translating Domain Needs Into Technical Concepts, Experiments, Reports And Usable Tools
Machine Learning / Deep Learning: Deep Learning, Cnns, Classification, Regression, Supervised Learning, Model Validation, Error Analysis, Robustness Checks, Explainability, Uncertainty Communication, Small-Data Modeling, Scikit-Learn, Xgboost, Pytorch, Tensorflow/Keras
Vision / Multimodal Ai: Vision-Language Models, Multimodal Ai, Computer-Vision Fundamentals, Image/Signal-Style Data, Foundation Models, Prompt-Based Adaptation, Fine-Tuning Concepts, Model-Output Evaluation, Edge-Case Analysis
Evaluation / Experimentation: Benchmark Design, Datasets For Model Assessment, Experiment Design, Metrics, Failure Taxonomies, Data-Centric Improvement, Validation Gates, Reproducible Reports, Automated Evaluation Concepts
Data Engineering: Dataset Construction, Data Cleaning, Preprocessing, Annotation-Aware Workflows, Metadata Handling, Feature Engineering, Data Quality, Versioned Inputs, Pipelines, Structured Exports, Monitoring-Ready Workflows
Programming: Python, R, Sql, Bash, Powershell, Git, Linux, Macos, Windows, Common Lisp, Julia Exposure, Javascript Exposure
Python Ml/Data Stack: Pandas, Numpy, Scipy, Matplotlib, Plotly, Jupyter, Scikit-Learn, Xgboost, Pytorch, Tensorflow/Keras, Streamlit, Flask, Rest Apis
Llm / Genai Systems: Llms, Genai, Rag, Graphrag, Embeddings, Semantic Search, Prompt Engineering, Context Engineering, Local Llms, Document Intelligence, Structured Outputs, Hallucination Reduction, Ai Evaluation
Databases / Structured Data: Sql, Relational Databases, Postgresql/Mysql/Mariadb, Sql Server Exposure, Schema Exploration, Joins, Nested Sql, Data Extraction, Data-Quality Checks And Reporting Workflows
Deployment-Adjacent Tooling: Docker Exposure, Linux Deployment Patterns, Reproducible Environments, Rest/Api Integration, Logging-Oriented Thinking, Monitoring Logic, Technical Documentation, Ci/Cd Exposure
Languages
Education
University of Freiburg
PhD, Molecular Medicine · Molecular Medicine · Freiburg im Breisgau, Germany
University of Freiburg
MSc / Diploma, Molecular Medicine · Molecular Medicine · Freiburg im Breisgau, Germany
Certifications & licenses
Agile Software Development: Scrum For Developers
LinkedIn Learning
Data Analysis With Python
freeCodeCamp
Javascript Algorithms And Data Structures
freeCodeCamp
Neo4j Graph Data Science Certification
Neo4j
Scientific Computing With Python
freeCodeCamp
Statistics
Experience
Global Experience
Expertise
Qualifications
Profile
Frequently asked questions
Have questions? Find more information here.
Average rates for similar positions
Rates are based on recent contracts and do not include FRATCH margin.
Similar Freelancers
Discover other experts with similar qualifications and experience
Experts recently working on similar projects
Freelancers with hands-on experience in comparable project as a Data Scientist / Applied AI, Automation & Data Systems Researcher
Nearby freelancers
Professionals working in or nearby Zürich, Switzerland
