A Little About Me

Welcome to my personal website !

I am a 4th-year cs phd student at Kim’s lab at the University of Toronto, specializing in AI for science. My research focuses on using generative AI to design pharmaceutical peptide and protein drugs, with projects including D-peptide drugs and epitope-specific antibodies for applications such as wet-lab validated GLP-1 agonists and SARS-CoV-2 treatments.

My work involves in integrating computational and experimental approaches to tackle critical biomedical problems. Over the past three years, I have developed various deep learning models for computational biology, including flow-matching models, score-based diffusion models, generative adversarial networks, transformers, and reinforcement learning, using PyTorch and TensorFlow. I apply these cutting-edge gnerative AI to design and validate novel therapeutics, aiming to significantly impact patient care and therapy development.

Check out my publication page for more details.

Talks & Research Updates

Check out my new paper using diffusion model for peptide drug design as well as our designed Art Work Which Was Selected as Cover Figure for ACS Science (IF 18.2). On the right side, there was my Recorded Virtual Talk on MLSB NeurIPS regarding peptide design using generative model (HelixGAN)

             


Check out my recent work on Flow-matching model using E3NN to direct monitering atom movements. Flow Matching (FM) is a recent generative modeling approach gaining popularity in the ML community. It combines elements from flow models and diffusion models (DMs), addressing their drawbacks. The Gif for Flow-matching Process inide HelixFlow Model (14, 16, 18, 20 AA)

            

          


A pratice talks cover three my recent works (helixDiff, HelixFlow, and ABGM) on May 2024 and a visulization for new antibody project using flow-matching.

Watch the video          


Resume

Xuezhi Xie


PhD student, Department of Computer Science, University of Toronto

Education


  • Ph.D in Computer Science, University of Toronto, 2020 - 2024 (expected)
  • M.S. in Computer Science, specialization in Artificial Intelligence, Western University, 2018 - 2020
  • B.S. in Biology, Minor in Computer science, University of Waterloo, 2012 - 2016

Skills


  • Languages: Java, Python, C, C#, C++, JavaScript
  • Software: Eclipse, Jupyter, Visual Studio/Code
  • Database: SQL, MongoDB
  • Frameworks: Keras, Pytorch, Tensorflow, PySpark, Hadoop MapReduce
  • System: Linux, Windows
  • Version Control: Git/GitHub
  • Machine learning: Proficient in deep generative models (diffusion models, gan, VAE), CNN, LSTM, logistic regression, support vector machine, decision tree, random forest, GBDT, naive bayes, k-means, PCA, collaborative filtering.
  • Computer vision: Skilled in image processing (OpenCV), object detection, object segmentation, mapping and localization, and SLAM.
  • Natural language processing: Skilled in text classification, sentiment analysis and speech recognition

Work experience


  • Research assistant (machine learning) - University of Toronto, Toronto, On, Canada, 2020 - present
    • Developed different deep generative model for target-pecific drug design, including score-based diffusion model, GAN, and VAE.
    • Developed and implemented various searches and inpainting methods to optimize the synthetic data with optimized pharmetutical properties.
    • Supervisor: Professor Philip M. Kim
  • AI developer & Research assistant - Kaizhong’s lab, Western University, London, On, Canada, Sep.2018 - Apr. 2020
    • Collected peptide data from online databases (IEDB, IPD-MHC). Preprocessed, and analyzed data by using PySpark
    • Implemented and compared decision tree- and neural network-based models for predicting peptide binding.
    • Designed and developed a novel CNN-LSTM model to solve a mhc-ligand binding classification task. Achieved state of the art performance (AUC : 92.3%).
    • Published the work as first author in IEEE-BIBM (2019) and IJDMB(2020).
    • Supervisor: Professor Kaizhong Zhang

Projects


  • Flow-matching model for antibody design (Jan.2024 - Now)
    • Designed flow-matching model with equivariance for full-atom antibody design, and developed an active inpainting model for conditional design based on antigen.
  • Diffusion model for antigen-specific antibody design (Jan.2022 - Dec.2023)
    • Developed a score-based diffusion model named AntibodySGM for antigen specefic antibody design and achived state-of-art performance compared with current deep learning models.
    • Developed and implemented a novel CDR-inpainting module for antigen-specific antibody optimization. Published the work as first author in ICML Workshop (ICML computational biology) 2023. Link for my paper
    • Create an online visulization tools for peptide and protien. Generated antiboy data also included. Link for my visulization tool
  • Deep generative model with constraints for D-peptide drug design (Jan.2019 - Jan.2023)
    • Developed a deep generative model using GAN for novel dextrorotary helical conformations to facilitate the peptide drug design.
    • Developed and implemented various latent search methods for the synthetic data with optimized pharmetutical properties. Published the work as first author in both Bioinformatics (2023) Link for my paper and NeuIPS Workshop (MLSB) 2021. Link for my talk
  • Cell detection application for pathological image analysis (Jan.2019 - April.2020)
    • Designed and developed customized object detection algorithms to detect brain cells and mitosis cells on pathology segmented images and whole slide images by using deep Convolutional Neural Network.
    • Designed and developed Ki-67 cell segmentation and detection algorithm based on cellpose and image classification.
  • Recommendation system for prediction of user purchase behavior (Jan. 2019 - Mar.2019)
    • Classified data as train sets, validation sets and test sets. Constructed features using Pandas. Dealt with positive and negative sample imbalances using k-means and subsample.
    • Used the Gradient Boosting Decision Tree to predict user purchase behavior through model training, parameter tuning and performance evaluation using F1-Score. Ranked 4th in 135 submission teams (Kaggle leaderboard link)
  • Data Mining for Twitter Unstructured Data (Sep.2018 - Dec.2018)
    • Mining interesting information from twitter tweets (JSON). Operated complex and unstructured data using MongoDB. Used MapReduce to process and summarize information. Conducted ElasticSearch. Visualized the information using Kibana.

Publications (First Author only)

jourals

  • Xie et al. “HelixDiff, a Score-Based Diffusion Model for Generating All-Atom α-Helical Structures”(link), ACS Central Science (IF 18.2), 2024. Link for my paper

  • Xie et al. “HelixGAN a deep-learning methodology for conditional de novo design of α-helix structures”(link), Bioinformatics 2023.Link for my paper

  • Xie et al. “MHCherryPan, a novel pan-specific model for binding affinity prediction of class I HLA-peptide” (link), Int. J. Data Mining and Bioinformatics, Vol. 24, No. 3, 2020. Link for my paper

Conferences & workshops

  • Xie et al. “HelixDiff: Conditional Full-atom Design of Peptides With Diffusion Models”(link), Machine Learning for Structural Biology (MLSB) Workshop at NeurIPS 2023 Link for my paper

  • Xie et al. “Antibody-SGM: Antigen-Specific Joint Design of Antibody Sequence and Structure using Diffusion Models”, Computational Biology Workshop at ICML 2023 Link for my paper

  • Xie et al. “HelixGAN: A bidirectional Generative Adversarial Network with search in latent space for generation under constraints”(link), Machine Learning for Structural Biology (MLSB) Workshop at NeurIPS 2021. Link for my paper Link for my talk

  • Xie, et al. “MHCherryPan, a novel model to predict the binding affinity of pan-specific class I HLA-peptide” (link), IEEE International Conference on Bioinformatics and Biomedicine (IEEE - BIBM) 2019. Link for my paper