DiffModeler Tutorial

Video Tutorial

Overview

DiffModeler is a computational tool using a diffusion model to automatically build full protein complex structure from cryo-EM maps at 0-20 Å resolution.

Overall Pipeline

flowchart

  1. Backbone tracing from cryo-EM maps at intermediate resolution via diffusion model.
  2. Single-chain structure prediction by AlphaFold.
  3. Single-chain structure fitting using VESPER.
  4. Protein complex modeling by assembling algorithms.

Input Files

DiffModeler:
  1. 3D cryo-EM map in .mrc format
  2. zipped single-chain pdb files in .zip/.tar.gz/.tar format
  3. pdb config files in .txt format
DiffModeler(seq):
  1. 3D cryo-EM map in .mrc format
  2. zipped single-chain pdb files in .zip/.tar.gz/.tar format
  3. pdb config files in .txt format
  4. Sequence files in .fasta format

Output Files

Full protein complex structure in .cif file

Job Submission

  1. Prepare Input Map
  2. Collect 3D cryo-EM map from microscope in .mrc and .map format.
    Example map
    You can also find many maps in EMDataResource as testing examples.

  3. Prepare Input zip file
  4. Please zip all single-chain pdb files to a .zip/.tar.gz/.tar file to upload. We can also model part of protein complex if you only know partial single-chain structure.
    Please check AlphaFold Database for single-chain structure with UniProt ID.
    You can also search EBI Search Tool against structure database to find most similar structures as template for us to model protein complex.

  5. Prepare Input pdb config file
  6. pdb_config_file is a text file (only .txt file accepted) where each line includes pdb file name in the zipped file and its corressponding chains. It can correspond to many identical chains.
    Example config file:
    Suppose you have p142.pdb (corresponds to A, B chain) and p143.pdb (corresponds C chain) in the zip files, then the config file should be
    p142.pdb A B
    p143.pdb C
    Each line of the config file should be "[file_name] [chain_id1]". If one template corresponds to multiple chains, please simply add the chain id in the same line and split by the blank space.

  7. Prepare Input sequence file(Only if you are using DiffModeler(seq))
  8. Please Please use a sequence file with fasta format. Each chain must have a ID line (begin with a carat (">")) and a SEQUENCE line.
    For ID line, please only include the chain id without any other information. If multiple chains include the identical sequences, please use comma "," to split different chains.
    Example Sequence ID line:
    >A,B,C,D
    MATPAGRRASETERLLTPNPGYGTQVGTSPAPTTPTEEEDLRR
    >E,F
    VVTFREENTIAFRHLFLLGYSDGSDDTFAAYTQEQLYQ
    which indicates 6 chains with A,B,C,D share the identical sequences and E,F share another identical sequences.

  9. Decide the contour level
  10. Please make sure your contour level is lower than your focused region.
    This is absolute density threshold, not standard deviation.
    Please do not input 0, you must provide a contour to remove the outside very noisy regions.

  11. Decide the resolution
  12. For 0-2A resolution, the diffusion process will be skipped.
    Therefore, if you want to use diffusion model, please just use an approximate resolution of your map.

  13. Submit your job
  14. Once you collected the input files, please submit your job here (DiffModeler(seq)->here) For each input field, please input the files/info collected before.

    Step 3 Screenshot

    Once you finished input, simply click the upload button to submit jobs. After submission, you will be redirected to the “view job“ page. If you are not registered, please bookmark the link. Once the job is done, you can view jobs from this link. If you are registered, you will receive email notifications once job is done and you can also check job status from my jobs list under job manager.

  15. View your job results
  16. Once job is done, you can check the modeled structure from the link bookmarked before. Here you can also download the modeled structure in .pdb format by clicking the “Download Outputs” button. You can also visualize the 3D cryo-EM map online to check its consistency with the modeled structure. For more detailed instructions, please see the “Instructions” in the same page.

    results

  17. Submit for backend review (optional)
  18. If you noticed any strange outputs or job failure on your side, please submit a backend review by using the field in the bottom of the “View Job” page. We will get back to you as soon as possible.

    review

Availability (Other)

  1. GitHub

    Full code is available here.

Reference

Wang, X., Zhu, H., Terashi, G., Taluja, M., & Kihara, D. (2024). DiffModeler: Large macromolecular structure modeling for cryo-EM maps using a diffusion model. Nature Methods. https://doi.org/10.1038/s41592-024-02479-0