Tutorials

Tutorial 1: Reference based particle picking

In this tutorial we describe how to use TomoTwin for picking in tomograms using references.

Note

Example Dataset

To check if everything is working you can use our demo for EMPIAR 10499. As for this demo the pixel size is already reasonable, you can skip step 1 of the tutorial. The folder reference_output contains the results when we run it locally. The file run.sh contains all commands we run. The total runtime was ~ 30 minutes on 2 x A100 GPUs.

Download: https

1. Downscale your Tomogram to 10 Å

TomoTwin was trained on tomograms with a pixelsize of 10Å. While in practice we’ve used it with pixel sizes ranging from 9.2Å to 25.0Å, it is probably ideal to run it at a pixel size close to 10Å. For that you may need to downscale your tomogram. You can do that by fourier shrink your tomogram with EMAN2. Lets say you have a Tomogram with a pixelsize of 5.9359Å. The fouriershrink factor is then 10Å/5.9359Å = 1.684

e2proc3d.py --apix=5.9359 --fouriershrink=1.684 your_tomo.mrc your_tomo_a10.mrc

TomoTwin should be used to pick on tomograms without denoising or lowpass filtering. But you may use these tomograms to find the coordinates of your particle of interest for use as a reference. In this case, you should make sure the denoised/lowpass filtered tomogram has the same pixel size as the one you will pick on.

What if my protein is too big for a box size of 37x37x37 pixels?

Because TomoTwin was trained on many proteins at once, we needed to find a box size that worked for all proteins. Therefore, all proteins were used with a pixel size of 10Å and a box size of 37 pixels. Because of this, you must extract your reference with a box size of 37 pixels. If your protein is too large for ths box at 10Å/pix (much larger than a ribosome) then you should scale the pixel size of your tomogram until it fits rather than changing the box size. Likewise if your protein is so small that at 10Å/pix it only fills one to two pixels of the box, you should scale your tomogram pixel size until the particle is bigger, however we’ve found that for proteins down to 100 kDa, 10Å/pix is sufficient for the 37 box.

2. Pick and extract your reference

For the reference based approach you need, of course, references. To pick them follow the next steps:

  1. Open your tomogram in napari

Note

For easy identification of your reference particle we recommend to use low-pass filter to 60Å and/or denoising (be sure it has the same pixel size of the tomogram you will pick on).

napari_boxmanager your_tomo_a10.mrc
  1. Select organize_layer tab of the boxmanager toolkit (lower right corner). Press the button Create particle layer.

  2. Switch to the boxmanager tab and set the boxsize to 37, as this gonna be the box size we will use for extraction later on.

  3. Identify a potential reference, choose the slice so that its centered and pick it by clicking in the center of the particle. Continue doing that until you think you have enough references

Note

Use multiple references per particle class

We recommend to pick multiple (3-4) references per protein of interest, as not all subvolumes work equally well.

Each reference can be later evaluated separately using the boxmanager, allowing you to decide which gives the best result for each protein of interest

  1. Optional: If you want to pick another protein class, we recommend to create a separate particle layer for it (step 2).

  2. To save the reference of the selected particle layer (see layer list in napari), click on File -> Save Selected Layer(s). Create a new folder by right click in the dialog and name it for example ‘coords’. Now select as Files of type the entry Box Manager. Use the filename reference.coords and press Save.

  3. Finally, use the tomotwin_tools.py extractref script to extract a subvolume from the tomogram (the original, not the denoised / low pass filtered) at the coordinates for each reference. If there are multiple references you would like to pick in the tomogram, repeat this process multiple times giving a new output folder each time.

tomotwin_tools.py extractref --tomo tomo/your_tomo_a10.mrc --coords path/to/references.coords --out reference/ --filename protein_a

You will find your extracted references in reference/protein_a_X.mrc where X is a running number.

3. Embed your Tomogram

I assume that you already have downloaded the general model.

To embed your tomogram using two GPUs and batchsize of 256 use:

CUDA_VISIBLE_DEVICES=0,1 tomotwin_embed.py tomogram -m LATEST_TOMOTWIN_MODEL.pth -v your_tomo_a10.mrc -b 256 -o out/embed/tomo/ -s 2

Hint

The batchsize parameter

To have your tomograms embedded as quick as possible, you should choose a batchsize that utilize your GPU memory as much as possible. However, if you choose it too big, you might run into memory problems. In those cases play around with different batch sizes and check the memory usage with nvidia-smi.

Hint

Speed up embedding using a mask

With TomoTwin 0.5, the emedding command supports the use of masks. With masks you can define which regions of your tomogram get actually embedded and therefore speedup the embbeding. We also provide new tools that calculates mask that excludes areas that probably does not contain any protein. You can run it with:

tomotwin_tools.py embedding_mask -i your_tomo_a10.mrc -o out/mask/

The mask you find there can be used when running tomotwin_embed.py using the argument --mask. As this is still experimental, please check if the masks do not exclude any important areas. You can do that easiliy with napari by opening the tomogram and your mask and then change the opacity of your mask:

napari your_tomo_a10.mrc out_mask/your_tomo_a10_mask.mrc

4. Embed your reference

Now you can embed your reference:

CUDA_VISIBLE_DEVICES=0,1 tomotwin_embed.py subvolumes -m LATEST_TOMOTWIN_MODEL.pth -v reference/*.mrc -b 12 -o out/embed/reference/

5. Map your tomogram

The map command will calculate the pairwise distances/similarity between the references and the subvolumes and generate a localization map:

tomotwin_map.py distance -r out/embed/reference/embeddings.temb -v out/embed/tomo/your_tomo_a10_embeddings.temb -o out/map/

6. Localize potential particles

To locate potential particles positions for each target run:

tomotwin_locate.py findmax -m out/map/map.tmap -o out/locate/

Hint

Similarity maps

You can add the option --write_heatmaps to the locate command. If you do this you will find a similarity map for each reference in your_tomo_a10/locate/ - just in case you are interested, this is akin to a location confidence heatmap for each protein.

7. Inspect your particles with the boxmanager

Open your particles with the following command or drag the files into an open napari window:

napari_boxmanager tomo/your_tomo_a10.mrc out/locate/located.tloc
../_images/start.png

The example shown here is from the SHREC competition. In the table on the right you see 12 references. I selected the model_8_5MRC_86.mrc, which is a ribosome. Below the table, you need to adjust the metric min and size min thresholds until you like the results. After the optimization is done the result might look similar to this:

../_images/after_optim.png

In the left panel, select the references you would like to pick (ctrl click on windows, cmd click on mac to select multiple). You can now press File -> Save selected Layer(s). In the dialog, change the Files of type to Box Manager. Choose filename like selected_coords.tloc. Make sure that the file ending is .tloc.

To convert the .tloc file into .coords you need to run

tomotwin_pick.py -l coords.tloc -o coords/

You will find coordinate file for each reference in .coords format in the coords/ folder.

8. Scale your coordinates

After step 7 you have the coordinates for each protein of interest in your tomogram. Assuming you downscaled your tomogram in step 1, you now need to scale your coordinates to the pixel size you would like to use for extraction. Assuming that you would like to extract from tomograms with a pixel size of 5.936 A/pix, then the command would be:

tomotwin_tools.py scale_coordinates --coords coords/your_coords_file.coords --tomotwin_pixel_size 10 --extraction_pixel_size 5.9356 --out multi_refs_0_a5936.coords

Tutorial 2: Clustering based particle picking

1. Downscale your Tomogram to 10 Å

TomoTwin was trained on tomograms with a pixelsize of 10Å. While in practice we’ve used it with pixel sizes ranging from 9.2Å to 25.0Å, it is probably ideal to run it at a pixel size close to 10Å. For that you may need to downscale your tomogram. You can do that by fourier shrink your tomogram with EMAN2. Lets say you have a Tomogram with a pixelsize of 5.9359Å. The fouriershrink factor is then 10Å/5.9359Å = 1.684

e2proc3d.py --apix=5.9359 --fouriershrink=1.684 your_tomo.mrc your_tomo_a10.mrc

TomoTwin should be used to pick on tomograms without denoising or lowpass filtering. But you may use these tomograms for visualizing the picks in Napari. In this case, you should make sure the denoised/lowpass filtered tomogram has the same pixel size as the one you will pick on (downscaling it if necessary).

What if my protein is too big for a box size of 37x37x37 pixels?

Because TomoTwin was trained on many proteins at once, we needed to find a box size that worked for all proteins. Therefore, all proteins were used with a pixel size of 10Å and a box size of 37 pixels. Because of this, you must extract your reference with a box size of 37 pixels. If your protein is too large for ths box at 10Å/pix (much larger than a ribosome) then you should scale the pixel size of your tomogram until it fits rather than changing the box size. Likewise if your protein is so small that at 10Å/pix it only fills one to two pixels of the box, you should scale your tomogram pixel size until the particle is bigger, however we’ve found that for proteins down to 100 kDa, 10Å/pix is sufficient for the 37 box.

2. Embed your Tomogram

I assume that you already have downloaded the general model.

To embed your tomogram using two GPUs and batchsize of 256 use:

CUDA_VISIBLE_DEVICES=0,1 tomotwin_embed.py tomogram -m LATEST_TOMOTWIN_MODEL.pth -v your_tomo_a10.mrc -b 256 -o out/embed/tomo/ -s 2

Hint

The batchsize parameter

To have your tomograms embedded as quick as possible, you should choose a batchsize that utilize your GPU memory as much as possible. However, if you choose it too big, you might run into memory problems. In those cases play around with different batch sizes and check the memory usage with nvidia-smi.

Hint

Speed up embedding using a mask

With TomoTwin 0.5, the emedding command supports the use of masks. With masks you can define which regions of your tomogram get actually embedded and therefore speedup the embbeding. We also provide new tools that calculates mask that excludes areas that probably does not contain any protein. You can run it with:

tomotwin_tools.py embedding_mask -i your_tomo_a10.mrc -o out/mask/

The mask you find there can be used when running tomotwin_embed.py using the argument --mask. As this is still experimental, please check if the masks do not exclude any important areas. You can do that easiliy with napari by opening the tomogram and your mask and then change the opacity of your mask:

napari your_tomo_a10.mrc out_mask/your_tomo_a10_mask.mrc

3. Estimate UMAP manifold and Generate Embedding Mask

Now we will approximate the tomogram embeddings to 2D to allow for efficient visualization. To calculate a UMAP:

tomotwin_tools.py umap -i out/embed/tomo/tomo_embeddings.temb -o out/clustering/

Note

If you encounter an out of memory error here, you may need to reduce the fit_sample_size and/or chunk_size values (default 400,000).

Additionally, it generated a mask (tomo_embeddings_label_mask.mrci) of the embeddings to allow us to track which UMAP values correspond to which points in the tomogram.

4. Load data for clustering in Napari

Now that we have all the input files for the clustering workflow we can get started in Napari. First open your tomogram and the embedding mask by:

napari your_tomo_a10.mrc out/clustering/your_tomo_a10_embedding_label_mask.mrci

Next open the napari-tomotwin clustering tool via Plugins -> napari-tomotwin -> Cluster UMAP embeddings. Then choose the Path to UMAP by clicking on Select file and provide the path to your your_tomo_a10_embeddings.tumap. Click Load and after a second, a 2D plot of the umap embeddings should appear in the plugin window.

5. Find target clusters

The next step is to generate potential targets from the 2D umap using the interactive lasso (freehand) tool from the napari-clusters-plotter.

Check out the video demo of selecting clusters

Outline a set of points in the 2D plot and these points will become highlighted in your tomogram. To select multiple targets at once hold Shift when outlining points.

../_images/img1.png

Use log scale to see weak clusters

When the abundance of the protein is low, the clusters are often difficult to detect. Using a log scale for the plot may show clusters that are otherwise difficult to spot. To activate the log scale click on Advanced settings Log scale.

Alternatively you can click in the tomogram and a small red circle appears around the embedding for this position in the tomogram.

../_images/img3.png

You can use the mag icon to change the displayed area/zoom and the Home icon to reset it.

../_images/img2.png

Improved centering

When generating targets to pick large proteins, it is best to outline points that only lay in the center of your protein rather than covering the entire protein. Note that due to the way embeddings are generated from the tomogram, this likely won’t be in the center of the cluster. This will help ensure that your resulting picks are centered.

../_images/img4.png

6. Save target clusters

Once you have outlined a target cluster for each protein of interest, it is time to save these targets to be used as picking references in this and additional tomograms.

This can be done with Plugins -> napari-tomotwin -> Save cluster targets and providing an output directory cluster_targets.temb will be written.

7. Map your tomogram

The map command will calculate the pairwise distances/similarity between the targets and the tomogram subvolumes and generate a localization map:

tomotwin_map.py distance -r out/clustering/cluster_targets.temb -v out/embed/tomo/your_tomo_a10_embeddings.temb -o out/map/

8. Localize potential particles

To locate potential particles positions for each target run:

tomotwin_locate.py findmax -m out/map/map.tmap -o out/locate/

Hint

Similarity maps

You can add the option --write_heatmaps to the locate command. If you do this you will find a similarity map for each reference in your_tomo_a10/locate/ - just in case you are interested, this is akin to a location confidence heatmap for each protein.

Open your particles with the following command or drag the files into an open napari window:

napari_boxmanager tomo/your_tomo_a10.mrc out/locate/located.tloc
../_images/start.png

The example shown here is from the SHREC competition. In the table on the right you see 12 references. I selected the model_8_5MRC_86.mrc, which is a ribosome. Below the table, you need to adjust the metric min and size min thresholds until you like the results. After the optimization is done the result might look similar to this:

../_images/after_optim.png

In the left panel, select the references you would like to pick (ctrl click on windows, cmd click on mac to select multiple). You can now press File -> Save selected Layer(s). In the dialog, change the Files of type to Box Manager. Choose filename like selected_coords.tloc. Make sure that the file ending is .tloc.

To convert the .tloc file into .coords you need to run

tomotwin_pick.py -l coords.tloc -o coords/

You will find coordinate file for each reference in .coords format in the coords/ folder.

9. Scale your coordinates

After step 7 you have the coordinates for each protein of interest in your tomogram. Assuming you downscaled your tomogram in step 1, you now need to scale your coordinates to the pixel size you would like to use for extraction. Assuming that you would like to extract from tomograms with a pixel size of 5.936 A/pix, then the command would be:

tomotwin_tools.py scale_coordinates --coords coords/your_coords_file.coords --tomotwin_pixel_size 10 --extraction_pixel_size 5.9356 --out multi_refs_0_a5936.coords