WEBVTT
00:00.880 --> 00:00:02.446
Hi, I'm Blaine Mooers.
00:00:02.446 --> 00:00:04.160
I'm going to be talking about
00:00:04.160 --> 00:00:07.919
the use of molecular graphics in Org
00:07.919 --> 00:00:08.880
for the purpose of doing
00:00:08.880 --> 00:00:11.840
reproducible research in structural biology.
00:00:11.840 --> 00:00:13.722
I'm an associate professor of biochemistry
00:00:13.722 --> 00:00:15.768
and microbiology at the University of Oklahoma
00:00:15.768 --> 00:00:17.760
Health Sciences Center in Oklahoma City.
00:00:17.760 --> 00:00:19.600
My laboratory uses X-ray crystallography
00:00:19.600 --> 00:00:21.920
to determine the atomic structures
00:00:21.920 --> 00:00:23.439
of proteins like this one
00:00:23.439 --> 00:00:26.080
in the lower left, and of nucleic acids
00:26.080 --> 00:27.840
important in human health.
00:27.840 --> 00:00:29.591
This is a crystal of an RNA,
00:00:29.591 --> 00:00:31.359
which we have placed in this
00:00:31.359 --> 00:00:33.200
X-ray diffraction instrument.
00:00:33.200 --> 00:00:35.600
And after rotating the crystal
00:00:35.600 --> 00:00:38.000
in the X-ray beam for two degrees,
00:00:38.000 --> 00:00:40.480
we obtain this following diffraction pattern,
00:00:40.480 --> 00:00:43.280
which has thousands of spots on it.
00:43.280 --> 00:00:47.840
We rotate the crystal for over 180 degrees,
00:47.840 --> 00:00:51.760
collecting 90 images to obtain all the data.
00:00:51.760 --> 00:00:56.000
We then process those images
00:56.000 --> 00:00:57.752
and do an inverse Fourier transform
00:00:57.752 --> 00:00:59.920
to obtain the electron density.
00:00:59.920 --> 00:01:01.888
This electron density map has been
00:01:01.888 --> 00:01:04.344
contoured at the one-sigma level.
00:01:04.344 --> 00:01:06.116
That level's being shown by
00:01:06.116 --> 00:01:08.640
this blue chicken wire mesh.
00:01:08.640 --> 00:01:10.152
Atomic models have been fitted
00:01:10.152 --> 00:01:11.119
to this chicken wire.
00:01:11.119 --> 00:01:14.240
These lines represent bonds between atoms,
00:01:14.240 --> 00:01:16.240
atoms are being represented by points.
00:01:16.240 --> 00:01:18.640
And atoms are colored by atom type,
00:01:18.640 --> 00:01:21.280
red for oxygen, blue for nitrogen,
00:01:21.280 --> 00:01:23.040
and then in this case,
01:23.040 --> 00:01:24.720
carbon is colored cyan.
00:01:24.720 --> 00:01:27.203
We have fitted a drug molecule
00:01:27.203 --> 00:01:29.360
to the central blob of electron density
00:01:29.360 --> 00:01:32.400
which corresponds to that active site
01:32.400 --> 00:01:35.759
of this protein, which is RET Kinase.
00:01:35.759 --> 00:01:37.439
It's important in lung cancer.
00:01:37.439 --> 00:01:40.079
When we're finished with model building,
00:01:40.079 --> 00:01:41.339
we will then examine
00:01:41.339 --> 00:01:43.006
the result of the final structure
00:01:43.006 --> 00:01:45.200
to prepare images for publication
00:01:45.200 --> 00:01:47.439
using molecular graphics program.
01:47.439 --> 00:01:48.108
In this case,
00:01:48.108 --> 00:01:50.000
we've overlaid a number of structures,
00:01:50.000 --> 00:01:53.600
and we're examining the distance between
01:53.600 --> 00:01:55.680
the side chain of an alanine
00:01:55.680 --> 00:01:58.880
and one or two drug molecules.
00:01:58.880 --> 00:02:00.719
This alanine sidechain actually blocks
00:02:00.719 --> 00:02:02.159
the binding of one of these drugs.
00:02:02.159 --> 00:02:03.439
The most popular program
02:03.439 --> 02:06.320
for doing this kind of analysis
02:06.320 --> 00:02:07.280
and for preparing images
00:02:07.280 --> 00:02:09.520
for publication is PyMOL.
02:09.520 --> 02:11.440
PyMOL was used to prepare these images
02:11.440 --> 02:14.720
on the covers of these featured journals.
02:14.720 --> 00:02:17.520
PyMOL is favored because
00:02:17.520 --> 00:02:19.520
it has 500 commands
00:02:19.520 --> 00:02:22.128
and 600 parameter settings
00:02:22.128 --> 00:02:23.360
that provide exquisite control
00:02:23.360 --> 00:02:24.959
over the appearance of the output.
00:02:24.959 --> 00:02:28.480
PyMOL has over 100,000 users,
02:28.480 --> 00:02:30.000
reflecting its popularity.
00:02:30.000 --> 00:02:31.599
This is the GUI for PyMOL.
00:02:31.599 --> 00:02:35.120
It shows in white the viewport area
00:02:35.120 --> 00:02:36.080
where one interacts
00:02:36.080 --> 00:02:37.840
with the loaded molecular object.
00:02:37.840 --> 00:02:41.920
We have rendered the same RET kinase
02:41.920 --> 00:02:49.788
with a set of preset parameters
00:02:49.788 --> 00:02:51.200
that have been named "publication".
00:02:51.200 --> 00:02:52.720
The other way of applying
02:52.720 --> 00:02:54.319
parameter settings and commands
00:02:54.319 --> 00:02:56.720
is to enter them at the PyMOL prompt.
00:02:56.720 --> 00:03:00.159
Then the third way is to load and run scripts.
00:03:00.159 --> 00:03:03.120
PyMOL is actually written in C for speed,
00:03:03.120 --> 00:03:06.159
but it is wrapped in Python for extensibility.
03:06.159 --> 03:09.680
In fact, there are over 100 articles
03:09.680 --> 00:03:11.599
about various plugins and scripts
00:03:11.599 --> 00:03:12.400
that people have developed
00:03:12.400 --> 00:03:15.120
to extend PyMOL for years.
03:15.120 --> 00:03:16.480
Here's some examples
00:03:16.480 --> 00:03:18.959
from the snippet library that I developed.
03:18.959 --> 03:21.280
On the left is a default
03:21.280 --> 03:24.640
cartoon representation of a RNA hairpin.
03:24.640 --> 03:27.040
I find this reduced representation
03:27.040 --> 00:03:30.799
of the RNA hairpin to be too stark.
03:30.799 --> 00:03:32.319
I prefer these alternate ones
00:03:32.319 --> 00:03:33.840
that I developed.
03:33.840 --> 03:37.519
So, these three to the right of this one
03:37.519 --> 00:03:39.519
are not available through
00:03:39.519 --> 00:03:40.720
pull downs in PyMOL.
00:03:40.720 --> 00:03:42.748
So why developed a PyMOL
00:03:42.748 --> 00:03:44.879
snippet library for Org?
03:44.879 --> 00:03:47.040
Well, Org provides great support
00:03:47.040 --> 00:03:48.560
for literate programming,
00:03:48.560 --> 00:03:49.840
where you have code blocks
00:03:49.840 --> 00:03:52.000
that contain code that's executable,
00:03:52.000 --> 00:03:53.040
and the output is shown
00:03:53.040 --> 00:03:54.959
below that code block.
03:54.959 --> 00:03:56.720
And then you can fill
00:03:56.720 --> 00:03:58.959
the surrounding area in the document
03:58.959 --> 00:04:00.799
with the explanatory prose.
00:04:00.799 --> 00:04:02.000
Org has great support
00:04:02.000 --> 00:04:04.480
for editing that explanatory prose.
00:04:04.480 --> 00:04:08.080
Org can run PyMOL through PyMOL's Python API.
04:08.080 --> 00:04:11.280
One of the uses of such an Org document
00:04:11.280 --> 00:04:14.487
is to assemble a gallery of draft images.
00:04:14.487 --> 00:04:16.563
We often have to look at
00:04:16.563 --> 00:04:19.840
dozens of candidate images
00:04:19.840 --> 00:04:22.000
with the molecule in different orientations,
00:04:22.000 --> 00:04:23.520
different zoom settings,
04:23.520 --> 00:04:25.032
different representations,
00:04:25.032 --> 00:04:27.280
different colors, and so on.
00:04:27.280 --> 00:04:30.639
And to have those images along with…,
00:04:30.639 --> 00:04:31.840
adjacent to the code
04:31.840 --> 00:04:33.680
that was used to generate them,
00:04:33.680 --> 00:04:37.199
can be very effective for
04:37.199 --> 00:04:39.680
further editing the code
00:04:39.680 --> 00:04:40.880
and improving the images.
00:04:40.880 --> 00:04:44.080
Once the final images have been selected,
04:44.080 --> 00:04:46.320
one can submit the code
00:04:46.320 --> 00:04:48.479
as part of the supplemental material.
00:04:48.479 --> 00:04:52.400
Finally, one can use the journal package
04:52.400 --> 00:04:54.608
to use the Org files as
00:04:54.608 --> 00:04:57.120
an electronic laboratory notebook,
00:04:57.120 --> 00:04:59.600
which is illustrated with molecular images.
00:04:59.600 --> 00:05:01.039
This can be very useful
00:05:01.039 --> 00:05:04.080
when assembling manuscripts
05:04.080 --> 00:05:05.440
months or years later.
00:05:05.440 --> 00:05:08.320
This shows the YASnippet pull down
05:08.320 --> 00:05:12.720
after my library has been installed.
00:05:12.720 --> 00:05:15.360
I have an Org file open,
00:05:15.360 --> 00:05:17.120
so I'm in Org mode.
05:17.120 --> 00:05:20.880
We have the Org mode submenu,
05:20.880 --> 00:05:23.919
and under it, all my snippets
00:05:23.919 --> 00:05:26.880
are located in these sub-sub-menus
05:26.880 --> 00:05:30.880
that are prepended with pymolpy.
00:05:30.880 --> 00:05:33.840
Under the molecular representations menu,
00:05:33.840 --> 00:05:36.479
there is a listing of snippets.
00:05:36.479 --> 00:05:38.563
The top one is for the ambient occlusion effect,
00:05:38.563 --> 00:05:39.840
which we're going to apply
00:05:39.840 --> 00:05:41.039
in this Org file.
00:05:41.039 --> 00:05:44.240
So these lines of code were inserted after,
00:05:44.240 --> 00:05:48.479
as well as these flanking lines
05:48.479 --> 00:05:50.240
that define the source block,
00:05:50.240 --> 00:05:53.280
were inserted by clicking on that line.
05:53.280 --> 00:05:55.120
Then I've added some additional code.
00:05:55.120 --> 00:05:56.880
So, the first line defines
00:05:56.880 --> 00:05:59.039
the language that we're using.
00:05:59.039 --> 00:05:59.768
We're going to use
00:05:59.768 --> 00:06:02.639
the jupyter-python language.
06:02.639 --> 00:06:04.560
Then you can define the session,
00:06:04.560 --> 00:06:06.400
and the name of this is arbitrary.
00:06:06.400 --> 00:06:09.680
Then the kernel is our means
00:06:09.680 --> 00:06:11.360
by which we gain access
00:06:11.360 --> 00:06:14.880
to the Python API of PyMOL.
06:14.880 --> 00:06:17.039
The remaining settings apply to the output.
00:06:17.039 --> 00:06:18.319
To execute this code
00:06:18.319 --> 00:06:21.199
and to get the resulting image,
00:06:21.199 --> 00:06:25.120
you put the cursor inside this code block,
00:06:25.120 --> 00:06:26.560
or on the top line,
00:06:26.560 --> 00:06:29.840
and enter Control c Control c (C-c C-c).
06:29.840 --> 00:06:32.240
This shows the resulting image
00:06:32.240 --> 00:06:33.600
has been loaded up.
00:06:33.600 --> 00:06:37.280
It takes about 10 seconds for this to appear.
06:37.280 --> 00:06:38.479
So the downside of this is
00:06:38.479 --> 00:06:40.729
if you have a large number of these,
00:06:40.729 --> 00:06:43.919
the Org file can lag quite a bit
00:06:43.919 --> 00:06:45.120
when you try to scroll through it,
00:06:45.120 --> 00:06:48.319
so you need to close up these result drawers,
00:06:48.319 --> 00:06:50.960
and only open up the ones
00:06:50.960 --> 00:06:53.199
that you're currently examining.
00:06:53.199 --> 00:06:54.319
These are features I think
06:54.319 --> 06:56.240
are important in practical work.
06:56.240 --> 00:06:59.840
So, the plus is, a feature that's present,
00:06:59.840 --> 00:07:01.120
minus is absent.
00:07:01.120 --> 00:07:03.199
I think tab stops and tab triggers
00:07:03.199 --> 00:07:04.800
are really important.
07:04.800 --> 00:07:05.680
Triggers are important for
00:07:05.680 --> 00:07:06.720
the fast assertion code,
00:07:06.720 --> 00:07:08.639
tab stops are important for
07:08.639 --> 00:07:10.560
complete, accurate editing of code.
00:07:10.560 --> 00:07:12.735
I already addressed the rendering speed
00:07:12.735 --> 00:07:14.560
and scrolling issue.
00:07:14.560 --> 00:07:15.759
I think the way around this
00:07:15.759 --> 00:07:19.199
is just to export the Org document to a PDF file
00:07:19.199 --> 00:07:23.360
and do your evaluation of different images
00:07:23.360 --> 00:07:25.199
by examining them in the PDF
00:07:25.199 --> 00:07:26.560
rather than the Org file.
00:07:26.560 --> 00:07:30.400
The path to PDF is lightning fast in Emacs
00:07:30.400 --> 00:07:32.240
compared to Jupyter,
00:07:32.240 --> 00:07:35.280
where it's cumbersome in comparison.
00:07:35.280 --> 00:07:38.400
This is a snapshot of my initialization file.
00:07:38.400 --> 00:07:41.840
These parts are relevant to doing this work.
00:07:41.840 --> 00:07:43.039
A full description of them
00:07:43.039 --> 00:07:46.319
can be found in the README file
07:46.319 --> 00:07:48.639
of this repository on GitHub.
00:07:48.639 --> 00:07:49.456
I'd like to thank the
00:07:49.456 --> 00:07:51.840
Nathan Shock Data Science Workshop
00:07:51.840 --> 00:07:54.319
for feedback during presentations
00:07:54.319 --> 00:07:56.160
I've made about this work.
00:07:56.160 --> 00:07:57.628
And I would also like to thank
00:07:57.628 --> 00:08:00.240
the following funding sources for support.
00:08:00.240 --> 00:08:03.879
I will now take questions. Thank you.
00:08:03.879 --> 00:08:03.986
[captions by Blaine Mooers and Bhavin Gandhi]