diff options
Diffstat (limited to '')
-rw-r--r-- | 2021/captions/emacsconf-2021-molecular--reproducible-molecular-graphics-with-org-mode--blaine-mooers--main.vtt | 628 |
1 files changed, 628 insertions, 0 deletions
diff --git a/2021/captions/emacsconf-2021-molecular--reproducible-molecular-graphics-with-org-mode--blaine-mooers--main.vtt b/2021/captions/emacsconf-2021-molecular--reproducible-molecular-graphics-with-org-mode--blaine-mooers--main.vtt new file mode 100644 index 00000000..06d92f3a --- /dev/null +++ b/2021/captions/emacsconf-2021-molecular--reproducible-molecular-graphics-with-org-mode--blaine-mooers--main.vtt @@ -0,0 +1,628 @@ +WEBVTT + +00:00.880 --> 00:00:02.446 +Hi, I'm Blaine Mooers. + +00:00:02.446 --> 00:00:04.160 +I'm going to be talking about + +00:00:04.160 --> 00:00:07.919 +the use of molecular graphics in Org + +00:07.919 --> 00:00:08.880 +for the purpose of doing + +00:00:08.880 --> 00:00:11.840 +reproducible research in structural biology. + +00:00:11.840 --> 00:00:13.722 +I'm an associate professor of biochemistry + +00:00:13.722 --> 00:00:15.768 +and microbiology at the University of Oklahoma + +00:00:15.768 --> 00:00:17.760 +Health Sciences Center in Oklahoma City. + +00:00:17.760 --> 00:00:19.600 +My laboratory uses X-ray crystallography + +00:00:19.600 --> 00:00:21.920 +to determine the atomic structures + +00:00:21.920 --> 00:00:23.439 +of proteins like this one + +00:00:23.439 --> 00:00:26.080 +in the lower left, and of nucleic acids + +00:26.080 --> 00:27.840 +important in human health. + +00:27.840 --> 00:00:29.591 +This is a crystal of an RNA, + +00:00:29.591 --> 00:00:31.359 +which we have placed in this + +00:00:31.359 --> 00:00:33.200 +X-ray diffraction instrument. + +00:00:33.200 --> 00:00:35.600 +And after rotating the crystal + +00:00:35.600 --> 00:00:38.000 +in the X-ray beam for two degrees, + +00:00:38.000 --> 00:00:40.480 +we obtain this following diffraction pattern, + +00:00:40.480 --> 00:00:43.280 +which has thousands of spots on it. + +00:43.280 --> 00:00:47.840 +We rotate the crystal for over 180 degrees, + +00:47.840 --> 00:00:51.760 +collecting 90 images to obtain all the data. + +00:00:51.760 --> 00:00:56.000 +We then process those images + +00:56.000 --> 00:00:57.752 +and do an inverse Fourier transform + +00:00:57.752 --> 00:00:59.920 +to obtain the electron density. + +00:00:59.920 --> 00:01:01.888 +This electron density map has been + +00:01:01.888 --> 00:01:04.344 +contoured at the one-sigma level. + +00:01:04.344 --> 00:01:06.116 +That level's being shown by + +00:01:06.116 --> 00:01:08.640 +this blue chicken wire mesh. + +00:01:08.640 --> 00:01:10.152 +Atomic models have been fitted + +00:01:10.152 --> 00:01:11.119 +to this chicken wire. + +00:01:11.119 --> 00:01:14.240 +These lines represent bonds between atoms, + +00:01:14.240 --> 00:01:16.240 +atoms are being represented by points. + +00:01:16.240 --> 00:01:18.640 +And atoms are colored by atom type, + +00:01:18.640 --> 00:01:21.280 +red for oxygen, blue for nitrogen, + +00:01:21.280 --> 00:01:23.040 +and then in this case, + +01:23.040 --> 00:01:24.720 +carbon is colored cyan. + +00:01:24.720 --> 00:01:27.203 +We have fitted a drug molecule + +00:01:27.203 --> 00:01:29.360 +to the central blob of electron density + +00:01:29.360 --> 00:01:32.400 +which corresponds to that active site + +01:32.400 --> 00:01:35.759 +of this protein, which is RET Kinase. + +00:01:35.759 --> 00:01:37.439 +It's important in lung cancer. + +00:01:37.439 --> 00:01:40.079 +When we're finished with model building, + +00:01:40.079 --> 00:01:41.339 +we will then examine + +00:01:41.339 --> 00:01:43.006 +the result of the final structure + +00:01:43.006 --> 00:01:45.200 +to prepare images for publication + +00:01:45.200 --> 00:01:47.439 +using molecular graphics program. + +01:47.439 --> 00:01:48.108 +In this case, + +00:01:48.108 --> 00:01:50.000 +we've overlaid a number of structures, + +00:01:50.000 --> 00:01:53.600 +and we're examining the distance between + +01:53.600 --> 00:01:55.680 +the side chain of an alanine + +00:01:55.680 --> 00:01:58.880 +and one or two drug molecules. + +00:01:58.880 --> 00:02:00.719 +This alanine sidechain actually blocks + +00:02:00.719 --> 00:02:02.159 +the binding of one of these drugs. + +00:02:02.159 --> 00:02:03.439 +The most popular program + +02:03.439 --> 02:06.320 +for doing this kind of analysis + +02:06.320 --> 00:02:07.280 +and for preparing images + +00:02:07.280 --> 00:02:09.520 +for publication is PyMOL. + +02:09.520 --> 02:11.440 +PyMOL was used to prepare these images + +02:11.440 --> 02:14.720 +on the covers of these featured journals. + +02:14.720 --> 00:02:17.520 +PyMOL is favored because + +00:02:17.520 --> 00:02:19.520 +it has 500 commands + +00:02:19.520 --> 00:02:22.128 +and 600 parameter settings + +00:02:22.128 --> 00:02:23.360 +that provide exquisite control + +00:02:23.360 --> 00:02:24.959 +over the appearance of the output. + +00:02:24.959 --> 00:02:28.480 +PyMOL has over 100,000 users, + +02:28.480 --> 00:02:30.000 +reflecting its popularity. + +00:02:30.000 --> 00:02:31.599 +This is the GUI for PyMOL. + +00:02:31.599 --> 00:02:35.120 +It shows in white the viewport area + +00:02:35.120 --> 00:02:36.080 +where one interacts + +00:02:36.080 --> 00:02:37.840 +with the loaded molecular object. + +00:02:37.840 --> 00:02:41.920 +We have rendered the same RET kinase + +02:41.920 --> 00:02:49.788 +with a set of preset parameters + +00:02:49.788 --> 00:02:51.200 +that have been named "publication". + +00:02:51.200 --> 00:02:52.720 +The other way of applying + +02:52.720 --> 00:02:54.319 +parameter settings and commands + +00:02:54.319 --> 00:02:56.720 +is to enter them at the PyMOL prompt. + +00:02:56.720 --> 00:03:00.159 +Then the third way is to load and run scripts. + +00:03:00.159 --> 00:03:03.120 +PyMOL is actually written in C for speed, + +00:03:03.120 --> 00:03:06.159 +but it is wrapped in Python for extensibility. + +03:06.159 --> 03:09.680 +In fact, there are over 100 articles + +03:09.680 --> 00:03:11.599 +about various plugins and scripts + +00:03:11.599 --> 00:03:12.400 +that people have developed + +00:03:12.400 --> 00:03:15.120 +to extend PyMOL for years. + +03:15.120 --> 00:03:16.480 +Here's some examples + +00:03:16.480 --> 00:03:18.959 +from the snippet library that I developed. + +03:18.959 --> 03:21.280 +On the left is a default + +03:21.280 --> 03:24.640 +cartoon representation of a RNA hairpin. + +03:24.640 --> 03:27.040 +I find this reduced representation + +03:27.040 --> 00:03:30.799 +of the RNA hairpin to be too stark. + +03:30.799 --> 00:03:32.319 +I prefer these alternate ones + +00:03:32.319 --> 00:03:33.840 +that I developed. + +03:33.840 --> 03:37.519 +So, these three to the right of this one + +03:37.519 --> 00:03:39.519 +are not available through + +00:03:39.519 --> 00:03:40.720 +pull downs in PyMOL. + +00:03:40.720 --> 00:03:42.748 +So why developed a PyMOL + +00:03:42.748 --> 00:03:44.879 +snippet library for Org? + +03:44.879 --> 00:03:47.040 +Well, Org provides great support + +00:03:47.040 --> 00:03:48.560 +for literate programming, + +00:03:48.560 --> 00:03:49.840 +where you have code blocks + +00:03:49.840 --> 00:03:52.000 +that contain code that's executable, + +00:03:52.000 --> 00:03:53.040 +and the output is shown + +00:03:53.040 --> 00:03:54.959 +below that code block. + +03:54.959 --> 00:03:56.720 +And then you can fill + +00:03:56.720 --> 00:03:58.959 +the surrounding area in the document + +03:58.959 --> 00:04:00.799 +with the explanatory prose. + +00:04:00.799 --> 00:04:02.000 +Org has great support + +00:04:02.000 --> 00:04:04.480 +for editing that explanatory prose. + +00:04:04.480 --> 00:04:08.080 +Org can run PyMOL through PyMOL's Python API. + +04:08.080 --> 00:04:11.280 +One of the uses of such an Org document + +00:04:11.280 --> 00:04:14.487 +is to assemble a gallery of draft images. + +00:04:14.487 --> 00:04:16.563 +We often have to look at + +00:04:16.563 --> 00:04:19.840 +dozens of candidate images + +00:04:19.840 --> 00:04:22.000 +with the molecule in different orientations, + +00:04:22.000 --> 00:04:23.520 +different zoom settings, + +04:23.520 --> 00:04:25.032 +different representations, + +00:04:25.032 --> 00:04:27.280 +different colors, and so on. + +00:04:27.280 --> 00:04:30.639 +And to have those images along with…, + +00:04:30.639 --> 00:04:31.840 +adjacent to the code + +04:31.840 --> 00:04:33.680 +that was used to generate them, + +00:04:33.680 --> 00:04:37.199 +can be very effective for + +04:37.199 --> 00:04:39.680 +further editing the code + +00:04:39.680 --> 00:04:40.880 +and improving the images. + +00:04:40.880 --> 00:04:44.080 +Once the final images have been selected, + +04:44.080 --> 00:04:46.320 +one can submit the code + +00:04:46.320 --> 00:04:48.479 +as part of the supplemental material. + +00:04:48.479 --> 00:04:52.400 +Finally, one can use the journal package + +04:52.400 --> 00:04:54.608 +to use the Org files as + +00:04:54.608 --> 00:04:57.120 +an electronic laboratory notebook, + +00:04:57.120 --> 00:04:59.600 +which is illustrated with molecular images. + +00:04:59.600 --> 00:05:01.039 +This can be very useful + +00:05:01.039 --> 00:05:04.080 +when assembling manuscripts + +05:04.080 --> 00:05:05.440 +months or years later. + +00:05:05.440 --> 00:05:08.320 +This shows the YASnippet pull down + +05:08.320 --> 00:05:12.720 +after my library has been installed. + +00:05:12.720 --> 00:05:15.360 +I have an Org file open, + +00:05:15.360 --> 00:05:17.120 +so I'm in Org mode. + +05:17.120 --> 00:05:20.880 +We have the Org mode submenu, + +05:20.880 --> 00:05:23.919 +and under it, all my snippets + +00:05:23.919 --> 00:05:26.880 +are located in these sub-sub-menus + +05:26.880 --> 00:05:30.880 +that are prepended with pymolpy. + +00:05:30.880 --> 00:05:33.840 +Under the molecular representations menu, + +00:05:33.840 --> 00:05:36.479 +there is a listing of snippets. + +00:05:36.479 --> 00:05:38.563 +The top one is for the ambient occlusion effect, + +00:05:38.563 --> 00:05:39.840 +which we're going to apply + +00:05:39.840 --> 00:05:41.039 +in this Org file. + +00:05:41.039 --> 00:05:44.240 +So these lines of code were inserted after, + +00:05:44.240 --> 00:05:48.479 +as well as these flanking lines + +05:48.479 --> 00:05:50.240 +that define the source block, + +00:05:50.240 --> 00:05:53.280 +were inserted by clicking on that line. + +05:53.280 --> 00:05:55.120 +Then I've added some additional code. + +00:05:55.120 --> 00:05:56.880 +So, the first line defines + +00:05:56.880 --> 00:05:59.039 +the language that we're using. + +00:05:59.039 --> 00:05:59.768 +We're going to use + +00:05:59.768 --> 00:06:02.639 +the jupyter-python language. + +06:02.639 --> 00:06:04.560 +Then you can define the session, + +00:06:04.560 --> 00:06:06.400 +and the name of this is arbitrary. + +00:06:06.400 --> 00:06:09.680 +Then the kernel is our means + +00:06:09.680 --> 00:06:11.360 +by which we gain access + +00:06:11.360 --> 00:06:14.880 +to the Python API of PyMOL. + +06:14.880 --> 00:06:17.039 +The remaining settings apply to the output. + +00:06:17.039 --> 00:06:18.319 +To execute this code + +00:06:18.319 --> 00:06:21.199 +and to get the resulting image, + +00:06:21.199 --> 00:06:25.120 +you put the cursor inside this code block, + +00:06:25.120 --> 00:06:26.560 +or on the top line, + +00:06:26.560 --> 00:06:29.840 +and enter Control c Control c (C-c C-c). + +06:29.840 --> 00:06:32.240 +This shows the resulting image + +00:06:32.240 --> 00:06:33.600 +has been loaded up. + +00:06:33.600 --> 00:06:37.280 +It takes about 10 seconds for this to appear. + +06:37.280 --> 00:06:38.479 +So the downside of this is + +00:06:38.479 --> 00:06:40.729 +if you have a large number of these, + +00:06:40.729 --> 00:06:43.919 +the Org file can lag quite a bit + +00:06:43.919 --> 00:06:45.120 +when you try to scroll through it, + +00:06:45.120 --> 00:06:48.319 +so you need to close up these result drawers, + +00:06:48.319 --> 00:06:50.960 +and only open up the ones + +00:06:50.960 --> 00:06:53.199 +that you're currently examining. + +00:06:53.199 --> 00:06:54.319 +These are features I think + +06:54.319 --> 06:56.240 +are important in practical work. + +06:56.240 --> 00:06:59.840 +So, the plus is, a feature that's present, + +00:06:59.840 --> 00:07:01.120 +minus is absent. + +00:07:01.120 --> 00:07:03.199 +I think tab stops and tab triggers + +00:07:03.199 --> 00:07:04.800 +are really important. + +07:04.800 --> 00:07:05.680 +Triggers are important for + +00:07:05.680 --> 00:07:06.720 +the fast assertion code, + +00:07:06.720 --> 00:07:08.639 +tab stops are important for + +07:08.639 --> 00:07:10.560 +complete, accurate editing of code. + +00:07:10.560 --> 00:07:12.735 +I already addressed the rendering speed + +00:07:12.735 --> 00:07:14.560 +and scrolling issue. + +00:07:14.560 --> 00:07:15.759 +I think the way around this + +00:07:15.759 --> 00:07:19.199 +is just to export the Org document to a PDF file + +00:07:19.199 --> 00:07:23.360 +and do your evaluation of different images + +00:07:23.360 --> 00:07:25.199 +by examining them in the PDF + +00:07:25.199 --> 00:07:26.560 +rather than the Org file. + +00:07:26.560 --> 00:07:30.400 +The path to PDF is lightning fast in Emacs + +00:07:30.400 --> 00:07:32.240 +compared to Jupyter, + +00:07:32.240 --> 00:07:35.280 +where it's cumbersome in comparison. + +00:07:35.280 --> 00:07:38.400 +This is a snapshot of my initialization file. + +00:07:38.400 --> 00:07:41.840 +These parts are relevant to doing this work. + +00:07:41.840 --> 00:07:43.039 +A full description of them + +00:07:43.039 --> 00:07:46.319 +can be found in the README file + +07:46.319 --> 00:07:48.639 +of this repository on GitHub. + +00:07:48.639 --> 00:07:49.456 +I'd like to thank the + +00:07:49.456 --> 00:07:51.840 +Nathan Shock Data Science Workshop + +00:07:51.840 --> 00:07:54.319 +for feedback during presentations + +00:07:54.319 --> 00:07:56.160 +I've made about this work. + +00:07:56.160 --> 00:07:57.628 +And I would also like to thank + +00:07:57.628 --> 00:08:00.240 +the following funding sources for support. + +00:08:00.240 --> 00:08:03.879 +I will now take questions. Thank you. + +00:08:03.879 --> 00:08:03.986 +[captions by Blaine Mooers and Bhavin Gandhi] |