WEBVTT 00:00.880 --> 00:00:02.446 Hi, I'm Blaine Mooers. 00:00:02.446 --> 00:00:04.160 I'm going to be talking about 00:00:04.160 --> 00:00:07.919 the use of molecular graphics in Org 00:07.919 --> 00:00:08.880 for the purpose of doing 00:00:08.880 --> 00:00:11.840 reproducible research in structural biology. 00:00:11.840 --> 00:00:13.722 I'm an associate professor of biochemistry 00:00:13.722 --> 00:00:15.768 and microbiology at the University of Oklahoma 00:00:15.768 --> 00:00:17.760 Health Sciences Center in Oklahoma City. 00:00:17.760 --> 00:00:19.600 My laboratory uses X-ray crystallography 00:00:19.600 --> 00:00:21.920 to determine the atomic structures 00:00:21.920 --> 00:00:23.439 of proteins like this one 00:00:23.439 --> 00:00:26.080 in the lower left, and of nucleic acids 00:26.080 --> 00:27.840 important in human health. 00:27.840 --> 00:00:29.591 This is a crystal of an RNA, 00:00:29.591 --> 00:00:31.359 which we have placed in this 00:00:31.359 --> 00:00:33.200 X-ray diffraction instrument. 00:00:33.200 --> 00:00:35.600 And after rotating the crystal 00:00:35.600 --> 00:00:38.000 in the X-ray beam for two degrees, 00:00:38.000 --> 00:00:40.480 we obtain this following diffraction pattern, 00:00:40.480 --> 00:00:43.280 which has thousands of spots on it. 00:43.280 --> 00:00:47.840 We rotate the crystal for over 180 degrees, 00:47.840 --> 00:00:51.760 collecting 90 images to obtain all the data. 00:00:51.760 --> 00:00:56.000 We then process those images 00:56.000 --> 00:00:57.752 and do an inverse Fourier transform 00:00:57.752 --> 00:00:59.920 to obtain the electron density. 00:00:59.920 --> 00:01:01.888 This electron density map has been 00:01:01.888 --> 00:01:04.344 contoured at the one-sigma level. 00:01:04.344 --> 00:01:06.116 That level's being shown by 00:01:06.116 --> 00:01:08.640 this blue chicken wire mesh. 00:01:08.640 --> 00:01:10.152 Atomic models have been fitted 00:01:10.152 --> 00:01:11.119 to this chicken wire. 00:01:11.119 --> 00:01:14.240 These lines represent bonds between atoms, 00:01:14.240 --> 00:01:16.240 atoms are being represented by points. 00:01:16.240 --> 00:01:18.640 And atoms are colored by atom type, 00:01:18.640 --> 00:01:21.280 red for oxygen, blue for nitrogen, 00:01:21.280 --> 00:01:23.040 and then in this case, 01:23.040 --> 00:01:24.720 carbon is colored cyan. 00:01:24.720 --> 00:01:27.203 We have fitted a drug molecule 00:01:27.203 --> 00:01:29.360 to the central blob of electron density 00:01:29.360 --> 00:01:32.400 which corresponds to that active site 01:32.400 --> 00:01:35.759 of this protein, which is RET Kinase. 00:01:35.759 --> 00:01:37.439 It's important in lung cancer. 00:01:37.439 --> 00:01:40.079 When we're finished with model building, 00:01:40.079 --> 00:01:41.339 we will then examine 00:01:41.339 --> 00:01:43.006 the result of the final structure 00:01:43.006 --> 00:01:45.200 to prepare images for publication 00:01:45.200 --> 00:01:47.439 using molecular graphics program. 01:47.439 --> 00:01:48.108 In this case, 00:01:48.108 --> 00:01:50.000 we've overlaid a number of structures, 00:01:50.000 --> 00:01:53.600 and we're examining the distance between 01:53.600 --> 00:01:55.680 the side chain of an alanine 00:01:55.680 --> 00:01:58.880 and one or two drug molecules. 00:01:58.880 --> 00:02:00.719 This alanine sidechain actually blocks 00:02:00.719 --> 00:02:02.159 the binding of one of these drugs. 00:02:02.159 --> 00:02:03.439 The most popular program 02:03.439 --> 02:06.320 for doing this kind of analysis 02:06.320 --> 00:02:07.280 and for preparing images 00:02:07.280 --> 00:02:09.520 for publication is PyMOL. 02:09.520 --> 02:11.440 PyMOL was used to prepare these images 02:11.440 --> 02:14.720 on the covers of these featured journals. 02:14.720 --> 00:02:17.520 PyMOL is favored because 00:02:17.520 --> 00:02:19.520 it has 500 commands 00:02:19.520 --> 00:02:22.128 and 600 parameter settings 00:02:22.128 --> 00:02:23.360 that provide exquisite control 00:02:23.360 --> 00:02:24.959 over the appearance of the output. 00:02:24.959 --> 00:02:28.480 PyMOL has over 100,000 users, 02:28.480 --> 00:02:30.000 reflecting its popularity. 00:02:30.000 --> 00:02:31.599 This is the GUI for PyMOL. 00:02:31.599 --> 00:02:35.120 It shows in white the viewport area 00:02:35.120 --> 00:02:36.080 where one interacts 00:02:36.080 --> 00:02:37.840 with the loaded molecular object. 00:02:37.840 --> 00:02:41.920 We have rendered the same RET kinase 02:41.920 --> 00:02:49.788 with a set of preset parameters 00:02:49.788 --> 00:02:51.200 that have been named "publication". 00:02:51.200 --> 00:02:52.720 The other way of applying 02:52.720 --> 00:02:54.319 parameter settings and commands 00:02:54.319 --> 00:02:56.720 is to enter them at the PyMOL prompt. 00:02:56.720 --> 00:03:00.159 Then the third way is to load and run scripts. 00:03:00.159 --> 00:03:03.120 PyMOL is actually written in C for speed, 00:03:03.120 --> 00:03:06.159 but it is wrapped in Python for extensibility. 03:06.159 --> 03:09.680 In fact, there are over 100 articles 03:09.680 --> 00:03:11.599 about various plugins and scripts 00:03:11.599 --> 00:03:12.400 that people have developed 00:03:12.400 --> 00:03:15.120 to extend PyMOL for years. 03:15.120 --> 00:03:16.480 Here's some examples 00:03:16.480 --> 00:03:18.959 from the snippet library that I developed. 03:18.959 --> 03:21.280 On the left is a default 03:21.280 --> 03:24.640 cartoon representation of a RNA hairpin. 03:24.640 --> 03:27.040 I find this reduced representation 03:27.040 --> 00:03:30.799 of the RNA hairpin to be too stark. 03:30.799 --> 00:03:32.319 I prefer these alternate ones 00:03:32.319 --> 00:03:33.840 that I developed. 03:33.840 --> 03:37.519 So, these three to the right of this one 03:37.519 --> 00:03:39.519 are not available through 00:03:39.519 --> 00:03:40.720 pull downs in PyMOL. 00:03:40.720 --> 00:03:42.748 So why developed a PyMOL 00:03:42.748 --> 00:03:44.879 snippet library for Org? 03:44.879 --> 00:03:47.040 Well, Org provides great support 00:03:47.040 --> 00:03:48.560 for literate programming, 00:03:48.560 --> 00:03:49.840 where you have code blocks 00:03:49.840 --> 00:03:52.000 that contain code that's executable, 00:03:52.000 --> 00:03:53.040 and the output is shown 00:03:53.040 --> 00:03:54.959 below that code block. 03:54.959 --> 00:03:56.720 And then you can fill 00:03:56.720 --> 00:03:58.959 the surrounding area in the document 03:58.959 --> 00:04:00.799 with the explanatory prose. 00:04:00.799 --> 00:04:02.000 Org has great support 00:04:02.000 --> 00:04:04.480 for editing that explanatory prose. 00:04:04.480 --> 00:04:08.080 Org can run PyMOL through PyMOL's Python API. 04:08.080 --> 00:04:11.280 One of the uses of such an Org document 00:04:11.280 --> 00:04:14.487 is to assemble a gallery of draft images. 00:04:14.487 --> 00:04:16.563 We often have to look at 00:04:16.563 --> 00:04:19.840 dozens of candidate images 00:04:19.840 --> 00:04:22.000 with the molecule in different orientations, 00:04:22.000 --> 00:04:23.520 different zoom settings, 04:23.520 --> 00:04:25.032 different representations, 00:04:25.032 --> 00:04:27.280 different colors, and so on. 00:04:27.280 --> 00:04:30.639 And to have those images along with…, 00:04:30.639 --> 00:04:31.840 adjacent to the code 04:31.840 --> 00:04:33.680 that was used to generate them, 00:04:33.680 --> 00:04:37.199 can be very effective for 04:37.199 --> 00:04:39.680 further editing the code 00:04:39.680 --> 00:04:40.880 and improving the images. 00:04:40.880 --> 00:04:44.080 Once the final images have been selected, 04:44.080 --> 00:04:46.320 one can submit the code 00:04:46.320 --> 00:04:48.479 as part of the supplemental material. 00:04:48.479 --> 00:04:52.400 Finally, one can use the journal package 04:52.400 --> 00:04:54.608 to use the Org files as 00:04:54.608 --> 00:04:57.120 an electronic laboratory notebook, 00:04:57.120 --> 00:04:59.600 which is illustrated with molecular images. 00:04:59.600 --> 00:05:01.039 This can be very useful 00:05:01.039 --> 00:05:04.080 when assembling manuscripts 05:04.080 --> 00:05:05.440 months or years later. 00:05:05.440 --> 00:05:08.320 This shows the YASnippet pull down 05:08.320 --> 00:05:12.720 after my library has been installed. 00:05:12.720 --> 00:05:15.360 I have an Org file open, 00:05:15.360 --> 00:05:17.120 so I'm in Org mode. 05:17.120 --> 00:05:20.880 We have the Org mode submenu, 05:20.880 --> 00:05:23.919 and under it, all my snippets 00:05:23.919 --> 00:05:26.880 are located in these sub-sub-menus 05:26.880 --> 00:05:30.880 that are prepended with pymolpy. 00:05:30.880 --> 00:05:33.840 Under the molecular representations menu, 00:05:33.840 --> 00:05:36.479 there is a listing of snippets. 00:05:36.479 --> 00:05:38.563 The top one is for the ambient occlusion effect, 00:05:38.563 --> 00:05:39.840 which we're going to apply 00:05:39.840 --> 00:05:41.039 in this Org file. 00:05:41.039 --> 00:05:44.240 So these lines of code were inserted after, 00:05:44.240 --> 00:05:48.479 as well as these flanking lines 05:48.479 --> 00:05:50.240 that define the source block, 00:05:50.240 --> 00:05:53.280 were inserted by clicking on that line. 05:53.280 --> 00:05:55.120 Then I've added some additional code. 00:05:55.120 --> 00:05:56.880 So, the first line defines 00:05:56.880 --> 00:05:59.039 the language that we're using. 00:05:59.039 --> 00:05:59.768 We're going to use 00:05:59.768 --> 00:06:02.639 the jupyter-python language. 06:02.639 --> 00:06:04.560 Then you can define the session, 00:06:04.560 --> 00:06:06.400 and the name of this is arbitrary. 00:06:06.400 --> 00:06:09.680 Then the kernel is our means 00:06:09.680 --> 00:06:11.360 by which we gain access 00:06:11.360 --> 00:06:14.880 to the Python API of PyMOL. 06:14.880 --> 00:06:17.039 The remaining settings apply to the output. 00:06:17.039 --> 00:06:18.319 To execute this code 00:06:18.319 --> 00:06:21.199 and to get the resulting image, 00:06:21.199 --> 00:06:25.120 you put the cursor inside this code block, 00:06:25.120 --> 00:06:26.560 or on the top line, 00:06:26.560 --> 00:06:29.840 and enter Control c Control c (C-c C-c). 06:29.840 --> 00:06:32.240 This shows the resulting image 00:06:32.240 --> 00:06:33.600 has been loaded up. 00:06:33.600 --> 00:06:37.280 It takes about 10 seconds for this to appear. 06:37.280 --> 00:06:38.479 So the downside of this is 00:06:38.479 --> 00:06:40.729 if you have a large number of these, 00:06:40.729 --> 00:06:43.919 the Org file can lag quite a bit 00:06:43.919 --> 00:06:45.120 when you try to scroll through it, 00:06:45.120 --> 00:06:48.319 so you need to close up these result drawers, 00:06:48.319 --> 00:06:50.960 and only open up the ones 00:06:50.960 --> 00:06:53.199 that you're currently examining. 00:06:53.199 --> 00:06:54.319 These are features I think 06:54.319 --> 06:56.240 are important in practical work. 06:56.240 --> 00:06:59.840 So, the plus is, a feature that's present, 00:06:59.840 --> 00:07:01.120 minus is absent. 00:07:01.120 --> 00:07:03.199 I think tab stops and tab triggers 00:07:03.199 --> 00:07:04.800 are really important. 07:04.800 --> 00:07:05.680 Triggers are important for 00:07:05.680 --> 00:07:06.720 the fast assertion code, 00:07:06.720 --> 00:07:08.639 tab stops are important for 07:08.639 --> 00:07:10.560 complete, accurate editing of code. 00:07:10.560 --> 00:07:12.735 I already addressed the rendering speed 00:07:12.735 --> 00:07:14.560 and scrolling issue. 00:07:14.560 --> 00:07:15.759 I think the way around this 00:07:15.759 --> 00:07:19.199 is just to export the Org document to a PDF file 00:07:19.199 --> 00:07:23.360 and do your evaluation of different images 00:07:23.360 --> 00:07:25.199 by examining them in the PDF 00:07:25.199 --> 00:07:26.560 rather than the Org file. 00:07:26.560 --> 00:07:30.400 The path to PDF is lightning fast in Emacs 00:07:30.400 --> 00:07:32.240 compared to Jupyter, 00:07:32.240 --> 00:07:35.280 where it's cumbersome in comparison. 00:07:35.280 --> 00:07:38.400 This is a snapshot of my initialization file. 00:07:38.400 --> 00:07:41.840 These parts are relevant to doing this work. 00:07:41.840 --> 00:07:43.039 A full description of them 00:07:43.039 --> 00:07:46.319 can be found in the README file 07:46.319 --> 00:07:48.639 of this repository on GitHub. 00:07:48.639 --> 00:07:49.456 I'd like to thank the 00:07:49.456 --> 00:07:51.840 Nathan Shock Data Science Workshop 00:07:51.840 --> 00:07:54.319 for feedback during presentations 00:07:54.319 --> 00:07:56.160 I've made about this work. 00:07:56.160 --> 00:07:57.628 And I would also like to thank 00:07:57.628 --> 00:08:00.240 the following funding sources for support. 00:08:00.240 --> 00:08:03.879 I will now take questions. Thank you. 00:08:03.879 --> 00:08:03.986 [captions by Blaine Mooers and Bhavin Gandhi]