diff options
author | EmacsConf <emacsconf-org@gnu.org> | 2023-12-02 13:50:17 -0500 |
---|---|---|
committer | EmacsConf <emacsconf-org@gnu.org> | 2023-12-02 13:50:17 -0500 |
commit | 5e5b46d8db74f12f8639c684082d4691eaffc030 (patch) | |
tree | 38d7e01d67ddc3390a43d0dd99d67ee8fa998802 /2023/captions | |
parent | 8687f40e1d5af25092655bf0afacb5bdc1f247b9 (diff) | |
download | emacsconf-wiki-5e5b46d8db74f12f8639c684082d4691eaffc030.tar.xz emacsconf-wiki-5e5b46d8db74f12f8639c684082d4691eaffc030.zip |
Automated commit
Diffstat (limited to '')
2 files changed, 1199 insertions, 0 deletions
diff --git a/2023/captions/emacsconf-2023-collab--collaborative-data-processing-and-documenting-using-orgbabel--jonathan-hartman-lukas-c-bossert--main--chapters.vtt b/2023/captions/emacsconf-2023-collab--collaborative-data-processing-and-documenting-using-orgbabel--jonathan-hartman-lukas-c-bossert--main--chapters.vtt new file mode 100644 index 00000000..dca4982e --- /dev/null +++ b/2023/captions/emacsconf-2023-collab--collaborative-data-processing-and-documenting-using-orgbabel--jonathan-hartman-lukas-c-bossert--main--chapters.vtt @@ -0,0 +1,23 @@ +WEBVTT + + +00:00:00.000 --> 00:01:16.079 +Introduction + +00:01:16.080 --> 00:02:18.959 +Org Mode + +00:02:18.960 --> 00:06:27.839 +Working together + +00:06:27.840 --> 00:08:04.039 +Data cleaning + +00:08:04.040 --> 00:12:36.039 +Processing + +00:12:36.040 --> 00:14:01.759 +Visualization + +00:14:01.760 --> 00:19:07.280 +Preserve diff --git a/2023/captions/emacsconf-2023-collab--collaborative-data-processing-and-documenting-using-orgbabel--jonathan-hartman-lukas-c-bossert--main.vtt b/2023/captions/emacsconf-2023-collab--collaborative-data-processing-and-documenting-using-orgbabel--jonathan-hartman-lukas-c-bossert--main.vtt new file mode 100644 index 00000000..1dcc0b22 --- /dev/null +++ b/2023/captions/emacsconf-2023-collab--collaborative-data-processing-and-documenting-using-orgbabel--jonathan-hartman-lukas-c-bossert--main.vtt @@ -0,0 +1,1176 @@ +WEBVTT captioned by amine, checked by sachac + +NOTE Introduction + +00:00.000 --> 00:00:01.874 +[Lukas]: Welcome to our presentation, + +00:00:01.875 --> 00:00:03.599 +Collaborative Data Processing + +00:03.600 --> 00:06.039 +and Documenting using org-babel. + +00:06.040 --> 00:07.759 +My name is Lukas Bossert, and I'm + +00:07.760 --> 00:00:09.740 +from the RWTH Aachen University + +00:00:09.741 --> 00:00:12.519 +in the city of Aachen, Germany. + +00:12.520 --> 00:14.839 +[Jonathan]: And my name is Jonathan Hartmann. + +00:14.840 --> 00:18.719 +I'm also from the IT Center here at RWTH Aachen. + +00:18.720 --> 00:19.239 +[Lukas]: Great. + +00:19.240 --> 00:21.679 +And we will show you today how you + +00:21.680 --> 00:25.399 +can use Org Mode for data processing. + +00:25.400 --> 00:27.999 +So you see a little workflow what we are going to do. + +00:28.000 --> 00:31.199 +First, we will give you a slight introduction to Org Mode. + +00:31.200 --> 00:34.639 +Then we will dive into the part of data preparing. + +00:34.640 --> 00:38.679 +First, you're going to query the data using the language SPARQL. + +00:38.680 --> 00:41.759 +Then we're going to clean it using a different language. + +00:41.760 --> 00:44.279 +And in the main part of our presentation, + +00:44.280 --> 00:48.119 +we're going to do the data processing, first aggregating + +00:48.120 --> 00:52.519 +using Python, later on counting items using Org, + +00:52.520 --> 00:56.360 +and even visualizing it using R. At the end, + +00:56.400 --> 00:58.959 +we're going to show you how to preserve + +00:58.960 --> 01:01.759 +the data and the document and its documentation, + +01:01.760 --> 01:06.599 +first doing in plain exporting, then adding some metadata, + +01:06.600 --> 01:09.759 +and showing you two different ways, first a manual export, + +01:09.760 --> 01:13.359 +and also then a batch-processed export. + +01:13.360 --> 01:14.239 +All right. + +01:14.240 --> 01:16.079 +Let's dive in to that. + +NOTE Org Mode + +01:16.080 --> 01:19.919 +Jonathan, can you give us an introduction about Org Mode? + +01:19.920 --> 01:20.439 +[Jonathan]: Of course. + +01:20.440 --> 01:23.079 +So in case anyone isn't familiar with it, + +01:23.080 --> 01:25.879 +Org Mode, in the words of Carsten Dominik, + +01:25.880 --> 01:28.559 +is back to the future for plain text. + +01:28.560 --> 01:31.439 +So this is just a module available for Emacs, + +01:31.440 --> 01:32.519 +plain-text base. + +01:32.520 --> 01:34.919 +It's been around since 2003, which + +01:34.920 --> 01:36.799 +makes it about 20 years old. + +01:36.800 --> 01:40.159 +And it's extensible and fully customizable. + +01:40.160 --> 01:43.999 +And especially, it's very convenient, very good + +01:44.000 --> 01:46.719 +for scientific text production and organization. + +01:46.720 --> 01:49.439 +So for example, you can do project management, agenda, + +01:49.440 --> 01:52.559 +diary, journaling, personal knowledge management, + +01:52.560 --> 01:53.359 +presentation. + +01:53.360 --> 01:55.520 +Even this is written in Org Mode. + +01:55.560 --> 01:57.439 +It's an Org Mode presentation. + +01:57.440 --> 01:59.199 +You can do single source publishing, + +01:59.200 --> 02:01.679 +which we will do later on, and also + +02:01.680 --> 02:06.479 +literate programming, which is the core of our talk. + +02:06.480 --> 02:06.999 +OK. + +02:07.000 --> 02:10.799 +[Lukas]: So let me stop this presentation here. + +02:10.800 --> 02:14.719 +So what you see here is the plain text underneath it. + +02:14.720 --> 02:18.959 +So this is Org Mode. + +NOTE Working together + +02:18.960 --> 02:21.919 +And Jonathan, since we kind of already + +02:21.920 --> 02:25.320 +did the introduction together, should we + +02:26.120 --> 00:02:28.760 +also do the working part together? + +00:02:28.761 --> 00:02:29.700 +[Jonathan]: Of course. + +00:02:29.701 --> 00:02:33.119 +So you see on the screen there on the right, + +00:02:33.120 --> 00:02:35.060 +that's my screen in Emacs. + +00:02:35.061 --> 00:02:39.520 +And Lukas, why don't you host a session using CRDT, + +00:02:39.521 --> 00:02:41.200 +and I'll connect to your buffer. + +00:02:41.201 --> 00:02:42.560 +[Lukas]: OK. Great. + +00:02:42.561 --> 00:02:43.280 +I do that. + +00:02:43.281 --> 00:02:46.180 +So what I do, I'm using Doom Emacs. + +00:02:46.181 --> 00:02:49.307 +And I can use the `SPC` and then the `l` + +00:02:49.308 --> 00:02:52.140 +for the live share/collab part. + +00:02:52.141 --> 02:57.999 +I can use the `s` for share current buffer. + +02:58.000 --> 00:03:01.559 +So when I do this, I'm getting asked for some settings. + +00:03:01.560 --> 00:03:04.439 +I'm going with the default settings here. + +00:03:04.440 --> 00:03:08.340 +So default port, no password, and my display name. + +00:03:08.341 --> 00:03:11.940 +And now Emacs is connecting. + +00:03:11.941 --> 00:03:15.179 +And once it's connected, which just takes a couple of seconds, + +00:03:15.180 --> 00:03:17.239 +I can get the URL. + +00:03:17.240 --> 03:20.800 +So I'm going back to this menu and using `y` + +03:21.160 --> 03:23.999 +for copying the URL of the current session. + +03:24.000 --> 03:27.799 +And this is the URL I'm going to send over to you, Jonathan, + +03:27.800 --> 03:29.079 +to pick that up. + +03:29.080 --> 03:29.599 +[Jonathan]: Right. + +03:29.600 --> 03:30.079 +OK. + +03:30.080 --> 00:03:36.999 +And now on my screen, I'm going to do a `SPC l c` for connect. + +00:03:37.000 --> 00:03:38.740 +And I'm going to paste the URL + +00:03:38.741 --> 00:03:40.040 +that Lukas just sent me in here. + +00:03:40.980 --> 03:43.719 +Default port, no password. + +03:43.720 --> 00:03:45.440 +And we're connecting now. + +00:03:45.700 --> 03:48.600 +So this takes a second just to get us synced up. + +03:51.600 --> 00:03:54.160 +So we can work on the same document at the same time. + +00:03:54.161 --> 03:56.639 +We can follow each other's cursors around. + +03:56.640 --> 03:58.839 +We can have multiple buffers open and work on them + +03:58.840 --> 04:00.999 +at the same time. + +04:01.000 --> 04:04.719 +And so here you see that we are both in the same document. + +04:04.720 --> 04:06.280 +You can see my cursor popping around. + +04:09.040 --> 04:13.279 +And you can see we're both editing the same item. + +04:13.280 --> 04:14.039 +Great. + +04:14.040 --> 04:18.039 +[Lukas]: So we also see who else is currently in our buffer + +04:18.040 --> 04:20.199 +with the user overview. + +04:20.200 --> 04:23.559 +So let me just delete that window. + +04:23.560 --> 04:26.079 +And that's going to work in our main one. + +04:26.080 --> 04:29.599 +So we said first part is about data retrieval. + +04:29.600 --> 04:32.720 +So we should give it a headline. + +04:37.080 --> 04:39.239 +We said prepare stage. + +04:39.240 --> 04:42.319 +So what are we going to do first, Jonathan? + +04:42.320 --> 00:04:43.940 +[Jonathan]: So what we're going to do, + +00:04:43.941 --> 00:04:45.399 +what this whole document is based upon, + +04:45.400 --> 04:50.119 +is we're going to pull data from Wikidata using a SPARQL query. + +04:50.120 --> 04:53.519 +The data we're going to pull is related to the NFDIs, + +04:53.520 --> 04:55.639 +which here in Germany is the National Forschungsdaten + +04:55.640 --> 05:00.679 +Infrastructure, which is a sort of collection of universities + +05:00.680 --> 05:03.399 +that work together on various research projects. + +05:03.400 --> 05:05.599 +And this is emblematic of the kind of data + +05:05.600 --> 05:09.239 +that we would be interested in working with here. + +05:09.240 --> 05:13.359 +So I'm going to paste a--forgive the pre-written code-- + +05:13.360 --> 05:19.840 +I'm going to paste some text in here. + +05:20.040 --> 00:05:21.407 +[Lukas]: And while you are talking, I just + +00:05:21.408 --> 00:05:23.359 +keep on documenting what we do + +00:05:23.360 --> 00:05:25.880 +so we can split the work. + +05:27.360 --> 05:29.679 +[Jonathan]: In here, after a minor technical upset, + +05:29.680 --> 05:32.559 +is the raw dataset cell. + +05:32.560 --> 00:05:34.740 +And it's going to use SPARQL, + +00:05:34.741 --> 00:05:37.174 +which is how we have the syntax highlighting + +00:05:37.175 --> 00:05:37.940 +in our code here. + +00:05:37.941 --> 05:40.639 +It's going to go to the URL endpoint + +05:40.640 --> 05:43.639 +query.wikidata.org/sparql , + +05:43.640 --> 05:46.799 +and it's going to return the data as a text CSV, + +05:46.800 --> 05:49.279 +and it's going to cache that data + +05:49.280 --> 05:51.439 +so that we don't constantly hammer the API every time + +05:51.440 --> 05:54.239 +we run this notebook. + +05:54.240 --> 00:05:57.360 +So I'm going to run that there. + +00:05:57.361 --> 05:58.799 +You can see down at the bottom of my screen, + +05:58.800 --> 06:00.840 +we're contacting the host query.wikidata.org . + +06:05.720 --> 06:07.319 +[Lukas]: And there's the result. + +06:07.320 --> 06:11.799 +[Jonathan]: Yeah, except I think that for our purposes here, + +06:11.800 --> 06:15.279 +we're just going to limit this to 50 results. + +06:15.280 --> 06:16.279 +[Lukas]: Oh, yeah. + +06:16.280 --> 06:18.679 +[Jonathan]: Just so it's a little easier for us to manage. + +06:18.680 --> 06:20.719 +I'm going to run that again. + +06:20.720 --> 06:21.519 +There we go. + +06:21.520 --> 00:06:22.319 +That looks a little better. + +00:06:22.320 --> 00:06:23.159 +[Lukas]: I think that's fine. + +00:06:23.160 --> 00:06:25.359 +50 items is fine. + +00:06:25.360 --> 06:27.839 +So what do we see here, Jonathan? + +NOTE Data cleaning + +06:27.840 --> 06:28.319 +[Jonathan]: Right. + +06:28.320 --> 06:31.239 +So the first thing we see when we look at this + +06:31.240 --> 00:06:33.307 +is a couple of Q codes at the top, + +00:06:33.308 --> 00:06:36.079 +which are an artifact of Wikidata. + +06:36.080 --> 06:39.519 +So these are pages which don't have + +06:39.520 --> 06:42.519 +the label for whichever institution they happen to be. + +06:42.520 --> 06:45.919 +For our purposes here, we're just going to exclude them. + +06:45.920 --> 06:48.199 +We could just go on Wikidata and edit them ourselves. + +06:48.200 --> 06:50.399 +But for now, it's a little more interesting + +06:50.400 --> 06:52.519 +if we go and remove them. + +06:52.520 --> 06:55.159 +So I'm going to create a new cell. + +06:55.160 --> 06:58.279 +Lukas, if you don't mind starting one for data cleaning. + +06:58.280 --> 06:58.879 +[Lukas]: Oh, yeah. + +06:58.880 --> 06:59.479 +Good point. + +06:59.480 --> 07:02.039 +Yeah, data cleaning. + +07:02.040 --> 07:03.439 +OK. + +07:03.440 --> 00:07:05.499 +How do you want to do that, Jonathan? + +00:07:05.500 --> 07:09.759 +[Jonathan]: I'm going to use a shell command. + +07:09.760 --> 07:11.119 +So let's see. + +07:11.120 --> 07:12.999 +There we go. + +07:13.000 --> 07:15.159 +And so you can see, here is another cell, + +07:15.160 --> 07:20.039 +that the cell is now using a shell, + +07:20.040 --> 00:07:23.799 +and that we have this thing `:var input=raw-dataset`, + +00:07:23.800 --> 00:07:25.840 +which is the name of the cell above + +00:07:25.841 --> 00:07:28.439 +where we got our data from Wikidata. + +07:28.440 --> 07:31.679 +This is going to run just a simple shell command. + +07:31.680 --> 07:33.959 +It's going to take the input and then run `sed` on it + +07:33.960 --> 00:07:37.039 +and exclude any records which have a Q + +00:07:37.040 --> 00:07:41.279 +followed by one or more digits afterwards. + +07:41.280 --> 07:43.960 +That should remove those from our data set. + +07:44.000 --> 07:45.400 +So I'm going to run that. + +07:48.640 --> 07:51.039 +That seems to have done the trick. + +07:51.040 --> 07:51.879 +[Lukas]: Great, yeah. + +07:51.880 --> 07:52.919 +That's really good. + +07:52.920 --> 07:55.399 +We got rid of all the Q items. + +07:55.400 --> 07:55.919 +Very good. + +07:55.920 --> 07:59.959 +So we just have two-column table: institutions + +07:59.960 --> 08:02.759 +and consortia. + +08:02.760 --> 08:04.039 +Very nice. + +NOTE Processing + +08:04.040 --> 08:08.719 +So let's come to our main part, doing some processing. + +08:08.720 --> 08:13.560 +Let me give you a headline here, process the data. + +08:13.640 --> 08:15.519 +What do you want to do first? + +08:15.520 --> 08:17.599 +[Jonathan]: This is not a very complicated data set, + +08:17.600 --> 08:19.439 +but let's just do some simple counts first. + +08:19.440 --> 08:22.199 +I'm going to start with Python, + +08:22.200 --> 08:25.239 +and we're just going to do some aggregation with Python. + +08:25.240 --> 08:30.039 +Again, I've got some pre-written code here. + +08:30.040 --> 08:34.999 +You can see that we've started a cell using Python. + +08:35.000 --> 08:37.879 +The variable `clean_df` now is equal to `clean-dataset`. + +08:37.880 --> 00:08:39.707 +So we're going to take that data + +00:08:39.708 --> 00:08:41.039 +that we retrieved from the SPARQL query, + +08:41.040 --> 08:42.680 +we're going to run it through the cleaning cell, + +08:42.720 --> 08:45.239 +and then we're going to import it into this cell. + +08:45.240 --> 08:47.839 +This is just going to do some simple Python aggregation. + +08:47.840 --> 00:08:49.007 +We're going to import `pandas`, + +00:08:49.008 --> 00:08:51.307 +which is the Python data science library, + +00:08:51.308 --> 00:08:54.839 +create a data frame out of our input, + +08:54.840 --> 08:57.479 +and then aggregate it, grouping on `wLabel`, + +08:57.480 --> 08:59.959 +and getting a count from that and returning it. + +08:59.960 --> 09:01.640 +So if we execute that cell... + +09:05.040 --> 09:08.879 +[Lukas]: Nice, we get institutions and a count. + +09:08.880 --> 09:14.119 +But what about not ordering it by the alphabet, + +09:14.120 --> 09:17.079 +but more like ordering by counts? + +09:17.080 --> 09:18.439 +[Jonathan]: Sure. + +09:18.440 --> 09:22.839 +So let's do this... `sort_values()`, I think, as the Python. + +09:22.840 --> 09:24.919 +How does that look? + +09:24.920 --> 00:09:27.640 +[Lukas]: Better, but I would like to + +00:09:27.641 --> 00:09:29.239 +have the highest number first + +09:29.240 --> 09:32.239 +and then ascending. + +09:32.240 --> 09:34.719 +Well, not ascending, descending. + +09:34.720 --> 09:37.600 +[Jonathan]: Right, so we can do `ascending=False`. + +09:39.880 --> 09:42.559 +[Lukas]: This is perfect, I'd say. + +09:42.560 --> 09:43.079 +[Jonathan]: Great. + +09:43.080 --> 09:44.079 +[Lukas]: Very good. + +09:44.080 --> 00:09:46.799 +OK, that's nice. + +00:09:46.800 --> 09:47.999 +We get a good overview here. + +09:48.000 --> 09:50.079 +But can we also do something else, + +09:50.080 --> 09:56.079 +like counting how many institutions are + +09:56.080 --> 09:57.799 +involved in one consortium? + +09:57.800 --> 10:00.879 +And also using this later on in the text? + +10:00.880 --> 00:10:00.880 +[Jonathan]: Sure, so I'm going to put a new... + +00:10:00.881 --> 00:10:05.040 +If you give me another heading down here + +00:10:05.041 --> 00:10:08.320 +for institutions per consortium... + +10:12.080 --> 10:16.799 +And here we're going to use awk code just to spice things up + +10:16.800 --> 10:18.959 +and add yet another language in here. + +10:18.960 --> 10:22.439 +So you can see this is awk. + +10:22.440 --> 10:26.279 +We're using standard in instead of defining a variable. + +10:26.280 --> 10:28.359 +But the really interesting thing about this cell + +10:28.360 --> 00:10:33.399 +is that we have this `:var consortium="NFDI4Memory"`. + +10:33.400 --> 00:10:35.640 +And what this code is doing is + +00:10:35.641 --> 00:10:38.040 +it's counting any time it sees + +00:10:38.041 --> 00:10:40.279 +that particular consortium name + +10:40.280 --> 10:41.759 +and keeping track of that. + +10:41.760 --> 00:10:43.907 +So if we execute this, + +00:10:43.908 --> 00:10:45.919 +Lukas, why don't you execute this one? + +10:45.920 --> 10:49.399 +[Lukas]: OK, I'm going to enter it. + +10:49.400 --> 10:52.439 +And I get a result, NFDI4Memory, + +10:52.440 --> 10:58.239 +because this is our default value for this variable. + +10:58.240 --> 10:59.439 +And we get the count. + +10:59.440 --> 00:11:01.640 +So it's five institutions are involved + +00:11:01.641 --> 00:11:04.639 +in the NFDI4memory consortium. + +11:04.640 --> 11:07.839 +Great, but the very nice thing, what I think, + +11:07.840 --> 11:12.519 +is here that we can use this code snippet within our text. + +11:12.520 --> 11:14.279 +So, blended in seamlessly. + +11:14.280 --> 11:16.199 +Let me give you an example. + +11:16.200 --> 11:18.919 +I'm writing out the text. + +11:18.920 --> 11:27.599 +Now we know how many institutions are in... + +11:27.600 --> 11:29.239 +Give me an example. + +11:29.240 --> 11:31.480 +I would like to know how many institutions are + +11:31.560 --> 11:35.079 +involved in NFDI4Objects, which is a consortium. + +11:35.080 --> 11:39.239 +So I'm writing `call_` and using + +11:39.240 --> 00:11:42.607 +the name of this snippet here, of this cell, + +00:11:42.608 --> 00:11:46.607 +which is `inst-count(`, + +00:11:46.608 --> 00:11:51.719 +and writing my value, `NFDI4Objects`. + +11:51.720 --> 11:57.999 +As soon as I evaluate this using `C-c C-c`, + +11:58.000 --> 12:00.279 +I get the result back here. + +12:00.280 --> 12:05.159 +I can do this even for more. + +12:05.160 --> 12:14.039 +Or in writing, `call_inst-count`, go with `NFDI4Earth`, + +12:14.040 --> 12:16.799 +which is another consortium. + +12:16.800 --> 12:20.559 +`C-c C-c`, it's three institutions. + +12:20.560 --> 12:23.439 +This can be used throughout your text, + +12:23.440 --> 12:26.639 +and as soon as the data set changes from in the beginning, + +12:26.640 --> 12:30.399 +maybe different results requiring Wikidata, + +12:30.400 --> 12:35.079 +this also will be updated once it's exported. + +12:35.080 --> 12:36.039 +Very nice, Jonathan. + +NOTE Visualization + +12:36.040 --> 00:12:38.974 +But I think we did a lot of analysis + +00:12:38.975 --> 00:12:41.079 +on text and counting things. + +12:41.080 --> 12:43.679 +Can we also do something more visual? + +12:43.680 --> 12:45.199 +Show me something. + +12:45.200 --> 12:45.759 +[Jonathan]: Sure. + +12:45.760 --> 12:48.639 +So what we can do with this, because we just + +12:48.640 --> 12:51.399 +have two columns here that are sort of related, + +12:51.400 --> 12:53.759 +we can build a little network plot out of it. + +12:53.760 --> 12:56.999 +So let's make a network visualization. + +12:57.000 --> 12:59.599 +We're going to use the `igraph` library from R + +12:59.600 --> 13:02.559 +and just plot the edges that we see here. + +13:02.560 --> 13:04.239 +There we go. + +13:04.240 --> 13:11.879 +There's my little heading and space. + +13:11.880 --> 13:13.479 +Here is our code. + +13:13.480 --> 13:16.039 +Again, just to be fancy and keep using + +13:16.040 --> 13:19.719 +different languages in here, we set a variable called + +13:19.720 --> 13:21.560 +`NFDI_edges` equal to `clean-dataset`. + +13:21.600 --> 13:23.399 +So this, again, is sort of cascading + +13:23.400 --> 00:13:25.740 +through the original data + +00:13:25.741 --> 00:13:28.807 +that we pulled from the Wikidata endpoint, + +00:13:28.808 --> 00:13:30.959 +cleaning that data, and now it's being inserted + +13:30.960 --> 13:32.959 +into this cell as well. + +13:32.960 --> 13:34.239 +But you see the difference here. + +13:34.240 --> 13:36.839 +Instead of exporting a table, what we're saying + +13:36.840 --> 13:39.239 +is that there will be a graphics file, + +13:39.240 --> 13:44.639 +and it will be called network-plot.png. + +13:44.640 --> 13:45.119 +All right. + +13:45.120 --> 13:47.959 +And so Lukas, why don't you execute this one? + +13:47.960 --> 13:48.759 +[Lukas]: There you go. + +13:48.760 --> 13:52.919 +I can click `C-c C-c` + +13:52.920 --> 13:59.159 +and I get a nice plot of the network below our cell. + +13:59.160 --> 14:01.759 +So this is very nice indeed. + +NOTE Preserve + +14:01.760 --> 14:05.199 +So I think it's about time to wrap it up and to export + +14:05.200 --> 14:07.959 +and to preserve the data and the documentation + +14:07.960 --> 14:13.079 +that we have in our very last step, calling preserve. + +14:13.080 --> 14:16.239 +So I would like to do it in two steps. + +14:16.240 --> 14:18.600 +First, maybe manually exporting it, + +14:18.800 --> 14:22.239 +but then also doing it in a batch process. + +14:22.240 --> 14:27.119 +Giving you some insights how to do that manual export. + +14:27.120 --> 14:30.559 +For example, you can do a LaTeX export. + +14:30.560 --> 14:34.279 +Let me write down the key combination to do that here. + +14:34.280 --> 14:44.560 +So you press `SPC m e l o`. + +14:44.600 --> 14:49.159 +Let me show you how this is done. + +14:49.160 --> 14:51.439 +So I'm pressing `SPC`. + +14:51.440 --> 14:55.679 +I'm pressing `m`, which is my local leader. + +14:55.680 --> 15:01.279 +I'm pressing `e`, which is now the `org-export-dispatch`. + +15:01.280 --> 15:03.519 +And now I have different options I can choose from. + +15:03.520 --> 15:07.119 +I want to do a LaTeX export because I want to get in PDF. + +15:07.120 --> 00:15:08.674 +So I'm pressing `l`. + +00:15:08.675 --> 00:15:11.479 +Now I've got different options available. + +15:11.480 --> 15:17.399 +So I'm pressing `o` for a PDF file and open that. + +15:17.400 --> 15:21.119 +Let's see now the code. + +15:21.120 --> 15:25.639 +Now this is exporting document. + +15:25.640 --> 00:15:29.674 +And what we have here is PDF, + +00:15:29.675 --> 00:15:31.974 +which contains our workflow in the beginning, + +00:15:31.975 --> 00:15:35.707 +our bullet points we have here, + +00:15:35.708 --> 00:15:37.919 +and also the code snippet + +15:37.920 --> 15:41.120 +that we use for querying the data. + +15:41.280 --> 15:43.599 +And we have the result below that. + +15:43.600 --> 15:46.999 +So this is our table with all the data sets. + +15:47.000 --> 15:51.879 +But as you can see, this is running out of the page. + +15:51.880 --> 15:55.679 +So this is not very nice using the default settings. + +15:55.680 --> 16:00.239 +But everything is in this PDF. + +16:00.240 --> 16:02.759 +I guess we can now show you a way + +16:02.760 --> 16:06.519 +how to improve this result. + +16:06.520 --> 16:07.039 +[Jonathan]: Right. + +16:07.040 --> 16:09.399 +So we have, of course, a version of this + +16:09.400 --> 00:16:10.774 +that we prepared ahead of time, + +00:16:10.775 --> 00:16:14.279 +which is more or less identical to the one we just made, + +16:14.280 --> 16:17.839 +but it has a little more text, a little more explanation, + +16:17.840 --> 16:20.559 +a little more documentation along with the code. + +16:20.560 --> 16:23.879 +You can see we have some metadata up at the top, + +16:23.880 --> 16:26.879 +the title, the authors, a bibliography, + +16:26.880 --> 16:31.679 +and most importantly, the `custom-export.setup` file, + +16:31.680 --> 16:36.879 +which lists specifically the sort of LaTeX commands + +16:36.880 --> 16:43.599 +that we're using and the HTML styles that we're going to use. + +16:43.600 --> 16:45.919 +And then down at the bottom of this file, + +16:45.920 --> 16:49.119 +we have our automatic batch process. + +16:49.120 --> 16:51.719 +Here is one more language we're including in here. + +16:51.720 --> 16:53.439 +So this is Lisp. + +16:53.440 --> 16:57.359 +And you can see here we are exporting to HTML, ASCII, + +16:57.360 --> 16:58.079 +and PDF. + +16:58.080 --> 17:01.359 +The nice thing about this is that this is a document. + +17:01.360 --> 00:17:03.307 +It's a sort of document that we have a couple of + +00:17:03.308 --> 00:17:08.639 +that we can have running automatically and building. + +17:08.640 --> 17:12.919 +It will export a HTML, an ASCII file, and a PDF file + +17:12.920 --> 00:17:14.674 +every time it's run based off of + +00:17:14.675 --> 00:17:17.319 +the most recent data available on Wikidata. + +17:17.320 --> 17:19.719 +So it's self-documenting. + +17:19.720 --> 00:17:22.440 +We have, of course, our data retrieval steps, + +00:17:22.441 --> 00:17:25.159 +our data cleaning steps, our data preparation steps, + +17:25.160 --> 17:28.359 +and our preservation steps all listed at the same time. + +17:28.360 --> 17:30.239 +And then you can see over on the right, + +17:30.240 --> 17:34.320 +there's an example of the HTML file that we get out of this. + +17:34.360 --> 17:37.639 +We also get a very nicely formatted PDF file, + +17:37.640 --> 17:39.239 +which doesn't have that little issue + +17:39.240 --> 17:41.719 +with the overflow of the table. + +17:41.720 --> 17:43.559 +It's very nicely put together. + +17:43.560 --> 17:46.199 +And we even have an ASCII file. + +17:46.200 --> 17:47.879 +And I should also point out very quickly, + +17:47.880 --> 17:51.799 +while you have this one up, Lukas, after the awk code, + +17:51.800 --> 17:56.079 +you can see the text for the number of consortia, + +17:56.080 --> 17:57.839 +or the number of institutions per consortia + +17:57.840 --> 18:00.519 +is actually printed inline. + +18:00.520 --> 18:01.799 +[Lukas]: Yeah, you're very right. + +18:01.800 --> 18:06.119 +So this is what we had as code, + +18:06.120 --> 18:10.719 +and now this is nicely integrated into our text. + +18:10.720 --> 18:15.279 +So we got the consortium and number of institutions. + +18:15.280 --> 18:19.199 +You can't tell a difference between code and text. + +18:19.200 --> 18:20.719 +[Jonathan]: And those are automatically updated. + +18:20.720 --> 18:23.879 +So if another institution joins NFDI4Earth, + +18:23.880 --> 18:26.319 +then the next time this runs, we update the text right here. + +18:26.320 --> 18:28.519 +It's nothing we have to worry about. + +18:28.520 --> 18:30.400 +We just pull it directly out of Wikidata. + +18:31.840 --> 18:34.679 +[Lukas]: And for the sake of completeness, + +18:34.680 --> 18:37.879 +this is the ASCII file. + +18:37.880 --> 18:39.320 +That's in the export format. + +18:42.760 --> 18:46.440 +It contains also everything, code and data. + +18:48.360 --> 18:51.680 +Yeah, so this is what we wanted to show you, + +18:53.240 --> 18:56.639 +how to do some data processing, + +18:56.640 --> 18:58.679 +some collaborative work, + +18:58.680 --> 19:01.119 +documenting using org-babel. + +19:01.120 --> 19:03.960 +Thanks for listening. + +19:05.720 --> 19:07.280 +[Jonathan]: Thank you all, have a good day. |