Transcript

[[!template new="1" text="""Introduction""" start="00:00:00.000" video="mainVideo-collab" id="subtitle"]]
[[!template text="""[Lukas]: Welcome to our presentation,""" start="00:00:00.000" video="mainVideo-collab" id="subtitle"]] [[!template text="""Collaborative Data Processing""" start="00:00:01.875" video="mainVideo-collab" id="subtitle"]] [[!template text="""and Documenting using org-babel.""" start="00:00:03.600" video="mainVideo-collab" id="subtitle"]] [[!template text="""My name is Lukas Bossert, and I'm""" start="00:00:06.040" video="mainVideo-collab" id="subtitle"]] [[!template text="""from the RWTH Aachen University""" start="00:00:07.760" video="mainVideo-collab" id="subtitle"]] [[!template text="""in the city of Aachen, Germany.""" start="00:00:09.741" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Jonathan]: And my name is Jonathan Hartmann.""" start="00:00:12.520" video="mainVideo-collab" id="subtitle"]] [[!template text="""I'm also from the IT Center here at RWTH Aachen.""" start="00:00:14.840" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Lukas]: Great.""" start="00:00:18.720" video="mainVideo-collab" id="subtitle"]] [[!template text="""And we will show you today how you""" start="00:00:19.240" video="mainVideo-collab" id="subtitle"]] [[!template text="""can use Org Mode for data processing.""" start="00:00:21.680" video="mainVideo-collab" id="subtitle"]] [[!template text="""So you see a little workflow what we are going to do.""" start="00:00:25.400" video="mainVideo-collab" id="subtitle"]] [[!template text="""First, we will give you a slight introduction to Org Mode.""" start="00:00:28.000" video="mainVideo-collab" id="subtitle"]] [[!template text="""Then we will dive into the part of data preparing.""" start="00:00:31.200" video="mainVideo-collab" id="subtitle"]] [[!template text="""First, you're going to query the data using the language SPARQL.""" start="00:00:34.640" video="mainVideo-collab" id="subtitle"]] [[!template text="""Then we're going to clean it using a different language.""" start="00:00:38.680" video="mainVideo-collab" id="subtitle"]] [[!template text="""And in the main part of our presentation,""" start="00:00:41.760" video="mainVideo-collab" id="subtitle"]] [[!template text="""we're going to do the data processing, first aggregating""" start="00:00:44.280" video="mainVideo-collab" id="subtitle"]] [[!template text="""using Python, later on counting items using Org,""" start="00:00:48.120" video="mainVideo-collab" id="subtitle"]] [[!template text="""and even visualizing it using R. At the end,""" start="00:00:52.520" video="mainVideo-collab" id="subtitle"]] [[!template text="""we're going to show you how to preserve""" start="00:00:56.400" video="mainVideo-collab" id="subtitle"]] [[!template text="""the data and the document and its documentation,""" start="00:00:58.960" video="mainVideo-collab" id="subtitle"]] [[!template text="""first doing in plain exporting, then adding some metadata,""" start="00:01:01.760" video="mainVideo-collab" id="subtitle"]] [[!template text="""and showing you two different ways, first a manual export,""" start="00:01:06.600" video="mainVideo-collab" id="subtitle"]] [[!template text="""and also then a batch-processed export.""" start="00:01:09.760" video="mainVideo-collab" id="subtitle"]] [[!template text="""All right.""" start="00:01:13.360" video="mainVideo-collab" id="subtitle"]] [[!template text="""Let's dive in to that.""" start="00:01:14.240" video="mainVideo-collab" id="subtitle"]]
[[!template new="1" text="""Org Mode""" start="00:01:16.080" video="mainVideo-collab" id="subtitle"]]
[[!template text="""Jonathan, can you give us an introduction about Org Mode?""" start="00:01:16.080" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Jonathan]: Of course.""" start="00:01:19.920" video="mainVideo-collab" id="subtitle"]] [[!template text="""So in case anyone isn't familiar with it,""" start="00:01:20.440" video="mainVideo-collab" id="subtitle"]] [[!template text="""Org Mode, in the words of Carsten Dominik,""" start="00:01:23.080" video="mainVideo-collab" id="subtitle"]] [[!template text="""is back to the future for plain text.""" start="00:01:25.880" video="mainVideo-collab" id="subtitle"]] [[!template text="""So this is just a module available for Emacs,""" start="00:01:28.560" video="mainVideo-collab" id="subtitle"]] [[!template text="""plain-text base.""" start="00:01:31.440" video="mainVideo-collab" id="subtitle"]] [[!template text="""It's been around since 2003, which""" start="00:01:32.520" video="mainVideo-collab" id="subtitle"]] [[!template text="""makes it about 20 years old.""" start="00:01:34.920" video="mainVideo-collab" id="subtitle"]] [[!template text="""And it's extensible and fully customizable.""" start="00:01:36.800" video="mainVideo-collab" id="subtitle"]] [[!template text="""And especially, it's very convenient, very good""" start="00:01:40.160" video="mainVideo-collab" id="subtitle"]] [[!template text="""for scientific text production and organization.""" start="00:01:44.000" video="mainVideo-collab" id="subtitle"]] [[!template text="""So for example, you can do project management, agenda,""" start="00:01:46.720" video="mainVideo-collab" id="subtitle"]] [[!template text="""diary, journaling, personal knowledge management,""" start="00:01:49.440" video="mainVideo-collab" id="subtitle"]] [[!template text="""presentation.""" start="00:01:52.560" video="mainVideo-collab" id="subtitle"]] [[!template text="""Even this is written in Org Mode.""" start="00:01:53.360" video="mainVideo-collab" id="subtitle"]] [[!template text="""It's an Org Mode presentation.""" start="00:01:55.560" video="mainVideo-collab" id="subtitle"]] [[!template text="""You can do single source publishing,""" start="00:01:57.440" video="mainVideo-collab" id="subtitle"]] [[!template text="""which we will do later on, and also""" start="00:01:59.200" video="mainVideo-collab" id="subtitle"]] [[!template text="""literate programming, which is the core of our talk.""" start="00:02:01.680" video="mainVideo-collab" id="subtitle"]] [[!template text="""OK.""" start="00:02:06.480" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Lukas]: So let me stop this presentation here.""" start="00:02:07.000" video="mainVideo-collab" id="subtitle"]] [[!template text="""So what you see here is the plain text underneath it.""" start="00:02:10.800" video="mainVideo-collab" id="subtitle"]] [[!template text="""So this is Org Mode.""" start="00:02:14.720" video="mainVideo-collab" id="subtitle"]]
[[!template new="1" text="""Working together""" start="00:02:18.960" video="mainVideo-collab" id="subtitle"]]
[[!template text="""And Jonathan, since we kind of already""" start="00:02:18.960" video="mainVideo-collab" id="subtitle"]] [[!template text="""did the introduction together, should we""" start="00:02:21.920" video="mainVideo-collab" id="subtitle"]] [[!template text="""also do the working part together?""" start="00:02:26.120" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Jonathan]: Of course.""" start="00:02:28.761" video="mainVideo-collab" id="subtitle"]] [[!template text="""So you see on the screen there on the right,""" start="00:02:29.701" video="mainVideo-collab" id="subtitle"]] [[!template text="""that's my screen in Emacs.""" start="00:02:33.120" video="mainVideo-collab" id="subtitle"]] [[!template text="""And Lukas, why don't you host a session using CRDT,""" start="00:02:35.061" video="mainVideo-collab" id="subtitle"]] [[!template text="""and I'll connect to your buffer.""" start="00:02:39.521" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Lukas]: OK. Great.""" start="00:02:41.201" video="mainVideo-collab" id="subtitle"]] [[!template text="""I do that.""" start="00:02:42.561" video="mainVideo-collab" id="subtitle"]] [[!template text="""So what I do, I'm using Doom Emacs.""" start="00:02:43.281" video="mainVideo-collab" id="subtitle"]] [[!template text="""And I can use the `SPC` and then the `l`""" start="00:02:46.181" video="mainVideo-collab" id="subtitle"]] [[!template text="""for the live share/collab part.""" start="00:02:49.308" video="mainVideo-collab" id="subtitle"]] [[!template text="""I can use the `s` for share current buffer.""" start="00:02:52.141" video="mainVideo-collab" id="subtitle"]] [[!template text="""So when I do this, I'm getting asked for some settings.""" start="00:02:58.000" video="mainVideo-collab" id="subtitle"]] [[!template text="""I'm going with the default settings here.""" start="00:03:01.560" video="mainVideo-collab" id="subtitle"]] [[!template text="""So default port, no password, and my display name.""" start="00:03:04.440" video="mainVideo-collab" id="subtitle"]] [[!template text="""And now Emacs is connecting.""" start="00:03:08.341" video="mainVideo-collab" id="subtitle"]] [[!template text="""And once it's connected, which just takes a couple of seconds,""" start="00:03:11.941" video="mainVideo-collab" id="subtitle"]] [[!template text="""I can get the URL.""" start="00:03:15.180" video="mainVideo-collab" id="subtitle"]] [[!template text="""So I'm going back to this menu and using `y`""" start="00:03:17.240" video="mainVideo-collab" id="subtitle"]] [[!template text="""for copying the URL of the current session.""" start="00:03:21.160" video="mainVideo-collab" id="subtitle"]] [[!template text="""And this is the URL I'm going to send over to you, Jonathan,""" start="00:03:24.000" video="mainVideo-collab" id="subtitle"]] [[!template text="""to pick that up.""" start="00:03:27.800" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Jonathan]: Right.""" start="00:03:29.080" video="mainVideo-collab" id="subtitle"]] [[!template text="""OK.""" start="00:03:29.600" video="mainVideo-collab" id="subtitle"]] [[!template text="""And now on my screen, I'm going to do a `SPC l c` for connect.""" start="00:03:30.080" video="mainVideo-collab" id="subtitle"]] [[!template text="""And I'm going to paste the URL""" start="00:03:37.000" video="mainVideo-collab" id="subtitle"]] [[!template text="""that Lukas just sent me in here.""" start="00:03:38.741" video="mainVideo-collab" id="subtitle"]] [[!template text="""Default port, no password.""" start="00:03:40.980" video="mainVideo-collab" id="subtitle"]] [[!template text="""And we're connecting now.""" start="00:03:43.720" video="mainVideo-collab" id="subtitle"]] [[!template text="""So this takes a second just to get us synced up.""" start="00:03:45.700" video="mainVideo-collab" id="subtitle"]] [[!template text="""So we can work on the same document at the same time.""" start="00:03:51.600" video="mainVideo-collab" id="subtitle"]] [[!template text="""We can follow each other's cursors around.""" start="00:03:54.161" video="mainVideo-collab" id="subtitle"]] [[!template text="""We can have multiple buffers open and work on them""" start="00:03:56.640" video="mainVideo-collab" id="subtitle"]] [[!template text="""at the same time.""" start="00:03:58.840" video="mainVideo-collab" id="subtitle"]] [[!template text="""And so here you see that we are both in the same document.""" start="00:04:01.000" video="mainVideo-collab" id="subtitle"]] [[!template text="""You can see my cursor popping around.""" start="00:04:04.720" video="mainVideo-collab" id="subtitle"]] [[!template text="""And you can see we're both editing the same item.""" start="00:04:09.040" video="mainVideo-collab" id="subtitle"]] [[!template text="""Great.""" start="00:04:13.280" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Lukas]: So we also see who else is currently in our buffer""" start="00:04:14.040" video="mainVideo-collab" id="subtitle"]] [[!template text="""with the user overview.""" start="00:04:18.040" video="mainVideo-collab" id="subtitle"]] [[!template text="""So let me just delete that window.""" start="00:04:20.200" video="mainVideo-collab" id="subtitle"]] [[!template text="""And that's going to work in our main one.""" start="00:04:23.560" video="mainVideo-collab" id="subtitle"]] [[!template text="""So we said first part is about data retrieval.""" start="00:04:26.080" video="mainVideo-collab" id="subtitle"]] [[!template text="""So we should give it a headline.""" start="00:04:29.600" video="mainVideo-collab" id="subtitle"]] [[!template text="""We said prepare stage.""" start="00:04:37.080" video="mainVideo-collab" id="subtitle"]] [[!template text="""So what are we going to do first, Jonathan?""" start="00:04:39.240" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Jonathan]: So what we're going to do,""" start="00:04:42.320" video="mainVideo-collab" id="subtitle"]] [[!template text="""what this whole document is based upon,""" start="00:04:43.941" video="mainVideo-collab" id="subtitle"]] [[!template text="""is we're going to pull data from Wikidata using a SPARQL query.""" start="00:04:45.400" video="mainVideo-collab" id="subtitle"]] [[!template text="""The data we're going to pull is related to the NFDIs,""" start="00:04:50.120" video="mainVideo-collab" id="subtitle"]] [[!template text="""which here in Germany is the National Forschungsdaten""" start="00:04:53.520" video="mainVideo-collab" id="subtitle"]] [[!template text="""Infrastructure, which is a sort of collection of universities""" start="00:04:55.640" video="mainVideo-collab" id="subtitle"]] [[!template text="""that work together on various research projects.""" start="00:05:00.680" video="mainVideo-collab" id="subtitle"]] [[!template text="""And this is emblematic of the kind of data""" start="00:05:03.400" video="mainVideo-collab" id="subtitle"]] [[!template text="""that we would be interested in working with here.""" start="00:05:05.600" video="mainVideo-collab" id="subtitle"]] [[!template text="""So I'm going to paste a--forgive the pre-written code--""" start="00:05:09.240" video="mainVideo-collab" id="subtitle"]] [[!template text="""I'm going to paste some text in here.""" start="00:05:13.360" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Lukas]: And while you are talking, I just""" start="00:05:20.040" video="mainVideo-collab" id="subtitle"]] [[!template text="""keep on documenting what we do""" start="00:05:21.408" video="mainVideo-collab" id="subtitle"]] [[!template text="""so we can split the work.""" start="00:05:23.360" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Jonathan]: In here, after a minor technical upset,""" start="00:05:27.360" video="mainVideo-collab" id="subtitle"]] [[!template text="""is the raw dataset cell.""" start="00:05:29.680" video="mainVideo-collab" id="subtitle"]] [[!template text="""And it's going to use SPARQL,""" start="00:05:32.560" video="mainVideo-collab" id="subtitle"]] [[!template text="""which is how we have the syntax highlighting""" start="00:05:34.741" video="mainVideo-collab" id="subtitle"]] [[!template text="""in our code here.""" start="00:05:37.175" video="mainVideo-collab" id="subtitle"]] [[!template text="""It's going to go to the URL endpoint""" start="00:05:37.941" video="mainVideo-collab" id="subtitle"]] [[!template text="""query.wikidata.org/sparql ,""" start="00:05:40.640" video="mainVideo-collab" id="subtitle"]] [[!template text="""and it's going to return the data as a text CSV,""" start="00:05:43.640" video="mainVideo-collab" id="subtitle"]] [[!template text="""and it's going to cache that data""" start="00:05:46.800" video="mainVideo-collab" id="subtitle"]] [[!template text="""so that we don't constantly hammer the API every time""" start="00:05:49.280" video="mainVideo-collab" id="subtitle"]] [[!template text="""we run this notebook.""" start="00:05:51.440" video="mainVideo-collab" id="subtitle"]] [[!template text="""So I'm going to run that there.""" start="00:05:54.240" video="mainVideo-collab" id="subtitle"]] [[!template text="""You can see down at the bottom of my screen,""" start="00:05:57.361" video="mainVideo-collab" id="subtitle"]] [[!template text="""we're contacting the host query.wikidata.org .""" start="00:05:58.800" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Lukas]: And there's the result.""" start="00:06:05.720" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Jonathan]: Yeah, except I think that for our purposes here,""" start="00:06:07.320" video="mainVideo-collab" id="subtitle"]] [[!template text="""we're just going to limit this to 50 results.""" start="00:06:11.800" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Lukas]: Oh, yeah.""" start="00:06:15.280" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Jonathan]: Just so it's a little easier for us to manage.""" start="00:06:16.280" video="mainVideo-collab" id="subtitle"]] [[!template text="""I'm going to run that again.""" start="00:06:18.680" video="mainVideo-collab" id="subtitle"]] [[!template text="""There we go.""" start="00:06:20.720" video="mainVideo-collab" id="subtitle"]] [[!template text="""That looks a little better.""" start="00:06:21.520" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Lukas]: I think that's fine.""" start="00:06:22.320" video="mainVideo-collab" id="subtitle"]] [[!template text="""50 items is fine.""" start="00:06:23.160" video="mainVideo-collab" id="subtitle"]] [[!template text="""So what do we see here, Jonathan?""" start="00:06:25.360" video="mainVideo-collab" id="subtitle"]]
[[!template new="1" text="""Data cleaning""" start="00:06:27.840" video="mainVideo-collab" id="subtitle"]]
[[!template text="""[Jonathan]: Right.""" start="00:06:27.840" video="mainVideo-collab" id="subtitle"]] [[!template text="""So the first thing we see when we look at this""" start="00:06:28.320" video="mainVideo-collab" id="subtitle"]] [[!template text="""is a couple of Q codes at the top,""" start="00:06:31.240" video="mainVideo-collab" id="subtitle"]] [[!template text="""which are an artifact of Wikidata.""" start="00:06:33.308" video="mainVideo-collab" id="subtitle"]] [[!template text="""So these are pages which don't have""" start="00:06:36.080" video="mainVideo-collab" id="subtitle"]] [[!template text="""the label for whichever institution they happen to be.""" start="00:06:39.520" video="mainVideo-collab" id="subtitle"]] [[!template text="""For our purposes here, we're just going to exclude them.""" start="00:06:42.520" video="mainVideo-collab" id="subtitle"]] [[!template text="""We could just go on Wikidata and edit them ourselves.""" start="00:06:45.920" video="mainVideo-collab" id="subtitle"]] [[!template text="""But for now, it's a little more interesting""" start="00:06:48.200" video="mainVideo-collab" id="subtitle"]] [[!template text="""if we go and remove them.""" start="00:06:50.400" video="mainVideo-collab" id="subtitle"]] [[!template text="""So I'm going to create a new cell.""" start="00:06:52.520" video="mainVideo-collab" id="subtitle"]] [[!template text="""Lukas, if you don't mind starting one for data cleaning.""" start="00:06:55.160" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Lukas]: Oh, yeah.""" start="00:06:58.280" video="mainVideo-collab" id="subtitle"]] [[!template text="""Good point.""" start="00:06:58.880" video="mainVideo-collab" id="subtitle"]] [[!template text="""Yeah, data cleaning.""" start="00:06:59.480" video="mainVideo-collab" id="subtitle"]] [[!template text="""OK.""" start="00:07:02.040" video="mainVideo-collab" id="subtitle"]] [[!template text="""How do you want to do that, Jonathan?""" start="00:07:03.440" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Jonathan]: I'm going to use a shell command.""" start="00:07:05.500" video="mainVideo-collab" id="subtitle"]] [[!template text="""So let's see.""" start="00:07:09.760" video="mainVideo-collab" id="subtitle"]] [[!template text="""There we go.""" start="00:07:11.120" video="mainVideo-collab" id="subtitle"]] [[!template text="""And so you can see, here is another cell,""" start="00:07:13.000" video="mainVideo-collab" id="subtitle"]] [[!template text="""that the cell is now using a shell,""" start="00:07:15.160" video="mainVideo-collab" id="subtitle"]] [[!template text="""and that we have this thing `:var input=raw-dataset`,""" start="00:07:20.040" video="mainVideo-collab" id="subtitle"]] [[!template text="""which is the name of the cell above""" start="00:07:23.800" video="mainVideo-collab" id="subtitle"]] [[!template text="""where we got our data from Wikidata.""" start="00:07:25.841" video="mainVideo-collab" id="subtitle"]] [[!template text="""This is going to run just a simple shell command.""" start="00:07:28.440" video="mainVideo-collab" id="subtitle"]] [[!template text="""It's going to take the input and then run `sed` on it""" start="00:07:31.680" video="mainVideo-collab" id="subtitle"]] [[!template text="""and exclude any records which have a Q""" start="00:07:33.960" video="mainVideo-collab" id="subtitle"]] [[!template text="""followed by one or more digits afterwards.""" start="00:07:37.040" video="mainVideo-collab" id="subtitle"]] [[!template text="""That should remove those from our data set.""" start="00:07:41.280" video="mainVideo-collab" id="subtitle"]] [[!template text="""So I'm going to run that.""" start="00:07:44.000" video="mainVideo-collab" id="subtitle"]] [[!template text="""That seems to have done the trick.""" start="00:07:48.640" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Lukas]: Great, yeah.""" start="00:07:51.040" video="mainVideo-collab" id="subtitle"]] [[!template text="""That's really good.""" start="00:07:51.880" video="mainVideo-collab" id="subtitle"]] [[!template text="""We got rid of all the Q items.""" start="00:07:52.920" video="mainVideo-collab" id="subtitle"]] [[!template text="""Very good.""" start="00:07:55.400" video="mainVideo-collab" id="subtitle"]] [[!template text="""So we just have two-column table: institutions""" start="00:07:55.920" video="mainVideo-collab" id="subtitle"]] [[!template text="""and consortia.""" start="00:07:59.960" video="mainVideo-collab" id="subtitle"]] [[!template text="""Very nice.""" start="00:08:02.760" video="mainVideo-collab" id="subtitle"]]
[[!template new="1" text="""Processing""" start="00:08:04.040" video="mainVideo-collab" id="subtitle"]]
[[!template text="""So let's come to our main part, doing some processing.""" start="00:08:04.040" video="mainVideo-collab" id="subtitle"]] [[!template text="""Let me give you a headline here, process the data.""" start="00:08:08.720" video="mainVideo-collab" id="subtitle"]] [[!template text="""What do you want to do first?""" start="00:08:13.640" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Jonathan]: This is not a very complicated data set,""" start="00:08:15.520" video="mainVideo-collab" id="subtitle"]] [[!template text="""but let's just do some simple counts first.""" start="00:08:17.600" video="mainVideo-collab" id="subtitle"]] [[!template text="""I'm going to start with Python,""" start="00:08:19.440" video="mainVideo-collab" id="subtitle"]] [[!template text="""and we're just going to do some aggregation with Python.""" start="00:08:22.200" video="mainVideo-collab" id="subtitle"]] [[!template text="""Again, I've got some pre-written code here.""" start="00:08:25.240" video="mainVideo-collab" id="subtitle"]] [[!template text="""You can see that we've started a cell using Python.""" start="00:08:30.040" video="mainVideo-collab" id="subtitle"]] [[!template text="""The variable `clean_df` now is equal to `clean-dataset`.""" start="00:08:35.000" video="mainVideo-collab" id="subtitle"]] [[!template text="""So we're going to take that data""" start="00:08:37.880" video="mainVideo-collab" id="subtitle"]] [[!template text="""that we retrieved from the SPARQL query,""" start="00:08:39.708" video="mainVideo-collab" id="subtitle"]] [[!template text="""we're going to run it through the cleaning cell,""" start="00:08:41.040" video="mainVideo-collab" id="subtitle"]] [[!template text="""and then we're going to import it into this cell.""" start="00:08:42.720" video="mainVideo-collab" id="subtitle"]] [[!template text="""This is just going to do some simple Python aggregation.""" start="00:08:45.240" video="mainVideo-collab" id="subtitle"]] [[!template text="""We're going to import `pandas`,""" start="00:08:47.840" video="mainVideo-collab" id="subtitle"]] [[!template text="""which is the Python data science library,""" start="00:08:49.008" video="mainVideo-collab" id="subtitle"]] [[!template text="""create a data frame out of our input,""" start="00:08:51.308" video="mainVideo-collab" id="subtitle"]] [[!template text="""and then aggregate it, grouping on `wLabel`,""" start="00:08:54.840" video="mainVideo-collab" id="subtitle"]] [[!template text="""and getting a count from that and returning it.""" start="00:08:57.480" video="mainVideo-collab" id="subtitle"]] [[!template text="""So if we execute that cell...""" start="00:08:59.960" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Lukas]: Nice, we get institutions and a count.""" start="00:09:05.040" video="mainVideo-collab" id="subtitle"]] [[!template text="""But what about not ordering it by the alphabet,""" start="00:09:08.880" video="mainVideo-collab" id="subtitle"]] [[!template text="""but more like ordering by counts?""" start="00:09:14.120" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Jonathan]: Sure.""" start="00:09:17.080" video="mainVideo-collab" id="subtitle"]] [[!template text="""So let's do this... `sort_values()`, I think, as the Python.""" start="00:09:18.440" video="mainVideo-collab" id="subtitle"]] [[!template text="""How does that look?""" start="00:09:22.840" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Lukas]: Better, but I would like to""" start="00:09:24.920" video="mainVideo-collab" id="subtitle"]] [[!template text="""have the highest number first""" start="00:09:27.641" video="mainVideo-collab" id="subtitle"]] [[!template text="""and then ascending.""" start="00:09:29.240" video="mainVideo-collab" id="subtitle"]] [[!template text="""Well, not ascending, descending.""" start="00:09:32.240" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Jonathan]: Right, so we can do `ascending=False`.""" start="00:09:34.720" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Lukas]: This is perfect, I'd say.""" start="00:09:39.880" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Jonathan]: Great.""" start="00:09:42.560" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Lukas]: Very good.""" start="00:09:43.080" video="mainVideo-collab" id="subtitle"]] [[!template text="""OK, that's nice.""" start="00:09:44.080" video="mainVideo-collab" id="subtitle"]] [[!template text="""We get a good overview here.""" start="00:09:46.800" video="mainVideo-collab" id="subtitle"]] [[!template text="""But can we also do something else,""" start="00:09:48.000" video="mainVideo-collab" id="subtitle"]] [[!template text="""like counting how many institutions are""" start="00:09:50.080" video="mainVideo-collab" id="subtitle"]] [[!template text="""involved in one consortium?""" start="00:09:56.080" video="mainVideo-collab" id="subtitle"]] [[!template text="""And also using this later on in the text?""" start="00:09:57.800" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Jonathan]: Sure, so I'm going to put a new...""" start="00:10:00.880" video="mainVideo-collab" id="subtitle"]] [[!template text="""If you give me another heading down here""" start="00:10:00.881" video="mainVideo-collab" id="subtitle"]] [[!template text="""for institutions per consortium...""" start="00:10:05.041" video="mainVideo-collab" id="subtitle"]] [[!template text="""And here we're going to use awk code just to spice things up""" start="00:10:12.080" video="mainVideo-collab" id="subtitle"]] [[!template text="""and add yet another language in here.""" start="00:10:16.800" video="mainVideo-collab" id="subtitle"]] [[!template text="""So you can see this is awk.""" start="00:10:18.960" video="mainVideo-collab" id="subtitle"]] [[!template text="""We're using standard in instead of defining a variable.""" start="00:10:22.440" video="mainVideo-collab" id="subtitle"]] [[!template text="""But the really interesting thing about this cell""" start="00:10:26.280" video="mainVideo-collab" id="subtitle"]] [[!template text="""is that we have this `:var consortium="NFDI4Memory"`.""" start="00:10:28.360" video="mainVideo-collab" id="subtitle"]] [[!template text="""And what this code is doing is""" start="00:10:33.400" video="mainVideo-collab" id="subtitle"]] [[!template text="""it's counting any time it sees""" start="00:10:35.641" video="mainVideo-collab" id="subtitle"]] [[!template text="""that particular consortium name""" start="00:10:38.041" video="mainVideo-collab" id="subtitle"]] [[!template text="""and keeping track of that.""" start="00:10:40.280" video="mainVideo-collab" id="subtitle"]] [[!template text="""So if we execute this,""" start="00:10:41.760" video="mainVideo-collab" id="subtitle"]] [[!template text="""Lukas, why don't you execute this one?""" start="00:10:43.908" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Lukas]: OK, I'm going to enter it.""" start="00:10:45.920" video="mainVideo-collab" id="subtitle"]] [[!template text="""And I get a result, NFDI4Memory,""" start="00:10:49.400" video="mainVideo-collab" id="subtitle"]] [[!template text="""because this is our default value for this variable.""" start="00:10:52.440" video="mainVideo-collab" id="subtitle"]] [[!template text="""And we get the count.""" start="00:10:58.240" video="mainVideo-collab" id="subtitle"]] [[!template text="""So it's five institutions are involved""" start="00:10:59.440" video="mainVideo-collab" id="subtitle"]] [[!template text="""in the NFDI4memory consortium.""" start="00:11:01.641" video="mainVideo-collab" id="subtitle"]] [[!template text="""Great, but the very nice thing, what I think,""" start="00:11:04.640" video="mainVideo-collab" id="subtitle"]] [[!template text="""is here that we can use this code snippet within our text.""" start="00:11:07.840" video="mainVideo-collab" id="subtitle"]] [[!template text="""So, blended in seamlessly.""" start="00:11:12.520" video="mainVideo-collab" id="subtitle"]] [[!template text="""Let me give you an example.""" start="00:11:14.280" video="mainVideo-collab" id="subtitle"]] [[!template text="""I'm writing out the text.""" start="00:11:16.200" video="mainVideo-collab" id="subtitle"]] [[!template text="""Now we know how many institutions are in...""" start="00:11:18.920" video="mainVideo-collab" id="subtitle"]] [[!template text="""Give me an example.""" start="00:11:27.600" video="mainVideo-collab" id="subtitle"]] [[!template text="""I would like to know how many institutions are""" start="00:11:29.240" video="mainVideo-collab" id="subtitle"]] [[!template text="""involved in NFDI4Objects, which is a consortium.""" start="00:11:31.560" video="mainVideo-collab" id="subtitle"]] [[!template text="""So I'm writing `call_` and using""" start="00:11:35.080" video="mainVideo-collab" id="subtitle"]] [[!template text="""the name of this snippet here, of this cell,""" start="00:11:39.240" video="mainVideo-collab" id="subtitle"]] [[!template text="""which is `inst-count(`,""" start="00:11:42.608" video="mainVideo-collab" id="subtitle"]] [[!template text="""and writing my value, `NFDI4Objects`.""" start="00:11:46.608" video="mainVideo-collab" id="subtitle"]] [[!template text="""As soon as I evaluate this using `C-c C-c`,""" start="00:11:51.720" video="mainVideo-collab" id="subtitle"]] [[!template text="""I get the result back here.""" start="00:11:58.000" video="mainVideo-collab" id="subtitle"]] [[!template text="""I can do this even for more.""" start="00:12:00.280" video="mainVideo-collab" id="subtitle"]] [[!template text="""Or in writing, `call_inst-count`, go with `NFDI4Earth`,""" start="00:12:05.160" video="mainVideo-collab" id="subtitle"]] [[!template text="""which is another consortium.""" start="00:12:14.040" video="mainVideo-collab" id="subtitle"]] [[!template text="""`C-c C-c`, it's three institutions.""" start="00:12:16.800" video="mainVideo-collab" id="subtitle"]] [[!template text="""This can be used throughout your text,""" start="00:12:20.560" video="mainVideo-collab" id="subtitle"]] [[!template text="""and as soon as the data set changes from in the beginning,""" start="00:12:23.440" video="mainVideo-collab" id="subtitle"]] [[!template text="""maybe different results requiring Wikidata,""" start="00:12:26.640" video="mainVideo-collab" id="subtitle"]] [[!template text="""this also will be updated once it's exported.""" start="00:12:30.400" video="mainVideo-collab" id="subtitle"]] [[!template text="""Very nice, Jonathan.""" start="00:12:35.080" video="mainVideo-collab" id="subtitle"]]
[[!template new="1" text="""Visualization""" start="00:12:36.040" video="mainVideo-collab" id="subtitle"]]
[[!template text="""But I think we did a lot of analysis""" start="00:12:36.040" video="mainVideo-collab" id="subtitle"]] [[!template text="""on text and counting things.""" start="00:12:38.975" video="mainVideo-collab" id="subtitle"]] [[!template text="""Can we also do something more visual?""" start="00:12:41.080" video="mainVideo-collab" id="subtitle"]] [[!template text="""Show me something.""" start="00:12:43.680" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Jonathan]: Sure.""" start="00:12:45.200" video="mainVideo-collab" id="subtitle"]] [[!template text="""So what we can do with this, because we just""" start="00:12:45.760" video="mainVideo-collab" id="subtitle"]] [[!template text="""have two columns here that are sort of related,""" start="00:12:48.640" video="mainVideo-collab" id="subtitle"]] [[!template text="""we can build a little network plot out of it.""" start="00:12:51.400" video="mainVideo-collab" id="subtitle"]] [[!template text="""So let's make a network visualization.""" start="00:12:53.760" video="mainVideo-collab" id="subtitle"]] [[!template text="""We're going to use the `igraph` library from R""" start="00:12:57.000" video="mainVideo-collab" id="subtitle"]] [[!template text="""and just plot the edges that we see here.""" start="00:12:59.600" video="mainVideo-collab" id="subtitle"]] [[!template text="""There we go.""" start="00:13:02.560" video="mainVideo-collab" id="subtitle"]] [[!template text="""There's my little heading and space.""" start="00:13:04.240" video="mainVideo-collab" id="subtitle"]] [[!template text="""Here is our code.""" start="00:13:11.880" video="mainVideo-collab" id="subtitle"]] [[!template text="""Again, just to be fancy and keep using""" start="00:13:13.480" video="mainVideo-collab" id="subtitle"]] [[!template text="""different languages in here, we set a variable called""" start="00:13:16.040" video="mainVideo-collab" id="subtitle"]] [[!template text="""`NFDI_edges` equal to `clean-dataset`.""" start="00:13:19.720" video="mainVideo-collab" id="subtitle"]] [[!template text="""So this, again, is sort of cascading""" start="00:13:21.600" video="mainVideo-collab" id="subtitle"]] [[!template text="""through the original data""" start="00:13:23.400" video="mainVideo-collab" id="subtitle"]] [[!template text="""that we pulled from the Wikidata endpoint,""" start="00:13:25.741" video="mainVideo-collab" id="subtitle"]] [[!template text="""cleaning that data, and now it's being inserted""" start="00:13:28.808" video="mainVideo-collab" id="subtitle"]] [[!template text="""into this cell as well.""" start="00:13:30.960" video="mainVideo-collab" id="subtitle"]] [[!template text="""But you see the difference here.""" start="00:13:32.960" video="mainVideo-collab" id="subtitle"]] [[!template text="""Instead of exporting a table, what we're saying""" start="00:13:34.240" video="mainVideo-collab" id="subtitle"]] [[!template text="""is that there will be a graphics file,""" start="00:13:36.840" video="mainVideo-collab" id="subtitle"]] [[!template text="""and it will be called network-plot.png.""" start="00:13:39.240" video="mainVideo-collab" id="subtitle"]] [[!template text="""All right.""" start="00:13:44.640" video="mainVideo-collab" id="subtitle"]] [[!template text="""And so Lukas, why don't you execute this one?""" start="00:13:45.120" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Lukas]: There you go.""" start="00:13:47.960" video="mainVideo-collab" id="subtitle"]] [[!template text="""I can click `C-c C-c`""" start="00:13:48.760" video="mainVideo-collab" id="subtitle"]] [[!template text="""and I get a nice plot of the network below our cell.""" start="00:13:52.920" video="mainVideo-collab" id="subtitle"]] [[!template text="""So this is very nice indeed.""" start="00:13:59.160" video="mainVideo-collab" id="subtitle"]]
[[!template new="1" text="""Preserve""" start="00:14:01.760" video="mainVideo-collab" id="subtitle"]]
[[!template text="""So I think it's about time to wrap it up and to export""" start="00:14:01.760" video="mainVideo-collab" id="subtitle"]] [[!template text="""and to preserve the data and the documentation""" start="00:14:05.200" video="mainVideo-collab" id="subtitle"]] [[!template text="""that we have in our very last step, calling preserve.""" start="00:14:07.960" video="mainVideo-collab" id="subtitle"]] [[!template text="""So I would like to do it in two steps.""" start="00:14:13.080" video="mainVideo-collab" id="subtitle"]] [[!template text="""First, maybe manually exporting it,""" start="00:14:16.240" video="mainVideo-collab" id="subtitle"]] [[!template text="""but then also doing it in a batch process.""" start="00:14:18.800" video="mainVideo-collab" id="subtitle"]] [[!template text="""Giving you some insights how to do that manual export.""" start="00:14:22.240" video="mainVideo-collab" id="subtitle"]] [[!template text="""For example, you can do a LaTeX export.""" start="00:14:27.120" video="mainVideo-collab" id="subtitle"]] [[!template text="""Let me write down the key combination to do that here.""" start="00:14:30.560" video="mainVideo-collab" id="subtitle"]] [[!template text="""So you press `SPC m e l o`.""" start="00:14:34.280" video="mainVideo-collab" id="subtitle"]] [[!template text="""Let me show you how this is done.""" start="00:14:44.600" video="mainVideo-collab" id="subtitle"]] [[!template text="""So I'm pressing `SPC`.""" start="00:14:49.160" video="mainVideo-collab" id="subtitle"]] [[!template text="""I'm pressing `m`, which is my local leader.""" start="00:14:51.440" video="mainVideo-collab" id="subtitle"]] [[!template text="""I'm pressing `e`, which is now the `org-export-dispatch`.""" start="00:14:55.680" video="mainVideo-collab" id="subtitle"]] [[!template text="""And now I have different options I can choose from.""" start="00:15:01.280" video="mainVideo-collab" id="subtitle"]] [[!template text="""I want to do a LaTeX export because I want to get in PDF.""" start="00:15:03.520" video="mainVideo-collab" id="subtitle"]] [[!template text="""So I'm pressing `l`.""" start="00:15:07.120" video="mainVideo-collab" id="subtitle"]] [[!template text="""Now I've got different options available.""" start="00:15:08.675" video="mainVideo-collab" id="subtitle"]] [[!template text="""So I'm pressing `o` for a PDF file and open that.""" start="00:15:11.480" video="mainVideo-collab" id="subtitle"]] [[!template text="""Let's see now the code.""" start="00:15:17.400" video="mainVideo-collab" id="subtitle"]] [[!template text="""Now this is exporting document.""" start="00:15:21.120" video="mainVideo-collab" id="subtitle"]] [[!template text="""And what we have here is PDF,""" start="00:15:25.640" video="mainVideo-collab" id="subtitle"]] [[!template text="""which contains our workflow in the beginning,""" start="00:15:29.675" video="mainVideo-collab" id="subtitle"]] [[!template text="""our bullet points we have here,""" start="00:15:31.975" video="mainVideo-collab" id="subtitle"]] [[!template text="""and also the code snippet""" start="00:15:35.708" video="mainVideo-collab" id="subtitle"]] [[!template text="""that we use for querying the data.""" start="00:15:37.920" video="mainVideo-collab" id="subtitle"]] [[!template text="""And we have the result below that.""" start="00:15:41.280" video="mainVideo-collab" id="subtitle"]] [[!template text="""So this is our table with all the data sets.""" start="00:15:43.600" video="mainVideo-collab" id="subtitle"]] [[!template text="""But as you can see, this is running out of the page.""" start="00:15:47.000" video="mainVideo-collab" id="subtitle"]] [[!template text="""So this is not very nice using the default settings.""" start="00:15:51.880" video="mainVideo-collab" id="subtitle"]] [[!template text="""But everything is in this PDF.""" start="00:15:55.680" video="mainVideo-collab" id="subtitle"]] [[!template text="""I guess we can now show you a way""" start="00:16:00.240" video="mainVideo-collab" id="subtitle"]] [[!template text="""how to improve this result.""" start="00:16:02.760" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Jonathan]: Right.""" start="00:16:06.520" video="mainVideo-collab" id="subtitle"]] [[!template text="""So we have, of course, a version of this""" start="00:16:07.040" video="mainVideo-collab" id="subtitle"]] [[!template text="""that we prepared ahead of time,""" start="00:16:09.400" video="mainVideo-collab" id="subtitle"]] [[!template text="""which is more or less identical to the one we just made,""" start="00:16:10.775" video="mainVideo-collab" id="subtitle"]] [[!template text="""but it has a little more text, a little more explanation,""" start="00:16:14.280" video="mainVideo-collab" id="subtitle"]] [[!template text="""a little more documentation along with the code.""" start="00:16:17.840" video="mainVideo-collab" id="subtitle"]] [[!template text="""You can see we have some metadata up at the top,""" start="00:16:20.560" video="mainVideo-collab" id="subtitle"]] [[!template text="""the title, the authors, a bibliography,""" start="00:16:23.880" video="mainVideo-collab" id="subtitle"]] [[!template text="""and most importantly, the `custom-export.setup` file,""" start="00:16:26.880" video="mainVideo-collab" id="subtitle"]] [[!template text="""which lists specifically the sort of LaTeX commands""" start="00:16:31.680" video="mainVideo-collab" id="subtitle"]] [[!template text="""that we're using and the HTML styles that we're going to use.""" start="00:16:36.880" video="mainVideo-collab" id="subtitle"]] [[!template text="""And then down at the bottom of this file,""" start="00:16:43.600" video="mainVideo-collab" id="subtitle"]] [[!template text="""we have our automatic batch process.""" start="00:16:45.920" video="mainVideo-collab" id="subtitle"]] [[!template text="""Here is one more language we're including in here.""" start="00:16:49.120" video="mainVideo-collab" id="subtitle"]] [[!template text="""So this is Lisp.""" start="00:16:51.720" video="mainVideo-collab" id="subtitle"]] [[!template text="""And you can see here we are exporting to HTML, ASCII,""" start="00:16:53.440" video="mainVideo-collab" id="subtitle"]] [[!template text="""and PDF.""" start="00:16:57.360" video="mainVideo-collab" id="subtitle"]] [[!template text="""The nice thing about this is that this is a document.""" start="00:16:58.080" video="mainVideo-collab" id="subtitle"]] [[!template text="""It's a sort of document that we have a couple of""" start="00:17:01.360" video="mainVideo-collab" id="subtitle"]] [[!template text="""that we can have running automatically and building.""" start="00:17:03.308" video="mainVideo-collab" id="subtitle"]] [[!template text="""It will export a HTML, an ASCII file, and a PDF file""" start="00:17:08.640" video="mainVideo-collab" id="subtitle"]] [[!template text="""every time it's run based off of""" start="00:17:12.920" video="mainVideo-collab" id="subtitle"]] [[!template text="""the most recent data available on Wikidata.""" start="00:17:14.675" video="mainVideo-collab" id="subtitle"]] [[!template text="""So it's self-documenting.""" start="00:17:17.320" video="mainVideo-collab" id="subtitle"]] [[!template text="""We have, of course, our data retrieval steps,""" start="00:17:19.720" video="mainVideo-collab" id="subtitle"]] [[!template text="""our data cleaning steps, our data preparation steps,""" start="00:17:22.441" video="mainVideo-collab" id="subtitle"]] [[!template text="""and our preservation steps all listed at the same time.""" start="00:17:25.160" video="mainVideo-collab" id="subtitle"]] [[!template text="""And then you can see over on the right,""" start="00:17:28.360" video="mainVideo-collab" id="subtitle"]] [[!template text="""there's an example of the HTML file that we get out of this.""" start="00:17:30.240" video="mainVideo-collab" id="subtitle"]] [[!template text="""We also get a very nicely formatted PDF file,""" start="00:17:34.360" video="mainVideo-collab" id="subtitle"]] [[!template text="""which doesn't have that little issue""" start="00:17:37.640" video="mainVideo-collab" id="subtitle"]] [[!template text="""with the overflow of the table.""" start="00:17:39.240" video="mainVideo-collab" id="subtitle"]] [[!template text="""It's very nicely put together.""" start="00:17:41.720" video="mainVideo-collab" id="subtitle"]] [[!template text="""And we even have an ASCII file.""" start="00:17:43.560" video="mainVideo-collab" id="subtitle"]] [[!template text="""And I should also point out very quickly,""" start="00:17:46.200" video="mainVideo-collab" id="subtitle"]] [[!template text="""while you have this one up, Lukas, after the awk code,""" start="00:17:47.880" video="mainVideo-collab" id="subtitle"]] [[!template text="""you can see the text for the number of consortia,""" start="00:17:51.800" video="mainVideo-collab" id="subtitle"]] [[!template text="""or the number of institutions per consortia""" start="00:17:56.080" video="mainVideo-collab" id="subtitle"]] [[!template text="""is actually printed inline.""" start="00:17:57.840" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Lukas]: Yeah, you're very right.""" start="00:18:00.520" video="mainVideo-collab" id="subtitle"]] [[!template text="""So this is what we had as code,""" start="00:18:01.800" video="mainVideo-collab" id="subtitle"]] [[!template text="""and now this is nicely integrated into our text.""" start="00:18:06.120" video="mainVideo-collab" id="subtitle"]] [[!template text="""So we got the consortium and number of institutions.""" start="00:18:10.720" video="mainVideo-collab" id="subtitle"]] [[!template text="""You can't tell a difference between code and text.""" start="00:18:15.280" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Jonathan]: And those are automatically updated.""" start="00:18:19.200" video="mainVideo-collab" id="subtitle"]] [[!template text="""So if another institution joins NFDI4Earth,""" start="00:18:20.720" video="mainVideo-collab" id="subtitle"]] [[!template text="""then the next time this runs, we update the text right here.""" start="00:18:23.880" video="mainVideo-collab" id="subtitle"]] [[!template text="""It's nothing we have to worry about.""" start="00:18:26.320" video="mainVideo-collab" id="subtitle"]] [[!template text="""We just pull it directly out of Wikidata.""" start="00:18:28.520" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Lukas]: And for the sake of completeness,""" start="00:18:31.840" video="mainVideo-collab" id="subtitle"]] [[!template text="""this is the ASCII file.""" start="00:18:34.680" video="mainVideo-collab" id="subtitle"]] [[!template text="""That's in the export format.""" start="00:18:37.880" video="mainVideo-collab" id="subtitle"]] [[!template text="""It contains also everything, code and data.""" start="00:18:42.760" video="mainVideo-collab" id="subtitle"]] [[!template text="""Yeah, so this is what we wanted to show you,""" start="00:18:48.360" video="mainVideo-collab" id="subtitle"]] [[!template text="""how to do some data processing,""" start="00:18:53.240" video="mainVideo-collab" id="subtitle"]] [[!template text="""some collaborative work,""" start="00:18:56.640" video="mainVideo-collab" id="subtitle"]] [[!template text="""documenting using org-babel.""" start="00:18:58.680" video="mainVideo-collab" id="subtitle"]] [[!template text="""Thanks for listening.""" start="00:19:01.120" video="mainVideo-collab" id="subtitle"]] [[!template text="""[Jonathan]: Thank you all, have a good day.""" start="00:19:05.720" video="mainVideo-collab" id="subtitle"]]
Captioner: amine Questions or comments? Please e-mail [hartman@itc.rwth-aachen.de, bossert@itc.rwth-aachen.de](mailto:hartman@itc.rwth-aachen.de, bossert@itc.rwth-aachen.de?subject=Comment%20for%20EmacsConf%202023%20collab%3A%20Collaborative%20data%20processing%20and%20documenting%20using%20org-babel)