diff options
Diffstat (limited to '2023/talks/collab.md')
-rw-r--r-- | 2023/talks/collab.md | 195 |
1 files changed, 195 insertions, 0 deletions
diff --git a/2023/talks/collab.md b/2023/talks/collab.md new file mode 100644 index 00000000..1a88809b --- /dev/null +++ b/2023/talks/collab.md @@ -0,0 +1,195 @@ +[[!meta title="Collaborative data processing and documenting using org-babel"]] +[[!meta copyright="Copyright © 2023 Jonathan Hartman, Lukas C. Bossert"]] +[[!inline pages="internal(2023/info/collab-nav)" raw="yes"]] + +<!-- Initially generated with emacsconf-publish-talk-page and then left alone for manual editing --> +<!-- You can manually edit this file to update the abstract, add links, etc. ---> + + +# Collaborative data processing and documenting using org-babel +Jonathan Hartman (he/him), Lukas C. Bossert (he/him) - <https://mastodon.social/@lukascbossert>, <mailto:hartman@itc.rwth-aachen.de>, <mailto:bossert@itc.rwth-aachen.de> + +[[!inline pages="internal(2023/info/collab-before)" raw="yes"]] + +In our presentation we will show an efficient way of combining +information and enriching it by retrieving data, processing it, and +finally exporting it, all with org-mode. In this presentation, we will +demonstrate not only org-mode, but also a few companion libraries that +add functionality such as knowledge graph visualizations, literate +programming, and collaborative editing to quickly create a deeply +informative reference page. + +The starting point of our best practice is the National Research Data +Infrastructure Germany (NFDI), about which we intend to retrieve and +process certain information data gathered from wikidata. For this, we +are additionally leveraging the "org-roam" emacs package, which +provides functionality for quickly and simply linking together notes +and ideas into a custom knowledge graph. Initially, we will write a +short abstract about the NFDI and embed it into our existing knowledge +graph by linking it to other existing nodes. In the visualized graph +(using the “org-roam-ui” package), links and secondary connections to +other existing nodes can now be revealed. + +Next, we would like to enrich the text about the NFDI by with data +retrieved from the Wikidata API. A convenient way of creating +self-documenting code is the approach called “literate programming”, +which presents program logic embedded within human language text. In +Emacs we achieve this by using the “org-babel” package. Perhaps now we +find it is helpful to collaborate with a colleague in the document: +while one is writing the code, the other can explain its use and +interpret the results. We will do this simultaneously in the same +document using a method called “crdt” (conflict-free replicated data +type) and – of course – there is also an implementation of this in +Emacs. The results of the code blocks can be used for further analysis +and shared throughout the same document. + +Finally, for the sake of proper and barrier free documentation, we +show how to export the document to various formats like pdf, html, txt +etc. using either the built-in feature of org-mode or the +implementation of pandoc. + +About the speakers: + +**Jonathan Hartman** is a trained data scientist and works at the IT +Center of the RWTH Aachen University, Germany. + +**Lukas C. Bossert** is a trained classical archaeologist and is deputy +head of the department "research process and data management" at the +IT Center of the RWTH. + +Lukas, an intermediate Emacs user, is currently exploring how to +optimize his daily workflow by leveraging various Emacs packages. On +the other hand, Jonathan is a relative newcomer to this environment, +encountering common pitfalls faced by beginners. Together, they +explore the capabilities and functionalities of org-mode, discovering +how it can enhance data management and presentation in their research +processes. + +[[!img /i/emacsconf-2023-collab-sponsorship.png alt="Lukas and Jonathan are financed by the DKZ.2R Datenkompetenzkolleg Rhein-Ruhr (16DKZ2030E), www.dks2r.de"]] + +# Discussion + +## Questions and answers + +- Q: How reliable it resolves the conflict? I mean, for my personal + use case, for example, Sycnthing, sometimes it's not working + perfectly and I had to manually edit it. How is it robust compared + to syncthing? + - A (Lukas): We also faced sometimes issues that letters got + mixed up. We couldnt figure out what caused it and it was not + reproducable . I cannot compare it to syncthing, never used that + with emacs/org-mode. +- Q: How's the security for this kind of things? I mean, if we adopt + these things in our PAD, is there any, can this thing execute + arbitrary (elisp) code in different people's computer? (Think like + an adversary!) + - A: (Lukas) As far as we saw the code is executed on the local + computer, see the part with the R-code in our video. + - (zaeph) We had plans with qhong (maintainer of crdt.el) to + tunnel the connection via SSL, but we were blocked by the SSL + library that shipped with Emacs, sadly. However, we did create + a security policy that allowed restrictions on the execution of + Elisp code. (great!) +- Q: Really nice talk and demo! You guys clearly rehearsed :). I + always wonder with serial data processing sequencing like this, to + what degree do the intermediate outputs need to appear inline in the + text? Suppose you had 50,000 or one million rows from your initial + wikidata (or similar) call. How would you handle that size of data + using a collaborative, literate approach like this? + - A: (Lukas) Good question. In your local buffer there is no + difference and for the collaborative partner I cannot tell. We + testet it with 50 items because that was enough for + demonstrating our purpose. + - noweb allows getting results of evaluation without having to put + the actual data into Org buffer - just arrange the original + block generating the data to have :results silent. Basically, + :var foo=block-name does not require "block-name" to be + evaluated in advance - it will be evaluated as necessary. AFAIU, + in the talk, it is re-evaluated every time (to not have it, one + would need :cache t). + - This has tremendous utility + - So it would be stored on disk and referenced by name in a + subsequent block? Sounds useful. + - Not on disk - just cached within a single session. To store + on disk, need to save to actual file on disk. +- Q: How do you handle the viewing of larger or really any tabular + data in Emacs/Org when you want to inspect it, like the nice way + tabular data is displayed inline in Rmarkdown/RStudio? + - A: (Lukas) I have no particular way of doing this. + - What about pandas data summary functionality? Can be a simple + python block. + - Lukas: Jonathan is our python expert, he might answer this + question. + - A: (Jonathan) If I follow, you can certainly just use + DataFrame.describe() or Series.describe() to get summary + statistics for a dataset - the return value would be a Series or + a DataFrame, which would be displayed similiarly to how we show + things here. Alternatively, DataFrame.head(n) or + DataFrame.sample(n) would return a dataframe of the first n / n + random lines of a dataset, and might be a way of providing the + gist of a very large dataset without printing the entire table + in the document. + - Would be nice to have a "summarized table" functionality in + Org, that includes an abridged copy of a long table inline, but + you can open it in another buffer to browse/edit the full table + (ala block edit). + - Feel free to post a feature request - see + <https://orgmode.org/manual/Feedback.html#Feedback> +- Q: I'm thinking about an application for a single user, but in + different platforms. In a simple case. For example, you have a + buffer in your local computer, and you also want to have some files + on your pad or on your phone, and you can use this CADT concept to + make sure that there's not too much conflict in between different + editing sections. Do you think this is a good idea? I mean, compared + to purely relying on Syncthing, which sometimes I feel is unreliable + for resolving those conflicts. + - A: (Lukas) This sounds very interesting and could beneficial for + contiously working on things. + +## Notes + +- I like the way you highlight the point you are talking about in real + time. +- Conflict-free Replicated Data Types (CADT) :: + <https://github.com/emacs-straight/crdt> +- !This is the future of PAD for our conference. +- Just came here to say watching two users editing the same buffer + simultaneously is BLOWING MY MIND + - BLOWING MY MIND +2 + - blowing my mind, too ... + - WOW +- Gitlab custom-export.setup + - What about it? + - I am looking for that setup file and want to try it :) + --> + <https://git.rwth-aachen.de/dl/workshops/collaborative-coding-with-emacs/-/blob/main/emacs/custom-export.setup> + - Thank you! +- Truly one of the most impressive talks of the day. Congrats! Very + inspiring + - Yes, indeed. + - (Lukas) Wow! Thank you. We werent sure if this is worth showing + at EmacsConf because there already have been plenty of talks + about literate programming and org-babel.... + - Great collaborative conversation and step-wise example + creates a different (and impactful) framing. Thank you! +- crdt is fantastic; pity that most (all but one) of my collaborators use Word & VS Code. 🙁 +- that's really cool. One of the parts that's a bit hidden from the user is seeing the format that the data is in inside the shell script +- it is whatever constitutes the closest equivalent of table in sh (array) + - yeah, you have to keep the representation in mind when filtering it as text through sed +- this demo is so cool :D +- Really, really impressive I have to admit +- HA. you cannot evaluate in place so seamlessly in that way with Rmarkdown :). And you cannot combine named blocks in this way either. Wish more folks used emacs. +- wow, so `#+CALL` can be embedded in text via `call_()?` TIL +- such a slick presentation, I like the CRDT collaboration angle, looks like an end-game UX +- Impressive workflow! +- great presentation! +- For those of you who remember the bad old days before "reproducible research," that talk is even more impressive. Great job! + - i was prolly not there in the bad old days, but imho reproducible research is a pressing, current problem. +- I feel like that talk video should be shared on Hacker News + + +[[!inline pages="internal(2023/info/collab-after)" raw="yes"]] + +[[!inline pages="internal(2023/info/collab-nav)" raw="yes"]] + + |