From 16f814fc76d2382e65ff7f66147d6e0aea842a1f Mon Sep 17 00:00:00 2001 From: Sacha Chua Date: Wed, 5 Oct 2022 11:18:08 -0400 Subject: Rethread nav, add grail --- 2022/talks/grail.md | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 74 insertions(+) create mode 100644 2022/talks/grail.md (limited to '2022/talks') diff --git a/2022/talks/grail.md b/2022/talks/grail.md new file mode 100644 index 00000000..38dae0b0 --- /dev/null +++ b/2022/talks/grail.md @@ -0,0 +1,74 @@ +[[!meta title="GRAIL---A Generalized Representation and Aggregation of Information Layers"]] +[[!meta copyright="Copyright © 2022 Sameer Pradhan"]] +[[!inline pages="internal(2022/info/grail-nav)" raw="yes"]] + + + + + +# GRAIL---A Generalized Representation and Aggregation of Information Layers +Sameer Pradhan (he/him) + +[[!inline pages="internal(2022/info/grail-before)" raw="yes"]] + +The human brain receives various signals that it assimilates (filters, +splices, corrects, etc.) to build a syntactic structure and its semantic +interpretation. This is a complex process that enables human communication. +The field of artificial intelligence (AI) is devoted to studying how we +generate symbols and derive meaning from such signals and to building +predictive models that allow effective human-computer interaction. + +For the purpose of this talk we will limit the scope of signals to the +domain to language—text and speech. Computational Linguistics (CL), +a.k.a. Natural Language Processing (NLP), is a sub-area of AI that tries to +interpret them. It involves modeling and predicting complex linguistic +structures from these signals. These models tend to rely heavily on a large +amount of \`\`raw'' (naturally occurring) data and a varying amount of +(manually) enriched data, commonly known as \`\`annotations''. The models are +only as good as the quality of the annotations. Owing to the complex and +numerous nature of linguistic phenomena, a divide and conquer approach is +common. The upside is that it allows one to focus on one, or few, related +linguistic phenomena. The downside is that the universe of these phenomena +keeps expanding as language is context sensitive and evolves over time. For +example, depending on the context, the word \`\`bank'' can refer to a financial +institution, or the rising ground surrounding a lake, or something else. The +verb \`\`google'' did not exist before the company came into being. + +Manually annotating data can be a very task specific, labor intensive, +endeavor. Owing to this, advances in multiple modalities have happened in +silos until recently. Recent advances in computer hardware and machine +learning algorithms have opened doors to interpretation of multimodal data. +However, the need to piece together such related but disjoint predictions +poses a huge challenge. + +This brings us to the two questions that we will try to address in this +talk: + +1. How can we come up with a unified representation of data and annotations that encompasses arbitrary levels of linguistic information? and, + +2. What role might Emacs play in this process? + +Emacs provides a rich environment for editing and manipulating recursive +embedded structures found in programming languages. Its view of text, +however, is more or less linear–strings broken into words, strings ended by +periods, strings identified using delimiters, etc. It does not assume +embedded or recursive structure in text. However, the process of interpreting +natural language involves operating on such structures. What if we could +adapt Emacs to manipulate rich structures derived from text? Unlike +programming languages, which are designed to be parsed and interpreted +deterministically, interpretation of statements in natural languages has to +frequently deal with phenomena such as ambiguity, inconsistency, +incompleteness, etc. and can get quite complex. + +We present an architecture (GRAIL) which utilizes the capabilities of Emacs +to allow the representation and aggregation of such rich structures in +a systematic fashion. Our approach is not tied to Emacs, but uses its many +built-in capabilities for creating and evaluating solution prototypes. + + + +[[!inline pages="internal(2022/info/grail-after)" raw="yes"]] + +[[!inline pages="internal(2022/info/grail-nav)" raw="yes"]] + +[[!taglink CategoryLinguistics]] -- cgit v1.2.3