summaryrefslogtreecommitdiffstats
path: root/2022/talks
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--2022/talks/grail.md74
1 files changed, 74 insertions, 0 deletions
diff --git a/2022/talks/grail.md b/2022/talks/grail.md
new file mode 100644
index 00000000..38dae0b0
--- /dev/null
+++ b/2022/talks/grail.md
@@ -0,0 +1,74 @@
+[[!meta title="GRAIL---A Generalized Representation and Aggregation of Information Layers"]]
+[[!meta copyright="Copyright © 2022 Sameer Pradhan"]]
+[[!inline pages="internal(2022/info/grail-nav)" raw="yes"]]
+
+<!-- Initially generated with emacsconf-generate-talk-page and then left alone for manual editing -->
+<!-- You can manually edit this file to update the abstract, add links, etc. --->
+
+
+# GRAIL---A Generalized Representation and Aggregation of Information Layers
+Sameer Pradhan (he/him)
+
+[[!inline pages="internal(2022/info/grail-before)" raw="yes"]]
+
+The human brain receives various signals that it assimilates (filters,
+splices, corrects, etc.) to build a syntactic structure and its semantic
+interpretation. This is a complex process that enables human communication.
+The field of artificial intelligence (AI) is devoted to studying how we
+generate symbols and derive meaning from such signals and to building
+predictive models that allow effective human-computer interaction.
+
+For the purpose of this talk we will limit the scope of signals to the
+domain to language&#x2014;text and speech. Computational Linguistics (CL),
+a.k.a. Natural Language Processing (NLP), is a sub-area of AI that tries to
+interpret them. It involves modeling and predicting complex linguistic
+structures from these signals. These models tend to rely heavily on a large
+amount of \`\`raw'' (naturally occurring) data and a varying amount of
+(manually) enriched data, commonly known as \`\`annotations''. The models are
+only as good as the quality of the annotations. Owing to the complex and
+numerous nature of linguistic phenomena, a divide and conquer approach is
+common. The upside is that it allows one to focus on one, or few, related
+linguistic phenomena. The downside is that the universe of these phenomena
+keeps expanding as language is context sensitive and evolves over time. For
+example, depending on the context, the word \`\`bank'' can refer to a financial
+institution, or the rising ground surrounding a lake, or something else. The
+verb \`\`google'' did not exist before the company came into being.
+
+Manually annotating data can be a very task specific, labor intensive,
+endeavor. Owing to this, advances in multiple modalities have happened in
+silos until recently. Recent advances in computer hardware and machine
+learning algorithms have opened doors to interpretation of multimodal data.
+However, the need to piece together such related but disjoint predictions
+poses a huge challenge.
+
+This brings us to the two questions that we will try to address in this
+talk:
+
+1. How can we come up with a unified representation of data and annotations that encompasses arbitrary levels of linguistic information? and,
+
+2. What role might Emacs play in this process?
+
+Emacs provides a rich environment for editing and manipulating recursive
+embedded structures found in programming languages. Its view of text,
+however, is more or less linear&#x2013;strings broken into words, strings ended by
+periods, strings identified using delimiters, etc. It does not assume
+embedded or recursive structure in text. However, the process of interpreting
+natural language involves operating on such structures. What if we could
+adapt Emacs to manipulate rich structures derived from text? Unlike
+programming languages, which are designed to be parsed and interpreted
+deterministically, interpretation of statements in natural languages has to
+frequently deal with phenomena such as ambiguity, inconsistency,
+incompleteness, etc. and can get quite complex.
+
+We present an architecture (GRAIL) which utilizes the capabilities of Emacs
+to allow the representation and aggregation of such rich structures in
+a systematic fashion. Our approach is not tied to Emacs, but uses its many
+built-in capabilities for creating and evaluating solution prototypes.
+
+
+
+[[!inline pages="internal(2022/info/grail-after)" raw="yes"]]
+
+[[!inline pages="internal(2022/info/grail-nav)" raw="yes"]]
+
+[[!taglink CategoryLinguistics]]