diff options
Diffstat (limited to '2022/talks')
| -rw-r--r-- | 2022/talks/grail.md | 74 | 
1 files changed, 74 insertions, 0 deletions
| diff --git a/2022/talks/grail.md b/2022/talks/grail.md new file mode 100644 index 00000000..38dae0b0 --- /dev/null +++ b/2022/talks/grail.md @@ -0,0 +1,74 @@ +[[!meta title="GRAIL---A Generalized Representation and Aggregation of Information Layers"]] +[[!meta copyright="Copyright © 2022 Sameer Pradhan"]] +[[!inline pages="internal(2022/info/grail-nav)" raw="yes"]] + +<!-- Initially generated with emacsconf-generate-talk-page and then left alone for manual editing --> +<!-- You can manually edit this file to update the abstract, add links, etc. ---> + + +# GRAIL---A Generalized Representation and Aggregation of Information Layers +Sameer Pradhan (he/him) + +[[!inline pages="internal(2022/info/grail-before)" raw="yes"]] + +The human brain receives various signals that it assimilates (filters, +splices, corrects, etc.) to build a syntactic structure and its semantic +interpretation.  This is a complex process that enables human communication. +The field of artificial intelligence (AI) is devoted to studying how we +generate symbols and derive meaning from such signals and to building +predictive models that allow effective human-computer interaction. + +For the purpose of this talk we will limit the scope of signals to the +domain to language—text and speech.  Computational Linguistics (CL), +a.k.a. Natural Language Processing (NLP), is a sub-area of AI that tries to +interpret them.  It involves modeling and predicting complex linguistic +structures from these signals.  These models tend to rely heavily on a large +amount of \`\`raw'' (naturally occurring) data and a varying amount of +(manually) enriched data, commonly known as \`\`annotations''.  The models are +only as good as the quality of the annotations. Owing to the complex and +numerous nature of linguistic phenomena, a divide and conquer approach is +common.  The upside is that it allows one to focus on one, or few, related +linguistic phenomena.  The downside is that the universe of these phenomena +keeps expanding as language is context sensitive and evolves over time.  For +example, depending on the context, the word \`\`bank'' can refer to a financial +institution, or the rising ground surrounding a lake, or something else.  The +verb \`\`google'' did not exist before the company came into being. + +Manually annotating data can be a very task specific, labor intensive, +endeavor.  Owing to this, advances in multiple modalities have happened in +silos until recently.  Recent advances in computer hardware and machine +learning algorithms have opened doors to interpretation of multimodal data. +However, the need to piece together such related but disjoint predictions +poses a huge challenge. + +This brings us to the two questions that we will try to address in this +talk: + +1.  How can we come up with a unified representation of data and annotations that encompasses arbitrary levels of linguistic information? and, + +2.  What role might Emacs play in this process? + +Emacs provides a rich environment for editing and manipulating recursive +embedded structures found in programming languages.  Its view of text, +however, is more or less linear–strings broken into words, strings ended by +periods, strings identified using delimiters, etc.  It does not assume +embedded or recursive structure in text.  However, the process of interpreting +natural language involves operating on such structures.  What if we could +adapt Emacs to manipulate rich structures derived from text?  Unlike +programming languages, which are designed to be parsed and interpreted +deterministically, interpretation of statements in natural languages has to +frequently deal with phenomena such as ambiguity, inconsistency, +incompleteness, etc. and can get quite complex. + +We present an architecture (GRAIL) which utilizes the capabilities of Emacs +to allow the representation and aggregation of such rich structures in +a systematic fashion.  Our approach is not tied to Emacs, but uses its many +built-in capabilities for creating and evaluating solution prototypes. + + + +[[!inline pages="internal(2022/info/grail-after)" raw="yes"]] + +[[!inline pages="internal(2022/info/grail-nav)" raw="yes"]] + +[[!taglink CategoryLinguistics]] | 
