remove backslashes

author: Sacha Chua <sacha@sachachua.com> 2022-12-10 16:48:09 -0500
committer: Sacha Chua <sacha@sachachua.com> 2022-12-10 16:48:09 -0500
commit: 3a8a34d7a50f679f0d3715c339cd5652e2deb7ce (patch)
tree: 66ed7db24c17af1a72f286246c936bec1b15bd8c /2022/talks/grail.md
parent: 9f0801ef2f6ace5ca7a74f465f4479624de72a9d (diff)
download: emacsconf-wiki-3a8a34d7a50f679f0d3715c339cd5652e2deb7ce.tar.xz
emacsconf-wiki-3a8a34d7a50f679f0d3715c339cd5652e2deb7ce.zip
1 files changed, 34 insertions, 34 deletions
diff --git a/2022/talks/grail.md b/2022/talks/grail.md
index b53efc44..694d37a8 100644
--- a/2022/talks/grail.md
+++ b/2022/talks/grail.md
@@ -24,16 +24,16 @@ domain to language&#x2014;text and speech.  Computational Linguistics (CL),
 a.k.a. Natural Language Processing (NLP), is a sub-area of AI that tries to
 interpret them.  It involves modeling and predicting complex linguistic
 structures from these signals.  These models tend to rely heavily on a large
-amount of \`\`raw'' (naturally occurring) data and a varying amount of
-(manually) enriched data, commonly known as \`\`annotations''.  The models are
+amount of ``raw'' (naturally occurring) data and a varying amount of
+(manually) enriched data, commonly known as ``annotations''.  The models are
 only as good as the quality of the annotations. Owing to the complex and
 numerous nature of linguistic phenomena, a divide and conquer approach is
 common.  The upside is that it allows one to focus on one, or few, related
 linguistic phenomena.  The downside is that the universe of these phenomena
 keeps expanding as language is context sensitive and evolves over time.  For
-example, depending on the context, the word \`\`bank'' can refer to a financial
+example, depending on the context, the word ``bank'' can refer to a financial
 institution, or the rising ground surrounding a lake, or something else.  The
-verb \`\`google'' did not exist before the company came into being.
+verb ``google'' did not exist before the company came into being.
 
 Manually annotating data can be a very task specific, labor intensive,
 endeavor.  Owing to this, advances in multiple modalities have happened in
@@ -79,17 +79,17 @@ built-in capabilities for creating and evaluating solution prototypes.
     quickly and may not have provided useful context or might have made
     errors.
 -   .
--   Please feel free to email me at pradhan\@cemantix.org for any futher
+-   Please feel free to email me at pradhan@cemantix.org for any futher
     questions or discussions you may want to have with me or be part of
-    the grail community (doesn\'t exist yet :-), or is a community of 1)
+    the grail community (doesn't exist yet :-), or is a community of 1)
 -   .
 
 ## Questions and answers
 
--   Q: Has the \'92 UPenn corpus of articles feat been reproduced over
+-   Q: Has the '92 UPenn corpus of articles feat been reproduced over
     and over again using these tools?
     -   A: 
-    -   Yes. The \'92 corpus only annotated syntactic structure. It was
+    -   Yes. The '92 corpus only annotated syntactic structure. It was
         probably the first time that the details captured in syntax were
         selected not purely based on linguistic accuracy, but on the
         consistency of such annotations across multiple annotators. This
@@ -110,7 +110,7 @@ built-in capabilities for creating and evaluating solution prototypes.
         related to the brittleness of the representations. For example,
         I remember when we were building the OntoNotes corpus, there was
         a point where the guidelines were changed to split all words at
-        a \'hyphen\'. That simple change cause a lot of heartache
+        a 'hyphen'. That simple change cause a lot of heartache
         because the interdependencies were not captured at a level that
         could be programmatically manipulated. That was around 2007 when
         I decided to use a relational database architecture to represent
@@ -123,11 +123,11 @@ built-in capabilities for creating and evaluating solution prototypes.
         you have no idea if the whole is consistent. And when came
         across org-mode sometime around 2011/12 (if I remember
         correctly) I thought it would be a great tool. And indeed about
-        decade in the future I am trying to stand on it\'s and emacs\'
+        decade in the future I am trying to stand on it's and emacs'
         shoulders.
     -   This corpus was one of the first large scale manually annotated
         corpora that bootstrapped the statistical natural language
-        processing era.  That can be considered the first wave\... 
+        processing era.  That can be considered the first wave... 
         SInce then, there have been  more corpora built on the same
         philosophy.  In fact I spent about 8 years about a decade ago
         building a much larger corpus with more layers of information
@@ -146,7 +146,7 @@ built-in capabilities for creating and evaluating solution prototypes.
         built, but the idea is to identify patterns and build upon them
         to create a larger collection of transformations that could be
         generally useful.  That could help capture the abstract
-        reprsentation of \"meaning\" and help the models learn better. 
+        reprsentation of "meaning" and help the models learn better. 
     -   These days most models are trained on a boat load of data and no
         matter how much data you use to train your largest model, it is
         still going to be a small spec in the universe of ever growing
@@ -159,7 +159,7 @@ built-in capabilities for creating and evaluating solution prototypes.
         deriving the function itself. You can get close, but then then
         you cannot really do a lot better with that model :-)
     -   I did a brief stint at the Harvard Medical School/Boston
-        Childrens\' Hospital to see if we would use the same underlying
+        Childrens' Hospital to see if we would use the same underlying
         philosophy to build better models for understanding clinical
         notes. It would be an extremely useful and socially beneficial
         use case, but then after a few years and realizing that the
@@ -172,28 +172,28 @@ built-in capabilities for creating and evaluating solution prototypes.
         older people and using which neurologists can predict a
         potential early onset of some neurological disorder. The idea is
         to see if we can use speech and langauge signals to predict such
-        cases early on. The fact that we don\'t have cures for those
+        cases early on. The fact that we don't have cures for those
         conditions yet, the best we can do it identify them earlier with
         the hope that the progression can be slowed down.
     -   .
     -   This is sort of what is happening with the deep learning hype.
         It is not to say that there hasn;t been a significant
         advancement in the technologies, but to say that the models can
-        \"learn\" is an extremely overstatement. 
+        "learn" is an extremely overstatement. 
+
 
-\
 
 -   Q: Reminds me of the advantages of pre computer copy and paste. Cut
     up paper and rearange but having more stuff with your pieces.
     -   A: Right! 
-    -   Kind of like that, but more \"intelligent\" than copy/paste,
+    -   Kind of like that, but more "intelligent" than copy/paste,
         because you could have various local constraints that would
         ensure that the information that is consistent with the whole. I
         am also ensioning this as a usecase of hooks. And if you can
         have rich local dependencies, then you can be sure (as much as
         you can) that the information signal is not too corrupted.
     -   .
-    -   I did not read the \"cut up paper\" you mentioned. That is an
+    -   I did not read the "cut up paper" you mentioned. That is an
         interesting thought. In fact, the kind of thing I was/am
         envisioning is that you can cut the paper a million ways but
         then you can still join them back to form the original piece of
@@ -203,7 +203,7 @@ built-in capabilities for creating and evaluating solution prototypes.
 <!-- -->
 ```
 
-\
+
 
 -   Q: Have you used it on some real life situation?
     -   A: NO. 
@@ -223,8 +223,8 @@ built-in capabilities for creating and evaluating solution prototypes.
 
 -   Q:Do you see this as a format for this type of annotation
     specifically, or something more general that can be used for
-    interlinear glosses, lexicons, etc? \-- Does wordsense include a
-    valence on positive or negative words\-- (mood) . 
+    interlinear glosses, lexicons, etc? -- Does wordsense include a
+    valence on positive or negative words-- (mood) . 
 
 -   Interesting. question.  There are sub-corpora that have some of this
     data. 
@@ -234,11 +234,11 @@ built-in capabilities for creating and evaluating solution prototypes.
         propositional structure which uses a large lexicon that covers
         about 15K verbs and nouns and all their argument structures that
         we have been seen so far in the corpora. There is about a
-        million \"propositions\" that have been released recently (we
+        million "propositions" that have been released recently (we
         just recently celebrated a 20th birthday of the corpus. It is
         called the PropBank. 
 
--   There is an interesting history of the \"Banks\" . It started with
+-   There is an interesting history of the "Banks" . It started with
     Treebank, and then there was PropBank (with a capital B), but then
     when we were developing OntoNotes which contains:
     -   Syntax
@@ -247,14 +247,14 @@ built-in capabilities for creating and evaluating solution prototypes.
     -   Propositions
     -   Word Sensse 
 
--   All in the same whole and across various genre\... (can add more
-    information here later\... )
+-   All in the same whole and across various genre... (can add more
+    information here later... )
 
 -   Q: Are there parallel efforts to analyze literary texts or news
     articles? Pulling the ambiguity of meaning and not just the syntax
-    out of works? (Granted this may be out of your area\-- ignore as
+    out of works? (Granted this may be out of your area-- ignore as
     desired)
-    -   A: :-) Nothing that relates to \"meaning\" falls too far away
+    -   A: :-) Nothing that relates to "meaning" falls too far away
         from where I would like to be. It is a very large landscape and
         growing very fast, so it is hard to be able to be everywhere at
         the same time :-)
@@ -262,7 +262,7 @@ built-in capabilities for creating and evaluating solution prototypes.
     -   Many people are working on trying to analyze literature.
         Analyzing news stories has been happening since the beginning of
         the statistical NLP revolution---sort of linked to the fact that
-        the first million \"trees\" were curated using WSJ articles :-)
+        the first million "trees" were curated using WSJ articles :-)
 
 -   Q: Have you considered support for conlangs, such as Toki Pona?  The
     simplicity of Toki Pona seems like it would lend itself well to
@@ -270,10 +270,10 @@ built-in capabilities for creating and evaluating solution prototypes.
     -   A:  This is the first time I hearing of conlangs and Toki Pona.
         I would love to know more about them to say more, but I cannot
         imaging any langauge not being able to use this framework.
-    -   conlangs are \"constructed languages\" such as Esperanto ---
+    -   conlangs are "constructed languages" such as Esperanto ---
         languages designed with intent, rather than evolved over
         centuries.  Toki Pona is a minimal conlang created in 2001, with
-        a uniform syntax and small (\<200 word) vocabulary.
+        a uniform syntax and small (<200 word) vocabulary.
     -   Thanks for the information! I would love to look into it.
 
 -   Q: Is there a roadmap of sorts for GRAIL?
@@ -288,10 +288,10 @@ built-in capabilities for creating and evaluating solution prototypes.
         idea and pitch in, I guess.
 
 -   Q: How can GRAIL be used by common people?
-    -   A: I don\'t think it can be used by common people at the very
-        moment---partly because most \"common man\" has never heard of
+    -   A: I don't think it can be used by common people at the very
+        moment---partly because most "common man" has never heard of
         emacs or org-mode. But if we can valide the concept and if it
-        does \"grow legs\" and walk out of the emacs room into the
+        does "grow legs" and walk out of the emacs room into the
         larger universe, then absolutely, anyone who can have any say
         about langauge could use it. And the contributions would be as
         useful as the consistency with which one can capture a certain
@@ -305,7 +305,7 @@ built-in capabilities for creating and evaluating solution prototypes.
 -   Q: 
     -   A: 
 
-\
+
 
 
 [[!inline pages="internal(2022/info/grail-after)" raw="yes"]]
author	Sacha Chua <sacha@sachachua.com>	2022-12-10 16:48:09 -0500
committer	Sacha Chua <sacha@sachachua.com>	2022-12-10 16:48:09 -0500
commit	3a8a34d7a50f679f0d3715c339cd5652e2deb7ce (patch)
tree	66ed7db24c17af1a72f286246c936bec1b15bd8c /2022/talks/grail.md
parent	9f0801ef2f6ace5ca7a74f465f4479624de72a9d (diff)
download	emacsconf-wiki-3a8a34d7a50f679f0d3715c339cd5652e2deb7ce.tar.xz emacsconf-wiki-3a8a34d7a50f679f0d3715c339cd5652e2deb7ce.zip