WEBVTT

00:00:00.000 --> 00:00:04.859
Hey, everybody. Welcome from frigid Omaha, Nebraska.

00:00:04.860 --> 00:00:06.619
I'm just going to kick off my talk here,

00:00:06.620 --> 00:00:23.899
and we'll see how it all goes. Thanks for attending.

00:00:23.900 --> 00:00:26.939
So the slides will be available on my site, growthy.us,

00:00:26.940 --> 00:00:29.899
in the presentation section tonight or tomorrow.

00:00:29.900 --> 00:00:33.099
This is a quick intro to one way to do private AI in Emacs.

00:00:33.100 --> 00:00:35.299
There are a lot of other ways to do it.

00:00:35.300 --> 00:00:38.899
This one is really just more or less the easiest way to do it.

00:00:38.900 --> 00:00:40.379
It's a minimal viable product

00:00:40.380 --> 00:00:42.379
to get you an idea of how to get started with it

00:00:42.380 --> 00:00:43.859
and how to give it a spin.

00:00:43.860 --> 00:00:45.819
Really hope some of you give it a shot

00:00:45.820 --> 00:00:48.179
and learn something along the way.

00:00:48.180 --> 00:00:50.379
So the overview of the talk.

00:00:50.380 --> 00:00:54.939
broke down these basic bullet points of why private AI,

00:00:54.940 --> 00:00:58.939
what do I need to do private AI, Emacs and private AI,

00:00:58.940 --> 00:01:02.739
pieces for an AI Emacs solution,

00:01:02.740 --> 00:01:08.059
a demo of a minimal viable product, and the summary.

00:01:08.060 --> 00:01:10.779
Why private AI? This is pretty simple.

00:01:10.780 --> 00:01:12.099
Just read the terms and conditions

00:01:12.100 --> 00:01:14.819
for any AI system you're currently using.

00:01:14.820 --> 00:01:17.019
If you're using the free tiers, your queries,

00:01:17.020 --> 00:01:18.619
code uploaded information

00:01:18.620 --> 00:01:20.699
is being used to train the models.

00:01:20.700 --> 00:01:22.939
In some cases, you are giving the company

00:01:22.940 --> 00:01:25.419
a perpetual license to your data.

00:01:25.420 --> 00:01:27.059
You have no control over this,

00:01:27.060 --> 00:01:29.219
except for not using the engine.

00:01:29.220 --> 00:01:30.699
And keep in mind, the terms

00:01:30.700 --> 00:01:32.179
are changing all the time on that,

00:01:32.180 --> 00:01:34.139
and they're not normally changing for our benefit.

00:01:34.140 --> 00:01:38.259
So that's not necessarily a good thing.

00:01:38.260 --> 00:01:40.339
If you're using the paid tiers,

00:01:40.340 --> 00:01:43.459
you may be able to opt out of the data collection.

00:01:43.460 --> 00:01:45.539
But keep in mind, this can change,

00:01:45.540 --> 00:01:48.619
or they may start charging for that option.

00:01:48.620 --> 00:01:51.419
Every AI company wants more and more data.

00:01:51.420 --> 00:01:53.779
They need more and more data to train their models.

00:01:53.780 --> 00:01:56.019
It is just the way it is.

00:01:56.020 --> 00:01:57.899
They need more and more information

00:01:57.900 --> 00:02:00.459
to get it more and more accurate to keep it up to date.

00:02:00.460 --> 00:02:03.219
There's been a story about Stack Overflow.

00:02:03.220 --> 00:02:05.819
It has like half the number of queries they had a year ago

00:02:05.820 --> 00:02:07.379
because people are using AI.

00:02:07.380 --> 00:02:08.579
The problem with that is now

00:02:08.580 --> 00:02:10.379
there's less data going to Stack Overflow

00:02:10.380 --> 00:02:12.979
for the AI to get. vicious cycle,

00:02:12.980 --> 00:02:14.619
especially when you start looking at

00:02:14.620 --> 00:02:16.579
newer language like Ruby and stuff like that.

00:02:16.580 --> 00:02:21.419
So it comes down to being an interesting time.

00:02:21.420 --> 00:02:24.739
Another reason why to go private AI is your costs are going to vary.

00:02:24.740 --> 00:02:27.019
Right now, these services are being heavily subsidized.

00:02:27.020 --> 00:02:29.419
If you're paying Claude $20 a month,

00:02:29.420 --> 00:02:32.579
it is not costing Claude, those guys $20 a month

00:02:32.580 --> 00:02:34.099
to host all the infrastructure

00:02:34.100 --> 00:02:35.619
to build all these data centers.

00:02:35.620 --> 00:02:38.779
They are severely subsidizing that

00:02:38.780 --> 00:02:41.259
at a very much a loss right now.

00:02:41.260 --> 00:02:43.659
When they start charging the real costs plus a profit,

00:02:43.660 --> 00:02:45.499
it's going to change.

00:02:45.500 --> 00:02:48.019
Right now, I use a bunch of different services.

00:02:48.020 --> 00:02:50.019
I've played with Grok and a bunch of other ones.

00:02:50.020 --> 00:02:52.459
But Grok right now is like $30 a month

00:02:52.460 --> 00:02:54.139
for a regular Super Grok.

00:02:54.140 --> 00:02:56.419
When they start charging the real cost of that,

00:02:56.420 --> 00:02:59.819
it's going to go from $30 to something a great deal more,

00:02:59.820 --> 00:03:02.379
perhaps, I think, $100 or $200

00:03:02.380 --> 00:03:04.459
or whatever really turns out to be the cost

00:03:04.460 --> 00:03:06.059
when you figure everything into it.

00:03:06.060 --> 00:03:07.539
When you start adding that cost into that,

00:03:07.540 --> 00:03:10.179
a lot of people are using public AI right now

00:03:10.180 --> 00:03:11.899
are going to have no option but to move to private AI

00:03:11.900 --> 00:03:16.019
or give up on AI overall.

00:03:16.020 --> 00:03:18.659
What do you need to be able to do private AI?

00:03:18.660 --> 00:03:21.179
If you're going to run your own AI,

00:03:21.180 --> 00:03:23.579
you're going to need a system with either some cores,

00:03:23.580 --> 00:03:25.699
a graphics processor unit,

00:03:25.700 --> 00:03:28.339
or a neural processing unit, a GPU or an NPU.

00:03:28.340 --> 00:03:29.819
I currently have four systems

00:03:29.820 --> 00:03:32.979
I'm experimenting with and playing around with on a daily basis.

00:03:32.980 --> 00:03:37.979
I have a System76 Pangolin AMD Ryzen 7 78040U

00:03:37.980 --> 00:03:41.099
with a Radeon 7080M integrated graphics card.

00:03:41.100 --> 00:03:42.539
It's got 32 gigs of RAM.

00:03:42.540 --> 00:03:45.259
It's a beautiful piece of hardware. I really do like it.

00:03:45.260 --> 00:03:46.499
I have my main workstation,

00:03:46.500 --> 00:03:50.579
it's an HP Z620 with dual Intel Xeons

00:03:50.580 --> 00:03:53.179
with four NVIDIA K2200 graphics cards in it.

00:03:53.180 --> 00:03:56.699
Why the four NVIDIA K2200 graphics card on it?

00:03:56.700 --> 00:03:59.739
Because I could buy four of them on eBay for $100

00:03:59.740 --> 00:04:02.379
and it was still supported by the NVIDIA drivers for Debian.

00:04:02.380 --> 00:04:08.179
So that's why that is. A MacBook Air with an M1 processor,

00:04:08.180 --> 00:04:10.939
a very nice piece of kit I picked up a couple years ago,

00:04:10.940 --> 00:04:14.139
very cheap, but it runs AI surprisingly well,

00:04:14.140 --> 00:04:18.099
and an Acer Aspire 1 with an AMD Ryzen 5700H in it.

00:04:18.100 --> 00:04:22.099
This was my old laptop. It was a sturdy beast.

00:04:22.100 --> 00:04:24.379
It was able to do enough AI to do demos and stuff,

00:04:24.380 --> 00:04:25.859
and I liked it quite a bit for that.

00:04:25.860 --> 00:04:28.339
I'm using the Pangolin for this demonstration

00:04:28.340 --> 00:04:30.979
because it's just better.

00:04:30.980 --> 00:04:37.219
Apple's M4 chip has 38 teraflops of MPU performance.

00:04:37.220 --> 00:04:40.099
The Microsoft co-pilots are now requiring

00:04:40.100 --> 00:04:41.459
45 teraflops of MPU

00:04:41.460 --> 00:04:43.939
to be able to have the co-pilot badge on it.

00:04:43.940 --> 00:04:48.299
And Raspberry Pi's new AI top is about 18 teraflops

00:04:48.300 --> 00:04:51.219
and is $70 on top of the cost of Raspberry Pi 5.

00:04:51.220 --> 00:04:56.059
Keep in mind Raspberry recently

00:04:56.060 --> 00:04:59.499
raised the cost of their Pi 5s because of RAM pricing,

00:04:59.500 --> 00:05:00.379
which is going to be affecting

00:05:00.380 --> 00:05:02.459
a lot of these types of solutions in the near future.

00:05:02.460 --> 00:05:05.299
But there's going to be a lot of

00:05:05.300 --> 00:05:06.699
local power available in the future.

00:05:06.700 --> 00:05:08.219
That's what it really comes down to.

00:05:08.220 --> 00:05:11.179
A lot of people are going to have PCs on their desks.

00:05:11.180 --> 00:05:13.459
They're going to run a decent private AI

00:05:13.460 --> 00:05:18.059
without much issue. So for Emacs and private AI,

00:05:18.060 --> 00:05:20.139
there's a couple popular solutions.

00:05:20.140 --> 00:05:22.099
Gptel, which is the one we're going to talk about.

00:05:22.100 --> 00:05:24.739
It's a simple interface. It's a minimal interface.

00:05:24.740 --> 00:05:26.579
It integrates easily into your workflow.

00:05:26.580 --> 00:05:29.019
It's just, quite honestly, chef's kiss,

00:05:29.020 --> 00:05:31.059
just a beautifully well-done piece of software.

00:05:31.060 --> 00:05:33.859
OllamaBuddy has more features,

00:05:33.860 --> 00:05:36.259
a menu interface, has quick access

00:05:36.260 --> 00:05:37.499
for things like code refactoring,

00:05:37.500 --> 00:05:38.979
text-free formatting, et cetera.

00:05:38.980 --> 00:05:41.979
This is the one that you spend a little more time with,

00:05:41.980 --> 00:05:43.939
but you also get a little bit more back from it.

00:05:43.940 --> 00:05:49.419
Elama is another one, has some really good features to it,

00:05:49.420 --> 00:05:51.059
more different capabilities,

00:05:51.060 --> 00:05:54.979
but it's a different set of rules and capabilities to it.

00:05:54.980 --> 00:05:59.179
Itermac, which is programming with your AI and Emacs.

00:05:59.180 --> 00:06:01.219
The closest thing I can come up

00:06:01.220 --> 00:06:04.139
to comparing this to is Cursor, except it's an Emacs.

00:06:04.140 --> 00:06:05.659
It's really quite well done.

00:06:05.660 --> 00:06:07.299
These are all really quite well done.

00:06:07.300 --> 00:06:08.499
There's a bunch of other projects out there.

00:06:08.500 --> 00:06:10.819
If you go out to GitHub, type Emacs AI,

00:06:10.820 --> 00:06:13.219
you'll find a lot of different options.

00:06:13.220 --> 00:06:18.459
So what is a minimal viable product that can be done?

00:06:18.460 --> 00:06:23.379
A minimal viable product to show what an AI Emacs solution is

00:06:23.380 --> 00:06:27.179
can be done with only needing two pieces of software.

00:06:27.180 --> 00:06:31.179
Llamafile, this is an amazing piece of software.

00:06:31.180 --> 00:06:32.899
This is a whole LLM contained in one file.

00:06:32.900 --> 00:06:36.059
And the same file runs on Mac OS X,

00:06:36.060 --> 00:06:39.379
Linux, Windows, and the BSDs.

00:06:39.380 --> 00:06:42.179
It's a wonderful piece of kit

00:06:42.180 --> 00:06:44.179
based on these people who created

00:06:44.180 --> 00:06:45.899
this thing called Cosmopolitan

00:06:45.900 --> 00:06:46.779
that lets you create and execute

00:06:46.780 --> 00:06:48.699
while it runs on a bunch of different systems.

00:06:48.700 --> 00:06:51.299
And Gptel, which is an easy plug-in for Emacs,

00:06:51.300 --> 00:06:54.979
which we talked about in the last slide a bit.

00:06:54.980 --> 00:07:00.179
So setting up the LLM, you have to just go out

00:07:00.180 --> 00:07:01.699
and just hit the a page for it

00:07:01.700 --> 00:07:05.099
and go out and do a wget of it.

00:07:05.100 --> 00:07:07.099
That's all it takes there.

00:07:07.100 --> 00:07:10.259
Chmodding it so you can actually execute the executable.

00:07:10.260 --> 00:07:12.939
And then just go ahead and actually running it.

00:07:12.940 --> 00:07:16.939
And let's go ahead and do that.

00:07:16.940 --> 00:07:18.899
I've already downloaded it because I don't want to wait.

00:07:18.900 --> 00:07:21.259
And let's just take a look at it.

00:07:21.260 --> 00:07:22.899
I've actually downloaded several of them,

00:07:22.900 --> 00:07:25.699
but let's go ahead and just run lava 3.2b

00:07:25.700 --> 00:07:31.179
with the 3 billion instructions. And that's it firing up.

00:07:31.180 --> 00:07:33.899
And it is nice enough to actually be listening in port 8080,

00:07:33.900 --> 00:07:35.339
which we'll need in a minute.

00:07:35.340 --> 00:07:43.139
So once you do that, you have to install gptel and emacs.

00:07:43.140 --> 00:07:45.659
That's as simple as firing up emacs,

00:07:45.660 --> 00:07:48.339
doing the meta x install package,

00:07:48.340 --> 00:07:49.779
and then just typing gptel

00:07:49.780 --> 00:07:51.499
if you have your repository set up right,

00:07:51.500 --> 00:07:52.299
which hopefully you do.

00:07:52.300 --> 00:07:54.499
And then you just go ahead and have it.

00:07:54.500 --> 00:07:58.139
You also have to set up a config file.

00:07:58.140 --> 00:08:01.739
Here's my example config file as it currently set up,

00:08:01.740 --> 00:08:04.019
requiring ensuring Gptel is loaded,

00:08:04.020 --> 00:08:05.899
defining the Llamafile backend.

00:08:05.900 --> 00:08:07.779
You can put multiple backends into it,

00:08:07.780 --> 00:08:09.859
but I just have the one defined on this example.

00:08:09.860 --> 00:08:12.059
But it's pretty straightforward.

00:08:12.060 --> 00:08:16.739
Llama local file, name for it, stream, protocol HTTP.

00:08:16.740 --> 00:08:20.859
If you have HTTPS set up, that's obviously preferable,

00:08:20.860 --> 00:08:22.779
but a lot of people don't for their home labs.

00:08:22.780 --> 00:08:26.379
Host is just 127.0.0.1 port 8080.

00:08:26.380 --> 00:08:30.099
Keep in mind, some of the AIs run on a different port,

00:08:30.100 --> 00:08:31.499
so you may be 8081

00:08:31.500 --> 00:08:34.619
if you're running OpenWebView at the same time. The key,

00:08:34.620 --> 00:08:37.019
we don't need an API key because it's a local server.

00:08:37.020 --> 00:08:40.259
And the models just, uh, we can put multiple models

00:08:40.260 --> 00:08:41.339
on there if we want to.

00:08:41.340 --> 00:08:43.699
So if we create one with additional stuff

00:08:43.700 --> 00:08:45.379
or like rag and stuff like that,

00:08:45.380 --> 00:08:47.459
we can actually name those models by their domain,

00:08:47.460 --> 00:08:48.699
which is really kind of cool.

00:08:48.700 --> 00:08:52.099
But, uh, that's all that takes.

00:08:52.100 --> 00:09:03.779
So let's go ahead and go to a quick test of it.

00:09:03.780 --> 00:09:11.019
Oops. Alt-X, gptel. And we're going to just choose

00:09:11.020 --> 00:09:12.499
the default buffer to make things easier.

00:09:12.500 --> 00:09:15.339
Going to resize it up a bit.

00:09:15.340 --> 00:09:19.859
And usually the go-to question I go to is, who was David Bowie?

00:09:19.860 --> 00:09:24.499
This one is actually a question

00:09:24.500 --> 00:09:26.219
that's turned out to be really good

00:09:26.220 --> 00:09:28.019
for figuring out whether or not AI is complete.

00:09:28.020 --> 00:09:31.139
This is one that some engines do well on, other ones don't.

00:09:31.140 --> 00:09:33.739
And we can just do, we can either do

00:09:33.740 --> 00:09:36.059
the alt X and send the gptel-send,

00:09:36.060 --> 00:09:37.979
or we can just do control C and hit enter.

00:09:37.980 --> 00:09:39.139
We'll just do control C and enter.

00:09:39.140 --> 00:09:43.659
And now it's going ahead and hitting our local AI system

00:09:43.660 --> 00:09:46.659
running on port 8080. And that looks pretty good,

00:09:46.660 --> 00:09:50.739
but let's go ahead and say, hey, it's set to terse mode right now.

00:09:50.740 --> 00:10:03.859
Please expand upon this. And there we go.

00:10:03.860 --> 00:10:05.379
We're getting a full description

00:10:05.380 --> 00:10:08.739
of the majority of, uh, about David Bowie's life

00:10:08.740 --> 00:10:10.139
and other information about him.

00:10:10.140 --> 00:10:21.699
So very, very happy with that.

00:10:21.700 --> 00:10:23.539
One thing to keep in mind is you look at things

00:10:23.540 --> 00:10:24.699
when you're looking for hallucinations,

00:10:24.700 --> 00:10:26.899
how accurate AI is, how it's compressed

00:10:26.900 --> 00:10:29.259
is it will tend to screw up on things like

00:10:29.260 --> 00:10:30.859
how many children he had and stuff like that.

00:10:30.860 --> 00:10:32.459
Let me see if it gets to that real quick.

00:10:32.460 --> 00:10:39.739
Is it not actually on this one?

00:10:39.740 --> 00:10:42.179
Alright, so that's the first question I always ask one.

00:10:42.180 --> 00:10:44.659
The next one is what are sea monkeys?

00:10:44.660 --> 00:10:48.979
It gives you an idea of the breadth of the system.

00:10:48.980 --> 00:11:10.619
It's querying right now. Pulls it back correctly. Yes.

00:11:10.620 --> 00:11:12.339
And it's smart enough to actually detect David Bowie

00:11:12.340 --> 00:11:15.019
even referenced see monkeys in the song sea of love,

00:11:15.020 --> 00:11:16.179
which came at hit single.

00:11:16.180 --> 00:11:18.859
So it's actually keeping the context alive

00:11:18.860 --> 00:11:20.419
and that which is very cool feature.

00:11:20.420 --> 00:11:21.459
I did not see that coming.

00:11:21.460 --> 00:11:24.139
Here's one that some people say is a really good one

00:11:24.140 --> 00:11:25.739
to ask ours in strawberry.

00:11:25.740 --> 00:11:46.179
All right, now she's going off the reservation.

00:11:46.180 --> 00:11:48.139
She's going in a different direction.

00:11:48.140 --> 00:11:49.979
Let me go ahead and reopen that again,

00:11:49.980 --> 00:11:52.979
because it's went down a bad hole there for a second.

00:11:52.980 --> 00:11:58.419
Let me ask it to do write hello world in Emacs list.

00:11:58.420 --> 00:12:10.419
Yep, that works. So the point being here,

00:12:10.420 --> 00:12:14.939
that was like two minutes of setup.

00:12:14.940 --> 00:12:18.019
And now we have a small AI embedded inside the system.

00:12:18.020 --> 00:12:20.539
So that gives you an idea just how easy it can be.

00:12:20.540 --> 00:12:22.299
And it's just running locally on the system.

00:12:22.300 --> 00:12:25.259
We also have the default system here as well.

00:12:25.260 --> 00:12:32.579
So not that bad.

00:12:32.580 --> 00:12:35.379
That's a basic solution, that's a basic setup

00:12:35.380 --> 00:12:37.059
that will get you to the point where you can go like,

00:12:37.060 --> 00:12:39.859
it's a party trick, but it's a very cool party trick.

00:12:39.860 --> 00:12:42.859
The way that Gptel works is it puts it into buffers,

00:12:42.860 --> 00:12:45.099
it doesn't interfere with your flow that much,

00:12:45.100 --> 00:12:47.179
it's just an additional window you can pop open

00:12:47.180 --> 00:12:49.019
to ask questions and get information for,

00:12:49.020 --> 00:12:51.459
dump code into it and have it refactored.

00:12:51.460 --> 00:12:53.339
Gptel has a lot of additional options

00:12:53.340 --> 00:12:55.699
for things that are really cool for that.

00:12:55.700 --> 00:12:57.099
But if you want a better solution,

00:12:57.100 --> 00:12:59.939
I recommend Ollama or LM Studio.

00:12:59.940 --> 00:13:01.899
They're both more capable than llama file.

00:13:01.900 --> 00:13:03.859
They can accept a lot of different models.

00:13:03.860 --> 00:13:05.739
You can do things like RAG.

00:13:05.740 --> 00:13:09.219
You can do loading of things onto the GPU more explicitly.

00:13:09.220 --> 00:13:10.379
It can speed stuff up.

00:13:10.380 --> 00:13:13.059
One of the things about the retrieval augmentation is

00:13:13.060 --> 00:13:15.539
it will let you put your data into the system

00:13:15.540 --> 00:13:17.779
so you can start uploading your code, your information,

00:13:17.780 --> 00:13:20.139
and actually being able to do analysis of it.

00:13:20.140 --> 00:13:23.539
OpenWebUI provides more capabilities.

00:13:23.540 --> 00:13:24.859
It provides an interface that's similar

00:13:24.860 --> 00:13:25.899
to what you're used to seeing

00:13:25.900 --> 00:13:28.179
for chat, GPT, and the other systems.

00:13:28.180 --> 00:13:29.419
It's really quite well done.

00:13:29.420 --> 00:13:32.539
And once again, gptel, I have to mention that

00:13:32.540 --> 00:13:34.779
because that's the one I really kind of like.

00:13:34.780 --> 00:13:36.899
And OlamaBuddy is also another really nice one.

00:13:36.900 --> 00:13:41.019
So what about the licensing of these models?

00:13:41.020 --> 00:13:42.299
Since I'm going out pulling down

00:13:42.300 --> 00:13:43.579
a model and doing this stuff.

00:13:43.580 --> 00:13:46.579
Let's take a look at a couple of highlights

00:13:46.580 --> 00:13:49.379
from the MetaLlama 3 community license scale.

00:13:49.380 --> 00:13:52.579
If your service exceeds 700 million monthly users,

00:13:52.580 --> 00:13:54.099
you need additional licensing.

00:13:54.100 --> 00:13:56.099
Probably not going to be a problem for most of us.

00:13:56.100 --> 00:13:58.379
There's a competition restriction.

00:13:58.380 --> 00:14:00.899
You can't use this model to enhance competing models.

00:14:00.900 --> 00:14:04.219
And there's some limitations on using the Meta trademarks.

00:14:04.220 --> 00:14:05.939
Not that big a deal.

00:14:05.940 --> 00:14:09.139
And the other ones are it's a permissive one

00:14:09.140 --> 00:14:10.939
designed to encourage innovation,

00:14:10.940 --> 00:14:13.779
open development, commercial use is allowed,

00:14:13.780 --> 00:14:15.219
but there are some restrictions on it.

00:14:15.220 --> 00:14:17.259
Yeah, you can modify the model,

00:14:17.260 --> 00:14:20.419
but you have to rely on the license terms.

00:14:20.420 --> 00:14:22.339
And you can distribute the model with derivatives.

00:14:22.340 --> 00:14:24.059
And there are some very cool ones out there.

00:14:24.060 --> 00:14:25.259
There's people who've done things

00:14:25.260 --> 00:14:29.579
to try and make the llama bee less, what's the phrase,

00:14:29.580 --> 00:14:31.939
ethical if you're doing penetration testing research

00:14:31.940 --> 00:14:32.619
and stuff like that.

00:14:32.620 --> 00:14:34.459
It has some very nice value there.

00:14:34.460 --> 00:14:37.739
Keep in mind licenses also vary

00:14:37.740 --> 00:14:39.619
depending on the model you're using.

00:14:39.620 --> 00:14:42.419
Mistral AI has the non-production license.

00:14:42.420 --> 00:14:45.219
It's designed to keep it to research and development.

00:14:45.220 --> 00:14:46.739
You can't use it commercially.

00:14:46.740 --> 00:14:50.419
So it's designed to clearly delineate

00:14:50.420 --> 00:14:52.939
between research and development

00:14:52.940 --> 00:14:54.259
and somebody trying to actually build

00:14:54.260 --> 00:14:55.379
something on top of it.

00:14:55.380 --> 00:14:57.979
And another question I get asked is,

00:14:57.980 --> 00:14:59.899
are there open source data model options?

00:14:59.900 --> 00:15:02.819
Yeah, but most of them are small or specialized currently.

00:15:02.820 --> 00:15:05.499
MoMo is a whole family of them,

00:15:05.500 --> 00:15:07.339
but there tend to be more specialized,

00:15:07.340 --> 00:15:09.019
but it's very cool to see where it's going.

00:15:09.020 --> 00:15:11.339
And it's another thing that's just going forward.

00:15:11.340 --> 00:15:13.379
It's under the MIT license.

00:15:13.380 --> 00:15:15.819
Some things to know to help you

00:15:15.820 --> 00:15:17.499
have a better experience with this.

00:15:17.500 --> 00:15:21.059
Get a Llama and OpenWebUI working by themselves,

00:15:21.060 --> 00:15:22.659
then set up your config file.

00:15:22.660 --> 00:15:24.819
I was fighting both at the same time,

00:15:24.820 --> 00:15:26.699
and it turned out I had a problem with my LLAMA.

00:15:26.700 --> 00:15:28.899
I had a conflict, so that was what my problem is.

00:15:28.900 --> 00:15:32.819
Llamafile, gptel is a great way to start experimenting

00:15:32.820 --> 00:15:34.299
just to get you an idea of how it works

00:15:34.300 --> 00:15:36.939
and figure out how the interfaces work. Tremendous.

00:15:36.940 --> 00:15:40.739
RAG loading documents into it is really easy with open web UI.

00:15:40.740 --> 00:15:43.019
You can create models, you can put things like

00:15:43.020 --> 00:15:46.419
help desk developers and stuff like that, breaking it out.

00:15:46.420 --> 00:15:51.019
The Hacker News has a how to build a $300 AI computer.

00:15:51.020 --> 00:15:52.859
This is for March 2024,

00:15:52.860 --> 00:15:55.099
but it still has a lot of great information

00:15:55.100 --> 00:15:56.819
on how to benchmark the environments,

00:15:56.820 --> 00:16:01.339
what some values are like the Ryzen 5700U

00:16:01.340 --> 00:16:02.579
inside my Acer Aspire,

00:16:02.580 --> 00:16:04.419
that's where I got the idea doing that.

00:16:04.420 --> 00:16:06.739
Make sure you do the RockM stuff correctly

00:16:06.740 --> 00:16:09.899
to get the GUI extensions. But it's just really good stuff.

00:16:09.900 --> 00:16:13.059
You don't need a great GPU or CPU to get started.

00:16:13.060 --> 00:16:14.819
Smaller models like Tiny Llama

00:16:14.820 --> 00:16:16.179
can run on very small systems.

00:16:16.180 --> 00:16:18.499
It gets you the ability to start playing with it

00:16:18.500 --> 00:16:21.619
and start experimenting and figure out if that's for you

00:16:21.620 --> 00:16:23.379
and to move forward with it.

00:16:23.380 --> 00:16:29.219
The AMD Ryzen AI Max 395 plus is a mini PC

00:16:29.220 --> 00:16:31.179
makes it really nice dedicated host.

00:16:31.180 --> 00:16:34.619
You used to be able to buy these for about $1200 now

00:16:34.620 --> 00:16:35.579
with the RAM price increase,

00:16:35.580 --> 00:16:38.779
you want to get 120 gig when you're pushing two brands so.

00:16:38.780 --> 00:16:40.739
It gets a little tighter.

00:16:40.740 --> 00:16:44.099
Macs work remarkably well with AI.

00:16:44.100 --> 00:16:47.659
My MacBook Air was one of my go-tos for a while,

00:16:47.660 --> 00:16:49.779
but once I started doing anything AI,

00:16:49.780 --> 00:16:50.779
I had a five-minute window

00:16:50.780 --> 00:16:52.619
before the thermal throttling became an issue.

00:16:52.620 --> 00:16:54.619
Keep in mind that's a MacBook Air,

00:16:54.620 --> 00:16:56.659
so it doesn't have the greatest ventilation.

00:16:56.660 --> 00:16:58.339
If you get the MacBook Pros and stuff,

00:16:58.340 --> 00:17:00.139
they tend to have more ventilation,

00:17:00.140 --> 00:17:02.499
but still you're going to be pushing against that.

00:17:02.500 --> 00:17:04.939
So Mac Minis and the Mac Ultras and stuff like that

00:17:04.940 --> 00:17:06.099
tend to work really well for that.

00:17:06.100 --> 00:17:09.779
Alex Ziskin on YouTube has a channel.

00:17:09.780 --> 00:17:11.899
He does a lot of AI performance benchmarking,

00:17:11.900 --> 00:17:14.819
like I load a 70 billion parameter model

00:17:14.820 --> 00:17:16.699
on this mini PC and stuff like that.

00:17:16.700 --> 00:17:19.019
It's a lot of fun and interesting stuff there.

00:17:19.020 --> 00:17:21.219
And it's influencing my decision

00:17:21.220 --> 00:17:22.979
to buy my next AI style PC.

00:17:22.980 --> 00:17:27.619
Small domain specific LLMs are happening.

00:17:27.620 --> 00:17:29.939
An LLM that has all your code and information,

00:17:29.940 --> 00:17:31.659
it sounds like a really cool idea.

00:17:31.660 --> 00:17:34.299
It gives you capabilities to start training stuff

00:17:34.300 --> 00:17:35.899
that you couldn't do with like the big ones.

00:17:35.900 --> 00:17:38.059
Even with in terms of fine tuning and stuff,

00:17:38.060 --> 00:17:40.539
it's remarkable to see where that space is coming along

00:17:40.540 --> 00:17:41.739
in the next year or so.

00:17:41.740 --> 00:17:46.219
Hugging Face Co has pointers to tons of AI models.

00:17:46.220 --> 00:17:49.259
You'll find the one that works for you, hopefully there.

00:17:49.260 --> 00:17:50.539
If you're doing cybersecurity,

00:17:50.540 --> 00:17:52.059
there's a whole bunch out there for that,

00:17:52.060 --> 00:17:54.619
that have certain training on it, information.

00:17:54.620 --> 00:17:56.139
It's really good.

00:17:56.140 --> 00:18:00.099
One last thing to keep in mind is hallucinations are real.

00:18:00.100 --> 00:18:02.779
You will get BS back from the AI occasionally,

00:18:02.780 --> 00:18:05.179
so do validate everything you get from it.

00:18:05.180 --> 00:18:08.459
Don't be using it for court cases like some people have

00:18:08.460 --> 00:18:14.539
and run into those problems. So, That is my talk.

00:18:14.540 --> 00:18:17.219
What I would like you to get out of that is,

00:18:17.220 --> 00:18:21.859
if you haven't tried it, give GPTEL and LlamaFile a shot.

00:18:21.860 --> 00:18:23.979
Fire up a little small AI instance,

00:18:23.980 --> 00:18:27.339
play around with a little bit inside your Emacs,

00:18:27.340 --> 00:18:30.139
and see if it makes your life better. Hopefully it will.

00:18:30.140 --> 00:18:32.139
And I really hope you guys

00:18:32.140 --> 00:18:34.659
learned something from this talk. And thanks for listening.

00:18:34.660 --> 00:18:38.979
And the links are at the end of the talk, if you have any questions.

00:18:38.980 --> 00:18:42.739
Let me see if we got anything you want, Pat. You do.

00:18:42.740 --> 00:18:43.899
You've got a few questions.

00:18:43.900 --> 00:18:48.059
Hey, this is Corwin. Thank you so much. Thank you, Aaron.

00:18:48.060 --> 00:18:50.339
What an awesome talk this was, actually.

00:18:50.340 --> 00:18:52.179
If you don't have a camera,

00:18:52.180 --> 00:18:54.339
I can get away with not having one too.

00:18:54.340 --> 00:18:56.299
I've got, I'll turn the camera on.

00:18:56.300 --> 00:19:01.499
Okay. All right. I'll turn mine back on. Here I come.

00:19:01.500 --> 00:19:03.139
Yeah, so there are a few questions,

00:19:03.140 --> 00:19:04.579
but first let me say thank you

00:19:04.580 --> 00:19:06.339
for a really captivating talk.

00:19:06.340 --> 00:19:10.939
I think a lot of people will be empowered from this

00:19:10.940 --> 00:19:15.259
to try to do more with less, especially locally.

00:19:15.260 --> 00:19:20.179
concerned about the data center footprint,

00:19:20.180 --> 00:19:23.659
environmentally concerned

00:19:23.660 --> 00:19:26.979
about the footprint of LLM inside data centers.

00:19:26.980 --> 00:19:28.219
So just thinking about how we can

00:19:28.220 --> 00:19:32.419
put infrastructure we have at home to use

00:19:32.420 --> 00:19:34.019
and get more done with less.

00:19:34.020 --> 00:19:37.499
Yeah, the data center impact's interesting

00:19:37.500 --> 00:19:39.979
because there was a study a while ago.

00:19:39.980 --> 00:19:42.099
Someone said every time you do a Gemini query,

00:19:42.100 --> 00:19:45.019
it's like boiling a cup of water.

00:19:45.020 --> 00:19:48.619
Yeah, I've heard that one too. So do you want to, you know,

00:19:48.620 --> 00:19:51.699
I don't know how much direction you want.

00:19:51.700 --> 00:19:53.859
I'd be very happy to read out the questions for you.

00:19:53.860 --> 00:19:55.219
Yeah, that would be great.

00:19:55.220 --> 00:19:57.619
I'm having trouble getting to that tab.

00:19:57.620 --> 00:20:02.779
Okay, I'm there, so I'll put it into our chat too,

00:20:02.780 --> 00:20:07.419
so you can follow along if you'd like.

00:20:07.420 --> 00:20:11.219
The first question was, why is the David Bowie question

00:20:11.220 --> 00:20:12.219
a good one to start with?

00:20:12.220 --> 00:20:14.419
Does it have interesting failure conditions

00:20:14.420 --> 00:20:17.299
or what made you choose that?

00:20:17.300 --> 00:20:21.979
First off, huge fan of David Bowie.

00:20:21.980 --> 00:20:24.499
But I came down to it really taught me a few things

00:20:24.500 --> 00:20:26.299
about how old the models work

00:20:26.300 --> 00:20:28.819
in terms of things like how many kids he had,

00:20:28.820 --> 00:20:31.779
because deep seek, which is a very popular Chinese model

00:20:31.780 --> 00:20:33.179
that a lot of people are using now,

00:20:33.180 --> 00:20:35.619
misidentifies him having three daughters,

00:20:35.620 --> 00:20:38.459
and he has like one son and one, one, I think,

00:20:38.460 --> 00:20:40.899
two sons and a daughter or something like that.

00:20:40.900 --> 00:20:43.659
so there's differences on that and it just goes over

00:20:43.660 --> 00:20:45.299
there's a whole lot of stuff

00:20:45.300 --> 00:20:47.779
because his story spans like 60 years

00:20:47.780 --> 00:20:49.659
so it gives a good good feedback

00:20:49.660 --> 00:20:51.539
that's the real main reason I asked that question

00:20:51.540 --> 00:20:53.699
because I just needed one that sea monkeys I just picked

00:20:53.700 --> 00:20:56.579
because it was obscure and just always have right

00:20:56.580 --> 00:20:58.939
I used to have it right hello world and forth

00:20:58.940 --> 00:21:01.019
because I thought was an interesting one as well so

00:21:01.020 --> 00:21:03.899
It's just picking random ones like that.

00:21:03.900 --> 00:21:06.499
One question asked, sorry, a lot of models is,

00:21:06.500 --> 00:21:09.419
what is the closest star to the Earth?

00:21:09.420 --> 00:21:12.019
Because most of them will say Alpha Centauri

00:21:12.020 --> 00:21:13.739
or Proxima Centauri and not the sun.

00:21:13.740 --> 00:21:15.899
And I have a whole nother talk

00:21:15.900 --> 00:21:17.899
where I just argue with the LLM

00:21:17.900 --> 00:21:20.019
trying to say, hey, the sun is a star.

00:21:20.020 --> 00:21:26.579
And he just wouldn't accept it, so. What?

00:21:26.580 --> 00:21:28.419
Oh, I can hear that.

00:21:28.420 --> 00:21:34.379
So what specific tasks do you like to use your local AI?

00:21:34.380 --> 00:21:37.459
I like to load a lot of my code into

00:21:37.460 --> 00:21:39.739
and actually have it do analysis of it.

00:21:39.740 --> 00:21:42.339
I was actually going through some code

00:21:42.340 --> 00:21:45.619
I have for some pen testing, and I was having it modified

00:21:45.620 --> 00:21:47.259
to update it for the newer version,

00:21:47.260 --> 00:21:48.459
because I hate to say this,

00:21:48.460 --> 00:21:49.859
but it was written for Python 2,

00:21:49.860 --> 00:21:51.459
and I needed to update it for Python 3.

00:21:51.460 --> 00:21:53.859
And the 2 to 3 tool did not do all of it,

00:21:53.860 --> 00:21:56.659
but the actual tool was able to do the refactoring.

00:21:56.660 --> 00:21:58.499
It's part of my laziness.

00:21:58.500 --> 00:22:01.459
But I use that for anything I don't want to hit the web.

00:22:01.460 --> 00:22:03.259
And that's a lot of stuff when you start thinking about

00:22:03.260 --> 00:22:04.979
if you're doing cyber security researching.

00:22:04.980 --> 00:22:06.819
and you have your white papers

00:22:06.820 --> 00:22:10.779
and stuff like that and stuff in there.

00:22:10.780 --> 00:22:13.979
I've got a lot of that loaded into RAG

00:22:13.980 --> 00:22:15.659
in one model on my OpenWebUI system.

00:22:15.660 --> 00:22:21.059
Neat. Have you used have you used

00:22:21.060 --> 00:22:25.739
any small domain specific LLMs? What kind of tasks?

00:22:25.740 --> 00:22:30.419
If so, what kind of tasks that they specialize in?

00:22:30.420 --> 00:22:32.139
And you know, how?

00:22:32.140 --> 00:22:34.979
Not to be honest, but there are some out there like once again,

00:22:34.980 --> 00:22:36.779
for cybersecurity and stuff like that,

00:22:36.780 --> 00:22:39.739
that I really need to dig into that's on my to do list.

00:22:39.740 --> 00:22:41.699
I've got a couple weeks off at the end of the year.

00:22:41.700 --> 00:22:43.779
And that's a big part of my plan for that.

00:22:43.780 --> 00:22:49.379
Are the various models updated pretty regularly?

00:22:49.380 --> 00:22:52.059
Can you add your own data to the pre-built models?

00:22:52.060 --> 00:22:56.699
Yes. The models are updated pretty reasonably.

00:22:56.700 --> 00:22:59.699
You can add data to a model in a couple of different ways.

00:22:59.700 --> 00:23:01.099
You can do something called fine-tuning,

00:23:01.100 --> 00:23:03.819
which requires a really nice GPU and a lot of CPU time.

00:23:03.820 --> 00:23:05.499
Probably not going to do that.

00:23:05.500 --> 00:23:07.419
You can do retrieval augmentation generation,

00:23:07.420 --> 00:23:09.499
which is you load your data on top of the system

00:23:09.500 --> 00:23:11.299
and puts inside a database

00:23:11.300 --> 00:23:12.859
and you can actually scan that and stuff.

00:23:12.860 --> 00:23:14.619
I have another talk where I go through

00:23:14.620 --> 00:23:16.219
and I start asking questions about,

00:23:16.220 --> 00:23:18.579
I load the talk into the engine

00:23:18.580 --> 00:23:20.099
and I ask questions against that.

00:23:20.100 --> 00:23:22.179
I would have one more time would have done that

00:23:22.180 --> 00:23:26.499
but it comes down to how many That's that's rag rag

00:23:26.500 --> 00:23:29.419
is pretty easy to do through open web UI or LM studio

00:23:29.420 --> 00:23:31.419
It's a great way you just like point a folder

00:23:31.420 --> 00:23:34.099
point it to a folder and it just sucks all that state into

00:23:34.100 --> 00:23:35.499
and it'll hit that data first

00:23:35.500 --> 00:23:36.859
you have like helpdesk and stuff and

00:23:36.860 --> 00:23:39.619
The other options there's vector databases,

00:23:39.620 --> 00:23:41.819
which is like if you use PostgreSQL.

00:23:41.820 --> 00:23:43.699
It has a PG vector I can do a lot of that stuff.

00:23:43.700 --> 00:23:44.739
I've not dug into that yet,

00:23:44.740 --> 00:23:46.099
but that is also on that to-do list

00:23:46.100 --> 00:23:48.459
I've got a lot of stuff planned for Cool.

00:23:48.460 --> 00:23:51.819
So what are your experience with rags?

00:23:51.820 --> 00:23:54.339
I don't even know what that means.

00:23:54.340 --> 00:23:57.419
Do you know what that means?

00:23:57.420 --> 00:23:59.619
Do you remember this question again?

00:23:59.620 --> 00:24:03.979
What is your experience with RAGs? RAGs is great.

00:24:03.980 --> 00:24:07.459
That's Retrieval Augmentation Generation.

00:24:07.460 --> 00:24:09.739
That loads your data first, and it hits yours,

00:24:09.740 --> 00:24:11.499
and it'll actually cite it and stuff.

00:24:11.500 --> 00:24:14.659
There's a guy who wrote a RAG in 100 lines of Python,

00:24:14.660 --> 00:24:16.899
and it's an impressive piece of software.

00:24:16.900 --> 00:24:18.779
I think if you hit one of my site,

00:24:18.780 --> 00:24:22.099
I've got a private AI talk where I actually refer to that.

00:24:22.100 --> 00:24:25.219
But retrieval augmentation, it's easy, it's fast,

00:24:25.220 --> 00:24:26.699
it puts your data into the system,

00:24:26.700 --> 00:24:31.339
Yeah, start with that and go then iterate on top of that.

00:24:31.340 --> 00:24:32.659
That's one of the great things about AI,

00:24:32.660 --> 00:24:33.619
especially private AI,

00:24:33.620 --> 00:24:37.739
is you can do whatever you want to with it

00:24:37.740 --> 00:24:43.179
and build up with it as you get more experience.

00:24:43.180 --> 00:24:44.219
Any thoughts on running things

00:24:44.220 --> 00:24:49.179
on AWS, DigitalOcean, and so on?

00:24:49.180 --> 00:24:50.619
AWS is not bad.

00:24:50.620 --> 00:24:52.659
The DigitalOcean, they have some of their GPUs.

00:24:52.660 --> 00:24:54.379
I still don't like having the data

00:24:54.380 --> 00:24:57.419
leave my house, to be honest, or at work,

00:24:57.420 --> 00:24:59.019
because I tend to do some stuff

00:24:59.020 --> 00:25:01.259
that I don't want it even hitting that situation.

00:25:01.260 --> 00:25:03.699
But they have pretty good stuff.

00:25:03.700 --> 00:25:05.579
Another one to consider is Oracle Cloud.

00:25:05.580 --> 00:25:09.059
Oracle has their AI infrastructure that's really well done.

00:25:09.060 --> 00:25:12.379
But I mean, once again, then you start looking at potential

00:25:12.380 --> 00:25:13.779
is saying your data is private,

00:25:13.780 --> 00:25:14.819
I don't necessarily trust it.

00:25:14.820 --> 00:25:17.859
But they do have good stuff, both DigitalOcean, AWS,

00:25:17.860 --> 00:25:20.339
Oracle Cloud has the free service, which isn't too bad,

00:25:20.340 --> 00:25:21.339
usually a certain number of stuff.

00:25:21.340 --> 00:25:23.179
And Google's also has it,

00:25:23.180 --> 00:25:26.739
but I still tend to keep more stuff on local PCs,

00:25:26.740 --> 00:25:33.299
because I just paranoid that way. Gotcha.

00:25:33.300 --> 00:25:35.579
What has your experience been using AI?

00:25:35.580 --> 00:25:40.139
Do you want to get into that, using AI for cybersecurity?

00:25:40.140 --> 00:25:42.019
You might have already touched on this.

00:25:42.020 --> 00:25:44.379
Yeah, really, for cybersecurity,

00:25:44.380 --> 00:25:46.259
what I've had to do is I've dumped logs

00:25:46.260 --> 00:25:47.299
to have a due correlation.

00:25:47.300 --> 00:25:49.859
Keep in mind, the size of that LLAMA file we were using

00:25:49.860 --> 00:25:52.059
for figuring out David Bowie, writing the hello world,

00:25:52.060 --> 00:25:54.179
all that stuff, is like six gig.

00:25:54.180 --> 00:25:56.859
How does it get the entire world in six gig?

00:25:56.860 --> 00:25:59.739
I still haven't figured that out in terms of quantization.

00:25:59.740 --> 00:26:02.499
So I'm really interested in seeing the ability

00:26:02.500 --> 00:26:05.139
to take all this stuff out of all my logs,

00:26:05.140 --> 00:26:06.339
dump it all in there,

00:26:06.340 --> 00:26:08.459
and actually be able to do intelligent queries against that.

00:26:08.460 --> 00:26:10.899
Microsoft has a project called Security Copilot,

00:26:10.900 --> 00:26:12.819
which is trying to do that in the Cloud.

00:26:12.820 --> 00:26:15.299
But I want to work on something to do that more locally

00:26:15.300 --> 00:26:19.019
and be able to actually drive this stuff over that.

00:26:19.020 --> 00:26:21.979
That's one also on the long-term goals.

00:26:21.980 --> 00:26:26.059
So we got any other questions or?

00:26:26.060 --> 00:26:29.099
Those are the questions that I see.

00:26:29.100 --> 00:26:31.179
I want to just read out a couple of comments

00:26:31.180 --> 00:26:33.419
that I saw in IRC though.

00:26:33.420 --> 00:26:36.699
Jay Rutabaga says, it went very well

00:26:36.700 --> 00:26:39.259
from an audience perspective.

00:26:39.260 --> 00:26:43.619
And G Gundam says, respect your commitment to privacy.

00:26:43.620 --> 00:26:45.619
And then somebody is telling us

00:26:45.620 --> 00:26:46.779
we might have skipped a question.

00:26:46.780 --> 00:26:50.019
So I'm just going to run back to my list.

00:26:50.020 --> 00:26:52.819
Updated regularly experience.

00:26:52.820 --> 00:26:57.659
I just didn't type in the answer here's

00:26:57.660 --> 00:26:59.659
and there's a couple more questions coming in so

00:26:59.660 --> 00:27:04.699
Is there a disparity where you go to paid models

00:27:04.700 --> 00:27:08.619
because they are better and what problems?

00:27:08.620 --> 00:27:14.019
You know what would drive you to? That's a good question.

00:27:14.020 --> 00:27:17.819
Paid models, I don't mind them. I think they're good,

00:27:17.820 --> 00:27:21.299
but I don't think they're actually economically sustainable

00:27:21.300 --> 00:27:22.659
under their current system.

00:27:22.660 --> 00:27:24.299
Because right now, if you're paying

00:27:24.300 --> 00:27:26.899
20 bucks a month for Copilot and that goes up to 200 bucks,

00:27:26.900 --> 00:27:28.499
I'm not going to be as likely to use it.

00:27:28.500 --> 00:27:29.579
You know what I mean?

00:27:29.580 --> 00:27:33.059
But it does do some things in a way that I did not expect.

00:27:33.060 --> 00:27:35.459
For example, Grok was refactoring

00:27:35.460 --> 00:27:38.019
some of my code in the comments and dropped an F-bomb.

00:27:38.020 --> 00:27:39.979
which I did not see coming,

00:27:39.980 --> 00:27:41.619
but the other code before

00:27:41.620 --> 00:27:43.219
that I had gotten off GitHub

00:27:43.220 --> 00:27:44.059
had F bombs in it.

00:27:44.060 --> 00:27:45.899
So it was just emulating the style,

00:27:45.900 --> 00:27:47.779
but would that be something

00:27:47.780 --> 00:27:49.979
I'd want to turn in a pull request? I don't know.

00:27:49.980 --> 00:27:52.139
But, uh, there's, there's a lot of money

00:27:52.140 --> 00:27:53.899
going into these AIs and stuff,

00:27:53.900 --> 00:27:56.219
but in terms of the ability to get a decent one,

00:27:56.220 --> 00:27:57.979
like the llama, llama three, two,

00:27:57.980 --> 00:28:01.699
and load your data into it, you can be pretty competitive.

00:28:01.700 --> 00:28:04.779
You're not going to get all the benefits,

00:28:04.780 --> 00:28:07.299
but you have more control over it.

00:28:07.300 --> 00:28:11.819
So it's, it's a, this and that it's a,

00:28:11.820 --> 00:28:13.139
it's a balancing act.

00:28:13.140 --> 00:28:15.539
Okay, and I think I see a couple more questions coming in.

00:28:15.540 --> 00:28:19.619
What is the largest parameter size for local models

00:28:19.620 --> 00:28:22.459
that you've been able to successfully run locally

00:28:22.460 --> 00:28:26.059
and do run into issues with limited context window size?

00:28:26.060 --> 00:28:29.659
The top eight models will tend to have a larger ceiling.

00:28:29.660 --> 00:28:32.859
Yes, yes, yes, yes, yes.

00:28:32.860 --> 00:28:37.019
By default, the context size is I think 1024.

00:28:37.020 --> 00:28:44.619
But I've upped it to 8192 on the on this box, the Pangolin

00:28:44.620 --> 00:28:46.939
because it seems to be some reason

00:28:46.940 --> 00:28:49.459
it's just a very working quite well.

00:28:49.460 --> 00:28:52.219
But the largest ones I've loaded have been in

00:28:52.220 --> 00:28:54.059
the have not been that huge.

00:28:54.060 --> 00:28:55.699
I've loaded this the last biggest one I've done.

00:28:55.700 --> 00:28:57.459
That's the reason why I'm planning

00:28:57.460 --> 00:29:01.339
on breaking down and buying a Ryzen.

00:29:01.340 --> 00:29:03.619
Actually, I'm going to buy

00:29:03.620 --> 00:29:06.979
an Intel i285H with 96 gig of RAM.

00:29:06.980 --> 00:29:08.379
Then I should be able to load

00:29:08.380 --> 00:29:12.059
a 70 billion parameter model in that. How fast will it run?

00:29:12.060 --> 00:29:13.819
It's going to run slow as dog,

00:29:13.820 --> 00:29:15.819
but it's going to be cool to be able to do it.

00:29:15.820 --> 00:29:17.379
It's an AI bragging rights thing,

00:29:17.380 --> 00:29:20.019
but I mostly stick with the smaller size models

00:29:20.020 --> 00:29:22.819
and the ones that are more quantitized

00:29:22.820 --> 00:29:26.619
because it just tends to work better for me.

00:29:26.620 --> 00:29:29.179
We've still got over 10 minutes before we're cutting away,

00:29:29.180 --> 00:29:30.179
but I'm just anticipating

00:29:30.180 --> 00:29:32.859
that we're going to be going strong at the 10 minute mark.

00:29:32.860 --> 00:29:34.899
So I'm just, just letting, you know,

00:29:34.900 --> 00:29:37.379
we can go as long as we like here at a certain point.

00:29:37.380 --> 00:29:41.059
I may have to jump away and check in with the next speaker,

00:29:41.060 --> 00:29:44.419
but we'll post the entirety of this,

00:29:44.420 --> 00:29:47.979
even if we aren't able to stay with it all.

00:29:47.980 --> 00:29:49.739
Okay. And we've got 10 minutes

00:29:49.740 --> 00:29:52.379
where we're still going to stay live.

00:29:52.380 --> 00:30:00.139
So next question coming in, I see, are there free as in freedom,

00:30:00.140 --> 00:30:05.739
free as in FSF issues with the data?

00:30:05.740 --> 00:30:11.699
Yes, where's the data coming from is a huge question with AI.

00:30:11.700 --> 00:30:13.739
It's astonishing you can ask questions

00:30:13.740 --> 00:30:16.899
to models that you don't know where it's coming from.

00:30:16.900 --> 00:30:19.979
That is gonna be one of the big issues long-term.

00:30:19.980 --> 00:30:21.499
There are people who are working

00:30:21.500 --> 00:30:22.979
on trying to figure out that stuff,

00:30:22.980 --> 00:30:25.259
but it's, I mean, if you look at, God,

00:30:25.260 --> 00:30:27.059
I can't remember who it was.

00:30:27.060 --> 00:30:28.659
Somebody was actually out torrenting books

00:30:28.660 --> 00:30:30.939
just to be able to build into their AI system.

00:30:30.940 --> 00:30:32.339
I think it might've been Meta.

00:30:32.340 --> 00:30:34.819
So there's a lot of that going on.

00:30:34.820 --> 00:30:38.139
The open source of the stuff is going to be tough.

00:30:38.140 --> 00:30:39.459
There's going to be there's some models

00:30:39.460 --> 00:30:41.419
like the mobile guys have got their own license,

00:30:41.420 --> 00:30:42.739
but where they're getting their data from,

00:30:42.740 --> 00:30:45.499
I'm not sure on so that that's a huge question.

00:30:45.500 --> 00:30:47.979
That's a that's a talk in itself.

00:30:47.980 --> 00:30:51.979
But yeah, but you if you train on your RAG and your data,

00:30:51.980 --> 00:30:53.499
you know what it's come, you know,

00:30:53.500 --> 00:30:54.379
you have a license that

00:30:54.380 --> 00:30:55.139
but the other stuff is just

00:30:55.140 --> 00:30:56.739
more lines of supplement

00:30:56.740 --> 00:31:01.379
if you're using a smaller model,

00:31:01.380 --> 00:31:05.419
but the comment online, I see a couple of them.

00:31:05.420 --> 00:31:08.339
I'll read them out in order here. Really interesting stuff.

00:31:08.340 --> 00:31:11.659
Thank you for your talk. Given that large AI companies

00:31:11.660 --> 00:31:14.899
are openly stealing intellectual property and copyright

00:31:14.900 --> 00:31:18.939
and therefore eroding the authority of such laws

00:31:18.940 --> 00:31:21.579
and maybe obscuring the truth itself,

00:31:21.580 --> 00:31:26.579
can you see a future where IP and copyright flaw become untenable?

00:31:26.580 --> 00:31:29.619
I think that's a great question.

00:31:29.620 --> 00:31:34.979
I'm not a lawyer, but it is really getting complicated.

00:31:34.980 --> 00:31:37.859
It is getting to the point, I asked a question from,

00:31:37.860 --> 00:31:41.179
I played with Sora a little bit, and it generated someone,

00:31:41.180 --> 00:31:42.819
you can go like, oh, that's Jon Hamm,

00:31:42.820 --> 00:31:44.099
that's Christopher Walken,

00:31:44.100 --> 00:31:45.379
you start figuring out who the people

00:31:45.380 --> 00:31:47.019
they're modeling stuff after.

00:31:47.020 --> 00:31:48.979
There is an apocalypse, something

00:31:48.980 --> 00:31:52.459
going to happen right now.

00:31:52.460 --> 00:31:53.579
There is, but this is once again,

00:31:53.580 --> 00:31:56.059
my personal opinion, and I'm not a lawyer,

00:31:56.060 --> 00:31:57.459
and I do not have money.

00:31:57.460 --> 00:31:58.859
So don't sue me, is there's going to be

00:31:58.860 --> 00:32:02.899
the current administration tends is very AI pro AI.

00:32:02.900 --> 00:32:05.499
And there's very a great deal of lobbying by those groups.

00:32:05.500 --> 00:32:07.139
And it's on both sides.

00:32:07.140 --> 00:32:09.699
And it's going to be, it's gonna be interesting to see

00:32:09.700 --> 00:32:11.699
what happens to copyright the next 510 years.

00:32:11.700 --> 00:32:13.339
I just don't know how it keeps up

00:32:13.340 --> 00:32:16.059
without there being some adjustments and stuff.

00:32:16.060 --> 00:32:20.419
Okay, and then another comment I saw,

00:32:20.420 --> 00:32:23.219
file size is not going to be a bottleneck.

00:32:23.220 --> 00:32:25.819
RAM is. You'll need 16 gigabytes of RAM

00:32:25.820 --> 00:32:28.259
to run the smallest local models

00:32:28.260 --> 00:32:31.979
and 512 gigabytes of RAM to run the larger ones.

00:32:31.980 --> 00:32:35.059
You'll need a GPU with that much memory

00:32:35.060 --> 00:32:39.099
if you want it to run quickly. Yeah. Oh no.

00:32:39.100 --> 00:32:41.259
It also depends upon how your memory is laid out.

00:32:41.260 --> 00:32:45.699
Like example being the Ultra i285H

00:32:45.700 --> 00:32:47.899
I plan to buy, that has 96 gig of memory.

00:32:47.900 --> 00:32:50.499
It's unified between the GPU and the CPU share it,

00:32:50.500 --> 00:32:52.739
but they go over the same bus.

00:32:52.740 --> 00:32:55.779
So the overall bandwidth of it tends to be a bit less,

00:32:55.780 --> 00:32:57.579
but you're able to load more of it into memory.

00:32:57.580 --> 00:32:59.419
So it's able to do some additional stuff with it

00:32:59.420 --> 00:33:00.819
as opposed to come off disk.

00:33:00.820 --> 00:33:03.699
It's all balancing act. If you hit Zyskin's website,

00:33:03.700 --> 00:33:05.819
that guy's done some great work on it.

00:33:05.820 --> 00:33:07.499
I'm trying to figure out how big a model you can do,

00:33:07.500 --> 00:33:08.619
what you can do with it.

00:33:08.620 --> 00:33:12.699
And some of the stuff seems to be not obvious,

00:33:12.700 --> 00:33:15.299
because like example, being that MacBook Air,

00:33:15.300 --> 00:33:17.619
for the five minutes I can run the model,

00:33:17.620 --> 00:33:19.379
it runs it faster than a lot of other things

00:33:19.380 --> 00:33:21.339
that should be able to run it faster,

00:33:21.340 --> 00:33:24.619
just because of the way the ARM cores and the unified memory work on it.

00:33:24.620 --> 00:33:26.019
So it's a learning process.

00:33:26.020 --> 00:33:29.579
But if you want to, Network Chuck had a great video

00:33:29.580 --> 00:33:30.939
talking about building his own system

00:33:30.940 --> 00:33:34.379
with a couple really powerful NVIDIA cards

00:33:34.380 --> 00:33:35.379
and stuff like that in it.

00:33:35.380 --> 00:33:38.859
And just actually setting up on his system as a node

00:33:38.860 --> 00:33:41.459
and using a web UI on it. So there's a lot of stuff there,

00:33:41.460 --> 00:33:43.899
but it is a process of learning how big your data is,

00:33:43.900 --> 00:33:44.899
which models you want to use,

00:33:44.900 --> 00:33:46.219
how much information you need,

00:33:46.220 --> 00:33:48.019
but it's part of the learning.

00:33:48.020 --> 00:33:52.899
And you can run models, even as a Raspberry PI fives,

00:33:52.900 --> 00:33:54.499
if you want to, they'll run slow.

00:33:54.500 --> 00:33:56.459
Don't get me wrong, but they're possible.

00:33:56.460 --> 00:34:02.179
Okay, and I think there's other questions coming in too,

00:34:02.180 --> 00:34:04.019
so I'll just bam for another second.

00:34:04.020 --> 00:34:06.299
We've got about five minutes before we'll,

00:34:06.300 --> 00:34:09.739
before we'll be cutting over,

00:34:09.740 --> 00:34:13.179
but I just want to say in case we get close for time here,

00:34:13.180 --> 00:34:14.859
how much I appreciate your talk.

00:34:14.860 --> 00:34:15.979
This is another one that I'm going to

00:34:15.980 --> 00:34:18.339
have to study after the conference.

00:34:18.340 --> 00:34:21.099
We greatly appreciate, all of us appreciate

00:34:21.100 --> 00:34:22.459
you guys putting on the conference.

00:34:22.460 --> 00:34:26.299
It's a great conference. It's well done.

00:34:26.300 --> 00:34:28.019
It's an honor to be on the stage

00:34:28.020 --> 00:34:30.899
with the brains of the project, which is you.

00:34:30.900 --> 00:34:34.699
So what else we got? Question wise.

00:34:34.700 --> 00:34:39.499
Okay, so just scanning here.

00:34:39.500 --> 00:34:50.699
Have you used local models capable of tool calling?

00:34:50.700 --> 00:34:54.779
I'm, I'm scared of agentic.

00:34:54.780 --> 00:34:58.739
I, I am, I'm going to be a slow adopter of that.

00:34:58.740 --> 00:35:02.459
I want to do it, but I just don't have the, uh,

00:35:02.460 --> 00:35:04.339
four decimal fortitude right now to do it.

00:35:04.340 --> 00:35:07.179
I, I, I've had to give me the commands,

00:35:07.180 --> 00:35:08.739
but I still run the commands by hand.

00:35:08.740 --> 00:35:10.539
I'm looking into it and it's on once again,

00:35:10.540 --> 00:35:14.139
it's on that list, but I just, that's a big step for me.

00:35:14.140 --> 00:35:23.139
So. Awesome. All right.

00:35:23.140 --> 00:35:27.179
Well, maybe it's, let me just scroll through

00:35:27.180 --> 00:35:31.539
because we might have missed one question. Oh, I see.

00:35:31.540 --> 00:35:36.899
Here was the piggyback question.

00:35:36.900 --> 00:35:38.419
Now I see the question that I missed.

00:35:38.420 --> 00:35:41.139
So this was piggybacking on the question

00:35:41.140 --> 00:35:44.859
about model updates and adding data.

00:35:44.860 --> 00:35:46.579
And will models reach out to the web

00:35:46.580 --> 00:35:47.819
if they need more info?

00:35:47.820 --> 00:35:51.779
Or have you worked with any models that work that way?

00:35:51.780 --> 00:35:55.259
No, I've not seen any models to do that

00:35:55.260 --> 00:35:57.739
There's there was like a group

00:35:57.740 --> 00:35:59.899
working on something like a package updater

00:35:59.900 --> 00:36:02.499
that would do different diffs on it,

00:36:02.500 --> 00:36:03.939
but it's so Models change so much

00:36:03.940 --> 00:36:05.739
even who make minor changes and fine-tuning.

00:36:05.740 --> 00:36:07.659
It's hard just to update them in place

00:36:07.660 --> 00:36:10.099
So I haven't seen one, but that doesn't mean

00:36:10.100 --> 00:36:16.259
they're not out there. I'm curious topic though Awesome

00:36:16.260 --> 00:36:19.539
Well, it's probably pretty good timing.

00:36:19.540 --> 00:36:21.299
Let me just scroll and make sure.

00:36:21.300 --> 00:36:23.499
And of course, before I can say that,

00:36:23.500 --> 00:36:25.899
there's one more question. So let's go ahead and have that.

00:36:25.900 --> 00:36:28.299
I want to make sure while we're still live, though,

00:36:28.300 --> 00:36:31.299
I give you a chance to offer any closing thoughts.

00:36:31.300 --> 00:36:35.779
So what scares you most about the agentic tools?

00:36:35.780 --> 00:36:38.419
How would you think about putting a sandbox around that

00:36:38.420 --> 00:36:42.139
if you did adopt an agentic workflow?

00:36:42.140 --> 00:36:42.899
That is a great question.

00:36:42.900 --> 00:36:45.939
In terms of that, I would just control

00:36:45.940 --> 00:36:48.099
what it's able to talk to, what machines,

00:36:48.100 --> 00:36:50.059
I would actually have it be air gap.

00:36:50.060 --> 00:36:52.099
I work for a defense contractor,

00:36:52.100 --> 00:36:53.819
and we spend a lot of time dealing with air gap systems,

00:36:53.820 --> 00:36:55.979
because that's just kind of the way it works out for us.

00:36:55.980 --> 00:36:58.499
So agentic, it's just going to take a while to get trust.

00:36:58.500 --> 00:37:01.059
I want to want to see more stuff happening.

00:37:01.060 --> 00:37:02.819
Humans screw up stuff enough.

00:37:02.820 --> 00:37:04.819
The last thing we need is to multiply that by 1000.

00:37:04.820 --> 00:37:09.419
So in terms of that, I would be restricting what it can do.

00:37:09.420 --> 00:37:10.859
If you look at the capabilities,

00:37:10.860 --> 00:37:13.579
if I created a user and gave it permissions,

00:37:13.580 --> 00:37:15.299
I would have a lockdown through sudo,

00:37:15.300 --> 00:37:17.379
what it's able to do, what the account's able to do.

00:37:17.380 --> 00:37:18.899
I would do those kind of things,

00:37:18.900 --> 00:37:20.859
but it's going to be, it's happening.

00:37:20.860 --> 00:37:25.819
It's just, I'm going to be one of the laggards on that one.

00:37:25.820 --> 00:37:29.259
So airgab, jail, extremely locked down environments,

00:37:29.260 --> 00:37:34.899
like we're talking about separate physicals, not Docker.

00:37:34.900 --> 00:37:37.499
Yeah, hopefully. Right, fair.

00:37:37.500 --> 00:37:39.899
So tool calling can be read-only,

00:37:39.900 --> 00:37:42.539
such as giving models the ability to search the web

00:37:42.540 --> 00:37:43.979
before answering your question,

00:37:43.980 --> 00:37:46.219
you know, write access, execute access.

00:37:46.220 --> 00:37:49.219
I'm interested to know if local models

00:37:49.220 --> 00:37:51.419
are any good at that.

00:37:51.420 --> 00:37:55.579
Yes, local models can do a lot of that stuff.

00:37:55.580 --> 00:37:56.819
It's their capabilities.

00:37:56.820 --> 00:37:59.019
If you load LM studio, you can do a lot of wonderful stuff

00:37:59.020 --> 00:38:02.419
with that or with open web UI with a llama.

00:38:02.420 --> 00:38:05.739
It's a lot of capabilities. It's amazing.

00:38:05.740 --> 00:38:08.139
Open web UI is actually what a lot of companies are using now

00:38:08.140 --> 00:38:10.259
to put their data behind that.

00:38:10.260 --> 00:38:12.139
They're curated data and stuff like that. So works well.

00:38:12.140 --> 00:38:15.819
I can confirm that from my own professional experience.

00:38:15.820 --> 00:38:19.659
Excellent. Okay, well, our timing should be just perfect

00:38:19.660 --> 00:38:22.659
if you want to give us like a 30-second, 45-second wrap-up.

00:38:22.660 --> 00:38:24.419
Aaron, let me squeeze in mine.

00:38:24.420 --> 00:38:26.779
Thank you again so much for preparing this talk

00:38:26.780 --> 00:38:30.499
and for entertaining all of our questions.

00:38:30.500 --> 00:38:33.299
Yeah, let me just thank you guys for the conference again.

00:38:33.300 --> 00:38:35.179
This is a great one. I've enjoyed a lot of it.

00:38:35.180 --> 00:38:37.339
I've only had a couple of talks so far,

00:38:37.340 --> 00:38:41.659
but I'm looking forward to hitting the ones after this and tomorrow.

00:38:41.660 --> 00:38:44.739
But the AI stuff is coming. Get on board.

00:38:44.740 --> 00:38:46.939
Definitely recommend it. If you want to just try it out

00:38:46.940 --> 00:38:48.419
and get a little taste of it,

00:38:48.420 --> 00:38:49.779
what my minimal viable product

00:38:49.780 --> 00:38:51.619
with just LlamaFile and GPTEL

00:38:51.620 --> 00:38:53.139
will get you to the point where you start figuring out.

00:38:53.140 --> 00:38:55.579
Gptel is an amazing thing. It just gets out of your way,

00:38:55.580 --> 00:39:00.459
but it works solo with Emacs. Design because it takes

00:39:00.460 --> 00:39:01.699
doesn't take your hands off the keyboard.

00:39:01.700 --> 00:39:02.499
It's just another buffer

00:39:02.500 --> 00:39:04.059
and you just put information in there.

00:39:04.060 --> 00:39:06.979
It's quite quite a wonderful It's a wonderful time.

00:39:06.980 --> 00:39:10.819
Let's put that way That's all I got Thank you

00:39:10.820 --> 00:39:14.339
so much for once again, and we're we're just cut away.

00:39:14.340 --> 00:39:15.779
So I'll stop the recording

00:39:15.780 --> 00:39:18.259
and you're on your own recognizance

00:39:18.260 --> 00:39:19.699
Well, I'm gonna punch out

00:39:19.700 --> 00:39:21.059
if anybody has any questions or anything

00:39:21.060 --> 00:39:24.699
my email address is ajgrothe@yahoo.com or at gmail and

00:39:24.700 --> 00:39:26.779
Thank you all for attending

00:39:26.780 --> 00:39:29.939
and thanks again for the conference

00:39:29.940 --> 00:39:32.579
Okay, I'm gonna go ahead and end the room there, thank you.

00:39:32.580 --> 00:39:34.100
Excellent, thanks, bye.