WEBVTT

00:00.000 --> 00:05.120
most are outdoors behave

00:12.160 --> 00:15.220
on the institutional

00:23.860 --> 00:26.200
there will be no drift

00:26.220 --> 00:28.100
there will be no drift

00:28.600 --> 00:29.000
not one

00:30.000 --> 00:43.000
Okay, we're going to start now because I don't want to spend any minute speaking about anything else than this topic.

00:44.000 --> 00:56.000
Hello everyone, I am from, well, I was born in France, but now I have the luxury to live in Valencia

00:56.000 --> 01:01.000
Spain and your city is really freezing.

01:01.000 --> 01:11.000
I am just saying I work as a freelance, I have flying phobia, so it cost me a lot to come here.

01:11.000 --> 01:24.000
I'm using that BSD since 1998, yes, I am that old and I am a net BSD commuter since 2009.

01:25.000 --> 01:41.000
And for those who use it, I am the initial author of Packaging the Binary Packaging Manager for that BSD, but not only also for MacOS, Illumos and so on.

01:41.000 --> 01:52.000
And right now, Jonathan Perkin is the current maintainer and he's doing a magnificent job with it.

01:52.000 --> 02:07.000
Today I'm going to talk about a work that started back in 2006, but the real hard work started three years ago.

02:07.000 --> 02:13.000
I always was passionate about making net BSD small.

02:13.000 --> 02:22.000
I don't know why, that was kind of an obsession I wanted to reduce the size of net BSD.

02:22.000 --> 02:34.000
So first I did this with USBK, 64 megabytes, BSD USBK.

02:34.000 --> 02:49.000
Then in 2016, as everyone was really happy about Docker and containers and so on, I created Sailor, which is not a container system.

02:49.000 --> 03:01.000
It is based on sea of fruit, but it allows you to create a self-contained net BSD system running only what is netted.

03:01.000 --> 03:18.000
But at that time, there was noise about this firecracker thing, something that AWS did back then, which is basically their serverless,

03:18.000 --> 03:22.000
spine, you know about it.

03:22.000 --> 03:28.000
And basically what this is, those are virtual machines.

03:28.000 --> 03:39.000
And those virtual machine boots in something around 100 millisecond, which is enormous in my opinion.

03:39.000 --> 03:57.000
And I wanted to do something resembling to it, and I knew that QMU for those who don't know QMU, I can't do anything for you.

03:57.000 --> 04:04.000
Then QMU for ages as this dash kernel flag.

04:04.000 --> 04:23.000
I had no idea what that meant, but what I knew is that Linux always Linux knows how to be booted right straight from QMU with dash kernel flag.

04:23.000 --> 04:31.000
For years, five years ago now, I worked on something resembling to firecracker.

04:31.000 --> 04:41.000
Basically I trimmed up a net BSD kernel, but a net BSD 32 bits, why?

04:41.000 --> 05:06.000
Because at that time, the only version of net BSD that was capable of booting with a dash kernel flag was the 32 bit one, because the kernel in 32 bits add was called the multi-boot feature, which allows the kernel to boot directly from QMU like this.

05:06.000 --> 05:10.000
But I mean 32 bits.

05:10.000 --> 05:33.000
Recently, two years ago, I started this little really badly-named project called MK small NB, which basically consisted in stripping every bit of driver that's not needed from a 32 bit kernel.

05:33.000 --> 05:40.000
So it can be like life-hacked without having to recompile it.

05:40.000 --> 05:46.000
That was this guy calling Percival.

05:46.000 --> 05:58.000
Who was booting a free BSD system using QMU, actually his work was based on firecracker?

05:58.000 --> 06:06.000
And he was able to boot the in more or less 25 to 30 millisecond.

06:06.000 --> 06:10.000
Not bad.

06:10.000 --> 06:22.000
So, two years ago, one year and a half, I said, okay, let's try it.

06:22.000 --> 06:26.000
Let's see what all this is about.

06:26.000 --> 06:35.000
How does the kernel, you know what, I'm going to put like, so I can see it that I'm not drifting.

06:35.000 --> 06:38.000
Five minutes, five minutes.

06:38.000 --> 06:42.000
So yeah, let's see what all this is about.

06:42.000 --> 06:51.000
What does it take to boot a kernel from the QMU dash kernel parameter?

06:51.000 --> 06:58.000
And I mean, I did very little kernel hacking in the past.

06:58.000 --> 07:01.000
It's pretty frightening.

07:01.000 --> 07:04.000
And I mean, it's not easy.

07:04.000 --> 07:13.000
And I thought, you know what, I'm going to watch what Collin did on a story.

07:14.000 --> 07:20.000
And patch, and that's it, and probably there will be nothing more, and it will work just like that.

07:20.000 --> 07:23.000
Nope. Nope.

07:23.000 --> 07:39.000
I ended up patching the, probably the worst part of the kernel, which is the local.s, which is the assembly part that boots the kernel, you know.

07:40.000 --> 07:50.000
And with this, I was able to boot, if we have time, we'll also see the details of the theory after.

07:50.000 --> 07:56.000
With that, I was able to boot the kernel directly using a protocol called PVH.

07:56.000 --> 07:59.000
We'll talk about that.

07:59.000 --> 08:03.000
But it booted in 200 millisecond.

08:03.000 --> 08:05.000
Please.

08:05.000 --> 08:07.000
Please.

08:07.000 --> 08:28.000
So I ended up tweaking, hacking, modifying parts of the kernel that some of them are 30 years old, like when I broke the build, tweaking the serial code.

08:29.000 --> 08:38.000
And I ended up booting a kernel in a normal machine, in a recent machine, in about 20 millisecond.

08:38.000 --> 08:48.000
This laptop is a laptop, it's not plugged to the power, so it has less power than a real desktop machine or server.

08:48.000 --> 08:54.000
So don't do me when it takes more than 20 milliseconds.

08:55.000 --> 09:08.000
And as I really wanted to show what it is about, because I did kind of disparate presentation, it's not the same format.

09:09.000 --> 09:13.000
And I did it very terribly.

09:13.000 --> 09:23.000
I explained the code, what I didn't look for and so, but I had very little time to show what was it about.

09:23.000 --> 09:28.000
So today, I prepared demos.

09:28.000 --> 09:34.000
Most of the presentation should be the work, obviously.

09:35.000 --> 09:40.000
So what is the point of all this?

09:40.000 --> 09:45.000
Why do we want to boot a kernel so fast?

09:45.000 --> 09:47.000
Well, let's see that.

09:47.000 --> 10:00.000
So first, like with no other fanciness than just seeing the kernel and seeing a shell.

10:01.000 --> 10:08.000
I created a small project.

10:08.000 --> 10:10.000
I can, yeah, I'll do it.

10:10.000 --> 10:14.000
Because there's something that I want to show.

10:14.000 --> 10:17.000
Yeah, okay, I'm going to zoom a bit.

10:17.000 --> 10:22.000
Where's that?

10:22.000 --> 10:27.000
There.

10:27.000 --> 10:29.000
Yep.

10:29.000 --> 10:30.000
Okay.

10:30.000 --> 10:35.000
So the project I've been working on now for years.

10:35.000 --> 10:40.000
I renamed it from AMK that I don't want to small BSD.

10:40.000 --> 10:44.000
Small BSD is not an OS.

10:44.000 --> 10:47.000
It's an OS builder.

10:47.000 --> 10:49.000
We're going to see that.

10:49.000 --> 10:58.000
So it's mainly composed of a script of two scripts, which are very easy to use.

10:59.000 --> 11:01.000
You create images.

11:01.000 --> 11:06.000
I already created it not to have any surprise while doing the demo.

11:06.000 --> 11:10.000
And the main star is this guy.

11:10.000 --> 11:20.000
There's BSD small, which is a kernel, which has been built with all the stuff I wrote.

11:20.000 --> 11:26.000
And the rescue is basically just an RC file.

11:26.000 --> 11:29.000
We're going to see what it is about.

11:29.000 --> 11:30.000
And a shell.

11:30.000 --> 11:34.000
Shell being from the rescue directory.

11:34.000 --> 11:39.000
If you know the rescue directory, I guess every BSD has it.

11:39.000 --> 11:40.000
I don't know.

11:40.000 --> 11:42.000
PrebSD, I don't know.

11:42.000 --> 11:43.000
Yep.

11:43.000 --> 11:50.000
So basically it's a directory where you have one crunch by it's called crunch binary.

11:50.000 --> 11:57.000
We're pretty much like BSD box.

11:57.000 --> 11:58.000
Ta-da.

11:58.000 --> 12:00.000
Well, 53.

12:00.000 --> 12:01.000
Let me try something.

12:01.000 --> 12:02.000
Let me try something.

12:02.000 --> 12:08.000
Because 53 milliseconds.

12:08.000 --> 12:11.000
Poor.

12:11.000 --> 12:16.000
Let's see if I do that.

12:16.000 --> 12:25.000
But that's a very, very simple OS.

12:25.000 --> 12:32.000
Basically, like I said, the kernel and this.

12:32.000 --> 12:37.000
But now, from there, what can we do?

12:37.000 --> 12:40.000
We actually, from there, we can do anything.

12:40.000 --> 12:45.000
We can create our own operating system.

12:45.000 --> 12:52.000
Let's say everything evolves around a make file.

12:52.000 --> 12:53.000
Yeah.

12:53.000 --> 12:58.000
I don't know if any YAMO file, we know with CICD.

12:58.000 --> 12:59.000
No.

12:59.000 --> 13:03.000
It's plain make file.

13:03.000 --> 13:09.000
And in this make file, well, I have targets which are images.

13:09.000 --> 13:13.000
One of the main images is base.

13:13.000 --> 13:17.000
And in order to.

13:17.000 --> 13:21.000
I'm a friend, no.

13:21.000 --> 13:26.000
In order to make it easy to kill the, the, the, the, the, the, the, the

13:26.000 --> 13:27.000
The machine.

13:27.000 --> 13:32.000
You can pass the make file, mount our O equals.

13:32.000 --> 13:36.000
Yes, which will make the, the, the, the fS tab mount slash in

13:36.000 --> 13:37.000
Read only.

13:37.000 --> 13:44.000
allowing me to just kill the machine and don't care about shed down, out, and so on.

13:44.000 --> 13:45.000
Okay?

13:45.000 --> 13:52.000
So for example, I can do base, I'm looking at you.

13:52.000 --> 13:55.000
I know.

13:55.000 --> 14:05.000
So that's the base file system with every tool you should find in base.

14:05.000 --> 14:11.000
Okay, that's cool, not that impressive, but nevertheless cool.

14:11.000 --> 14:18.000
But like I said, you could create your own operating system.

14:18.000 --> 14:26.000
You could, like, modify in it, or change in it.

14:26.000 --> 14:29.000
Change the way the system boots.

14:29.000 --> 14:35.000
And for example, why not?

14:35.000 --> 14:46.000
Why not create a system image that's called system BSD?

14:46.000 --> 14:53.000
And this is exactly what you think it is.

14:53.000 --> 14:57.000
This is exactly what you think it is.

14:57.000 --> 15:04.000
Yeah, I also created a couple of configuration files.

15:04.000 --> 15:08.000
And yeah, this is systembSD.com.

15:08.000 --> 15:19.000
And systembSD.com uses the init, which is basically an init system, which is pretty fast.

15:19.000 --> 15:37.000
And those, the services with the init command, like init, start, stop, and well, you can do fancy thing like this.

15:37.000 --> 15:47.000
Okay, and so welcome system BSD, which is a, like, a net BSD version, which is not exactly net BSD,

15:47.000 --> 15:50.000
it has another init system.

15:50.000 --> 15:53.000
But after that, it's only net BSD.

15:53.000 --> 15:55.000
You have the same command channel.

15:55.000 --> 15:58.000
I'm going to accelerate a bit.

15:58.000 --> 16:04.000
Okay, we're not going to talk about the internals.

16:04.000 --> 16:07.000
I know we won't have time for that.

16:07.000 --> 16:11.000
So, okay, that's cool.

16:11.000 --> 16:22.000
This five-cracker thing and all the container topic was really appealing to me.

16:22.000 --> 16:36.000
And so, what about starting not just a virtual machine, but a container,

16:36.000 --> 16:41.000
with net BSD inside.

16:41.000 --> 16:57.000
So, for example, if I do, I don't know, that's not the one I want to start.

16:57.000 --> 17:06.000
I want to show you something before, like, instead of just having a shell,

17:06.000 --> 17:13.000
we can start the service, okay, like, for example, an HTTP server.

17:13.000 --> 17:19.000
And that's the only service that will start when I boot boot, the virtual machine,

17:19.000 --> 17:29.000
and this works just like a container.

17:29.000 --> 17:36.000
Okay, but obviously, with a bit more security, because a container,

17:36.000 --> 17:43.000
let me remind you that it is only a system called, and share online, I mean.

17:43.000 --> 17:51.000
Here, we have an entire kernel operating system with an isolated process.

17:51.000 --> 17:55.000
Okay, so yeah, what can we do from there?

17:55.000 --> 18:08.000
Well, as it is very light and fast, well, we can just run it as a Docker container.

18:08.000 --> 18:17.000
And here we go, and it behaves like the same, okay.

18:17.000 --> 18:22.000
And, okay, this was a bit slow, a 66, a minute second.

18:22.000 --> 18:35.000
But, and from there, if we can start it as a container, what atrocity can we do?

18:35.000 --> 18:38.000
Exactly.

18:38.000 --> 18:47.000
As we can start it as a container, well, we can have this awful thing.

18:47.000 --> 18:55.000
We can have that BSD as a Kubernetes pod, okay.

18:55.000 --> 19:05.000
So I have a small cluster on this laptop, where I will create a name face.

19:05.000 --> 19:13.000
Yeah.

19:13.000 --> 19:22.000
And I have a pod manifest that uses the container that I've just created, okay.

19:22.000 --> 19:32.000
So there we go.

19:32.000 --> 19:35.000
Hey, it works.

19:35.000 --> 19:45.000
And yeah, the cluster that's running inside, it's called kind, which is Kubernetes in Docker.

19:45.000 --> 19:56.000
And it explodes, you can query the container by querying the host, which is actually the container itself.

19:56.000 --> 20:04.000
So if I do that, oh, much, I understand, okay, I'm getting on with it.

20:04.000 --> 20:11.000
Okay, so the IP of this guy is this.

20:11.000 --> 20:21.000
So I can absolutely curl this, look.

20:21.000 --> 20:33.000
And it's actually the container running the virtual machine that's answering to my curl, okay.

20:33.000 --> 20:36.000
I did all the demos.

20:36.000 --> 20:46.000
So from there, I assume you have a pretty good idea of the vast possibilities that we have with that.

20:46.000 --> 20:50.000
What I showed here was with QMU.

20:50.000 --> 20:51.000
You saw this.

20:51.000 --> 21:01.000
This works also with firecracker, because the basically the technique used to build the channel is exactly the same.

21:01.000 --> 21:10.000
It's using PVH, which is a system included in Xen, like forever.

21:10.000 --> 21:23.000
And, well, first calling, because let's be clear, the main work has been done by calling festival like three years ago, and 22 years ago.

21:23.000 --> 21:27.000
And his work inspired what I did after that.

21:27.000 --> 21:31.000
Okay, the implementations are obviously not the same.

21:31.000 --> 21:33.000
We had some problems, I had others.

21:33.000 --> 21:38.000
But the boot method is the same, it's PVH.

21:38.000 --> 21:42.000
And actually, it's not that complicated.

21:42.000 --> 21:48.000
Instead of booting in the start entry point of the kernel,

21:48.000 --> 21:56.000
there is a special entry point called start Xen, what it was called, start Xen.

21:56.000 --> 22:05.000
And this entry point uses the information that are passed by the virtual machine manager,

22:05.000 --> 22:08.000
QMU, or firecracker, or whatever.

22:08.000 --> 22:11.000
You don't need, when you think about it.

22:11.000 --> 22:18.000
You don't need any bios, any bookloader, anything when you are starting a virtual machine from a VMM.

22:18.000 --> 22:25.000
You already know how many RAM do you have, what's the disks, and so on and so forth.

22:25.000 --> 22:35.000
So, you will gain a lot of time by just grabbing those information that are pushed by the VMM.

22:35.000 --> 22:38.000
And this is the main point of PVH.

22:38.000 --> 22:44.000
It's using a new entry point to just avoid all the bootloader stuff.

22:44.000 --> 22:49.000
Okay.

22:49.000 --> 22:53.000
Okay.

22:53.000 --> 23:03.000
I will not go through the implementation details, but because, well, you can see that part of the presentation

23:03.000 --> 23:12.000
on the BSDKN, BSDKN presentation, while I go deep into how it is implemented.

23:12.000 --> 23:18.000
I just want to mention that a part from PVH, which is the boot system only.

23:18.000 --> 23:25.000
After that, there was, yeah, there's a lot.

23:25.000 --> 23:34.000
Yeah, I like this tweet because this is the first time the kernel with my modifications.

23:34.000 --> 23:39.000
It, in it, the part of the kernel.

23:39.000 --> 23:45.000
So, at that point, it worked, and I was very proud, as it showed.

23:45.000 --> 23:54.000
So, yeah, another technique to speed up the boot process, and to speed up the kernel in general.

23:54.000 --> 24:09.000
It's, is to use, implement, and use instead of using like the PCI bus, which is, obviously, in a virtual machine, you don't really need PCI.

24:09.000 --> 24:10.000
Okay.

24:10.000 --> 24:23.000
So, you can use what's called MMIO, which is basically memory mapping, instead of a PCI bus, meaning that instead of using all the PCI complicated infrastructure,

24:23.000 --> 24:30.000
you just map an address, which will be used as the bus between the guests, the host and the guest.

24:30.000 --> 24:31.000
Okay.

24:31.000 --> 24:34.000
So, basically, it's, it's copying data through a structure.

24:34.000 --> 24:36.000
There's nothing faster than that.

24:36.000 --> 24:37.000
Okay.

24:37.000 --> 24:42.000
And, Viratayo has an implementation of MMIO.

24:42.000 --> 24:52.000
And so, one of the big works was to create a dummy bus, which is called VT.

24:52.000 --> 25:00.000
Sorry, in the VSD, which I stole from the OpenBSD, basically.

25:00.000 --> 25:14.000
And this bus permits to plug MMI, Viratayo on MMIO and not use any bus, which will be overkill.

25:14.000 --> 25:16.000
Okay.

25:16.000 --> 25:20.000
So, that's the second big chunk.

25:20.000 --> 25:23.000
After that, and I won't go into the details.

25:23.000 --> 25:29.000
So, that's a little joke that Krayo from the NetBSD crowd.

25:29.000 --> 25:35.000
Each time I was booting a faster, it was saying, faster.

25:35.000 --> 25:38.000
So, that's what I did.

25:38.000 --> 25:49.000
And using various techniques, I killed some delays that were, or useless, or avoidable in the context of a virtual machine.

25:49.000 --> 26:04.000
And this is where I achieved those, depending on the machine from 10 milliseconds to 20, 25.

26:04.000 --> 26:14.000
Someone with Intel i9, I don't know what, achieved 8 milliseconds, which is pretty good.

26:14.000 --> 26:22.000
Okay, I'm going to stop there to get some questions, or...

26:22.000 --> 26:26.000
Okay, so...

26:26.000 --> 26:29.000
Okay, cool.

26:29.000 --> 26:33.000
So, that's about the implementation.

26:33.000 --> 26:45.000
Now, calling the tremendous work in the arena of optimization, speed calculation, and so on and so forth.

26:45.000 --> 26:51.000
And for example, it did something called TSLog.

26:51.000 --> 26:58.000
TSLog is very simple, but genius.

26:58.000 --> 27:13.000
TSLog, what it does, you know that on X86 infrastructure, we have a clock, or more like a counter, which is called RSDT.

27:13.000 --> 27:17.000
Basically, it's every CPU cycle.

27:17.000 --> 27:24.000
But with that, you can know what time is it in your boot process.

27:24.000 --> 27:34.000
And using this, along with function names, you could say, okay, this guy is taking me 15 milliseconds.

27:34.000 --> 27:37.000
This guy, 20, this guy, 100.

27:37.000 --> 27:41.000
And you, well, I implemented it, stole it?

27:41.000 --> 27:44.000
Yeah, I basically stole code.

27:44.000 --> 27:47.000
That's more or less what I do.

27:48.000 --> 27:59.000
And using TSLog, I realized that there were their function that probably were subject to optimization.

27:59.000 --> 28:06.000
And that was the boot without optimization, with only pvh and mmio.

28:06.000 --> 28:11.000
And that's where I came down to, okay.

28:11.000 --> 28:24.000
By, again, killing some user's functions, optimizing other things, putting, you know, sometimes 20 millisecond.

28:24.000 --> 28:33.000
You can gain them just by putting a return somewhere instead of if blah, blah, blah, blah.

28:34.000 --> 28:41.000
Sometimes you break the bills, also doing that.

28:41.000 --> 28:46.000
Okay. Questions?

28:46.000 --> 29:00.000
Have you linked the adding support to NVIDIA's hypervisor for microvm's architecture?

29:00.000 --> 29:02.000
Yeah, that's a very good question.

29:02.000 --> 29:14.000
So, NVIDIA's D as an hypervisor called NVMM works with slower than KVM.

29:14.000 --> 29:19.000
Yeah, I didn't say it, but oh, yeah.

29:19.000 --> 29:27.000
I was asked if I tried it and implemented it in NVIDIA's D's hypervisor.

29:27.000 --> 29:32.000
NVIDIA's D hypervisor, hypervisor is called NVMM.

29:32.000 --> 29:41.000
It works on this machine and on my development machines, I use KVM, Linux KVM.

29:41.000 --> 29:44.000
There's a very good reason for that.

29:44.000 --> 29:47.000
I mean, let's face it.

29:47.000 --> 29:49.000
The world is a big Linux.

29:49.000 --> 29:55.000
And people are using Linux and Mac.

29:55.000 --> 30:05.000
So, I wanted this project to work primarily for Linux and Mac, NVIDIA's.

30:05.000 --> 30:09.000
But, I mean, I couldn't let NVIDIA's D be in.

30:09.000 --> 30:16.000
So, I tried all these things.

