WEBVTT

00:00.000 --> 00:16.320
Okay, and here we have, we have one last talk for the day, I hope we will be back next

00:16.320 --> 00:22.360
year, or in other venues of course, okay up next and that's not what you would consider

00:22.360 --> 00:28.260
a standard new tools package, but it has new in it and it's compiler tools related to

00:29.260 --> 00:36.060
new lighting, so there we go, you will tell us some details about that, Paul Kirkke, about

00:36.060 --> 00:49.380
new lighting, cross-platform, just in-time compilation, and the bucket, yeah, hello, I hope everybody

00:49.380 --> 00:54.380
can hear me right, my name is Paul Kirkke, I'm going to talk about a small library I've

00:54.380 --> 01:04.100
been using for the last 10 years already, it's called new lighting, if you close the official

01:04.100 --> 01:10.780
website of new lighting, or if you go to the Wikipedia page, you will read this sentence,

01:10.780 --> 01:17.980
new lighting is a library that generates assembly language at one time, it's, I kind of disagree

01:17.980 --> 01:23.420
with that, I would say it doesn't generate assembly language, it does generate machine code

01:23.420 --> 01:32.300
at one time, and what is it exactly, well if you consider, if you imagine a jet engine,

01:32.300 --> 01:40.060
lighting would be only the code generation part, it does provide you with what you could imagine

01:40.140 --> 01:50.220
as a virtual CPU with virtual registers, virtual obcodes, I think the thing that is most

01:50.220 --> 01:55.420
interesting about new lighting is it's cross-platform and by cross-platform, I mean, if there's

01:55.420 --> 02:00.780
somebody that has another guy in the room, it's the only person that can say, it's not learning

02:00.780 --> 02:07.020
on my CPU, but that everybody else should be covered, right, and it does work, you know, on all

02:07.100 --> 02:12.940
the common ones, I'm sorry to beat 64 beats, little and young, big and young, it's LGPL versions

02:12.940 --> 02:20.060
really is written in C, it has more dependencies, I think the only saying you could consider

02:20.060 --> 02:24.940
a dependency and that's, I mean, you can even work around that is all the memory mapping,

02:24.940 --> 02:34.380
functions, and map, and protects, stuff like that. It was painted in 2000, so it's 25 years old,

02:34.380 --> 02:40.540
by Paolo Bonzimu, and it's been developed and used for new small talk,

02:42.380 --> 02:50.220
C-lisp, in 2008, it moved to Gipz, and the maintainer is Paolo Caesar, Pereira, the Andrade,

02:50.220 --> 02:57.580
and it's at 2.2.3 right now, and personally, I'm not the also, I just contributed a few

02:57.660 --> 03:07.580
optimizations, and I wrote the effect for back end. A little bit about my involvement in that project,

03:07.580 --> 03:14.940
while I'm working on my own Gipz engine, that uses lightning, it's a Gipz engine for PlayStation

03:14.940 --> 03:20.860
1 emulator, and if you're interested in that, you should go see my talk about it tomorrow.

03:21.740 --> 03:29.420
So, why would you use lightning? Well, people could tell me, we own the GCC Devon. People

03:29.420 --> 03:35.420
would tell me, yeah, why don't you just use lip GCC GIT? They're actually different tools for different purposes.

03:36.860 --> 03:45.260
LLVM, GCC GIT, they were designed mostly for languages, so they, they manipulate concepts

03:45.340 --> 03:51.180
that when you want to do binary translation, or say like that, you just don't have, you don't have

03:51.180 --> 04:00.220
the concept of variable, for instance, or the concept of functional blocks. So, they're very

04:00.220 --> 04:06.940
long-watch focused, and as such, they also have very long-watch focused optimizations.

04:07.820 --> 04:14.220
They were also designed mostly for a head-of-time compiling, which means they have very,

04:14.300 --> 04:20.140
very powerful optimizations, and they generally generate very, very fast code,

04:20.140 --> 04:29.740
but they do it slowly, and in my case, in my emulator, I need to recompile 10,000 piece of code

04:30.300 --> 04:37.340
per second, and then if I use LLVM, or GCC GIT, it just takes minutes when I only have 16

04:37.420 --> 04:44.140
nelesicles. And lightning helps here, because lightning is very good at generating bad codes,

04:44.140 --> 04:55.740
but it does, it very, very fast. A little bit about the API, so it gives you a minimum of six

04:55.740 --> 05:02.380
general purpose registers, and six floating pond registers. The way you use them is basically,

05:02.460 --> 05:07.820
it's not macros, so you have from GIT or zero to GIT or two plus, if you have more,

05:08.540 --> 05:18.060
than three colors of color saved registers. GIT or V2 plus, if colleagues saved registers,

05:18.060 --> 05:25.580
and six floating points, GIT and zero to GCC F5 plus. You have a simple register,

05:25.660 --> 05:33.980
locator using the functions GIT, GIT and GIT reg. But the thing is, this register locator is

05:33.980 --> 05:41.580
just going to return one of the three registers that is listed above. If you are not going,

05:41.580 --> 05:48.460
if you don't need to dynamically allocate a register, you don't need that, and you can also

05:49.820 --> 05:54.060
implement your own register locator, just manipulating the register that you already have.

05:56.540 --> 06:04.860
In the API, so you have V2 locators, they are basically macros that allow you to generate

06:04.860 --> 06:13.180
a specific instruction, and you have these V2 locators for all kinds of things, binary operations,

06:13.260 --> 06:22.940
or arithmetic, etc. Each instruction is basically the name of the occult rule or sub,

06:24.700 --> 06:31.260
most time there is a register or immediate flag, which allows you to specify the type of the last

06:31.260 --> 06:37.580
parameter, if it's a register or if it's an immediate value, and if applicable, the type of

06:37.580 --> 06:47.980
suffix. So, for instance, you have the format on the top, so the format of macros you would call

06:47.980 --> 06:58.540
the GIT underscore, the name of the occult or i, then the type and the parameters, and according to,

06:59.020 --> 07:09.660
that's just one small sample of all the occults that you have, but a little bit about branching

07:11.180 --> 07:19.500
to create a forward branch, you would add branching occult in this code, this is the

07:19.500 --> 07:26.620
be a twice or branch, if equal, so if the register or zero is equal equals to zero,

07:26.620 --> 07:32.460
that is going to branch, and then a bit further in your program, you can just patch the node,

07:32.460 --> 07:39.500
so you just say, yeah, I want this previous branch to jump to these address. For backwards,

07:39.500 --> 07:48.540
branch is a bit similar, but different, you just declare a label first, and then in your branch

07:48.620 --> 07:54.700
occult, you're going to get the address of the branch occult, and using GIT patch,

07:54.700 --> 08:06.380
as you can patch the branch back to the label. To generate the function polar function

08:06.380 --> 08:13.500
epilogue, you have GIT polar GIT epilogue, it's not that average is obvious, and for

08:13.500 --> 08:19.500
regeneration, you just call the GIT, and it function, it returns point out to the function that

08:19.500 --> 08:26.380
you just created. Finally, for seeing, there is assembly of the code that you created,

08:26.380 --> 08:35.660
you can just call GIT disassemble. So on top, you have an example, for instance, of

08:36.460 --> 08:45.980
a macro that I call with three parameters, it's just a dumb A plus B equals C, or sorry,

08:45.980 --> 08:54.540
C, well, you understand, I hope, and this is what, you know, it generates on four different

08:54.540 --> 09:02.700
architectures, on x86, 64 bits going to use the LEA instruction, on x64 is going to use the

09:02.780 --> 09:11.180
LEA instruction, on power pc is going to use the add instructions on fh4, so the hsh4

09:11.180 --> 09:19.260
has 16 bits instructions, and then it doesn't support three parameters to the instruction, so

09:19.260 --> 09:25.900
it's first going to move one register to the result, in this case, I think it's moving R2 to R0,

09:25.980 --> 09:36.380
and then just add R1 to R0. And basically, every old code that are supported by a new Lightning

09:37.740 --> 09:45.180
can add a result in one instruction, if the architecture supports it in one instruction,

09:45.180 --> 09:49.820
otherwise it's going to, you know, find a way to generate the code that that

09:49.900 --> 10:00.380
reproduces the behavior. The wise mind once said, one's unfortunately the one can be told

10:00.380 --> 10:04.620
what new Lightning is, you have to see it for yourself, so I'm going to show you a small demo.

10:05.660 --> 10:10.060
To understand the demo, you have to energy in yourself, you have to imagine

10:11.020 --> 10:18.460
a scripting on which I am going to call my blog. My blog works with memory array of 32k

10:18.540 --> 10:27.260
cells. Each cell is assigned 32 bit value, and you have eight seven different

10:27.260 --> 10:33.820
odd codes. The shovels allow you to switch the current cell, the plus minus allows you to

10:33.820 --> 10:40.940
increment the value of the cell, and the bracket allows you to repeat the content of the cell

10:40.940 --> 10:46.140
until the value of the current cell, sorry, repeat the code between the brackets, until the value

10:46.140 --> 10:51.420
of the current cell is zero. Finally, the dot allows you to print a character in the current cell.

10:53.580 --> 11:04.300
So let's see if I can switch. Yeah, so I'm just going to show you how we can write a

11:04.380 --> 11:14.060
jit engine for that right now in five minutes. That's okay. So it's called mb.c.

11:16.540 --> 11:25.420
Yeah, so that's my function. All the code of my jit engine is 92 lines.

11:26.300 --> 11:40.860
So here we have my cells all right. My cells all right. We first initialize lightning

11:40.860 --> 11:47.260
calling energy. We pass the pass to the binary just because jit has a lightning has a

11:47.260 --> 11:54.700
built in this assembler. So it helps it disassemble. Then we're going to create a new

11:54.700 --> 12:00.860
compilation state with jit new state. We want to create a function so we're going to call the jit

12:00.860 --> 12:07.340
product for creating the product of the function. Then we're going to use two registers. The first

12:07.340 --> 12:14.300
one will be jit v1, which is our cell pointer. We'll just pass it the cell pointer. The second one is

12:14.300 --> 12:20.540
the value of the current cell. And what we're going to do is in the four loop for each character

12:20.540 --> 12:29.820
that we read on the input, stand out input. We're going to see if it's a plus, then we just

12:30.940 --> 12:36.540
generate an incrementation of the current value. If it's minus, then we just generate a

12:36.540 --> 12:43.500
decolonation of the current value. If it's a shovel, then we store the current value to the current

12:43.580 --> 12:52.300
pointer. We increment the cell pointer, and we just load the new value of the current cell.

12:54.700 --> 13:01.180
And similarly, if it's a shiver on facing left, then we store the old value. We

13:01.180 --> 13:09.420
decrement the pointer, then load the new value. If we have an opening bracket, then we're going to

13:10.380 --> 13:17.180
check the value of the cell counter. If it's zero, that means we don't have to enter the loop,

13:17.180 --> 13:24.380
then we can just exit. And I'm going to talk about it later. Otherwise, we have a label, which will

13:24.380 --> 13:31.740
allow the closing bracket to jump back to that point. And in the closing bracket, we just check

13:31.820 --> 13:38.860
if the current value of the current cell is zero, then we want to launch back to the

13:38.860 --> 13:46.060
beginning. So we have the gpatch at, which we launch back to the label. Otherwise, we want this

13:46.060 --> 13:53.500
label to this bounce function to jump here after the loop, so we just patch it there.

13:54.140 --> 14:03.100
Finally, if we find a dot, we're just going to call gppatch test the lightning that we're

14:03.100 --> 14:11.180
going to call a function. Pouching arguments as a char, sign char, which is the current value of

14:11.180 --> 14:19.420
the current cell. And we call the put char function. And that's it. We just emit the function.

14:20.300 --> 14:26.860
We're going to print the design. Call the function, so we get the output and clean up.

14:30.460 --> 14:35.820
So I have my program here that everybody can obviously tell what it does.

14:36.460 --> 14:49.660
And I have my BC. I have my program compiled for the exact same code that I showed. I didn't

14:49.660 --> 14:58.060
modify it. Compile for x86, MIPS and SH4, so I can read it for x86. That's the program output.

14:59.020 --> 15:11.100
All the details on the code. Then mbc, MIPS, you have the xx4, the compile form MIPS,

15:11.100 --> 15:22.140
I pass the same program, then you see all the MIPS code. And the same one for SH4.

15:22.540 --> 15:26.540
And then you go.

15:26.540 --> 15:52.540
Yes, the question is, why use a jit compiler in an emulator? It's much faster. Like 10 times faster or

15:52.540 --> 16:00.540
more. If you just consider the CPU in relation part, there's no comparison. It's the latest faster.

16:01.500 --> 16:05.900
We're going to use a function. How do you handle the exact exception for example?

16:05.900 --> 16:10.140
You can help with the exact exception in your emulator because you can compile everything.

16:10.140 --> 16:14.940
Each of you, you don't know, like you get control, but only when you use the basic code.

16:14.940 --> 16:17.740
So you have an exception in the middle of the basic code.

16:17.740 --> 16:19.580
Yeah. How do you handle that?

16:19.580 --> 16:24.220
So the question is, how do I handle exceptions in the middle of generating a block?

16:24.220 --> 16:25.020
Is that right?

16:25.020 --> 16:29.580
Well, when you write that code is important, then you'll execute it, and you'll skip the whole

16:29.580 --> 16:37.180
piece of code that's on. So you have to see the talk tomorrow.

16:37.180 --> 16:42.780
Okay. Because I'm talking about code generator, the question, I'm going to answer it tomorrow

16:42.780 --> 16:47.340
in the talk is about more like a jit engine. That's like a jit engine scene.

16:47.420 --> 16:53.900
TLDR, all the blocks are connected between each other, the jump to each other.

16:53.900 --> 16:57.100
Actually, no, they don't. In my diner like they don't.

16:57.100 --> 17:02.460
They jump back to a specific function that I call dispatcher, and the dispatcher is going to

17:02.460 --> 17:09.260
check the stages of exceptions and the stages of the cycle selects.

17:09.260 --> 17:12.780
And exit if the time is up or if there is an exception.

17:12.860 --> 17:18.060
So if I have like a block of compiled code and I get an exception in the middle, I'm going to

17:18.060 --> 17:22.860
handle it or report the exception only at the end. That works good enough.

17:24.860 --> 17:25.820
Yes.

17:25.820 --> 17:28.460
So I don't know if the question is talk today or tomorrow.

17:30.460 --> 17:37.980
So basically, jit confires, they do average some transit part.

17:37.980 --> 17:43.660
I'll just try to trace the execution and try to compile more walks at the same time.

17:45.660 --> 17:50.060
He compares his approach for instance to a lot of QM was doing.

17:52.060 --> 17:54.860
And why would you opt for this enough work?

17:54.860 --> 17:55.580
This is the time.

17:57.580 --> 18:05.820
So the question is how do I compare what Lightning is doing to what and being is doing?

18:06.780 --> 18:08.540
Well, QM uses them all for instance.

18:08.540 --> 18:08.860
Yeah.

18:08.860 --> 18:11.980
Why don't you explain the use of them on this time of writing all of this?

18:13.580 --> 18:14.540
That's a good question.

18:15.980 --> 18:17.980
I was funny how to do it from scratch, I guess.

18:18.940 --> 18:19.180
Yeah.

18:21.180 --> 18:23.740
And so QM using mode.

18:26.300 --> 18:30.540
I don't know if you can just plug it into a PlayStation emulator that easily.

18:31.500 --> 18:34.940
It's especially using mode.

18:34.940 --> 18:38.780
It's mostly, yeah, okay.

18:42.620 --> 18:44.380
I mean, you're missing the point.

18:44.380 --> 18:46.620
The point there, what it was turned right, you know.

18:54.620 --> 18:55.900
Oh, depth?

19:00.540 --> 19:05.180
Libertary, that's compilation.

19:05.180 --> 19:07.500
I'm talking about Libertjit.

19:07.500 --> 19:08.380
Pipi also?

19:08.380 --> 19:09.020
Yep.

19:09.020 --> 19:10.140
The bi from Pipi.

19:11.340 --> 19:14.780
One day, told me that you was doing exactly what you said.

19:14.780 --> 19:15.100
Yep.

19:15.100 --> 19:18.140
But you generate the very clear code.

19:18.140 --> 19:19.500
Because it was passed.

19:19.500 --> 19:21.500
And then the compiler is small enough.

19:21.500 --> 19:21.740
Yep.

19:21.740 --> 19:24.940
And the compiler just they deliver first.

19:24.940 --> 19:27.500
Right in the pool, but anyway, so I have to...

19:27.500 --> 19:28.220
Yeah.

19:29.180 --> 19:34.300
So I was thinking, do you know if this is something different approach?

19:34.300 --> 19:38.620
We've been in a relationship by Pi and is this library?

19:38.620 --> 19:42.060
So we have to, we have to go for it.

19:42.060 --> 19:42.620
We have to go.

19:42.620 --> 19:44.460
We can talk after if you want.

19:44.460 --> 19:44.620
Yeah.

