WEBVTT

00:00.000 --> 00:14.000
So, today's presentation is going to be focused on that day made transfer, that you see on the left hand side here.

00:14.000 --> 00:18.000
We're trying to figure out how fast it is to run a string chart.

00:18.000 --> 00:22.000
So, today we're doing with two types of string, a string that is Latin one.

00:22.000 --> 00:27.000
So, it's standard string with just 8 to 8 to set characters.

00:27.000 --> 00:31.000
And the only one is a UTF-16 string.

00:31.000 --> 00:37.000
And for today, what we want to also remember a little bit how string chart works.

00:37.000 --> 00:41.000
The way works is called a basic check to see where this is a Latin one.

00:41.000 --> 00:47.000
If it is, it delegates to a string Latin one throughout, otherwise the string UTF-8,

00:47.000 --> 00:53.000
where its character is two bytes instead of one.

00:54.000 --> 00:59.000
Something that might be worth remembering is how string Latin one throughout works.

00:59.000 --> 01:05.000
All it does really check the index is the index within the boundaries of the string that I'm trying to locate,

01:05.000 --> 01:08.000
or the character I'm trying to locate.

01:08.000 --> 01:13.000
And then, all it does is track the characters and it's simple.

01:13.000 --> 01:17.000
A string UTF-8 chart is very similar.

01:17.000 --> 01:19.000
It's a little more complicated.

01:19.000 --> 01:24.000
We got two characters per string, but for today, it's not so important.

01:24.000 --> 01:26.000
But it's important to remember how this works.

01:26.000 --> 01:28.000
The string Latin one throughout.

01:28.000 --> 01:36.000
And what you see on the right hand side, that is the code that damage generates for an average time benchmark.

01:36.000 --> 01:42.000
So, basically, it wraps your own benchmark around some code,

01:42.000 --> 01:46.000
where it starts by taking the start time,

01:46.000 --> 01:50.000
then it goes into a loop, and invokes your method all the time,

01:50.000 --> 01:53.000
but the number of operations until it's done.

01:53.000 --> 01:54.000
It's done.

01:54.000 --> 02:00.000
It's basically a Boolean variable here.

02:00.000 --> 02:02.000
A Boolean, a volatile Boolean.

02:02.000 --> 02:04.000
And then it calculates a stop time.

02:04.000 --> 02:06.000
It's got the time, it's got the operations pump.

02:06.000 --> 02:08.000
It can correct simple.

02:08.000 --> 02:13.000
So, we're going to do next, it'll just run it.

02:13.000 --> 02:16.000
So, let's go here.

02:16.000 --> 02:17.000
This is big enough.

02:17.000 --> 02:20.000
So, let's run this benchmark.

02:20.000 --> 02:24.000
We're going to run it in a slightly noble way,

02:24.000 --> 02:27.000
noble, because this is the first time

02:27.000 --> 02:30.000
that I'm showing this to the outside world.

02:30.000 --> 02:35.000
What you see is, we're benchmarking

02:35.000 --> 02:37.000
a global VM, native image.

02:37.000 --> 02:39.000
We basically, before this talk,

02:39.000 --> 02:43.000
I wrapped the benchmark into a global VM native image.

02:43.000 --> 02:45.000
And what I'm doing here is I'm basically

02:45.000 --> 02:49.000
invoking the native binary, which is a target on benchmarks.

02:49.000 --> 02:53.000
And you can see from the VM version that there is some cruise here.

02:53.000 --> 02:55.000
We ran on a substrate VM.

02:55.000 --> 02:56.000
So, this is not hotspot.

02:56.000 --> 02:59.000
This is the VM, the runs, global VM native images.

02:59.000 --> 03:02.000
And we see we're running global VM community addition.

03:02.000 --> 03:06.000
This is the first time this is showing life outside of my team.

03:06.000 --> 03:09.000
And we can in some numbers, okay?

03:09.000 --> 03:13.000
So, here at the bottom you can see some numbers.

03:13.000 --> 03:16.000
Let me put them right away at higher up.

03:16.000 --> 03:20.000
So, there is a few questions we can ask about these numbers.

03:20.000 --> 03:24.000
First one, are these numbers fast or the slow?

03:24.000 --> 03:26.000
That's a little bit hard for us to know,

03:26.000 --> 03:31.000
because first, you don't know what the specs of this machine are.

03:31.000 --> 03:36.000
And second, you don't have any point of reference to compare with.

03:36.000 --> 03:39.000
And the other question is, you might want to follow my.

03:39.000 --> 03:41.000
So, let's go to the first.

03:41.000 --> 03:46.000
Follow my, I'm an engineer at the Open JDK team at Red Hat.

03:46.000 --> 03:48.000
I work on global VM native image.

03:48.000 --> 03:51.000
And also hotspot, git compilers.

03:51.000 --> 03:55.000
And for today, the most important thing is I am the creator of the

03:55.000 --> 03:58.000
extension that you're seeing in action there.

03:58.000 --> 04:01.000
What it allows is to do the same extension.

04:01.000 --> 04:05.000
This is essentially benchmark yabba code when it's running inside

04:05.000 --> 04:07.000
a global VM native image.

04:07.000 --> 04:12.000
So, let's go back to our numbers.

04:12.000 --> 04:17.000
What we wanted to answer first is where this was fast or slow.

04:17.000 --> 04:22.000
But before we get there, there's something a little bit not going on here.

04:23.000 --> 04:30.000
We've got lotting one being slower the UTF-A16 by quite a bit.

04:30.000 --> 04:32.000
And this is slightly surprising.

04:32.000 --> 04:35.000
If anything, this should be roughly about the same in performance.

04:35.000 --> 04:36.000
One, but expect.

04:36.000 --> 04:40.000
If anything, one would expect maybe UTF-A16 to be a slightly slower,

04:40.000 --> 04:43.000
because it's a more complex implementation.

04:43.000 --> 04:45.000
But this is surprising.

04:45.000 --> 04:46.000
So, what do we do?

04:46.000 --> 04:48.000
Let's profile it.

04:48.000 --> 04:49.000
How can we profile it?

04:50.000 --> 04:53.000
We can profile it, but who can in a profiler.

04:53.000 --> 04:57.000
So, this profiler here, do you see?

04:57.000 --> 05:01.000
This is one that we've created specifically for this work, for the

05:01.000 --> 05:02.000
global VM native image.

05:02.000 --> 05:05.000
What it does, it wraps the day-mage

05:05.000 --> 05:08.000
invocation into the native invoker here,

05:08.000 --> 05:10.000
around a perfect code invocation.

05:10.000 --> 05:13.000
And we added in the call graph parameter.

05:13.000 --> 05:18.000
In order to use the dual of debugging for symbols available for

05:18.000 --> 05:22.000
native images, in order to then extract,

05:22.000 --> 05:26.000
to be able to match what it is,

05:26.000 --> 05:30.000
they are simply with what code we're running.

05:30.000 --> 05:32.000
The numbers here are not so important.

05:32.000 --> 05:35.000
The most important thing is that out of each benchmark,

05:35.000 --> 05:38.000
we get a pair of binary output here.

05:38.000 --> 05:42.000
And then what we can do is we can expect it.

05:42.000 --> 05:43.000
How can we expect it?

05:43.000 --> 05:46.000
We can basically go Perf, annotate,

05:46.000 --> 05:47.000
and we're going to open it.

05:47.000 --> 05:49.000
Can you see that the bottom here,

05:49.000 --> 05:50.000
and basically going to go.

05:50.000 --> 05:52.000
I'm going to start with Latin one.

05:52.000 --> 05:54.000
Now, we're going to see some assembly.

05:54.000 --> 05:56.000
I'm not going to go into a lot of depth.

05:56.000 --> 05:58.000
I've come back up slides explaining things in

05:58.000 --> 05:59.000
quality detail.

05:59.000 --> 06:03.000
I'm going to try to understand a little bit the flow of what's going on

06:03.000 --> 06:05.000
when we go here.

06:05.000 --> 06:08.000
So, what we see is that for the Latin one,

06:08.000 --> 06:11.000
it tells us that the first method it jumps to is the

06:11.000 --> 06:12.000
string Latin one chart.

06:12.000 --> 06:14.000
Okay, we say earlier, this is the method I get

06:14.000 --> 06:17.000
called for a string chart.

06:17.000 --> 06:21.000
Okay, this is the implementation.

06:21.000 --> 06:23.000
We see that there's a check in this call.

06:23.000 --> 06:25.000
This is the call that we saw earlier.

06:25.000 --> 06:27.000
It's actually not exactly the same call,

06:27.000 --> 06:29.000
but it's one that is on the Nithar.

06:29.000 --> 06:32.000
And that's pretty much what we need to know at this stage.

06:32.000 --> 06:34.000
We can move around a little bit.

06:34.000 --> 06:37.000
Here, what we see now, this is the actual game

06:37.000 --> 06:42.000
that is calling in to our chart benchmark.

06:42.000 --> 06:43.000
Good.

06:43.000 --> 06:49.000
And then here, what we see here is the string.

06:49.000 --> 06:52.000
No, this is...

06:52.000 --> 06:55.000
Oh, it's been something interesting.

06:55.000 --> 06:56.000
It has happened here.

06:56.000 --> 06:57.000
Oh, yeah.

06:57.000 --> 06:58.000
No.

06:58.000 --> 07:01.000
This is our string chart.

07:01.000 --> 07:05.000
A chart Latin, which is calling is to a string chart.

07:05.000 --> 07:08.000
And then here's what I wanted to get to.

07:08.000 --> 07:11.000
This is basically a string chart.

07:11.000 --> 07:15.000
The basically, what is doing is the most important thing here

07:15.000 --> 07:19.000
is calling this to a string Latin one chart.

07:19.000 --> 07:22.000
Okay, so we see, I'll leave it the chain of all the calls.

07:22.000 --> 07:24.000
And I'll leave it a slightly different way,

07:24.000 --> 07:26.000
but we see our benchmark calling the string chart.

07:26.000 --> 07:30.000
String chart, calling to a string Latin one chart, et cetera.

07:30.000 --> 07:31.000
Okay.

07:31.000 --> 07:33.000
That's how it works for Latin one.

07:33.000 --> 07:38.000
What about UTF-A16?

07:38.000 --> 07:39.000
Okay.

07:39.000 --> 07:40.000
So let's go here.

07:40.000 --> 07:41.000
Where are we?

07:41.000 --> 07:43.000
We're in a string chart.

07:43.000 --> 07:45.000
Okay.

07:45.000 --> 07:46.000
So what do we have down here?

07:46.000 --> 07:49.000
We've seen a lot of knobs, a lot of things.

07:49.000 --> 07:51.000
We've got the tech index.

07:51.000 --> 07:54.000
But there's nothing else.

07:54.000 --> 07:59.000
So what you see in the screen is that the string UTF-A16

07:59.000 --> 08:05.000
where our implementation has been inline is to a string chart.

08:05.000 --> 08:08.000
So we can make a theory here.

08:08.000 --> 08:11.000
I can make a theory saying the reason why UTF-A16

08:11.000 --> 08:14.000
was performed in better the string Latin one was because

08:14.000 --> 08:19.000
UTF-A16 chart was inline in the string chart.

08:19.000 --> 08:20.000
Okay.

08:20.000 --> 08:22.000
That's a theory I have.

08:22.000 --> 08:24.000
Now, how can I prove it?

08:24.000 --> 08:26.000
I can prove it.

08:26.000 --> 08:28.000
Let's prove it.

08:29.000 --> 08:30.000
What can we do?

08:30.000 --> 08:32.000
We're going to rebuild a native binary.

08:32.000 --> 08:35.000
Are we going to rebuild this passing in a parameter called

08:35.000 --> 08:38.000
max notes in trivial method 40?

08:38.000 --> 08:39.000
Okay.

08:39.000 --> 08:41.000
So let me make a theory.

08:41.000 --> 08:42.000
What we doing?

08:42.000 --> 08:43.000
This parameter here.

08:43.000 --> 08:44.000
Why are those bills?

08:44.000 --> 08:47.000
I'm going to explain what's going on.

08:47.000 --> 08:52.000
The call compiler will inline one of the conditions for inline

08:52.000 --> 08:56.000
in a method is when a method is considered trivial.

08:56.000 --> 08:59.000
What does it mean to be for a method to be trivial?

08:59.000 --> 09:04.000
It means that inside the method the compiler graph has 20 notes

09:04.000 --> 09:05.000
or less.

09:05.000 --> 09:08.000
So what I'm doing here is something in the case.

09:08.000 --> 09:09.000
Go and inline.

09:09.000 --> 09:13.000
Those methods have got 40 notes instead of 20.

09:13.000 --> 09:14.000
Okay.

09:14.000 --> 09:17.000
So I'm basically giving in more budget to inline bigger methods.

09:17.000 --> 09:20.000
And we're going to see what happens then when we do that.

09:20.000 --> 09:21.000
So this is native image.

09:21.000 --> 09:23.000
So it does the bill.

09:23.000 --> 09:26.000
And then basically eventually comes on to the button.

09:26.000 --> 09:27.000
Okay.

09:27.000 --> 09:29.000
We got you running.

09:29.000 --> 09:30.000
Yes.

09:30.000 --> 09:31.000
Run edge.

09:31.000 --> 09:34.000
See what we see now.

09:34.000 --> 09:35.000
Let's start again.

09:35.000 --> 09:37.000
We see our benchmark is running.

09:37.000 --> 09:38.000
It's of trivial.

09:38.000 --> 09:40.000
Well, we can see at the top.

09:40.000 --> 09:41.000
It's like a benchmarks.

09:41.000 --> 09:43.000
We start to see numbers.

09:43.000 --> 09:47.000
We see numbers are considerably faster than we saw before.

09:47.000 --> 09:52.000
But the interesting thing we're going to see now is we're going to see

09:52.000 --> 09:57.000
the numbers between UTF-16 and LAT-1 are pretty much the same now.

09:57.000 --> 09:59.000
So I feel we seem to have legs.

09:59.000 --> 10:03.000
The reason why things improved was because of inline in.

10:03.000 --> 10:05.000
What we can go is the further.

10:05.000 --> 10:07.000
We can look at the profiling data.

10:07.000 --> 10:11.000
Obviously, I'm not going to go through entire steps because I only got 20 minutes.

10:11.000 --> 10:14.000
But we're going to look at it directly here.

10:14.000 --> 10:19.000
So we're going to do profanoid.

10:19.000 --> 10:23.000
Well, sorry, if I can type.

10:23.000 --> 10:27.000
Well, now we see a string chart for LAT-1.

10:27.000 --> 10:30.000
We basically go into the profiling data for LAT-1.

10:30.000 --> 10:32.000
And we see a string chart.

10:32.000 --> 10:33.000
We see the jump.

10:33.000 --> 10:36.000
This is probably the jump for the coder.

10:36.000 --> 10:38.000
And then we jump down to here.

10:38.000 --> 10:41.000
And then we see this number of techniques.

10:41.000 --> 10:43.000
Check index is gone.

10:43.000 --> 10:44.000
The call.

10:44.000 --> 10:47.000
We also go now a string LAT-1 calls anymore.

10:48.000 --> 10:51.000
So we can see inlining how successfully happened.

10:51.000 --> 10:54.000
And here is basically the final instruction.

10:54.000 --> 10:55.000
This is the instruction.

10:55.000 --> 10:56.000
The moves it'll.

10:56.000 --> 10:59.000
It's a move with a zero.

10:59.000 --> 11:00.000
Zero.

11:00.000 --> 11:02.000
Basically, it's converted the bite into a chart.

11:02.000 --> 11:06.000
In a way that that then can be returned.

11:06.000 --> 11:08.000
So we answer one question.

11:08.000 --> 11:11.000
Why UTF-A-16 was faster than LAT-1.

11:11.000 --> 11:17.000
We have, I post another question earlier.

11:17.000 --> 11:22.000
Which is, is that those numbers we saw earlier,

11:22.000 --> 11:25.000
even the ones here, are they faster or slow?

11:25.000 --> 11:28.000
And obviously, what we can do is a very simple thing is,

11:28.000 --> 11:32.000
how do things allow with hotspot with standard JDK?

11:32.000 --> 11:33.000
So let's do that.

11:33.000 --> 11:35.000
So we're going to package it.

11:35.000 --> 11:41.000
We're going to package it in JV mode.

11:41.000 --> 11:42.000
Great.

11:42.000 --> 11:45.000
Now we're going to run it.

11:45.000 --> 11:49.000
And we're going to focus now.

11:49.000 --> 11:51.000
We're going to focus on the LAT-1.

11:51.000 --> 11:52.000
Okay?

11:52.000 --> 11:54.000
We're going to leave UTF-A-16 aside.

11:54.000 --> 11:59.000
And we're also going to do, we're going to add a profiler.

11:59.000 --> 12:00.000
Perfect.

12:01.000 --> 12:07.000
That allows us to see what the assembly looks like for this particular case.

12:07.000 --> 12:08.000
So we start running.

12:08.000 --> 12:10.000
Obviously, the VM version has changed.

12:10.000 --> 12:13.000
So that you can see it here.

12:13.000 --> 12:17.000
We now have up in JVK version, the invoker is Java.

12:17.000 --> 12:20.000
So this is a clear difference with what we're doing before.

12:20.000 --> 12:22.000
And we start to see numbers.

12:22.000 --> 12:26.000
We see we got 1.7 nanosecond preparation.

12:26.000 --> 12:28.000
So which is faster than LAT.

12:28.000 --> 12:29.000
Okay?

12:29.000 --> 12:30.000
That's not all of news.

12:30.000 --> 12:35.000
I mean, it's something that we would all expect that to happen.

12:35.000 --> 12:40.000
AOT can make the same optimizations as a hospital can do.

12:40.000 --> 12:41.000
Or that's it.

12:41.000 --> 12:43.000
Well, let's have a look.

12:43.000 --> 12:47.000
And I'm going to make this a slightly smaller.

12:47.000 --> 12:49.000
Just that it's been more clear.

12:49.000 --> 12:52.000
So the way it's going to be a little bit confusing.

12:53.000 --> 12:57.000
Maybe hopefully not.

12:57.000 --> 12:58.000
Okay.

12:58.000 --> 13:00.000
That's a big enough.

13:00.000 --> 13:04.000
Once again, I'm not going to try to go into a lot of detail.

13:04.000 --> 13:08.000
What we can see is that the first thing is that the HOTS method is the JMHNAT.

13:08.000 --> 13:09.000
It's called.

13:09.000 --> 13:12.000
In the very last one, we saw the string chart was the HOTS method.

13:12.000 --> 13:15.000
So obviously, the inline in that we achieved with the previous option.

13:15.000 --> 13:18.000
But we increased the budget to 40 notes.

13:18.000 --> 13:22.000
It's not as good as the inline in the G, the HOTS population can do.

13:22.000 --> 13:27.000
Obviously, HOTS can see what is HOTS and can basically optimize the inline in.

13:27.000 --> 13:32.000
So we got a lot more inlining happening here.

13:32.000 --> 13:36.000
And one of the things that is interesting to see as well is that,

13:36.000 --> 13:40.000
essentially what we see is this is a lot of inline assembly.

13:40.000 --> 13:43.000
And then we get back to the bottom.

13:43.000 --> 13:47.000
And then we basically after this there is a loop back at which we don't see it here.

13:47.000 --> 13:51.000
Sometimes you see, sometimes you see it, but you see it as a loop back up.

13:51.000 --> 13:54.000
So basically we run my iteration, we loop back up.

13:54.000 --> 13:58.000
Now, I have one more thing to show you today.

13:58.000 --> 14:04.000
AOT, we all expected that was going to be slower than yet.

14:04.000 --> 14:07.000
What about gravity and peak yield?

14:07.000 --> 14:12.000
So gravity and peak yield is a proprietary technology for Oracle.

14:12.000 --> 14:17.000
But it allows you to, basically it's called profile guided optimization.

14:17.000 --> 14:21.000
The idea is you run your native binary with some training,

14:21.000 --> 14:23.000
with some instrumentation.

14:23.000 --> 14:26.000
Then you run it through your training or your benchmark or whatever.

14:26.000 --> 14:29.000
And then you out of that you get some profile in data.

14:29.000 --> 14:30.000
You tend to profile in data.

14:30.000 --> 14:35.000
You use it to rebuild your native image with this data.

14:35.000 --> 14:40.000
And you basically have got something akin to a kit.

14:40.000 --> 14:43.000
But basically you've done an offline with just some training.

14:43.000 --> 14:46.000
The question is, would that be faster than a hotspot or not?

14:46.000 --> 14:51.000
Who thinks this is going to be a hotspot that is going to be faster than peak yield?

14:51.000 --> 14:54.000
Can you raise your hands?

14:54.000 --> 15:00.000
Two people. Who thinks peak yield is going to be faster than a hotspot?

15:00.000 --> 15:03.000
Six or seven. Okay.

15:03.000 --> 15:05.000
No, no, no participation. Let's have a look.

15:05.000 --> 15:13.000
So here we have already have prevailed things ahead of this code today or ahead of this presentation.

15:13.000 --> 15:16.000
So let's run it.

15:16.000 --> 15:19.000
This is running a slightly different now.

15:19.000 --> 15:22.000
Well, the first notice of what difference is the VM version has changed.

15:22.000 --> 15:26.000
When you're running with Oracle GraphiM, that's a proprietary version.

15:26.000 --> 15:29.000
The VM in Boga has a slightly changed.

15:29.000 --> 15:34.000
But the changes that you see in the screen about the VM Boga name is not so relevant.

15:34.000 --> 15:39.000
But it's relevant is that this VM in Boga initially is a p-year instrumentative vocar.

15:39.000 --> 15:46.000
What we do is behind the scenes, we inject a warmer fork so that it runs on the instrumentative binary.

15:46.000 --> 15:55.000
Then when that completes, we take the profiling data that comes out of the instrumentation and we rebuild the native image with that,

15:55.000 --> 15:58.000
which is what's happening right now here.

15:58.000 --> 16:07.000
When that completes, we basically execute the benchmark with the optimist native binary.

16:07.000 --> 16:11.000
And we start to see the numbers.

16:11.000 --> 16:16.000
And we see it takes about 1.4 nanosecosperification.

16:16.000 --> 16:21.000
Well, we focus on Latin one.

16:21.000 --> 16:27.000
So I leave the aside the UTF-60 because I only as you have told it to run Latin one.

16:27.000 --> 16:35.000
Now the question is why? Why is p-year faster than hotspot? Let's have a look.

16:35.000 --> 16:43.000
So once again, I've done this a little bit of head-of-time, so I don't have to repeat it here all the things.

16:43.000 --> 16:48.000
But we can move the profiling data from p-year.

16:48.000 --> 16:53.000
This is going to take a little bit more exercise.

16:53.000 --> 17:00.000
But what we can, if we go to the very top, if I can, or we can see it here.

17:00.000 --> 17:04.000
The hottest method is the game-h-generated code.

17:04.000 --> 17:07.000
But then something is really different, it starts to happen.

17:07.000 --> 17:10.000
It gets to here. That's where things start to run.

17:10.000 --> 17:15.000
And it starts one time, two times, three times, four times.

17:15.000 --> 17:21.000
So what you see here is something the p-year does that hotspot can do today.

17:21.000 --> 17:26.000
P-year can unroll a loop that is not counted.

17:26.000 --> 17:32.000
A loop that is checks a bulletin value of whether it's done. It can unroll it.

17:32.000 --> 17:39.000
Something that, from what I'm talking to my engineer fellow team members,

17:39.000 --> 17:44.000
I've been opening the gate in, the gate kind of does that hotspot yet.

17:44.000 --> 17:51.000
But still, p-year still keeps most of the original features, for example.

17:51.000 --> 17:58.000
This line here that you see there, that's essentially extracting the code of field out of a string

17:58.000 --> 18:00.000
and checking if it's Latino.

18:00.000 --> 18:04.000
How do I know the C field of a string is the coder?

18:04.000 --> 18:09.000
Well, I can look at, I can just pause to look at the string structure.

18:10.000 --> 18:14.000
And we see the coder is in field 12, so I'm field C.

18:14.000 --> 18:21.000
So this thing here is basically reading the coder, okay?

18:21.000 --> 18:28.000
This field number four, that's the byte array.

18:28.000 --> 18:36.000
Then this field four, we put it into array, and obviously, we had the before I go to the byte array,

18:37.000 --> 18:44.000
the coder goes into EBP, test BPL, does basically change in whether the coder is Latino or not.

18:44.000 --> 18:59.000
Then array R28 is basically then here, where are you going?

19:00.000 --> 19:03.000
That's the byte array, the rates.

19:03.000 --> 19:13.000
And then here is where we then eventually extract the chart out of that.

19:13.000 --> 19:18.000
If R13 is this one, so we extracted the array,

19:18.000 --> 19:22.000
basically we're taking the byte array value out of the string,

19:22.000 --> 19:28.000
putting it into R13, and here in the R13, we basically extract the index, RBP index.

19:28.000 --> 19:32.000
So we see that the structure of the coder is still pretty much the same.

19:32.000 --> 19:37.000
Still, the performance is slightly better.

19:37.000 --> 19:42.000
This is where we are at this stage of this investigation.

19:42.000 --> 19:46.000
Obviously, we're going to do more investigation on this to understand how the,

19:47.000 --> 19:51.000
these are rolling works, so if they are rolling, it's the reason why the performance increase.

19:51.000 --> 19:52.000
Yep.

19:52.000 --> 19:57.000
Still, it means that the ground has optimized the benchmark,

19:57.000 --> 20:02.000
but not the coder, not the coder, not the coder.

20:02.000 --> 20:05.000
Well, this is the implementation of the character.

20:05.000 --> 20:09.000
Yeah, but the message to optimize this, the JMH generated message.

20:09.000 --> 20:10.000
Yeah.

20:10.000 --> 20:13.000
And it has optimized a loop in that message.

20:13.000 --> 20:18.000
So that the coder's password, the character, not the coder.

20:18.000 --> 20:20.000
Yeah.

20:20.000 --> 20:24.000
And that's where you can add the compiler control annotations to tell you to never in line.

20:24.000 --> 20:29.000
And that's what I mean, the process of adding, I was starting in as we are no way.

20:29.000 --> 20:37.000
See, the reason why the JMH does the predicted is because we know that the compiler's regime is starting

20:38.000 --> 20:39.000
Yeah.

20:39.000 --> 20:41.000
But it is something that is interesting to know.

20:41.000 --> 20:45.000
It's something that we need to understand why they are rolling,

20:45.000 --> 20:48.000
but be something that makes things faster or not.

20:48.000 --> 20:50.000
Still, it's something valuable.

20:50.000 --> 20:51.000
Lessons to learn.

20:51.000 --> 20:53.000
I feel out of that.

20:53.000 --> 20:58.000
Yeah.

20:58.000 --> 21:03.000
Obviously, things that are still in progress.

21:03.000 --> 21:08.000
This is basically, as you can see, this is the first time we're speaking about this.

21:08.000 --> 21:12.000
So, this is still working progress, but we learn in things.

21:12.000 --> 21:15.000
So, that's all I really had today.

21:15.000 --> 21:19.000
There's my slides, I've got things that I went through today,

21:19.000 --> 21:23.000
a longer and some details on the assembly as well, more details.

21:23.000 --> 21:28.000
And I want to leave you with this slide, where you can see this way.

21:28.000 --> 21:31.000
We're just saying, I'm finishing a couple of links.

21:31.000 --> 21:36.000
The first one is that's the report where I've been where you can find the JMH extension

21:36.000 --> 21:39.000
to do what I've been doing today.

21:39.000 --> 21:42.000
We don't get half a release of it.

21:42.000 --> 21:46.000
We're trying to figure out the license has been already agreed.

21:46.000 --> 21:50.000
We're using the single license at JMH, because obviously it's heavily rely on the JMH,

21:50.000 --> 21:53.000
which is GPL2 with crosspath.

21:53.000 --> 21:56.000
But we haven't done a Maven release, for example.

21:56.000 --> 21:58.000
But you can still check it out.

21:58.000 --> 22:01.000
You can build it and there are instructions on how to use it.

22:01.000 --> 22:04.000
So, that's on this link.

22:04.000 --> 22:06.000
The other link is basically what I've done today.

22:06.000 --> 22:09.000
So, you can go through it on your own time.

22:09.000 --> 22:12.000
And with that, that's all I wanted to say.

22:12.000 --> 22:13.000
Questions, yeah?

22:13.000 --> 22:16.000
It's like how you launch the Maven release.

22:16.000 --> 22:18.000
But still, it's a Java Maven task.

22:18.000 --> 22:21.000
So, this is the magic of the JMH extension, or...

22:21.000 --> 22:23.000
Well, the thing is...

22:23.000 --> 22:29.000
The thing is, you need to understand,

22:29.000 --> 22:32.000
to understand what's going on to the JMH work.

22:32.000 --> 22:35.000
JMH work is not a single Java process.

22:35.000 --> 22:38.000
Normally, JMH has got a Java process,

22:38.000 --> 22:41.000
but then launches the benchmarks when you do it for it.

22:41.000 --> 22:45.000
There's no reason for that first benchmark to be native.

22:45.000 --> 22:49.000
What we've done is what we are launching is native.

22:49.000 --> 22:52.000
What do you mean is that you get the same experience

22:52.000 --> 22:55.000
as you get normally with the Java version of JMH,

22:55.000 --> 22:57.000
which is like launching a jar.

22:57.000 --> 23:00.000
But underneath, instead of launching a Java process,

23:00.000 --> 23:02.000
what I'm launching is a native process.

23:02.000 --> 23:04.000
That's the method in your JMH.

23:04.000 --> 23:05.000
Yeah, yeah.

23:05.000 --> 23:07.000
It's a part of the poem.xml.

23:07.000 --> 23:11.000
Basically, I make it show that basically the same thing works.

23:11.000 --> 23:13.000
Yeah.

23:13.000 --> 23:15.000
More questions?

23:15.000 --> 23:18.000
Thank you.

23:18.000 --> 23:19.000
Thanks for attending.