WEBVTT

00:00.000 --> 00:18.000
Okay, so all the few words about this small device, the compiler, starting with the quick introduction,

00:18.000 --> 00:24.000
what it is, the reason progress and our plans for the near future.

00:25.000 --> 00:32.000
So, this is a standard C compiler that supports all the relevant C standards from ancient NDC,

00:32.000 --> 00:37.000
and it's the 89 also called, also C90 to the current ISOC 2020.

00:37.000 --> 00:44.000
It's a few gaps of course here and there, but in general, nearly everything is there.

00:44.000 --> 00:48.000
And, as I said, everything relevant for so small devices is there.

00:49.000 --> 00:52.000
It can be a freestanding implementation that's what you'll typically use,

00:52.000 --> 00:55.000
it could be part of our student implementation.

00:55.000 --> 01:02.000
It's a supporting tool included, a sampler linker, and simulators for the architectures we target.

01:02.000 --> 01:07.000
It works on many host systems, or your typical, let's say,

01:07.000 --> 01:13.000
you're using the heart or new Linux or whatever it will work.

01:13.000 --> 01:18.000
And we target a lot of right-bed architectures, and have unusual optimizations that make sense for these architectures,

01:18.000 --> 01:22.000
that you won't find an LLVM or DCC.

01:22.000 --> 01:27.000
We just had a release a few days ago, while we usually do yearly releases,

01:27.000 --> 01:29.000
and they're often at the beginning of the year.

01:29.000 --> 01:33.000
The user base consists basically of two groups.

01:33.000 --> 01:36.000
And by the developers, we use it for small right-bed embedded systems,

01:36.000 --> 01:42.000
and right-row computing and right-row gaming people who use it well for older right-bed systems.

01:42.000 --> 01:46.000
And it's also often used by our team project.

01:46.000 --> 01:51.000
So it's at the ATK, the Gameboy Development Kit, or that kid SMS.

01:51.000 --> 01:55.000
My first on a background, however, got into it, it's also said that the long time ago,

01:55.000 --> 01:58.000
why didn't games, was a clicker vision, video game system,

01:58.000 --> 02:04.000
then got into some compiler, later also this little bit of games for Sega egg-bit systems.

02:04.000 --> 02:08.000
And since it was mentioned in the previous talk,

02:09.000 --> 02:13.000
LLVM also has a C-bag, and it's not, it's a bit outside to main project,

02:13.000 --> 02:16.000
but it's maintained by other people, and there are a few people who, for example,

02:16.000 --> 02:20.000
actually used Rust on the Gameboy, by using LLVM Rust front-end,

02:20.000 --> 02:24.000
compile to C-bag-end, and feed the resulting C-code into SDCC,

02:24.000 --> 02:28.000
and then run that on a Gameboy.

02:28.000 --> 02:33.000
Okay, these other architectures, these are for the first part,

02:33.000 --> 02:37.000
and there must be architectures relevant to data embedded systems,

02:37.000 --> 02:42.000
depending on what you're doing, some of these are also over there.

02:42.000 --> 02:48.000
The 6500 tool and its derivatives are also supported,

02:48.000 --> 02:53.000
that's a more recent part, that too many of you are interested in,

02:53.000 --> 02:56.000
but it's not something I personally worked on.

02:56.000 --> 03:02.000
However, we also have this huge family of Saturday related.

03:02.000 --> 03:07.000
The most original Saturday, these were used in the set-expectrum next.

03:07.000 --> 03:12.000
The set-one on the Saturday, these Saturdays, here's the 90,

03:12.000 --> 03:14.000
it's a relatively exotic one.

03:14.000 --> 03:16.000
It's super relatively old.

03:16.000 --> 03:20.000
It's an 80-3, it's a CPU used in the Gameboy,

03:20.000 --> 03:26.000
then there's the rabbit, with a still in relevant use for embedded systems,

03:26.000 --> 03:30.000
like the Saturday, but there's also some of them are also quite old,

03:30.000 --> 03:32.000
and there's an ASCII corporation,

03:32.000 --> 03:40.000
there are a tantrat, just in some Japanese-older systems.

03:40.000 --> 03:44.000
Okay, so let's introduce a project,

03:44.000 --> 03:48.000
the source that it's source for, since a long time ago,

03:48.000 --> 03:51.000
and it's still there, we have our ticket systems there,

03:51.000 --> 03:55.000
we have mailing lists, we have a repository,

03:55.000 --> 03:58.000
we have a Wiki for documentation,

03:59.000 --> 04:02.000
we have our own compile form,

04:02.000 --> 04:04.000
that's basically distributed,

04:04.000 --> 04:06.000
among the SSC developers,

04:06.000 --> 04:11.000
so every night, the current version is downloaded from our repository,

04:11.000 --> 04:13.000
gets compiled on a host system,

04:13.000 --> 04:17.000
and then compile the large number of small test programs,

04:17.000 --> 04:19.000
and executes it on someone similar,

04:19.000 --> 04:21.000
and we check that one.

04:21.000 --> 04:23.000
We run, we give it a correct result,

04:23.000 --> 04:25.000
and in some cases we check that one.

04:25.000 --> 04:29.000
And compile and get a correct error message as a line, where we wanted it.

04:29.000 --> 04:32.000
It's mostly a volunteer project,

04:32.000 --> 04:35.000
but it has actually received some external support,

04:35.000 --> 04:39.000
that comes mostly from the embedded applications side,

04:39.000 --> 04:42.000
so the prototype front and LLNet

04:42.000 --> 04:46.000
have actually given a monetary support for certain features,

04:46.000 --> 04:48.000
such as improving standard compliance,

04:48.000 --> 04:53.000
hardware vendors, usually if this support gives us hardware,

04:53.000 --> 04:56.000
or gives us extra documentation,

04:56.000 --> 04:58.000
and the DFD,

04:58.000 --> 05:01.000
that is special because the research organization,

05:01.000 --> 05:02.000
the German one,

05:02.000 --> 05:04.000
so if in compiler research,

05:04.000 --> 05:07.000
you have an idea for relevant algorithms,

05:07.000 --> 05:12.000
you probably want to do reference implementation for publishing,

05:12.000 --> 05:15.000
while they don't care if it's in,

05:15.000 --> 05:18.000
more, I mean,

05:18.000 --> 05:20.000
I've worked at a university,

05:20.000 --> 05:22.000
and I've done a compiler research,

05:22.000 --> 05:25.000
and the reference implementation is,

05:25.000 --> 05:26.000
in the study,

05:26.000 --> 05:27.000
the backend of SDCC,

05:27.000 --> 05:28.000
it's still good enough,

05:28.000 --> 05:30.000
so technically you can get even for retro computing,

05:30.000 --> 05:32.000
relevant things,

05:32.000 --> 05:32.900
get funding,

05:32.900 --> 05:35.000
software.

05:35.000 --> 05:38.000
Okay, so what's happened recently,

05:38.000 --> 05:41.000
while the union parameters and return types,

05:41.000 --> 05:43.000
the same thing, that was for long time,

05:43.000 --> 05:46.000
the main missing standard feature,

05:46.000 --> 05:49.000
we did that a few years ago,

05:49.000 --> 05:58.000
Then, of course, another big improvement was the recent ISOC 23 standard that we now support most of it.

05:58.000 --> 06:06.000
And a lot of it is relevant, particularly to such small devices, such as the big precise integer types or big fields of big precise integer types.

06:06.000 --> 06:12.000
It's really helped a lot with saving a few bits in your precious data memory.

06:13.000 --> 06:17.000
For example, a colleague of yours has one kilobyte of data in memory.

06:17.000 --> 06:23.000
If you want your game structure in there, you often want to use big fields.

06:23.000 --> 06:29.000
And with big precise integer types and big fields of big precise integer types,

06:29.000 --> 06:35.000
a lot of that can be done even better than with the old plane bit fields that we had in previous.

06:35.000 --> 06:50.000
And we actually already had a few bits of the future C2 and C2 by standard that we will see a ratified probability towards the end of this decade.

06:50.000 --> 06:54.000
Yeah, we have, oh, the 7th shouldn't be there.

06:54.000 --> 07:05.000
This is, of course, the 6502 and the R100 port, which is a set 80 derivative.

07:05.000 --> 07:14.000
But also subset of the instruction set of R1, a transcript of the subset of the set 200 and 80 instruction set.

07:14.000 --> 07:23.000
So if you happen to want to target the set 200 and 80, you can use the R800 port and get some more efficient code in particular for multiplications.

07:23.000 --> 07:29.000
Yeah, we also improved our optimizations.

07:29.000 --> 07:34.000
For example, bitrotations are reasonably common pattern in C code.

07:34.000 --> 07:37.000
Yeah, you have shift lift, left shift rate.

07:37.000 --> 07:39.000
So that's the sum of the two shifts amount.

07:39.000 --> 07:41.000
It's a size of the things that you're shifting.

07:41.000 --> 07:48.000
And then people either use binary or the at the two results when they want to rotate stuff.

07:48.000 --> 07:52.000
So we have some optimizations to recognize that because many at the same time optimizations for that.

07:52.000 --> 07:58.000
For example, this set 80 and that they're using the set X spec from next.

07:58.000 --> 08:08.000
So generalize constant propagation where we now can propagate information about the possible values of the variable throughout the program much better.

08:08.000 --> 08:17.000
So for some reason, it's clear from the code that let's say the lower 8 bits of some variable all where 0 we can use that information better.

08:17.000 --> 08:23.000
Then we were able to be formed and we've also changed the calling conventions and of course,

08:23.000 --> 08:29.000
upsetting everyone in the process because they had to adapt their hand with an assembler code.

08:29.000 --> 08:35.000
But still, I mean historically we've passed a lot of things just on the stack and we've made.

08:35.000 --> 08:41.000
We're making better use of registered parameters now to improve efficiency.

08:42.000 --> 08:44.000
So how does this reflect in code size?

08:44.000 --> 08:50.000
I've just put up a few small graphs how code size developed over the release.

08:50.000 --> 08:55.000
So I've used the set 80 since that's probably what many of you are interested in.

08:55.000 --> 09:01.000
And we see a general trend of code size going down over time.

09:01.000 --> 09:05.000
I mean, this is about 10 years of SSC development.

09:06.000 --> 09:19.000
Dries down a classic integer benchmark with a strong focus on string C standard library string operations, which is also the main criticism people put against it.

09:19.000 --> 09:28.000
Wets down technically a floating point benchmark 100 might not if you look at it like that it's not interesting, but it's these architectures typically don't have hardware floating point.

09:28.000 --> 09:32.000
And floating parent is implemented by doing a lot of bitshifting bitwise or, and so on.

09:32.000 --> 09:36.000
So exactly the stuff that everyone's doing and so small architectures.

09:36.000 --> 09:45.000
So floating point benchmark is actually reasonably good representation of what many people are doing even if they're not doing floating point.

09:45.000 --> 09:49.000
And here we see a similar situation for comma comma.

09:49.000 --> 09:57.000
As a benchmark introduced by people criticising Christo and for being too biased, then they came up with something even more biased.

09:58.000 --> 10:01.000
For the code size, like I'm using here, it's okay.

10:01.000 --> 10:06.000
But in the end, the benchmark does matrix multiplications in its inner loop.

10:06.000 --> 10:16.000
And if you are doing matrix multiplications somewhere on these small architectures, then the matrix multiplication in the end is all that matters for the speed set you get.

10:16.000 --> 10:22.000
But for the code size, what I'm using it here, it's kind of okay.

10:22.000 --> 10:25.000
Okay, so that's what happened so far.

10:25.000 --> 10:29.000
Let's see what we want to do with this DCC.

10:29.000 --> 10:30.000
And I hope we can do.

10:30.000 --> 10:31.000
We see.

10:31.000 --> 10:40.000
Okay, so in standard compliance, we want to complete this standard support mostly from the 99 to C23.

10:40.000 --> 10:43.000
We are still missing compound literals.

10:43.000 --> 10:47.000
Our attribute support is still incomplete.

10:47.000 --> 10:52.000
Our support for automixes incomplete, and we don't really support double and long double.

10:53.000 --> 11:00.000
Basically, if you use them, you get the warning saying that it's essentially a float in the background what you're getting.

11:00.000 --> 11:13.000
And of course, more of C2 by support, even though it's at this point, of course, still very unclear what C2 by will look like.

11:13.000 --> 11:19.000
So we will introduce additional parts to better support the rabbits.

11:19.000 --> 11:26.000
As I mentioned before, it's at 80 inspired architecture of microcontrollers.

11:26.000 --> 11:35.000
For the rabbits and it's at 80 we will include use far pointers, because those architectures are all still like this at 80 essentially.

11:35.000 --> 11:45.000
16-bit architectures, but they have some mechanisms to access data beyond 16 bits without going through explicit banks switching.

11:45.000 --> 11:49.000
But rather you have some special instructions that allow you to.

11:49.000 --> 11:52.000
Okay, here's a 24-bit address, there's a 16-bit address.

11:52.000 --> 11:56.000
Still a lot less efficient than you operating within your 16-bit address space.

11:56.000 --> 12:03.000
But a bit more efficient than banks switching, so we are introducing a named address space to do that.

12:03.000 --> 12:11.000
And all the further satellite related targets we want to introduce pointers into the IO address space.

12:11.000 --> 12:15.000
Because on the satellite it's actually possible.

12:15.000 --> 12:20.000
I mean for the game boy, of course not relevant, because the IO is mapped into the normal address space.

12:20.000 --> 12:28.000
And you can just use normal pointers, you want volatile tile qualified target there.

12:28.000 --> 12:36.000
If you're doing IO, but for a normal satellite, it's also programmed in the same plan out, especially our instructions.

12:36.000 --> 12:42.000
But its address can be used from register so it's possible to implement this as pointers.

12:42.000 --> 12:57.000
And we also want to introduce an improved optimizations, because of course, on those motivations you're constrained in terms of memory, both program memory and data memory.

12:57.000 --> 13:00.000
Something users have been asking for a long time.

13:00.000 --> 13:03.000
It's a link to the elimination of unused objects and functions.

13:03.000 --> 13:08.000
What's the reason some users love to put functions into their C core source code?

13:08.000 --> 13:13.000
They don't actually use and then expect us to optimize them out.

13:13.000 --> 13:17.000
Then currently we can only do that if it's in a library.

13:17.000 --> 13:20.000
So unused library functions, okay, that works.

13:20.000 --> 13:26.000
But normal object files are treated differently, everything is normal object files end up in the final binary.

13:26.000 --> 13:30.000
So we hope we'll get to that too.

13:30.000 --> 13:32.000
We have to optimize that too.

13:32.000 --> 13:37.000
When a spilt local variable is in the non-stack memory, that's not relevant to the satellite.

13:37.000 --> 13:40.000
Because it's at the ATS reasonably efficient stack access.

13:40.000 --> 13:49.000
But for the 6502, for example, as DCC by default puts local variables at fixed addresses, which means functions are non-reentrant.

13:49.000 --> 13:52.000
By default unless you tell the compiler you want it to be reentrant.

13:52.000 --> 13:58.000
That's not standard compliant, but stack access is less efficient on such architectures.

13:58.000 --> 14:02.000
So most people prefer efficient access rather than reentrantcy.

14:02.000 --> 14:11.000
And currently these local variables are not allocated as memory efficiently as we are doing it on the stack.

14:11.000 --> 14:18.000
And for the satellite related targets, we also want to handle global variable's better.

14:18.000 --> 14:25.000
Because a lot of the work, relevant into dealing with local variables, register allocation, and stack staff.

14:25.000 --> 14:29.000
While the global variance also has a very rough, very risk-based query.

14:29.000 --> 14:31.000
Okay, can we point i. Why do it?

14:31.000 --> 14:33.000
Well, let's do that.

14:33.000 --> 14:36.000
Well, if not, then we point hL to it.

14:36.000 --> 14:43.000
And so on, what in reality it would often be efficient to choose depending on our operant type,

14:43.000 --> 14:50.000
depending on the value of the registers, and so on, to use iY or hL, or even direct addressing modes in the satellite.

14:50.000 --> 14:57.000
Typically direct addressing modes will be faster, if it's actually just an eight bit operant.

14:57.000 --> 15:02.000
And actually that's the end of my talk today, and maybe we'll even have time for questions.

15:02.000 --> 15:03.000
No.

15:03.000 --> 15:12.000
Thank you.

15:12.000 --> 15:20.000
So your set-up of architectures have all types of DNA variants, but the answer starts.

15:20.000 --> 15:23.000
The 8080 and 85 is what we do.

15:23.000 --> 15:25.000
Is that for technical reasons?

15:25.000 --> 15:27.000
Well, that was no one did it.

15:27.000 --> 15:34.000
A pattern part, there was an attempt to make an 8080 part, but I think it never got finished.

15:34.000 --> 15:40.000
And then, of course, when you have the satellite, it's usually easy out to support an extension of the satellite.

15:40.000 --> 15:45.000
Because while you trust there can be special cases, I'm using the additional instructions.

15:45.000 --> 15:50.000
And you get correct code anyway, even when the new ones are not used yet, while leaving out is harder.

15:50.000 --> 15:56.000
In that sense, the gameboy, which is a restricted satellite variant, because it doesn't have the index vectors,

15:56.000 --> 16:04.000
because the satellite, in other reasons, traction set, was definitely harder to add support for the end of those extensions of the satellite.

16:04.000 --> 16:22.000
How easy it is to support the complete new architecture?

16:22.000 --> 16:29.000
Well, it was actually clear to us that depends on the architecture.

16:29.000 --> 16:36.000
Now, basically, for support, we need to complete the new one, not related to an existing one.

16:36.000 --> 16:43.000
You need to write your code generation state, and you probably also want people, optimizer rules,

16:43.000 --> 16:50.000
so it's optional, but there's an optimisation, people will expect, especially in people who are experienced,

16:50.000 --> 16:52.000
which is simply programming.

16:52.000 --> 17:02.000
And, of course, there are certain rules, depending on the architecture, if it makes more sense to make an SDCC or LLVM versus GCC background.