WEBVTT

00:00.000 --> 00:27.000
Okay, all right, so thanks everyone, I'm Stefano Stefano Fully and I'm just going to give you a very quick rundown of the work that we've done at the open source initiative for the past couple of years that allow us to reach through the conclusion at one conclusion of what the open source means in the ice space.

00:27.000 --> 00:36.000
And we started when we noticed that we were having issues with automatic decision-making systems that were making decisions and they were wrong.

00:36.000 --> 00:55.000
Like this was one of the cases a couple of years ago, the father was nothing a picture of his naked child to send it to his doctor to say, hey, what's going on with my kid and the system caught that image and recognize it and flagged it correctly to some extent as a child nudity.

00:55.000 --> 01:00.000
But also immediately triggered the child abuse process.

01:00.000 --> 01:06.000
And that was obviously, it was the case of one automatic tool making the wrong decision.

01:06.000 --> 01:10.000
Frankly, very visible for us in the technology space.

01:10.000 --> 01:14.000
We just stand in that corner because we can't run.

01:14.000 --> 01:18.000
Yes, let me move the laptop here.

01:18.000 --> 01:20.000
Okay, so I can see the slides actually.

01:20.000 --> 01:23.000
And so this is clearly a bug, right?

01:23.000 --> 01:31.000
It's we recognize as a bug, but the next question immediately was, how do you fix a bug in a system like this one?

01:31.000 --> 01:36.000
See, the machine learning systems that we were talking about that we are talking about.

01:36.000 --> 01:39.000
There are so popular in the past few years.

01:39.000 --> 01:44.000
They're not programmed. They're not written by humans. They're not deterministic. They don't.

01:44.000 --> 01:49.000
They don't have source code in the same sense that we are used to.

01:49.000 --> 01:55.000
And in fact, they learn automatically. They look like black boxes to some extent.

01:55.000 --> 02:00.000
So we didn't have a question, an answer to the question like, how do you fix a bug you want?

02:00.000 --> 02:02.000
How do you go back and what do you ask for?

02:02.000 --> 02:09.000
Because in the software space, especially in the early years of advocacy of free software inside the European Union,

02:09.000 --> 02:13.000
when we talk to public administrators, we're always saying, give us a source code.

02:13.000 --> 02:17.000
Ask for the source code. What is it that we need to do?

02:17.000 --> 02:22.000
We need to take a step back and we need to very quickly.

02:22.000 --> 02:35.000
Because there were three major forces pushing for a sense of urgency on us to coordinate actions to come up with a definition of how to fix this issue.

02:35.000 --> 02:39.000
First, the technology was different.

02:39.000 --> 02:42.000
It's different from what we are used to.

02:42.000 --> 02:49.000
Then there is a lot of pressure from market actors.

02:49.000 --> 02:56.000
They are trying to use the term open source because we have been so good that make it successful and valuable,

02:56.000 --> 02:59.000
so that everyone wants to use it.

02:59.000 --> 03:05.000
Even if it's applied to a new domain and there is violating some of the fundamental principles,

03:05.000 --> 03:08.000
at least from the face value that we recognize a few years ago,

03:08.000 --> 03:21.000
and then regulators were coming very quickly asking for clarifications and writing laws without having the answers or at least authority demands.

03:21.000 --> 03:26.000
So a year ago, we had a very clear idea of what an open source AI looks like.

03:26.000 --> 03:32.000
It's not really that difficult to expect that one needs to be able to use study,

03:32.000 --> 03:35.000
modify and share whatever they have to see.

03:35.000 --> 03:38.000
Whatever system that they have in their hands.

03:38.000 --> 03:40.000
But there is a piece missing.

03:40.000 --> 03:47.000
In familiar with the concept of the three software definition,

03:47.000 --> 03:53.000
they will have recognized that next to the freedoms to use and to study and modify,

03:53.000 --> 03:59.000
there is a short sentence saying you need to have access to source code.

03:59.000 --> 04:01.000
This is a precondition for that.

04:01.000 --> 04:06.000
And remember, we don't know what source code really means in the context of machine learning.

04:06.000 --> 04:17.000
The open source definition point number two also has reference to source code as a mandatory component of a program in source code.

04:17.000 --> 04:23.000
In that point number two of the OST says you must be the,

04:23.000 --> 04:28.000
the must be the preferred form used by a programmer to modify the program.

04:28.000 --> 04:30.000
So we can not be obfuscated.

04:30.000 --> 04:37.000
So we needed to understand how what is the preferred form used by builders of AI to modify the program.

04:37.000 --> 04:39.000
So with the system of the AI.

04:39.000 --> 04:42.000
So we put together a large collaboration.

04:42.000 --> 04:47.000
Over a hundred people have collaborated from 27 nations.

04:47.000 --> 04:51.000
33% from the global south.

04:51.000 --> 04:57.000
Over 50 volunteers have followed the co-design process.

04:58.000 --> 05:03.000
That started by defining what is that we were talking about, which is an AI system.

05:03.000 --> 05:08.000
Using the OECD definition, the AI system is something to make it short.

05:08.000 --> 05:15.000
It's anything that gives in first and output based on input in various degrees of independent.

05:15.000 --> 05:18.000
So we analyzed systems.

05:18.000 --> 05:20.000
First we started with five.

05:20.000 --> 05:25.000
We asked them to, we asked this volunteers, all experts, builders of AI,

05:25.000 --> 05:30.000
ethicists, philosophers, data scientists, etc.

05:30.000 --> 05:36.000
To answer the question, what are the components of the systems that you need in order to use study modifying share.

05:36.000 --> 05:39.000
They evaluated, they gave us an answer.

05:39.000 --> 05:45.000
We drafted a longer, complete definition of what that preferred form of making a modification is.

05:45.000 --> 05:48.000
We expanded in a validation phase.

05:48.000 --> 05:51.000
We analyzed 13 systems at this point.

05:51.000 --> 05:56.000
Then try to see which ones were matching the definition which ones were not.

05:56.000 --> 06:01.000
So we were trying to find whether we would catch some irregularities.

06:01.000 --> 06:04.000
Clearly, something like Lama that is not respecting freedom.

06:04.000 --> 06:08.000
That is not releasing information about the training, etc.

06:08.000 --> 06:10.000
What was going on?

06:11.000 --> 06:14.000
In fact, that we did not pass the validation phase.

06:14.000 --> 06:21.000
Then we would release a fine-tuned detect and finally at the end of October last year we released version one.

06:21.000 --> 06:24.000
The version one of the definition is not surprising.

06:24.000 --> 06:26.000
We should not be surprising any of you.

06:26.000 --> 06:29.000
The four freedoms have to be conveyed.

06:29.000 --> 06:34.000
That means that you must require access to the weights parameters.

06:34.000 --> 06:35.000
So there is also the training.

06:35.000 --> 06:37.000
Everything that went into that training.

06:37.000 --> 06:42.000
So the code that used for the training algorithms, etc.

06:42.000 --> 06:46.000
The machines and all the code used to build the training data set.

06:46.000 --> 06:49.000
Complete so that you can rebuild the data set.

06:49.000 --> 06:57.000
And finally, all the data used to train the system unless there is a footnote when legally impossible.

06:57.000 --> 07:00.000
I want you to focus on this.

07:00.000 --> 07:11.000
It is requiring the model weights, the code, complete that built the machine and the data unless it legally impossible.

07:11.000 --> 07:14.000
This is what is written in the definition.

07:14.000 --> 07:23.000
Do you have everything that you require to rebuild something that works that way without any restrictions?

07:23.000 --> 07:33.000
Now, there are some actors in the market that are really pushing for saying, well, AI is different.

07:33.000 --> 07:38.000
So we need to accept the fact that you will have to live with conditions.

07:38.000 --> 07:44.000
Every time these machines are so powerful, they are dangerous, they can kill us all and there is this rhetoric going around.

07:44.000 --> 07:47.000
They want us to accept that there are restrictions.

07:47.000 --> 07:50.000
Some restrictions are even inevitable.

07:50.000 --> 07:52.000
And that's frankly, that's a trap.

07:52.000 --> 07:55.000
This is not something we want to think about.

07:55.000 --> 08:06.000
And we're not against the fact that once a system that is powerful and capable of breaking things, that should be regulated on the deployment.

08:06.000 --> 08:16.000
But we cannot really expect that we turn developers into police or other actors going around and checking or evaluating a little where.

08:16.000 --> 08:21.000
And the deployment is what needs to be regulated, not the development.

08:22.000 --> 08:28.000
And they're also insisting on this idea that there are degrees of freedom.

08:28.000 --> 08:37.000
And that's like telling someone who's imprisoned, that they have freedom because they're not shackled to a wall or they can take an hour playing in the yard.

08:37.000 --> 08:40.000
Like, there's no freedom until you're out of the gate.

08:40.000 --> 08:43.000
And so there are degrees of restrictions for sure.

08:43.000 --> 08:45.000
There is an open source gate.

08:45.000 --> 08:48.000
Once you pass that gate, you have an explosion of freedom.

08:48.000 --> 08:51.000
None, not before that.

08:51.000 --> 08:58.000
So we have validated, that's another thing that we have done that I want you to focus on.

08:58.000 --> 09:11.000
The open source definition of AI version one that we have released comes with a list of systems that are then evaluated during the process by the volunteers in the 100 plus code designers.

09:11.000 --> 09:21.000
And the ones that pass are only the ones that are developed by research institutions, nonprofit groups and organizations.

09:21.000 --> 09:28.000
And the ones that don't usually are developed by corporations with commercial interests.

09:28.000 --> 09:38.000
And they sometimes release some pieces, some of these clothes, some of the materials, but not all of that's required.

09:38.000 --> 09:46.000
Now, you may have heard some of the criticisms, like, why is the sensational source, why are you calling it open source?

09:46.000 --> 09:53.000
You have why is the open source initiative stretching into machine learning since clearly it's not software.

09:53.000 --> 09:57.000
So why are you even going there? Why did you do this process this way?

09:57.000 --> 10:00.000
I can speak or I didn't know about it.

10:00.000 --> 10:02.000
These are all fair criticisms.

10:02.000 --> 10:17.000
I have addressed some of them at the very beginning, but the main thing here is if not us, if not now, if not in this process, like who, how, when should have that happened.

10:17.000 --> 10:25.000
And the other comment that you have read is why is that footnote?

10:25.000 --> 10:54.000
And I get it. Why is there a footnote that says, unless you're legally impossible? Well, it's because why will we want to force something that is not legal or put ourselves into a corner where only open data like open data is really a smaller, open data the way we think about it is only a smaller fraction of the possible available data.

10:54.000 --> 11:06.000
And other organizations like three software foundations of the freedom of conservancy that have been criticizing us and commenting on this, they haven't come up with a solution either.

11:06.000 --> 11:23.000
So we have noticed that the possibilities here are either going with a meta that is running TV ads, telling us that Lama is open source and open source AI, or we want to go with a definition that has had support.

11:23.000 --> 11:37.000
A wide collaboration happening over a longer period of time and that is open also for evolution as the technology and the legislation go.

11:37.000 --> 11:49.000
Now, there is another criticism that we have received and that it's too restrictive, like you may have noticed only the, it's a very, it puts a very high bar into the requirements.

11:49.000 --> 12:17.000
Some of the corporations have been arguing that it's not really clear whether it's not really clear or they they argue against the fact that the during the process during the evaluation process and the validation process we have engaged mostly with academics and the corporations did not have, although they were following the process that did not feel like they had the time to intervene without.

12:17.000 --> 12:34.000
And giving authoritative comments from the corporation's perspective. So this is another area where we want to, we want to think about collect the feedback and think about the evolution of the definition towards version two in in the future.

12:34.000 --> 12:55.000
Today what you can do is to go on open source org slash AI we have published everything that is related to the process how we got to the conclusion there is a very detailed report that was sent to the board about who contributed how it's very long document there is a very long paper research paper on data.

12:55.000 --> 13:09.000
Specifically that will it's over 30 pages of research material that you can that you can refer to to understand what the issues are when we talk about petabytes of data content that goes into training.

13:09.000 --> 13:30.000
And the other thing that we ask you to do today is to help us correct Lama or medias authorities like Mark Zuckerberg and Lekun to correct them when they say that Lama is open source because clearly it's not it doesn't even go into the debate whether it's an open source AI it's just not an open source anything that they give us.

13:30.000 --> 13:37.000
And that's creating really problems for the whole ecosystem. And that's it.

13:40.000 --> 13:47.000
Do we have questions or.

13:47.000 --> 13:57.000
Oh, sorry.

13:57.000 --> 14:15.000
Yeah.

14:15.000 --> 14:26.000
Check the check the rest of it and check the code requirements the whole thing right and you need to read the whole thing in one go and and internalize the fact that that's what is.

14:26.000 --> 14:29.000
That's what is requiring.

14:29.000 --> 14:46.000
In practice once you go and you get into the code requirements says you need to be able first of all it needs to be able to rebuild the data set it says that data information and the code says it needs to be complete so that you can rebuild the data set.

14:46.000 --> 14:54.000
Yes, yes, we ran out of time in October we needed to release version one.

