WEBVTT

00:00.000 --> 00:29.000
Thank you so much. Good afternoon everyone. Thank you for being here on a Saturday at 5 p.m.

00:29.000 --> 00:36.000
Today I will present some ongoing work from my PhD. I tried to give it a bit of a title that draws attention.

00:36.000 --> 00:43.000
Let me know if you think the title is completely bogus and I will put you in the acknowledgments of my next paper.

00:43.000 --> 00:52.000
So what I study is data donation and as part of my ongoing research, I get to talk to lots of people about how they see data.

00:52.000 --> 00:56.000
Most of the time you hear things that you're probably already well aware of.

00:56.000 --> 01:02.000
Like, data is the new oil, data is like the sun, but one that I particularly like is data is the new soil.

01:02.000 --> 01:08.000
I like soil because it has something generative about it, something natural, something that we share, something that's just there.

01:08.000 --> 01:17.000
And when I put data is the new soil in Dali 3, because I also like to play with General to the AI, it came out with this beautiful picture.

01:18.000 --> 01:25.000
I find it very ironic because what I study is voluntary data sharing that people share for a public interest purpose.

01:25.000 --> 01:34.000
And as we all know how this machine was fed wasn't, you know, with the same type of data that I am trying to learn more about.

01:34.000 --> 01:36.000
So hopefully that sets the scene.

01:36.000 --> 01:43.000
But also, before we dive into the content, I would like to introduce myself a bit more. It's my first time at full stem.

01:43.000 --> 01:44.000
I first time.

01:44.000 --> 01:46.000
Oh yeah, okay fun.

01:46.000 --> 01:55.000
I'm a PhD candidate in innovation studies at the Copernicus Institute of Sustainable Development at Etrecht University in the Netherlands.

01:55.000 --> 02:00.000
And what we try to do is find out how data can help make the world a better place.

02:00.000 --> 02:10.000
And in that sense, I'm looking at data donation as a method for us all getting more engaged in science and also finding new solutions based on what we already know already have.

02:11.000 --> 02:15.000
My academic background is in economics and financial law.

02:15.000 --> 02:18.000
Actually, financial specialized in insolvency.

02:18.000 --> 02:24.000
This was about, I've always been very interested in this idea of failure and what we do after.

02:24.000 --> 02:31.000
After my master's at the study to work a bit, I worked in the construction industry working on public private financing.

02:31.000 --> 02:36.000
So we have to think of our bridges who's going to do the insurance for that, very rewarding work.

02:36.000 --> 02:41.000
If anyone was ever interested in doing something other than foster them related things.

02:41.000 --> 02:51.000
And afterwards I started teaching tax law at the University of Plight Sciences in Amsterdam where I was involved in a minor called the internet is broken and we're going to repair it.

02:51.000 --> 02:54.000
There's a bunch of different students from different disciplines.

02:54.000 --> 02:59.000
And we're just seeing, okay, what is wrong about the internet and what can we do to fix it?

02:59.000 --> 03:06.000
So returning back to this idea of, okay, repair, repair, repair, I want to do something with my PhD.

03:06.000 --> 03:09.000
I am a bit of a sentimental person I have to admit.

03:09.000 --> 03:14.000
Also today is officially the second anniversary of my PhD.

03:14.000 --> 03:16.000
I started two years ago, I'm halfway.

03:16.000 --> 03:19.000
So for me, it's also like looking back a week evening.

03:19.000 --> 03:22.000
So my laptop is also in need of repair.

03:22.000 --> 03:24.000
And I wasn't done with the slides.

03:24.000 --> 03:26.000
So I had to ask for a lot of help.

03:26.000 --> 03:27.000
Thank you, Victor.

03:27.000 --> 03:33.000
Because this is his laptop and I turned my PowerPoint, which is not free open source.

03:33.000 --> 03:38.000
So really, it is my first time at Boston into a PDF.

03:38.000 --> 03:48.000
And I also like to thank Zaz Akunitler, a good friend of mine who took the time to turn my PowerPoint into a PDF because I do not have a laptop.

03:48.000 --> 03:50.000
My laptop is in need of repair.

03:50.000 --> 03:53.000
My presentation was in need of repair.

03:53.000 --> 03:59.000
The way I'm seeing repair is really all the steps in between breakdown and the solution.

03:59.000 --> 04:03.000
But when we talk about digital repair, I would like to focus on some more common practices.

04:03.000 --> 04:06.000
Because when I was making the slides, I did not know that my laptop wouldn't work.

04:06.000 --> 04:08.000
So it's just something I'm putting in now.

04:08.000 --> 04:14.000
But you also think of Wikipedia, moderation or community-led software maintenance as sites of digital repair.

04:14.000 --> 04:16.000
Something goes wrong.

04:16.000 --> 04:20.000
People come together and try to fix it in one way or another.

04:21.000 --> 04:25.000
So how is data donation a site of repair?

04:25.000 --> 04:29.000
This is more of a working question, something I'm asking myself for this point.

04:29.000 --> 04:34.000
Because when you look at breakdown, what could breakdown mean in the case of data donation?

04:34.000 --> 04:36.000
In the case of the laptop, it is clear.

04:36.000 --> 04:39.000
It is not turning on when I put in the power charger.

04:39.000 --> 04:41.000
It doesn't come on with the right light.

04:41.000 --> 04:44.000
So it's very clear when it works and when it doesn't.

04:44.000 --> 04:48.000
When we look at data donations or the voluntary data sharing that I'm talking about,

04:49.000 --> 04:51.000
the breakdown I have identified so far.

04:51.000 --> 04:54.000
With a question mark, it's two fold.

04:54.000 --> 04:57.000
First of all, it's the lack of access when it comes to data sharing.

04:57.000 --> 05:01.000
So there are a few large pages which I will not need.

05:01.000 --> 05:08.000
Which have more control and access to data than other parties may have.

05:08.000 --> 05:11.000
And also there's a case of mistrust.

05:11.000 --> 05:15.000
Who do we share with our personal data more specifically?

05:15.000 --> 05:16.000
To repair.

05:16.000 --> 05:19.000
There are different ways to dealing with this problem.

05:19.000 --> 05:24.000
And one of them is building alternative infrastructures.

05:24.000 --> 05:28.000
And another one is to formalize trust through policy.

05:28.000 --> 05:34.000
To think of new logos, new standards, new protocols, make sure that now all of a sudden there.

05:34.000 --> 05:36.000
This was called the API age.

05:36.000 --> 05:40.000
However, due to either exorbitant costs or just straightforward,

05:40.000 --> 05:41.000
it not being allowed anymore.

05:41.000 --> 05:44.000
This API is not being shared with researchers anymore.

05:44.000 --> 05:51.000
We have now entered the post API age and we need to find different ways of accessing the same data.

05:51.000 --> 05:58.000
In the EU, we have the GDPR which really allows us all to ask for our personal data.

05:58.000 --> 06:02.000
And these researchers have gone on with it to see, okay, what can it do?

06:02.000 --> 06:06.000
What do you actually need from the law in order for our research tools to work?

06:06.000 --> 06:10.000
But also what are the alternatives to APIs if it's not data donation?

06:10.000 --> 06:13.000
So for example, screen tracking.

06:13.000 --> 06:17.000
When we look at port that's after this on their websites, very lovely.

06:17.000 --> 06:19.000
It gives you an idea of how it works.

06:19.000 --> 06:20.000
I'll just walk you through it.

06:20.000 --> 06:23.000
So first, you need informed consent of the participant.

06:23.000 --> 06:27.000
So the so called data donor that could be any of us who has an account somewhere.

06:27.000 --> 06:32.000
You can just go to the website and request your personal data and come in a nice little package.

06:32.000 --> 06:35.000
Then this data is what they call digital trace data,

06:35.000 --> 06:39.000
because everything you do online on these platforms, they just keep track of it.

06:39.000 --> 06:43.000
You first stored locally on your own device.

06:43.000 --> 06:44.000
That could be your phone.

06:44.000 --> 06:45.000
It could be your laptop.

06:45.000 --> 06:51.000
And then using port you can select which data points you would like to share with the researchers.

06:51.000 --> 06:57.000
And once you've shared it, the researcher has it and they can work with it themselves.

06:57.000 --> 07:01.000
There are some nice examples of what port can do.

07:01.000 --> 07:03.000
These are just four of them.

07:03.000 --> 07:07.000
If you go to the website data donation EU, you can find many more examples.

07:07.000 --> 07:11.000
For example, you can see what is going on when people watch Netflix.

07:11.000 --> 07:14.000
Netflix is famous for not exactly sharing their viewership numbers.

07:14.000 --> 07:16.000
It's anything but open.

07:16.000 --> 07:18.000
But if you get lots of people to share the data,

07:18.000 --> 07:22.000
we can actually get a sense of what's going on on these platforms.

07:22.000 --> 07:26.000
One that I also really like is the mapping the digital food environment using YouTube.

07:26.000 --> 07:28.000
I thought that was very original and creative.

07:28.000 --> 07:32.000
Something that didn't expect to find with data donation research.

07:32.000 --> 07:34.000
But there's a lot going on in this field.

07:34.000 --> 07:38.000
But there's also something that we can all participate in if we like to.

07:38.000 --> 07:43.000
Because I assume most of us have an account at one platform or another.

07:43.000 --> 07:50.000
So you can always check out which projects are going on and where you would like to donate your data.

07:50.000 --> 07:55.000
However, the research also identified that this isn't going perfectly.

07:55.000 --> 08:01.000
And I guess it might be particularly relevant for them to find out where the issues are when it comes to.

08:02.000 --> 08:05.000
The type of data packages that we get.

08:05.000 --> 08:10.000
So we see, for example, that most of the issues come from incomplete data packages.

08:10.000 --> 08:12.000
So some information is missing.

08:12.000 --> 08:17.000
And this directly comes from the platform provided because they are not sharing it.

08:17.000 --> 08:20.000
This might not have been an issue if we were not aware of it.

08:20.000 --> 08:30.000
So as data donation researchers are doing this work to replace a way of doing research that has been blocked by platforms.

08:30.000 --> 08:38.000
We're still finding out the platforms by not doing everything they're supposed to do players and actors to ensure there's some trust and transparency going on.

08:38.000 --> 08:41.000
This is something I mapped out in the previous paper.

08:41.000 --> 08:47.000
You can recognize the data altruism organization because they have one of those logos up there.

08:47.000 --> 08:48.000
It's very recognizable.

08:48.000 --> 08:53.000
So far, there's only one organization that has registered.

08:53.000 --> 09:01.000
But anyone can register as long as it's a non-profit that has a sole aim of being an intermediary for data for some public interest purpose.

09:01.000 --> 09:06.000
The public interest purposes have to be aligned with some national interest.

09:06.000 --> 09:13.000
But as you can see, the first one that I've turned a little circle around it is for scientific research.

09:13.000 --> 09:18.000
So this also great opportunity for those who are interested in opening up research a bit more.

09:18.000 --> 09:25.000
To make sure that it's attractive for other people and that people know that it's trustworthy and it's transparent.

09:25.000 --> 09:31.000
It also has the compliance and trust from the EU.

09:31.000 --> 09:38.000
The first one, the first data altruism organization we have is data log and no one is based in Spain.

09:38.000 --> 09:46.000
What data log does is they try to make sure that users get to know more about how they use their data.

09:46.000 --> 09:55.000
So people in Barcelona can give their household data, so energy, water consumption, all that type of data.

09:55.000 --> 10:04.000
To date a log, they in return to get a personalized report saying, okay, this is how your use differs or could be better managed.

10:04.000 --> 10:10.000
And then they use that data for all other types of scientific purposes.

10:10.000 --> 10:14.000
And the last one we have is the social economy code of conduct.

10:14.000 --> 10:20.000
So what we have here is a project that I was involved in. This is me sometimes I wear a suit.

10:20.000 --> 10:27.000
And we were working on a code of conduct together with all these different social economy actors to find out what's going on.

10:27.000 --> 10:30.000
That we did a lot of what we call articulation work within repair studies.

10:30.000 --> 10:34.000
So making very explicit what is going on, what are we trying to do?

10:34.000 --> 10:40.000
We made a list of values, we made a checklist for all these organizations and potential data donors.

10:40.000 --> 10:46.000
But also we made a list of 30 examples of best practices or organizations that are doing well.

10:46.000 --> 10:51.000
Some of them are even here today at Boston such as open food facts.

10:51.000 --> 10:54.000
So this isn't anything new.

10:54.000 --> 11:00.000
But when we were doing all of this work, we weren't talking to any developers in the previous presentation.

11:00.000 --> 11:04.000
We also have different circles and they don't really overlap.

11:04.000 --> 11:13.000
But I'm trying to say right here, there's lots going on in the EU and there are lots of participatory activities and co-creation sessions.

11:13.000 --> 11:19.000
So if there's anyone here who has any idea on how to make sure that openness is also guaranteed in the systems.

11:19.000 --> 11:23.000
Feel free to join. Feel free to talk to me after the session.

11:23.000 --> 11:26.000
And we can see what's possible.

11:27.000 --> 11:30.000
So that brings me to the end of my presentation.

11:30.000 --> 11:36.000
I have some discussion questions, but these are things that I find interesting because once again I'm working on my PhDs.

11:36.000 --> 11:38.000
If you'd like to think along, that would be great.

11:38.000 --> 11:43.000
But if you have any other questions right now, I will be very happy to answer them.

11:43.000 --> 11:44.000
Thank you.

11:44.000 --> 11:51.000
Yes.

11:51.000 --> 11:55.000
I used the early parts, like I've noticed, because I apologize.

11:55.000 --> 11:59.000
It only was a bit of a donation available to me.

11:59.000 --> 12:02.000
What should be on the common monopoly data spaces?

12:02.000 --> 12:05.000
And that's making available a lot of them.

12:05.000 --> 12:08.000
They're particularly hard and they use.

12:08.000 --> 12:17.000
And the AI also impregnages the people out of development and data storage centers to be able to use that.

12:17.000 --> 12:23.000
This done, the collaboration with you are like, how do you work with that?

12:23.000 --> 12:25.000
Repeat the question.

12:25.000 --> 12:26.000
Thank you so much.

12:26.000 --> 12:31.000
Sir was asking, what is the relationship between this whole data donation data

12:31.000 --> 12:35.000
and ongoing effort for the European data spaces?

12:35.000 --> 12:38.000
When we look at, for example, this code of conduct.

12:38.000 --> 12:40.000
There's a lot of ongoing work right now.

12:40.000 --> 12:43.000
So it isn't very clear how everything relates to each other.

12:43.000 --> 12:46.000
Actually, we're in the process of laying all of that down.

12:46.000 --> 12:49.000
So here in this document, we do try to make a link between that,

12:49.000 --> 12:55.000
especially promoting this work at the European Data Space Support Center.

12:55.000 --> 12:59.000
But because it's at this stage where lots of things are being formed,

12:59.000 --> 13:09.000
it's actually quite attractive if you have any ideas to also just send them an email.

13:09.000 --> 13:18.000
Are there any open or ongoing projects which aim to make it easier for the donation data

13:18.000 --> 13:24.000
basically to make it easier to donate the data just having an online questionnaire

13:24.000 --> 13:28.000
or what's your name, which is where you live in.

13:28.000 --> 13:37.000
Do you rent your own apartment or whatever, just for them to identify your accounts for you

13:37.000 --> 13:44.000
and then asking for you for that data and thus allowing them to automatically download

13:44.000 --> 13:48.000
based on a series of simple questions rather than going through.

13:48.000 --> 13:56.000
It's easy for us to ask each individual potential data over what the data is that they hold

13:56.000 --> 13:59.000
and then forwarding it and actually.

13:59.000 --> 14:04.000
So if I understand the question correctly, the answer is of shortened.

14:04.000 --> 14:06.000
The question is shortened.

14:06.000 --> 14:11.000
Is there a central point where a potential data donor can volunteer

14:11.000 --> 14:16.000
and then they're given a set of questions so they can be identified where their data may go?

14:16.000 --> 14:22.000
No, at this point I am not aware of any such point, but if you think that's something you could build,

14:22.000 --> 14:24.000
that will be amazing and we can talk about it.

14:24.000 --> 14:27.000
Yes?

14:27.000 --> 14:30.000
Yes?

14:30.000 --> 14:32.000
Sorry.

14:32.000 --> 14:42.000
So when it comes to data sharing a problem that I see is if you'd go for an opt-in approach

14:42.000 --> 14:46.000
that would give you a massive selection bias.

14:46.000 --> 14:55.000
I don't know how it's currently done, but I was also thinking.

14:55.000 --> 15:01.000
I think a lot of tracking uses cookies which are stored locally.

15:01.000 --> 15:09.000
So I think it would also be interesting to look at maybe having an approach

15:09.000 --> 15:18.000
where trackers that are in the form of cookies that are locally installed by the big data

15:18.000 --> 15:30.000
miners to give users the option to also send that same information centrally.

15:30.000 --> 15:41.000
So when it comes to a selection bias, is there already a discussion on whether it's opt-in or opt-out?

15:41.000 --> 15:46.000
Because I think opt-in would have a huge selection bias.

15:46.000 --> 15:52.000
So the question is, date selection if you consider an opt-in system instead of an opt-out system,

15:52.000 --> 15:56.000
how do we deal with the possible selection bias that arises?

15:56.000 --> 16:01.000
Well, when it comes to data and the ways being shared in use, especially algorithmic systems,

16:01.000 --> 16:04.000
there is already a huge bias.

16:04.000 --> 16:08.000
People from different groups of the population share the data at different rates,

16:08.000 --> 16:13.000
which unfortunately does cause adverse effects that we didn't plan beforehand.

16:13.000 --> 16:19.000
When we look at an opt-in or opt-out system, when you compare for example to organ donation,

16:19.000 --> 16:24.000
then when you go for an automatic opt-out system, that already goes much higher,

16:24.000 --> 16:29.000
however, when it comes to data, the purposes can be much wider than when it comes to organ donation.

16:29.000 --> 16:35.000
Organ can only be used at a specific place in one person's body, whereas data could be used for public interest purpose,

16:35.000 --> 16:38.000
but also a private profit driven purpose.

16:38.000 --> 16:43.000
So because of the nature of data, we cannot go for an opt-out system in the same way,

16:43.000 --> 16:47.000
but we can ask centralized servers, think of for example,

16:47.000 --> 16:51.000
internet service providers or anything else, what they do specifically,

16:51.000 --> 16:58.000
non-personal data or anonymize data to see if we can make more or better use of data.

16:58.000 --> 17:01.000
Well, what I'm proposing is...

17:01.000 --> 17:02.000
Yeah.

17:02.000 --> 17:03.000
Yeah.

17:03.000 --> 17:04.000
Yeah.

17:04.000 --> 17:05.000
Yeah.

17:05.000 --> 17:06.000
Yeah.

17:06.000 --> 17:07.000
Yeah.

17:07.000 --> 17:08.000
Yeah.

17:08.000 --> 17:09.000
Okay.

17:09.000 --> 17:10.000
Let's have a good one again.

17:17.000 --> 17:19.000
Thank you.

