WEBVTT

00:00.000 --> 00:17.000
So I'm a tough way between academia and industry, so tough as a time, I'm in academia,

00:17.000 --> 00:21.000
so the talk will be also tough way.

00:21.000 --> 00:27.000
So the general idea is that there is like some lost chapter of computer science data

00:27.000 --> 00:29.000
synchronization in our apps.

00:29.000 --> 00:35.000
We use like typical commodity data structures like maps, vectors, whatever,

00:35.000 --> 00:41.000
and there is no way the rest of the generic technology to sync those data structures.

00:41.000 --> 00:47.000
Like once you have two devices, we obviously immediately get the need to sync data.

00:47.000 --> 00:55.000
For some strange reason, in a computer science textbook there is no accepted way to do it right.

00:55.000 --> 01:03.000
For some reason. Like it was discussed in ages, about the money when the other way.

01:03.000 --> 01:07.000
So this story was unfolding at a very slow rate.

01:07.000 --> 01:12.000
I started participating in 2008, so it is much better now.

01:12.000 --> 01:18.000
But the general idea is like, we currently can have like,

01:18.000 --> 01:21.000
we have like wool outlet with IP packets, right?

01:21.000 --> 01:25.000
And we want to have wool outlet with data structures.

01:25.000 --> 01:29.000
Like I changed some stuff here in my app, and it goes by the bias,

01:29.000 --> 01:32.000
and where I don't think about it, it does it in background,

01:32.000 --> 01:36.000
and it gets distributed to all the replicas, if somebody clicks something somewhere.

01:36.000 --> 01:38.000
It gets back to me.

01:38.000 --> 01:41.000
So I just connected the network.

01:41.000 --> 01:43.000
This is the idea.

01:44.000 --> 01:47.000
Very well defined problem, very generic, everybody needs it.

01:47.000 --> 01:48.000
There is nothing.

01:48.000 --> 01:51.000
So general approach, like we use the database.

01:51.000 --> 01:55.000
My app is saving to the database, your app is loading from the database.

01:55.000 --> 01:58.000
What if we deal with the loan running app?

01:58.000 --> 02:01.000
You have two positive database, which is not good.

02:01.000 --> 02:05.000
And general speaking, if we lose connection to the database,

02:05.000 --> 02:08.000
that's it, we are screwed, because we have no local copy.

02:08.000 --> 02:14.000
And another way to look at this problem, like, it is git for data.

02:14.000 --> 02:18.000
We have the data we need, we have it like locally, we autonomous,

02:18.000 --> 02:23.000
but we also think in real time, like the vision for it.

02:23.000 --> 02:29.000
In the application to database theory, like,

02:29.000 --> 02:33.000
some ideas have been said like in 1884,

02:33.000 --> 02:37.000
and it was obviously a summary of discussions of the 70s,

02:37.000 --> 02:42.000
so the general idea was like in the air, but the money went the other way.

02:42.000 --> 02:46.000
Because obviously, if you bring out the data to one big central server,

02:46.000 --> 02:49.000
then you can own the server, you can make a lot of money.

02:49.000 --> 02:54.000
So, I mean, the logic is just straightforward here, like this.

02:54.000 --> 02:59.000
It is about technology, if it makes me lose money, right?

02:59.000 --> 03:04.000
So, but if you look from the other side, like from the client side,

03:04.000 --> 03:06.000
then it is exactly the opposite.

03:07.000 --> 03:11.000
Then, why can't we use air sink, for example, for such thing?

03:11.000 --> 03:17.000
Why can't there seem to be our lip sink for such a kind of general things?

03:17.000 --> 03:21.000
Obviously, it works well, it synchronizes binary blobs,

03:21.000 --> 03:26.000
without understanding what's inside, so it is generic enough.

03:26.000 --> 03:29.000
But, there's sink on your works one way.

03:29.000 --> 03:32.000
It can update replicas, right? It can update mirrors,

03:32.000 --> 03:38.000
because if you have two files, same name, different date, different size,

03:38.000 --> 03:40.000
watch other sink do.

03:40.000 --> 03:44.000
I mean, is it like, this is probably the new version,

03:44.000 --> 03:47.000
and maybe this kind of like concurrent changes?

03:47.000 --> 03:50.000
It is unclear, from the context.

03:50.000 --> 03:55.000
And also, the sink is enabled to seem a file, which has some internal structure.

03:55.000 --> 04:01.000
So, obviously, to understand which is first, which is second,

04:01.000 --> 04:04.000
or maybe the concurrent, we need to track changes.

04:04.000 --> 04:07.000
Let us revision control, right?

04:07.000 --> 04:12.000
Obviously, git has history, branches.

04:12.000 --> 04:16.000
It knows which version was derived from, which has a version,

04:16.000 --> 04:20.000
and using that metadata, it can more or less reasonably merge the changes,

04:20.000 --> 04:23.000
most of the time.

04:24.000 --> 04:29.000
Well, with the exception, for the cases, one, it cannot merge them,

04:29.000 --> 04:33.000
or it will merge them, but that will be a bit incorrect,

04:33.000 --> 04:38.000
because it has, like, very, very understanding of the semantic of the data.

04:38.000 --> 04:48.000
Yeah, so, we cannot use git for this purpose, for this simple reason.

04:48.000 --> 04:51.000
Git has not deterministic notion.

04:51.000 --> 04:54.000
Quite often, git has to appeal to the developer,

04:54.000 --> 04:57.000
to manually resolve the merge contract, right?

04:57.000 --> 04:59.000
It's not deterministic.

04:59.000 --> 05:01.000
We cannot have an application, which asks the user,

05:01.000 --> 05:05.000
actually, Google Docs was doing that in its early days.

05:05.000 --> 05:08.000
One, they had concurrent edicts in the same paragraph.

05:08.000 --> 05:12.000
They would show a pop-up like, which version would you prefer?

05:12.000 --> 05:18.000
If you speak, for example, of Apple, their office program,

05:18.000 --> 05:21.000
it was even a funny story, because it was saying,

05:21.000 --> 05:23.000
like, which version do you want to keep?

05:23.000 --> 05:29.000
And if you click the wrong one, a week of work is disappeared forever.

05:29.000 --> 05:31.000
That actually happened to me.

05:31.000 --> 05:36.000
So, what is, they match better with this?

05:36.000 --> 05:44.000
So, then, a nice new technology,

05:44.000 --> 05:50.000
that is about 2,000 to 11, 30 is.

05:50.000 --> 05:52.000
Complex theory, pick it at data types.

05:52.000 --> 05:55.000
They call those commodity data structures,

05:55.000 --> 05:58.000
which, like, everybody is using to build their apps.

05:58.000 --> 06:01.000
And make mergeable versions of them.

06:01.000 --> 06:02.000
We use some metadata.

06:02.000 --> 06:04.000
We annotate each piece of data, like,

06:04.000 --> 06:07.000
which was added by Joel, like, yesterday.

06:07.000 --> 06:10.000
This was added by Jane, two days ago.

06:10.000 --> 06:14.000
So, we, more or less, understand which piece of data came from,

06:14.000 --> 06:17.000
which side, and which is the recent area,

06:17.000 --> 06:20.000
which is the old data, and so on.

06:20.000 --> 06:27.000
A algebraically, it is very, very nice and smooth and clean.

06:27.000 --> 06:33.000
And it always, like, merge very reasonably without conflicts and so on.

06:33.000 --> 06:39.000
I will explain the quotes.

06:39.000 --> 06:43.000
So, basically, who invented this series T?

06:43.000 --> 06:46.000
The series T wasn't invented, it was coined.

06:46.000 --> 06:48.000
By the time the term was invented,

06:48.000 --> 06:51.000
the technology was already being used.

06:51.000 --> 06:54.000
So, and the very term series here,

06:54.000 --> 06:57.000
I mentioned it as conflict theory replicated data types.

06:57.000 --> 07:01.000
Actually, there are also commutative replicated data types.

07:01.000 --> 07:03.000
Conversion, replicated data types.

07:03.000 --> 07:06.000
And many other probably, see,

07:06.000 --> 07:08.000
are these, they mean more or less the same,

07:08.000 --> 07:11.000
but they're, like, equivalent in some sense.

07:11.000 --> 07:16.000
So, basically, it is the last library of replicated data types,

07:16.000 --> 07:17.000
of why those kinds.

07:17.000 --> 07:23.000
So, that was, like, a happy day when,

07:23.000 --> 07:26.000
that in reality, in publishes,

07:26.000 --> 07:29.000
the paper of the library of replicated data types,

07:29.000 --> 07:32.000
have, of which the boroughs for some of the deals.

07:32.000 --> 07:35.000
But finally, we have, like, that for library,

07:35.000 --> 07:37.000
which we can use to build apps.

07:37.000 --> 07:40.000
So, finally, we can solve that problem,

07:40.000 --> 07:42.000
it is not too engineering, right?

07:42.000 --> 07:43.000
Sort of yes.

07:43.000 --> 07:46.000
But the problem is,

07:46.000 --> 07:50.000
CRBT, resource, all these merge conflicts

07:50.000 --> 07:53.000
and are other issues of dealing with concurrent changes to the data.

07:53.000 --> 07:55.000
It is solved by adding metadata.

07:55.000 --> 07:57.000
So, for each piece of data is annotated

07:58.000 --> 08:00.000
who added it, when added it.

08:00.000 --> 08:04.000
And that, basically, can ruin everything for the simple reason

08:04.000 --> 08:10.000
that we can have more metadata on data by many times.

08:10.000 --> 08:15.000
So, we can only compare with technology to Cassandra.

08:15.000 --> 08:18.000
I mean, I'm on, like, massively used technologies,

08:18.000 --> 08:21.000
which are, like, industrial, and massively used.

08:21.000 --> 08:23.000
The only thing we can compare,

08:23.000 --> 08:25.000
is Cassandra. Cassandra is last right-piece,

08:25.000 --> 08:27.000
but it is using metadata.

08:27.000 --> 08:29.000
I understand, Cassandra, right?

08:29.000 --> 08:31.000
It's a cloud database, which Apple is using,

08:31.000 --> 08:33.000
in farms, like, of 10,000 servers,

08:33.000 --> 08:35.000
because, like, all the iPhones are pumping data

08:35.000 --> 08:38.000
into the cloud, and they have to put it somewhere.

08:38.000 --> 08:42.000
So, Cassandra is, like, very decentralized.

08:42.000 --> 08:47.000
And it has the same molest metadata problems.

08:47.000 --> 08:48.000
They kind of cheating.

08:48.000 --> 08:50.000
They're putting, I believe, only one time

08:50.000 --> 08:52.000
step on the tuple.

08:52.000 --> 08:56.000
And in case of CRDT, we have to put, like,

08:56.000 --> 08:58.000
one logical time step on the cell.

08:58.000 --> 09:02.000
And a logical time step is idea for replica and time.

09:02.000 --> 09:04.000
So, basically, two integers.

09:04.000 --> 09:06.000
So, if our cell is one thing integer,

09:06.000 --> 09:09.000
we put two integers of metadata on top.

09:09.000 --> 09:12.000
And also, because, like, most typical piece of data

09:12.000 --> 09:14.000
in real-world applications is a small integer,

09:14.000 --> 09:17.000
and our logical time stamp is typically

09:17.000 --> 09:21.000
larger integer, and user-ready is also larger integer.

09:21.000 --> 09:24.000
So, in the end, it will be, like, 10 times more data,

09:24.000 --> 09:26.000
metadata than data.

09:26.000 --> 09:28.000
Which sort of ruins the day?

09:28.000 --> 09:30.000
Because, like, it works correctly,

09:30.000 --> 09:32.000
as a runa conflict, but you have to,

09:32.000 --> 09:35.000
you need 10 times more time to pump all the data,

09:35.000 --> 09:38.000
than 10 times more servers to store it.

09:38.000 --> 09:40.000
And, no, but it's all correct.

09:40.000 --> 09:44.000
On the contrary, it's synchronizes really good.

09:45.000 --> 09:48.000
So, basically, this was the story, like,

09:48.000 --> 09:50.000
two thousand telets, one, two thousand,

09:50.000 --> 09:54.000
fifteen, fifteen, so it would, like,

09:54.000 --> 09:56.000
top-ish ones agenda.

09:56.000 --> 10:00.000
And, another piece of metadata is version vectors.

10:00.000 --> 10:06.000
In case, I mean, we can also refer to one, like,

10:06.000 --> 10:09.000
most reviews in the astral system, they name it DB,

10:09.000 --> 10:11.000
which has a small, that's a lot of problem,

10:11.000 --> 10:13.000
but they don't have much, that's a problem, because,

10:13.000 --> 10:17.000
because, version vectors is a constructs.

10:17.000 --> 10:20.000
Like, if you don't have, in a linear history,

10:20.000 --> 10:22.000
you can, like, in post-gross, rogue,

10:22.000 --> 10:23.000
DB, whatever.

10:23.000 --> 10:25.000
Any classic database, my SQL,

10:25.000 --> 10:27.000
we can have, like, transaction,

10:27.000 --> 10:29.000
one transaction, two transaction, three,

10:29.000 --> 10:31.000
three, three, four, and so on, linear.

10:31.000 --> 10:33.000
In such a database, because you don't have

10:33.000 --> 10:35.000
central servers, you don't have linear history.

10:35.000 --> 10:39.000
So, we have a vector of versions.

10:39.000 --> 10:41.000
So, we say, like, from Jane,

10:41.000 --> 10:43.000
we've heard everything till version 10,

10:43.000 --> 10:45.000
and from Joe, we've heard everything till version 12,

10:45.000 --> 10:49.000
and so on, and in the worst case, for each user,

10:49.000 --> 10:51.000
we have a number.

10:51.000 --> 10:53.000
So, if we have million users,

10:53.000 --> 10:59.000
then our theoretical descriptor of our version,

10:59.000 --> 11:01.000
is, like, one million cells.

11:01.000 --> 11:03.000
Which is not good.

11:03.000 --> 11:05.000
And that was also a problem, that was,

11:05.000 --> 11:08.000
but, both in many people, for example,

11:08.000 --> 11:10.000
some people from the Portuguese team,

11:10.000 --> 11:14.000
I think, University of Braga, which he had Braga.

11:14.000 --> 11:18.000
They wrote, like,

11:18.000 --> 11:21.000
and stack of articles on optimizing version vectors,

11:21.000 --> 11:24.000
but, like, yes, it was a problem.

11:24.000 --> 11:26.000
Like, the name of DB,

11:26.000 --> 11:28.000
it doesn't have a problem because, for them,

11:28.000 --> 11:30.000
one replica is actually one server.

11:30.000 --> 11:32.000
So, in case you have 10 servers,

11:32.000 --> 11:35.000
you have 10 cells, and a option vector.

11:35.000 --> 11:37.000
And this will be, like,

11:37.000 --> 11:39.000
not service, in total, but, so,

11:39.000 --> 11:41.000
we're just touching a particular piece of data.

11:41.000 --> 11:45.000
So, it wasn't much of a problem, for them.

11:45.000 --> 11:49.000
So, like, we have nice technology,

11:49.000 --> 11:55.000
it's more just swell, it's synchronized as well.

11:55.000 --> 11:59.000
Well, actually,

11:59.000 --> 12:03.000
me and friends might,

12:03.000 --> 12:06.000
a collaborative editor, which was running in production,

12:06.000 --> 12:10.000
into files on 12, using exactly this version team.

12:10.000 --> 12:16.000
But, it wasn't an issue because, for each particular document,

12:16.000 --> 12:19.000
user-side collaborative, collaboratively.

12:19.000 --> 12:21.000
Write a picture not many users.

12:21.000 --> 12:24.000
If you have a hundred users, that's not a tragedy.

12:24.000 --> 12:26.000
Even if you have a thousand users per document,

12:26.000 --> 12:28.000
which is really rare, it is not a tragedy.

12:28.000 --> 12:32.000
So, this is only a problem in some cases,

12:32.000 --> 12:36.000
but when it is a problem, it is a big one.

12:36.000 --> 12:40.000
So, all these metadata problems have been

12:40.000 --> 12:46.000
more or less solved by this day.

12:46.000 --> 12:51.000
Basically, the correct approach was to use,

12:51.000 --> 12:54.000
how to say, to version data in bulk.

12:54.000 --> 12:57.000
Like, you can ship in envelopes,

12:57.000 --> 12:59.000
you can ship in what logistical containers.

12:59.000 --> 13:02.000
So, the idea is to ship data in logistical containers.

13:02.000 --> 13:04.000
Once you version logistical containers,

13:04.000 --> 13:06.000
you need much less of metadata.

13:06.000 --> 13:09.000
But, what I'm describing is,

13:09.000 --> 13:12.000
like, progress of this field of research for,

13:12.000 --> 13:15.000
might be like, 15 years, just like.

13:15.000 --> 13:17.000
I mean, when I describe it as a spectrum,

13:17.000 --> 13:20.000
it is like, what a full, heavy beam,

13:20.000 --> 13:22.000
but it wasn't just me, it was like,

13:22.000 --> 13:26.000
a team in Portugal, a team in France and many other teams.

13:26.000 --> 13:29.000
So, the idea,

13:29.000 --> 13:32.000
we can compare,

13:32.000 --> 13:34.000
with the entire story,

13:34.000 --> 13:38.000
to myself run all the same data basis,

13:38.000 --> 13:39.000
which actually, in nature,

13:39.000 --> 13:40.000
really, version of the data,

13:40.000 --> 13:42.000
because they have,

13:42.000 --> 13:44.000
how to say, it is T files of all data,

13:44.000 --> 13:47.000
and like, less old data and new data and so on.

13:47.000 --> 13:51.000
So, the nature to keep all the new data separate.

13:51.000 --> 13:54.000
So, the trick is used

13:54.000 --> 13:56.000
to then, that part of me,

13:56.000 --> 13:59.000
to data is not a problem as well.

13:59.000 --> 14:03.000
So, August question for people who are bored of,

14:03.000 --> 14:07.000
bored of academic struggles,

14:07.000 --> 14:08.000
I could be in describing.

14:08.000 --> 14:11.000
What can I actually use?

14:11.000 --> 14:13.000
Somebody,

14:13.000 --> 14:15.000
who has an idea,

14:15.000 --> 14:18.000
what can I actually use for sinking my data in my app?

14:18.000 --> 14:20.000
Please raise your hand.

14:20.000 --> 14:21.000
Like, you have clear idea,

14:21.000 --> 14:23.000
you're using some kind of,

14:23.000 --> 14:27.000
one, two, three, four, five,

14:27.000 --> 14:29.000
six, seven.

14:29.000 --> 14:34.000
So, basically, this is like a relevant audience.

14:34.000 --> 14:36.000
Good.

14:36.000 --> 14:39.000
So, currently, we have like,

14:39.000 --> 14:41.000
the local first world,

14:41.000 --> 14:44.000
which is mostly into JavaScript, mostly.

14:44.000 --> 14:47.000
Some have, are asymptim limitations,

14:47.000 --> 14:50.000
but they are not working that,

14:50.000 --> 14:52.000
quite easily.

14:52.000 --> 14:56.000
So, mostly, from 2014,

14:56.000 --> 14:58.000
I think that,

14:58.000 --> 15:00.000
a lot of first story continues,

15:00.000 --> 15:01.000
like for 10 years,

15:01.000 --> 15:04.000
and to this day, most of it is like JavaScript.

15:04.000 --> 15:07.000
Then, there are solutions for sinking data,

15:07.000 --> 15:08.000
basically.

15:08.000 --> 15:09.000
Like, if you have Postgres on the server,

15:09.000 --> 15:10.000
you have,

15:10.000 --> 15:12.000
SQLite on the client,

15:12.000 --> 15:14.000
and you want to synchronize it.

15:15.000 --> 15:16.000
And some of this actually,

15:16.000 --> 15:18.000
funded companies.

15:18.000 --> 15:20.000
And,

15:20.000 --> 15:22.000
the thing I like to talk about is,

15:22.000 --> 15:23.000
LiPurDiX,

15:23.000 --> 15:25.000
it is like,

15:25.000 --> 15:26.000
my fin.

15:26.000 --> 15:28.000
So, a LiPurDiX,

15:28.000 --> 15:30.000
like, replicated data exchange format.

15:30.000 --> 15:32.000
That is, like, JSON-ish format,

15:32.000 --> 15:34.000
which synchronizes really well.

15:34.000 --> 15:37.000
So, why do I want to talk about,

15:37.000 --> 15:39.000
because synchronize the bull JSON,

15:39.000 --> 15:42.000
it is like, for 1214,

15:42.000 --> 15:44.000
and it is not used really much,

15:44.000 --> 15:45.000
because,

15:45.000 --> 15:47.000
but what is the good thing about,

15:47.000 --> 15:48.000
again,

15:48.000 --> 15:50.000
a problem is about metadata.

15:50.000 --> 15:52.000
What is good thing about JSON JSON,

15:52.000 --> 15:54.000
is easy to read, easy to edit, right?

15:54.000 --> 15:56.000
If we put our data in JSON,

15:56.000 --> 15:57.000
we can always read it,

15:57.000 --> 15:59.000
we can always debug it,

15:59.000 --> 16:01.000
we open console in the browser,

16:01.000 --> 16:02.000
we see JSON,

16:02.000 --> 16:04.000
we understand what's going on, right?

16:04.000 --> 16:06.000
So, if you add that,

16:06.000 --> 16:08.000
if you add it to JSON,

16:08.000 --> 16:09.000
it becomes,

16:09.000 --> 16:11.000
like, really complicated,

16:11.000 --> 16:12.000
what's going on,

16:12.000 --> 16:14.000
impossible to edit by hand,

16:14.000 --> 16:16.000
because if you mess up the metadata,

16:16.000 --> 16:17.000
that's it.

16:17.000 --> 16:19.000
So,

16:19.000 --> 16:21.000
ah,

16:21.000 --> 16:23.000
ah,

16:23.000 --> 16:26.000
ah,

16:26.000 --> 16:28.000
good part.

16:28.000 --> 16:31.000
Ah,

16:31.000 --> 16:33.000
Erdix,

16:33.000 --> 16:35.000
replicate the data exchange format.

16:35.000 --> 16:37.000
So, it is like JSON-like format,

16:37.000 --> 16:39.000
which has metadata,

16:40.000 --> 16:42.000
but the model is built in a way

16:42.000 --> 16:44.000
that you may export it in a platform

16:44.000 --> 16:46.000
and then import it back

16:46.000 --> 16:48.000
into metadata reform.

16:48.000 --> 16:49.000
So, basically,

16:49.000 --> 16:51.000
as a user,

16:51.000 --> 16:54.000
you'll never work with metadata,

16:54.000 --> 16:57.000
annotated data.

16:57.000 --> 16:58.000
Also,

16:58.000 --> 16:59.000
it is not exactly JSON,

16:59.000 --> 17:02.000
because it is sort of optimized

17:02.000 --> 17:05.000
to make it nice, clean of jubric.

17:05.000 --> 17:07.000
So, for example,

17:08.000 --> 17:09.000
there are, like,

17:09.000 --> 17:10.000
primitive types,

17:10.000 --> 17:11.000
first float into jubric

17:11.000 --> 17:12.000
and string term.

17:12.000 --> 17:13.000
And,

17:13.000 --> 17:14.000
the rapplex types,

17:14.000 --> 17:15.000
collections,

17:15.000 --> 17:16.000
basically, tuple.

17:16.000 --> 17:17.000
Ah,

17:17.000 --> 17:18.000
linear vector,

17:18.000 --> 17:19.000
you will reset,

17:19.000 --> 17:20.000
and multiplex is,

17:20.000 --> 17:21.000
like,

17:21.000 --> 17:22.000
version vector,

17:22.000 --> 17:23.000
or counter,

17:23.000 --> 17:24.000
collaborative counter.

17:24.000 --> 17:25.000
So,

17:25.000 --> 17:26.000
ah,

17:26.000 --> 17:27.000
the idea is,

17:27.000 --> 17:28.000
that you can nest it all,

17:28.000 --> 17:29.000
arbitrarily,

17:29.000 --> 17:30.000
for example,

17:30.000 --> 17:31.000
this is set.

17:31.000 --> 17:32.000
In the JavaScript,

17:32.000 --> 17:33.000
that will be a legal,

17:33.000 --> 17:34.000
to,

17:34.000 --> 17:35.000
for example,

17:35.000 --> 17:36.000
one entry,

17:37.000 --> 17:38.000
or,

17:38.000 --> 17:39.000
or,

17:39.000 --> 17:40.000
or entry without a key,

17:40.000 --> 17:41.000
and here set is a set.

17:41.000 --> 17:42.000
So, basically,

17:42.000 --> 17:43.000
you can put whatever you want into it,

17:43.000 --> 17:44.000
and,

17:44.000 --> 17:45.000
and,

17:45.000 --> 17:46.000
this key will appear is actually a table,

17:46.000 --> 17:47.000
and this part,

17:47.000 --> 17:49.000
and this pair is also a table.

17:49.000 --> 17:50.000
So, basically,

17:50.000 --> 17:51.000
it is, like,

17:51.000 --> 17:52.000
algebraically,

17:52.000 --> 17:53.000
optimized JSON,

17:53.000 --> 17:56.000
where everything commutes,

17:56.000 --> 17:58.000
and everything can be nested into,

17:58.000 --> 18:00.000
into everything.

18:00.000 --> 18:01.000
So,

18:01.000 --> 18:02.000
basically,

18:02.000 --> 18:03.000
how I merge it,

18:03.000 --> 18:04.000
you don't produce

18:04.000 --> 18:06.000
an illegal construction.

18:08.000 --> 18:09.000
How,

18:09.000 --> 18:10.000
how,

18:10.000 --> 18:12.000
how,

18:12.000 --> 18:13.000
how,

18:13.000 --> 18:14.000
how,

18:14.000 --> 18:15.000
how,

18:15.000 --> 18:16.000
how,

18:16.000 --> 18:17.000
how,

18:17.000 --> 18:18.000
how,

18:18.000 --> 18:19.000
how,

18:19.000 --> 18:20.000
how,

18:20.000 --> 18:21.000
how,

18:21.000 --> 18:22.000
how,

18:22.000 --> 18:23.000
how,

18:23.000 --> 18:24.000
how,

18:24.000 --> 18:25.000
how,

18:25.000 --> 18:26.000
how,

18:26.000 --> 18:27.000
how,

18:27.000 --> 18:28.000
how,

18:28.000 --> 18:29.000
how,

18:29.000 --> 18:30.000
how,

18:30.000 --> 18:33.000
how,

18:33.000 --> 18:34.000
how,

18:34.000 --> 18:36.000
how,

18:36.000 --> 18:37.000
how,

18:37.000 --> 18:38.000
how,

18:38.000 --> 18:40.000
how,

18:40.000 --> 18:41.000
how,

18:41.000 --> 18:42.000
how,

18:42.000 --> 18:44.000
how,

18:44.000 --> 18:45.000
how,

18:45.000 --> 18:47.000
how,

18:47.000 --> 18:47.040
that's how,

18:48.000 --> 18:50.000
you,

18:50.000 --> 18:51.000
what,

18:51.000 --> 18:53.000
I've already read compiler,

18:53.000 --> 18:57.000
hit get it,

18:57.000 --> 19:04.160
DB. It is prototype quality, but the idea is once you have all this constructs nice

19:04.160 --> 19:10.360
clean algebraic, you can just add this more shakerators to record it, but it becomes

19:10.360 --> 19:18.000
not jable-syncable serdity database. Like this is a very primitive hack, and on

19:18.080 --> 19:26.080
it works, because once again, all the more shakerators for all these types, like single

19:26.080 --> 19:33.760
pass, very much like more shakerators. And then there is this library which I write in

19:33.760 --> 19:44.280
very particular dialect of C, which is called ABC, algebraic, break-lain C. It is a very

19:44.360 --> 19:57.200
algebraic dialect. People who read it, they always mention lists, whatever, and so, the good

19:57.200 --> 20:09.840
part is it is like systems level stuff. So, lipordrics, lip, lip, lip, lip, lip, lip, lip. Just to

20:09.840 --> 20:17.840
sync data from any up to any connection to any server, that is the idea. I might

20:17.840 --> 20:25.360
make it all nice, but because it is in C, I challenge obviously, and then in any other language.

20:25.360 --> 20:33.800
So, in this world of data synchronicnowages, we now have a different problem. We have different

20:33.880 --> 20:42.520
standards for those data synchroniclets, very much like in the real world, but otherwise life

20:42.520 --> 20:53.760
is beautiful, and please questions. Thank you.

20:54.000 --> 21:02.040
You have a first question, right? No, no, no, no. Any question?

21:02.040 --> 21:09.040
Yeah, of course itself, when you sit there, get as something that does sync, and that you can

21:09.040 --> 21:15.200
use, and you did it. No, theoretically, I might put my stuff in text files, and my pricing

21:15.200 --> 21:18.240
can get with this. Some people are actually doing that, for example. Can you repeat the

21:18.240 --> 21:23.360
question, please? Can you repeat the question? Why don't I use git? Why do I even consider?

21:23.440 --> 21:28.480
Why are you listening? Because some people are actually doing it. For example, they put

21:28.480 --> 21:34.240
issues and the commands and stuff into their rep of the main project and they have some scripts,

21:34.240 --> 21:37.840
which like I forgot was the name of the project, which is doing that, but they are actually using

21:37.840 --> 21:44.800
git to sync data. They didn't, they just not correct, but it works to some degree.

21:46.000 --> 21:51.600
And the sync is the same. So, basically, I'm just going like less semantics, more semantics.

21:52.080 --> 21:57.040
Ersink is just, GIT is basically also blobs, but it tracks history, the blobs, and it can

21:57.040 --> 22:01.600
merge changes to the blobs, it's out on the standard semantics. And see, already, everything is

22:01.600 --> 22:07.520
that, everything is that, and also it has particular semantics for the extractor. So, it can merge

22:07.680 --> 22:15.280
it also set, and vector is a vector. So, primitive to sophisticated.

22:22.480 --> 22:27.360
You brought up the merge problem earlier, but I don't know if I missed any proposed solution

22:27.360 --> 22:35.520
on semantic merge conflicts. So, even without the X or what not, was your proposal for,

22:35.600 --> 22:39.200
I changed the word, the cat, you changed the word, the dog, and they come in together.

22:39.200 --> 22:44.880
Obviously, a little bit. I mean, it was like a joke for like 20 years and this is the main,

22:44.880 --> 22:50.400
like we need artificial intelligence to solve semantics merge problems, because there is a

22:50.400 --> 22:54.800
semantic. And now we have artificial intelligence, so it is no longer a joke.

22:54.800 --> 23:08.160
Basically, they will produce a mess, but they will produce a deterministic mess, so they will

23:08.160 --> 23:13.200
all produce the same mess. So, they want to resolve that mess, they will also resolve it for the others.

23:13.280 --> 23:17.600
At least algorithmic level, like the maximum possible thing.

23:27.360 --> 23:33.360
Did you have a solution or is it at a level above for compaction or garbage collection of the

23:33.360 --> 23:38.320
version vectors? Yes, yes, yes, yes. Basically, I will send that a basis, they do garbage collection

23:38.400 --> 23:45.680
nature really. Like the ways of work, they always do gc, like with every compaction, they basically

23:45.680 --> 23:52.720
compact. So, more or less the same thing. Some amount of metadata still stays. It is inevitable.

23:54.240 --> 23:58.880
Unfortunately, but in the general scheme of things, yes, there are compaction and I am

23:58.880 --> 24:05.760
like inevitably all of some blades of it. Thank you. Oh, finally.

24:10.320 --> 24:19.120
So, how do you actually handle, is it ends up being last write rins for actual pieces of data,

24:19.120 --> 24:25.840
like let's say between if it was named Alice Bob, like there is a conflict in the actual values?

24:26.320 --> 24:32.560
As a result, basically, in many cases, it is last written depends on the data structure.

24:32.560 --> 24:38.880
It is of an array. For example, the mark with somehow it didn't uniquely identify the insertion

24:38.880 --> 24:43.680
place. So, if somebody, instead of data into the place, we don't care about any concurrent insertion

24:43.680 --> 24:48.960
solutions or deletions, we can always find the place in the new version. That is the matrix of array.

24:49.920 --> 24:54.160
So, it's, they have more or less last written ones, semantics, but they are a particular key.

24:55.760 --> 25:01.040
Counter, they have a different semantics. Like in version version, we had like Bob 5 and the other version,

25:01.040 --> 25:06.160
we have Bob 7, we know like this one is later version. So, Bob 7, that means like Bob contributed

25:06.160 --> 25:12.320
7 units to this particular counter. And values from Alice, they will go like in a different cell.

25:13.200 --> 25:19.760
So, basically, concurrent counters, they split it by contributions by the source.

25:21.600 --> 25:27.680
Not the mess up the things, because they track who contributed how much. And so on, semantics differs

25:27.680 --> 25:42.240
by the data type. Yep. Sorry, I have another question. Should I understand RDX as a being a CRD

25:42.320 --> 25:50.540
team? So, could I say, when I choose between oddermarge, YDS, that is just, and then this, this,

25:50.540 --> 25:57.440
this, this, this, this, this, this, this, this, this, this, this, this, this, this, this, this, this,

25:57.440 --> 26:00.000
this, this, this, this, this, you see? Clip, yeah, it is working in Cambridge. Yeah.

26:00.000 --> 26:06.800
Ah, the oddermarge. So, is there also something about RDX that makes it different or

26:06.800 --> 26:10.800
or higher up the stack, or larger than YJS, for instance,

26:10.800 --> 26:14.240
is something RDS is more or less like a small business

26:14.240 --> 26:16.520
doing the discrete data synchronization

26:16.520 --> 26:18.600
for collaborative online applications,

26:18.600 --> 26:23.040
which was my state of interest in 2012 to 2014.

26:23.040 --> 26:25.440
And now I'm doing systems level of top.

26:25.440 --> 26:29.200
So YJS, yes, let us see at another solution.

26:29.200 --> 26:32.120
Speak of after March, then basically

26:32.120 --> 26:34.280
they're trying to use unchanged JSON.

26:34.280 --> 26:37.040
And I'm basically saying like there's no way

26:37.040 --> 26:40.360
to make JSON mergeable, let's make a nicer JSON

26:40.360 --> 26:45.040
like format, which is like sufficiently how to say it.

26:45.040 --> 26:49.480
All jubric to merge it nicely.

26:49.480 --> 26:52.760
So more likely, quiz.

26:52.760 --> 26:55.520
So this is different solutions, but very more or less

26:55.520 --> 26:59.280
like overlapped, heavily.

26:59.280 --> 27:00.640
But very, very powerful, maybe.

27:00.640 --> 27:03.640
Thank you.

27:03.640 --> 27:07.640
One less question?

27:07.640 --> 27:10.240
Is this like a mathematically proved

27:10.240 --> 27:11.640
that this problem is solved?

27:11.640 --> 27:12.640
Is that the bottom part?

27:12.640 --> 27:13.640
No way.

27:13.640 --> 27:14.640
Why do we focus?

27:14.640 --> 27:15.640
Yes, yes.

27:15.640 --> 27:18.640
Why do we still have things like merge conflicts?

27:18.640 --> 27:20.640
Basically, it was public.

27:20.640 --> 27:23.640
It was a chain of articles starting to file on six

27:23.640 --> 27:24.640
I believe.

27:24.640 --> 27:28.640
And the concept here was coined in 2011.

27:28.640 --> 27:31.640
And it was always, yes, we had a couple of proof.

27:31.640 --> 27:33.640
And it was only one of us, we had a couple of proof.

27:33.640 --> 27:36.640
So for the last 15 years, the problem was practical

27:36.640 --> 27:38.640
applicability because of the method of data.

27:38.640 --> 27:41.640
And in theory, I would have been more just nicely

27:41.640 --> 27:43.640
over a few courses on the rainbow.