WEBVTT

00:00.000 --> 00:13.000
Okay. Hi. Good afternoon. My name is Yoshinori Matsunogu. I'm a production engineer at

00:13.000 --> 00:20.160
Meta. So today I'm going to talk about the myrox, myrox study storage engine for my

00:20.160 --> 00:28.880
scale. So I'm using my scale since 2004, so it's over 20 years. So I started using my scale

00:28.880 --> 00:37.520
for 0.0. Then I first worked at Sony, then joined my scale company. After I acquired

00:37.520 --> 00:43.480
my son and all the crew, then I'd have to do some local Japanese gaming company. And since

00:43.480 --> 00:51.200
then I joined Facebook in 2003. And after that I was still working on my scale. And

00:51.200 --> 00:58.040
a couple of years ago, we created rock-stabby, the embedded rock-stabby library, and we integrated

00:58.040 --> 01:08.920
into my scale as a storage engine. So myrox is a rock-stabby storage engine for myrox,

01:08.920 --> 01:18.080
but essentially it's a new storage engine for best and rock-stabby. So rock-stabby itself is also

01:18.080 --> 01:27.000
open source, so that is more active. The activity also created by Meta, that is GitHub.

01:27.000 --> 01:33.160
So the popular open source area, same, log selection, masterly database libraries, and

01:33.160 --> 01:39.960
going to briefly talk about from next slide. The myrox is a log set with engines and the

01:39.960 --> 01:45.240
built-in of my scale. So my regular my scale grants talk to my scale server, and my scale

01:45.240 --> 01:49.720
server areas like the pass-up of the myrox, the replication, these are all the same, not

01:49.720 --> 01:56.120
affected. And the only storage engine like a table is different. And binary distribution is

01:56.120 --> 02:02.800
available from Percon server. And also at Facebook's Meta, so we have it used, this is

02:02.800 --> 02:09.680
so we initially created myrox for the best things, social database, background, we call

02:09.680 --> 02:19.920
at UDD, so it's a Facebook Instagram, social interactions like likes, comments, like live

02:19.920 --> 02:24.240
comments or whatever. So all social activities are still at the end of the background database

02:24.240 --> 02:32.240
called UDDB. And UDDB is very space-based, bound service, and the innovative using quite a

02:32.240 --> 02:38.320
lot of space because of the P3 architecture. So log set is the same, so it is technically

02:38.320 --> 02:44.000
saved more space, so that's why we wanted to migrate. And the first myrox, so there was

02:44.000 --> 02:50.000
several migration attempts like HBs, and they didn't work where, so because my scale is pretty

02:50.000 --> 02:59.360
good database, so the HBs was not so great. So after myrox and that migration finally succeeded,

02:59.360 --> 03:08.000
so now we land the whole UDDB as a myrox, with a rock study data format. And now they

03:08.000 --> 03:15.760
we are trying to migrate as I use cases, like other databases, in set Meta, so we have

03:15.760 --> 03:23.440
many kinds of land database services like similar to Amazon RDS customers. So we are trying

03:23.440 --> 03:28.880
to migrate them, we are hitting several interesting issues, some of them I can share today.

03:30.400 --> 03:36.240
So this is a brief architecture of the rock stable, I'm talking about light pass and build pass,

03:36.240 --> 03:40.400
so this is a light pass, so when light thickness comes to rock stable,

03:40.400 --> 03:46.480
there is a transaction rock called light hydrox, which is equivalent to inner-devilied rock,

03:46.480 --> 03:53.200
transactions committed to log by fast, then the change is kept in memory, which is called

03:53.200 --> 03:59.760
a light memptable. Then all the changes are accumulated in the memptable, and it's not

03:59.760 --> 04:06.160
touching to the back end file called SSD files. So until the memptable gets full, then all the

04:06.240 --> 04:12.640
changes are kept in memory, which means light is very fast, and after memptable gets full,

04:13.200 --> 04:20.080
then the back end ground process called flash happening, and the stretching is memptable

04:20.080 --> 04:27.520
into SSD files, the data file. Then after there are many SSD files, there is another background

04:27.520 --> 04:33.680
process called complexions happening to magic them, these SSD files into more compact ones,

04:33.680 --> 04:42.160
more specific or more efficient for these. So the key architecture difference is light,

04:42.160 --> 04:47.760
all light because it thinks memory, which means no random light, the random needs to regard,

04:47.760 --> 04:55.840
which is very fast, and also the complexion is the SSD file levels, which is more efficient compared

04:55.840 --> 05:04.560
to a bit in a little bit. On the other hand, lead pass is a bit more complex, because there are

05:04.560 --> 05:12.240
many places that data is managed, the vines are SSD files, so if the data is not in memptable,

05:12.240 --> 05:19.360
or broadcast, then the files in SSD files needs to be literally led, and the castings of broadcast.

05:20.080 --> 05:25.200
But also, the user may change in the memptable, since when the memptable needs to

05:25.200 --> 05:31.120
return, then the data is not in the broadcast, so it is still valid. So the process called

05:31.120 --> 05:38.560
much, the magic from a memptable and leading from broadcast, then the magic results and the

05:38.560 --> 05:46.640
returning is the most latest record is a grant, so this step is needed, so this is more expensive

05:47.280 --> 05:53.280
compared to B3. So there is a clear thread of the generic characteristics of the

05:53.280 --> 05:58.960
RSM database, so I am talking about lock study, but it is applied for any other RSM database

05:58.960 --> 06:05.600
architecture like HPS. But the space is much smaller, because that is light optimised and

06:05.600 --> 06:11.440
the space of the device database, so the space is much smaller, light performance is good,

06:12.000 --> 06:18.640
and light using the less CPU and higher light throughput, and that also means the replication

06:18.720 --> 06:24.880
lags, less software, so which is much easier to manage the fighting with the replication lags.

06:25.280 --> 06:31.520
But instead there is a more expensive, especially if you are doing a lot of range of scans,

06:31.520 --> 06:38.320
or deleting lots of records and doing range scans, that is quite expensive. I am going to briefly

06:38.400 --> 06:49.440
talk about data. So from this as overview of the lock study architecture, so from next slide,

06:49.440 --> 06:56.240
I am going to talk about the recent developments we have done in the Myrox and the lock study.

06:57.840 --> 07:02.640
So why part of the first part is a Chrome plugin, so Chrome plugin is, it is a bit

07:02.640 --> 07:07.920
plugin, so it is available in the database of absolutely many areas where, and this is the

07:07.920 --> 07:14.880
great, so we implemented for the Myrox that is where, that is a Myrox plugin, so the main objective

07:14.880 --> 07:22.720
is deprecating the external backup, so the company 300 is a database copy, using the MySQL server,

07:23.600 --> 07:31.200
so the main benefit of the Chrome plugin is, there are several benefits managed by server,

07:31.200 --> 07:37.120
so you don't need external process, but to meet it for metadata, the one of the biggest benefits

07:37.120 --> 07:43.840
you can control the target copy speed with a Chrome plugin, and the speed can be much faster.

07:44.400 --> 07:48.960
So if you use the external backup, the streaming copy, so we used to use that,

07:48.960 --> 07:57.280
then when streaming copies and everything is English read, so if the Myrox speed or if you copy

07:57.280 --> 08:04.720
to a different regions, like from the US, the rest goes to Europe, then that is a very long distance,

08:04.720 --> 08:11.040
so if you copy with single thread, then copy speed 10 to go down, like from one year by

08:11.040 --> 08:17.520
second to two like 10 to 20 megabytes per second, so if you want to copy the database quickly,

08:17.520 --> 08:22.720
then the single thread is often causing problems, but with Chrome plugins, you can set the target

08:22.720 --> 08:28.240
copy speed, then the Chrome plugin spans more stress as needed, then the copying the

08:28.240 --> 08:35.680
massively large data, when needed, using more CPUs, but copy speed is more guarantees, so that means

08:35.680 --> 08:42.480
the copy speed is predictable, that is much easier to operate for us, because you know that

08:42.480 --> 08:50.320
by when the copy is finishing or not, so we are actively using that, so if you're using

08:50.320 --> 08:55.760
it, you know, they really are not compiling it, then that's a pretty useful feature, so I

08:55.760 --> 09:02.080
didn't really recommend for that. The second part is read free replication, so

09:03.680 --> 09:11.760
back is via inspired by this feature from Tokudavit, so that's what's got some attention about

09:11.760 --> 09:17.200
10 years ago, so Tokudavit is already stage the resume database, so that the support is read free

09:17.200 --> 09:26.720
replication, so concept is, it is same database, you can put the data, you have all the previous

09:26.720 --> 09:33.280
record image, you can install the database without reading record, so with robust final log,

09:33.280 --> 09:39.920
the whole primary key of the change in low image is available in the window, so by reading the

09:39.920 --> 09:47.680
window, you can compose all the change APIs that are needed to send to Tokudavit, so you don't

09:47.680 --> 09:56.960
have to read anything from Microsoft database, so default behaviors, when change comes, then

09:56.960 --> 10:03.840
the replication is replicas, read from Tokudavit by primary key to change which record is changed,

10:03.840 --> 10:10.080
then composing a log save API called delete or put to send the action, log save API calls,

10:10.080 --> 10:19.280
then changing that, so that involves the primary key record, that is not a zero cost, so if there

10:19.280 --> 10:25.040
are massive updates from the primary instance, then the random reads overhead is getting problems,

10:25.040 --> 10:30.960
and sometimes they're replicas, but it is read free replication, so you can see the all changes

10:30.960 --> 10:37.920
in the window, so you can just compose a log save API, that doesn't have any random reads,

10:38.560 --> 10:45.120
so we have this feature, we created this feature for a long time ago, but the recent AI use case,

10:45.120 --> 10:52.160
so some applications like a huge data through my scale, my logs, and then we started hitting

10:52.160 --> 11:01.360
a replica rack, so we enabled the replication for that, then the replica catch up speed went up to

11:01.360 --> 11:09.120
6x, so the replica rack issue got resolved, so some of the new use case like a huge light rate,

11:09.120 --> 11:16.240
which is a relatively uncommon in my scale work, but the new trends change some of the

11:17.200 --> 11:23.280
characteristics, and this feature paid off there, so we are using that, so one challenge is

11:25.040 --> 11:28.880
if you using a traditional my scale replication, which is taking really different

11:28.880 --> 11:34.720
replicas, then it has a better consistency, like if the record doesn't exist,

11:34.720 --> 11:40.400
or if the record is a duplicate, then that returns errors, like a duplicate key error, so key

11:40.560 --> 11:51.120
not found errors, so this is a return, that there is a pretty good consistency guarantees,

11:51.120 --> 11:57.040
but there is really a critical, that guarantee is lost, so only it is a brandry changing,

11:58.320 --> 12:04.640
there is a replica, so there are some trade-offs, but especially for light intensity use case,

12:04.720 --> 12:12.960
so this is pretty useful, so check some, so it's a pretty basic requirement, but after

12:12.960 --> 12:19.200
using, we are adopting more use cases, so a massive scale, so sometimes a bit free,

12:19.200 --> 12:26.080
half on the inside my rocks, or in some of the rocks, say beside, I'm failing to decode a record,

12:26.080 --> 12:32.640
or some rocks really broke corruption happenings, and that behavior, more strict behavior is

12:32.640 --> 12:39.280
maybe helpful, like if the data is corrupted, and if the corrupted, but instance keeps running,

12:39.280 --> 12:45.440
and if the copy is at the instance to elsewhere, then the data can get spread to many

12:46.720 --> 12:52.000
replicas, so that is problematic, so we have more controllable features, like the

12:52.000 --> 12:57.760
range of corruption, corrupted record found, then reclining errors to the client, or just

12:57.760 --> 13:04.400
identity-opelating, or just avoiding the machines, or when, look, say, we check some failure,

13:06.240 --> 13:11.680
when chronic instance, there's a bit, verify the checks, so that's a corrupted data,

13:11.680 --> 13:17.280
she will not be replicated elsewhere, so these types of control features of the settings too,

13:17.360 --> 13:28.240
so this is a bit high-up level layer, but this is inspired from a hand-drosser kit,

13:28.240 --> 13:34.240
you know, they will be managed to plug in from a marker, but we also implemented

13:34.240 --> 13:43.760
some noise-care interface, so at Facebook, with Meta, and RPC, the fact standard RPC is

13:43.760 --> 13:51.360
called a Swift, so we implemented a Swift sub-up, the ins and ins case sub-up, and that receives

13:51.360 --> 14:00.720
RPC request, then talking to my ROXCV by just directory-composing ROXCV vehicles, bypassing the skills

14:00.720 --> 14:07.440
passers, optimize us, just some Swift API calls, then get our range scans to Swift sub-up, then

14:07.760 --> 14:14.320
composing ROXCV, API calls, and some basic my skill checks, like a table exists or not,

14:14.320 --> 14:21.520
what schema change is ongoing, so then directly talking to the ROXCV, that's saved quite a good

14:21.520 --> 14:32.400
substitute, and our main use database code UDB is actually we don't allow the customers to send

14:32.400 --> 14:38.560
that under my skill, because that's a main database, so a lot of people shouldn't take down

14:38.560 --> 14:45.200
the hold of Facebook services, so that's a reasonable architecture, but we have a big cash

14:45.200 --> 14:52.880
service call tower, and the tower cash sub-up is talking to UDB by very common simple size skills,

14:52.880 --> 14:57.840
like primary key rookups, a market point primary key rookups, or range scans, with just

14:57.840 --> 15:07.920
my index, so like getting some list of the people who liked this post or not, on this type of

15:07.920 --> 15:15.280
operation, so these are simple range scans, so about 99% of the main database lead workloads,

15:15.280 --> 15:20.960
very simple primary key rookups for range scans, so that's why we support it, it's implemented

15:20.960 --> 15:30.640
with some simple query paths, so bypassing the paths and optimizers, but as very similar to

15:30.640 --> 15:37.040
100% as a main cash progress, so it doesn't support right or compress queries, so it's a

15:37.040 --> 15:44.400
reasonable thread of, so if majority of the workloads have a heavy lead, then that's a good use case,

15:44.960 --> 15:52.800
this is not implemented yet, but we are realizing that this is one of the biggest

15:53.760 --> 16:00.960
architectural challenge in the recent database and looks maybe, and also the challenge for supporting

16:00.960 --> 16:10.480
government main general purpose services for a minor level to my rooks, so optimizers in

16:10.480 --> 16:18.400
Q-based workloads, so we have quite a few customers who are using my skill as a basic

16:18.400 --> 16:25.440
EQ, so inserting a lot of records, then deleting them, and then checking if the rows are deleted

16:25.440 --> 16:31.920
or not by doing the range scans, so like insert, so in the looks there is the insert is equal to

16:31.920 --> 16:38.960
put, then people doing a lot of insights by put, then it's a Q, so people delete a bunch of

16:38.960 --> 16:46.480
all records by delete, so this is a very challenging for ASMR, so the challenge is,

16:46.480 --> 16:52.480
unlike the inner database, update in place database, so ASMR database doesn't immediately

16:52.880 --> 17:00.240
trim the delete chance, it's called tombstones in Luxeville, but this delete markers and don't

17:00.240 --> 17:09.280
immediately disappear, so they are accumulated until the flash of compactions, so at least the first

17:10.240 --> 17:16.240
10 minutes or 20 minutes, so these are all inside the main table, and these directions are remaining,

17:16.240 --> 17:22.720
and if someone checks if the Q is deleted or not, then people are doing range scans,

17:22.720 --> 17:30.400
or deleted the regions, so these are pretty expensive because that ends up scanning the lots of

17:30.400 --> 17:37.600
tombstones, so this is a common issue and we often have to rewrite, but fundamentally it's

17:37.600 --> 17:47.680
we are realizing that we should optimize more, and so we have several architectural work runs,

17:47.680 --> 17:52.800
like a traditional three-guard complexions, like if there are users, lots of traditions,

17:52.800 --> 17:59.680
inside FST-5, then we're triggering additional complexions, so aggressively wiping these two

17:59.760 --> 18:08.080
mistons, but nowadays, recently there's a big issue inside the main table, so inside the main

18:08.080 --> 18:14.560
table, so we cannot figure out the complexions because there's no SST-5, so there are lots of

18:14.560 --> 18:20.400
traditions inside main tables and these are still causing problems, and we have several ideas to

18:20.400 --> 18:27.200
work around, but this is from the area so we are prioritizing, again so this is the Luxeville challenge,

18:27.200 --> 18:34.400
so it was fixing Luxeville, as a service is using Luxeville, like the IDV, we'll see the benefits

18:34.400 --> 18:44.320
elsewhere. We have two other topics, the ones are user-defined timestamp, so there's several

18:44.320 --> 18:53.360
people, the building database on top of Luxeville, and there are these type of people at some time

18:53.360 --> 19:03.440
stamp on top of Luxeville, and the building consists of these for that. We also want some basic

19:03.440 --> 19:10.960
times more consistent leads features based on the timestamp, and we recently supported inside

19:10.960 --> 19:18.160
Luxeville, so that providing the appendings at timestamp, so user-defined timestamp on top of

19:18.160 --> 19:26.880
Luxeville, so that people can lead based on timestamp by using Luxeville PIs, so this is

19:26.880 --> 19:33.360
optional features, so we can continue to use the traditional Luxeville or user-defined timestamp,

19:33.360 --> 19:39.040
and with user-defined timestamp, this is more like a new SQL world, like a tiny V,

19:39.040 --> 19:46.320
cockroach, you go by type, but this is a pretty big building block for supporting the

19:46.320 --> 19:51.920
cross-shadowed consistent leads, like with user-defined timestamp, every record will have the time

19:51.920 --> 19:58.800
stamp, so if you start transactions based on this time, then you can lead the consistent data based

19:58.800 --> 20:06.320
on the time, and since the time with HLC, the time is consistently stored in the different

20:06.320 --> 20:11.840
charts, so by starting the transactions, it's the same time, so you can see the consistent

20:11.840 --> 20:20.160
leads across different charts, multiple charts, so like a global consistent lead, so the user-defined

20:20.160 --> 20:25.600
timestamp is a building block for that, and the reason to be supporting this feature in

20:25.600 --> 20:32.000
Luxeville, and our intention is using the same table only, so that doesn't change the Luxeville

20:32.000 --> 20:39.280
data format, so timestamp is at some space, and as I said, the UDP is a very space band,

20:39.280 --> 20:45.040
so we don't want to increase space, so the intention is just using the same table timestamp,

20:45.040 --> 20:52.560
so that at least this and data can be consistent. The last big is a vector database, so

20:53.840 --> 21:02.000
we are building some vector database capabilities by using a library called phase,

21:02.000 --> 21:10.480
so phase is a simulated search library, the leads from a phase to quiz search, which is also

21:10.480 --> 21:20.560
open source, so there are some several asks like a similar, like many other people talk today,

21:20.560 --> 21:27.120
so at Meta, there are some requirements for building a vector database, and myerox is a

21:27.120 --> 21:34.880
risky database, and people want to use vector vz-scale, so the direction is basically integrating

21:34.880 --> 21:42.960
with phase, and using that feature, using myerox, and offering that as a risky syntax,

21:43.840 --> 21:52.240
so country with a JSON probe format, so we don't have a specific type yet, but JSON probe

21:52.880 --> 21:58.880
so far we're using that, but JSON kind of uses a lot of space, even of the compression,

21:59.520 --> 22:05.520
especially when having a lot of floating pointers, so we use a probe, that's especially

22:05.520 --> 22:11.920
kind of about two weeks, so we are using probe, but anyway, so we support both, and the vector

22:11.920 --> 22:21.760
functions are provided by from phase, so it too, you create a little distance, and IP, the

22:21.920 --> 22:27.840
networks, so these are some vector functions, I'm not very familiar with that, but it's a part

22:27.840 --> 22:36.160
from phase, so that, and some JSON probe is a combustion function that provided, and yeah, the

22:36.160 --> 22:42.720
simulated search is by using these functions, and combining this SQL, so you can do a similar

22:42.720 --> 22:51.040
discharge, so it's a from phase, combining with a SQL, that's quite powerful, because you can

22:51.040 --> 22:58.160
order by these scores, distance scores, or the filtering by using having, so these are pretty

22:58.160 --> 23:04.080
the useful, the vector database people do lots of experiments, and the vz-scale, and the

23:04.080 --> 23:08.800
experiment is very easy, so you don't have to write code, so you just using the skills you can

23:09.200 --> 23:18.960
get some results, so it's pretty useful, and so on the index, indexes, yeah, I also got confused

23:18.960 --> 23:26.560
like my hair, so the vector database terminology, index is more like returning results faster,

23:26.560 --> 23:32.000
so even though the results are bit inaccurate, so that's very different from the relational database,

23:32.080 --> 23:39.440
so index, but anyways, so we are using the phase provided, the indexes, there are much

23:39.440 --> 23:49.440
preindexing supports, the flat h, h, h, the handler can navigate h and s w, and the ib, ib,

23:49.440 --> 23:59.040
the ib, f means the embedded file, so today it's we provide the flat ib, flat ib, pq, the

23:59.200 --> 24:06.160
product quantization, so these different indexes, we are planning to support more, but basically

24:06.160 --> 24:13.200
it's these are all far from phase, just integrate with the monoxide, so one particular thing

24:13.200 --> 24:22.800
I made was calling out it's, so trace the vector talk or all over the h, h and s w, but we

24:23.120 --> 24:33.120
met as we are using the ib, the major reasons h and s w is very sensitive, everything is

24:33.120 --> 24:39.360
speaking memory, and met as we have a bunch of hardware, but the many hardware has a very small memory,

24:39.360 --> 24:46.160
and the last set is very large, so we last wanted to work with a small memory or the

24:46.160 --> 24:53.200
large database, not data set, so then the embedded file types, the ib, f, the fit, the vector for us,

24:54.320 --> 25:00.000
but anyway, so generally it's a vector format, and I pretty sure the pj vector supports also

25:00.000 --> 25:10.960
both h and s w and the ib, f, so with ib, it's a kind of embedded, embedded file, but you need

25:10.960 --> 25:19.760
some training data, so in fact if it's a partitioning, the data across the modules, then for

25:19.760 --> 25:26.000
better partitioning, so we need training, so it needs some training data sets, and so based on the

25:26.000 --> 25:31.360
training data sets, and when you're studying data, and when building indexes, the data is

25:32.720 --> 25:40.000
partitioned property, so there are several features, but most of it's far from phase, but yeah,

25:40.000 --> 25:47.520
it's a complex query supported with sqr, so with sqr, so you can filter them, or with having

25:47.520 --> 25:54.880
quotes of workloads, and also condition pushdowns, and then we can offer good support, and yeah,

25:54.880 --> 26:00.800
so the layout on this is like a flat, it's more like flat format, but ib, it means a partitioned,

26:00.800 --> 26:06.880
and with this partitioned based on the detailed by training, and yeah, the execution of the

26:06.960 --> 26:12.320
single query is optimized, and then we need filters, or index scans, these are traditional

26:12.320 --> 26:18.080
flows, and it's very little that talking to phase for the similarity such, and talking to

26:18.080 --> 26:25.200
the vectors, and yeah, it's basically mixing the sqr, behavior, and the vector behavior from phase,

26:26.000 --> 26:33.840
and yeah, we have several use cases for that. This is the last slide, so the future developments,

26:33.840 --> 26:40.800
so the most of the work from a rock cv is from rock cv, and rock cv is a very

26:40.800 --> 26:47.600
actively developed open source product, and all that development is based on GitHub, so yeah,

26:47.600 --> 26:52.720
if you are not familiar with rock cv, and if you're using that, or already using the data

26:52.720 --> 26:59.040
based type, and it's recommended to check in the rock cv features development. Also,

26:59.360 --> 27:05.760
with my rocks, we are also pushing more features to rock cv, so that more people can get benefits,

27:06.560 --> 27:11.840
and my skin is self-cold based, it's now managed, and internal depository, and meta,

27:11.840 --> 27:19.280
so all development is done inside including my rocks, but we also regularly push the changes

27:19.280 --> 27:25.200
to GitHub, the Facebook GitHub, the meta, the Facebook five six, it's five six, but name that

27:25.200 --> 27:30.880
five six, but it's based on the eight, so we're really going to change the update, it's absolutely,

27:32.880 --> 27:34.400
that's it, thank you very much.

27:41.120 --> 27:43.760
The question for three minutes for questions, yes?

27:44.560 --> 27:50.000
If you do have to do the leads and do selects after, which is expensive,

27:50.000 --> 27:54.080
is there anything you can do from your application to just kind of help from Cv,

27:54.080 --> 28:01.120
like actually, and all that already? So the, yeah, so the expensive part is a select,

28:01.120 --> 28:08.160
so the range of scans, so there are some customers that think repeatedly doing range of scans,

28:08.160 --> 28:13.120
like 100 times per second or so, then we are the vice-cent of the not doing that, so just

28:14.080 --> 28:19.360
catching the results, the sending results, less often, or the range of directions,

28:19.360 --> 28:27.440
the range of directions, without that page, like ID, larger than 0, limit 5,000, 50,000, then

28:27.440 --> 28:35.600
the lifting that, and just these, or every time, so instead just using properties,

28:35.600 --> 28:41.280
the filtering, for example, remember that last nearest record and the passing factors are new

28:41.360 --> 28:47.360
records, so this type of, the beta filtering, or catching, so these are generally helpful.

28:50.080 --> 28:58.960
Yes, yes, yes, potentially, yeah, yes, I imagine you like work work and

28:58.960 --> 29:05.360
urbanization, but also, we have a problem with write-up negation and compassion,

29:05.360 --> 29:13.600
because you have the least material, like 0, and actual record files, 0, or 0, or 0, or 1, or 2,

29:13.600 --> 29:19.040
then it's compact, use the role of write-play, or the least compact, the next level, so you get

29:19.040 --> 29:24.160
a lot of write-up negation in this case. Yes, the question was a write-up application, which is,

29:24.160 --> 29:30.960
so generally, on the set-in-state workloads, the rook-stabies write-up applications much more

29:31.200 --> 29:37.200
up than typical beta, it's typical, let's not have, or even this, but yes, you're like,

29:37.200 --> 29:43.600
when doing aggressive, deletion-triggered compactions, then it compacts much more often,

29:43.600 --> 29:51.200
so then write-rate becomes a problem, or CPU for compressions. Yes, we had that issues on some of

29:51.200 --> 29:58.000
that, very likely, in terms of subsidies, so yeah, the challenges we need to find balance,

29:58.080 --> 30:05.520
like, write-rate was mostly fine, but CPU was concerning, so doing some active compressions,

30:05.520 --> 30:13.040
so for that, use-case, we changed compression algorithm from Zstandard to LZ4, which is

30:13.040 --> 30:21.920
faster compressions, like, check-ins write-rate under CPU, and then taking some balance,

30:22.560 --> 30:28.560
relaxing some of the deletion-triggered compactions threshold, so this is quite a, some

30:29.440 --> 30:37.280
tuning was needed, so we hope that we can be more automated, so that's a future study set-up.

30:37.280 --> 30:54.560
Yes, yes, yes, the question's face, the face is open source, yes, and I believe it's supposed to be

30:54.560 --> 31:05.200
on GPU, so you can choose, yes, yes, thank you very much.

