WEBVTT

00:00.000 --> 00:10.560
Okay, great. Thank you. So, welcome everyone. We are pastime soils are quickly. So, the

00:10.560 --> 00:16.400
clarity of networking in the clarity world, 20, 25 edition, just quick vibe check. So, who

00:16.400 --> 00:23.240
already saw this talk a year ago? Here at first them? Yeah. Okay. So, we start from scratch,

00:23.240 --> 00:27.360
which is okay. Better for me because a lot of stuff will be the same. So, you will be bored,

00:27.360 --> 00:33.520
but that's okay. So, I'm at Rikovazka, I work at Red Hat. It does like last year, two

00:33.520 --> 00:38.560
years ago, three years ago, four years ago, time goes by. Yeah. So, I'm basically Switzerland.

00:38.560 --> 00:42.960
I've been doing a lot of stuff before Red Hat. So, I've been in academia. I've been doing

00:42.960 --> 00:47.040
some banking. That wasn't good stuff. I don't recommend that. I was doing telecode. That was

00:47.040 --> 00:51.360
better. It gave me a lot of insights. So, I kind of see more other perspectives what I do. Now,

00:51.360 --> 00:55.680
so I can put it somehow into, you know, into the framework. So, I can already answer why I'm

00:56.000 --> 01:01.440
doing something. Since the beginning of time, I've been doing cloud and metal. Then I switched

01:01.440 --> 01:05.920
to network security for a moment. I'm not touching cardiovascular intelligence. I'm trying

01:05.920 --> 01:11.040
not to do this as long as my management doesn't force me to. So, thankfully, it's not happening.

01:11.040 --> 01:17.440
Yeah. Let's, let's give it like this. So, three to the middle of the talk. So, we are in containers.

01:17.440 --> 01:24.080
So, it may be not so super clear why we are even talking about this. But, basically, the starting point

01:24.160 --> 01:29.920
into this whole journey here is that we have systems with multiple network interfaces. And, of

01:29.920 --> 01:35.440
course, half of the audience will now question, why do we even care about this? I have Docker container.

01:35.440 --> 01:39.920
If it's one network interface and I'm happy I don't need to know anything else. Well, that's

01:39.920 --> 01:44.080
that's true as long as you are running some, you know, web app with database and front end,

01:44.080 --> 01:49.040
like, you know, on some classic containers, one-on-one class. But, but then we start

01:49.040 --> 01:53.680
setting people who are running network equipment as containers. This happens in, you know,

01:53.680 --> 01:59.120
telco 5G. It's not anymore. Well, it is physical hardware, but inside this hardware,

01:59.120 --> 02:04.000
you don't run processes directly. It's, it's a containerized, some, some 5G telcos,

02:04.000 --> 02:08.960
they are even doing Kubernetes, good for them. So, that's super cool. So, router routers,

02:08.960 --> 02:13.280
which is basically SDN. So, all the software defined networking called the fans,

02:13.440 --> 02:19.920
routing devices. It's, it's all containerized nowadays. Then telco, then we have high performance

02:19.920 --> 02:24.720
computing. So, those are people who are running containers on Burmett Hall and they need a lot

02:24.720 --> 02:30.560
of performance and it doesn't mean let's get a server, let's buy 10 and VDR GPUs. It means,

02:30.560 --> 02:35.280
let's have a lot of network, let's have super fast network. And for those people, this is also a

02:35.280 --> 02:39.840
use case. Because, you know, for data transfer, they have something else, for management, they have

02:40.720 --> 02:43.680
something else. And it all boils down to the fact that you have a server somewhere

02:43.680 --> 02:49.680
down during the basement. And this server has dozens of network interface. And it can go from,

02:49.680 --> 02:54.880
you know, from two network interfaces to hundreds. We've seen that all, and, you know, each order

02:54.880 --> 02:59.360
of magnitude brings its own problems, but it still boils to the fact that you don't have one

02:59.360 --> 03:05.760
interface, but you have a lot of them. So, let's, let's go back 50 years ago, let's say.

03:05.760 --> 03:11.520
So, network managers, standard binux, this admin, so you basically SSH to your machine,

03:11.520 --> 03:15.920
and you do everything yourself. So, forget, you know, Ansible, Puppet, this kind of stuff

03:15.920 --> 03:21.920
doesn't exist yet. All you had was network managers, static config files. And, you know,

03:21.920 --> 03:26.560
let's not get fooled. It's still there. It may be hidden by some layers of abstractions,

03:26.560 --> 03:31.120
but this stuff is still there. It all boils down to, you know, your systems, I think,

03:31.200 --> 03:36.880
bunch of files managed by network manager, and it's not going away. It will be there for, you know,

03:36.880 --> 03:44.080
for the next 50 years. But, okay, it's, it's not about renting about that, because at the end

03:44.080 --> 03:49.040
of the day, well, Linux, everything is a file. So, it's good. But, there is a problem, from the

03:49.040 --> 03:56.160
mental problem with this configuration like this, because it's super nice as long as your configuration

03:56.160 --> 04:01.440
is stable. But, what happens when you start changing this config? It's a file, right? So, you

04:01.440 --> 04:07.280
would SSH to be server, you would open this file, you would modify some values, and then,

04:08.560 --> 04:13.520
this is a potential for problem big is. You can quit the editor without saving the file,

04:13.520 --> 04:21.760
so far so good. You may save the configuration, and that's it. Now, is it applied or not? Well,

04:21.760 --> 04:27.200
it's not immediately applied, because now, whoever consumes this config file needs to be told,

04:27.200 --> 04:33.280
hey, I updated the config file. Can you please, can you please read the file and do what's necessary?

04:33.920 --> 04:39.040
And then we go even further, because now, okay, so we tell this something, which is network

04:39.040 --> 04:44.080
manager of your, hey, updated the file, please take changes into account. So, what now happens,

04:44.080 --> 04:49.680
if you basically, well, broke this configuration, oh, let's say something like that right here.

04:49.680 --> 04:54.000
It's, it's recorded, so, so what, what if you put invalid configuration in this file,

04:54.000 --> 04:57.760
you go and you tell network manager, hey, please restart and apply this configuration.

04:57.760 --> 05:03.040
Well, this configuration that it reads now is incorrect. So, if you are lucky,

05:03.040 --> 05:09.280
you just lost access to your server, and if you're doing this remotely, well, bye-bye. So,

05:10.400 --> 05:15.680
yeah, and, you know, a lot of problems, I've seen a lot of problems that people update this file,

05:15.680 --> 05:21.200
and they just forget, and these file sits updated, well, modified, and one year,

05:21.200 --> 05:26.640
afterwards, they reboot the server and server is suddenly not going up. Well, good luck debugging,

05:26.640 --> 05:32.480
what happened, and then the one. So, we can, we can improve, and on top of network manager,

05:32.480 --> 05:39.760
we got this project, and then state. So, it gives us ability to configure network manager configuration

05:39.840 --> 05:46.320
at runtime. It doesn't sound like something, you know, huge, but it's basically, it changes the

05:46.320 --> 05:52.000
paradigm. So, you don't modify a file with some configuration that is not even validated,

05:52.000 --> 05:56.640
and good luck if you have been plugging the validate syntax of network manager config file,

05:56.640 --> 06:01.440
well, good for you, but, but people don't do this. We have a CLI, which will be basically,

06:01.440 --> 06:08.720
you know, network manager, modify connection, and, you know, it's, it's better. It's not ideal.

06:08.720 --> 06:15.040
It's not declarative. It's not keyword netty's way, but at least it won't allow you to break

06:15.040 --> 06:19.440
the configuration immediately. So, I have this screenshot, well, it's bad liking, but,

06:19.440 --> 06:25.600
but, basically, what I was trying to do, I am changing IP address, and everything looks good.

06:25.600 --> 06:30.720
I mean, it's an IP address. It's not like I'm putting 9, 9, 9, 9, 9, as one of the objects,

06:30.720 --> 06:35.600
but instead of slashing doing bugs, or the other way around, basically, I'm doing the wrong one.

06:35.600 --> 06:41.120
So, if I did it in this, in the static config file, it just breaks, and thank you very much,

06:41.120 --> 06:48.160
good luck. Here, I get immediate, you know, feedback, well, sorry, this is not a correct IP address.

06:48.160 --> 06:52.960
So, already better. It doesn't give us everything that you would like from keyword netty's world,

06:52.960 --> 07:01.040
but, you know, step by step. So, we don't need to do this from, you know, from bash,

07:01.040 --> 07:07.120
putting everything, like this, we can take YAML. I don't judge, you know, YAML was a choice,

07:07.120 --> 07:12.080
there are maybe better, there are maybe worse, but, you know, at least, at least we have something.

07:12.080 --> 07:17.280
So, now, we go to the state that you craft a YAML describing your network configuration. So,

07:17.280 --> 07:22.640
what I want to have as DNS setup, what I want to have as routing, what I want to have as interfaces,

07:22.640 --> 07:27.840
and, you know, it goes further and further, and you have this file. So, you then basically apply

07:27.920 --> 07:33.600
this YAML as a network configuration for your host. So, now, you don't need to remember and go through

07:33.600 --> 07:38.560
your bash history to see who modified what, but you basically open the YAML and you see what

07:38.560 --> 07:44.000
your configuration. Well, with a small disclaimer, that when I apply this YAML and the U.S.

07:44.000 --> 07:49.840
Sets and the YAML and you start doing, you know, IP address, delete, add, you will mess up,

07:49.840 --> 07:55.280
and this will not hold, but, you know, step by step. We already have YAML, we already had something

07:55.280 --> 08:00.880
that we already applied this YAML. So, what are we missing now in this, in this scheme, you know,

08:00.880 --> 08:07.680
in this state of the art? Well, let's do what Kubernetes does. So, let's have continuous reconciliation.

08:07.680 --> 08:12.480
So, let's have operator, which will have inside the controller, which will be basically applying

08:12.480 --> 08:19.760
this configuration, this configuration all the time. So, you see the YAML looks almost the same,

08:19.760 --> 08:24.880
it just wrapped around the CRD in Kubernetes. So, we created the CRD node network configuration policy,

08:25.280 --> 08:32.800
crazy name, but you know, there are reasons. And in this CRD, well, in this CR, then, we define

08:32.800 --> 08:37.360
the same configuration that you just saw, which basically means that now, from now on, Kubernetes

08:37.360 --> 08:45.120
will be managing network configuration of this node. Which, okay, maybe not a big deal, but, you know,

08:45.120 --> 08:53.520
it's now nice because I think it's, yeah, it's them. So, let's do. So, let me show you.

08:53.600 --> 08:58.320
I have a bunch of policies set in DNS server, some additional IP others, and so on.

08:58.320 --> 09:05.040
So, basically what I can do right now, so we will work with this one. What I'm going to do

09:05.040 --> 09:14.560
with, with this CR? Oh, sorry. Yeah, the excuse, they always go my understanding right with

09:14.560 --> 09:22.400
them all. So, I will basically take one of the network interfaces on my server, I will add IP address,

09:22.400 --> 09:30.800
which we'll see, hit 10, 2, 4, 1, 3, 3, 3, and whatever. And I will apply this, you have now.

09:30.800 --> 09:34.720
Now, we have the node selector, so this configuration will apply only to this particular node.

09:35.280 --> 09:39.200
We can go deeper afterwards, but, you know, we need to have node selector, because we don't always

09:39.200 --> 09:44.240
apply configuration, which is specific to the node, sometimes it's super global. So, but with IP

09:44.240 --> 09:52.080
others, I don't want to apply this to my whole cluster, that would be stupid. So, just to prove

09:52.240 --> 09:57.280
that I don't have this IP address on this server. So, you can see that this interface is

09:57.280 --> 10:02.160
something similar, but it's 0, 3, not 1, 3. So, I'm going to apply now the next one. So,

10:04.560 --> 10:13.760
let's apply this configuration, see, get an mcp, okay, in that place. Oh, okay, configured.

10:13.920 --> 10:23.520
Now, let's see again what I have. And I have this, 1, 3, nice, okay. So, let's be now this,

10:25.040 --> 10:29.920
well, rogue admin, new junior in that team, you know, I made, but you know, basically someone goes

10:29.920 --> 10:34.240
and deletes this this configuration. So, now, go ahead and remember the syntax.

10:34.800 --> 10:50.160
And delete 24. And now, I think it's like this. Okay, it disappeared. Now, a bit of hacking

10:50.160 --> 10:56.880
from my side. Do some due to some performance tuning and so on, I have right now timer, so it

10:56.880 --> 11:02.880
doesn't check it immediately continuously, but I think there is like 300 seconds. I don't have

11:02.960 --> 11:08.640
300 seconds now to just, you know, stand and entertain you. So, what I will do, I will just restart

11:08.640 --> 11:20.640
the pod, so that the timer starts from 0. And if we'll go and apply everything, get, get pod.

11:20.640 --> 11:41.200
Now, pod running on Master 0 is this one. Now, restart or do I need to kill it, I think,

11:41.280 --> 11:51.760
to delete the pod. It's always the danger with live demo. Now, 50% chances it won't start again.

11:53.120 --> 11:58.400
Ah, no, it's started actually, okay. So, let's see, oh, see, good, and then Cp.

11:59.280 --> 12:05.280
Okay, healthy stage and did something happen on the node. Okay, I have this IP address back. Yeah,

12:05.360 --> 12:10.800
so we did fast forward, yeah, but, you know, those are, those are rules of the,

12:12.000 --> 12:16.880
of the live demos that you have limited time. We have six minutes, so I will not show you more

12:16.880 --> 12:25.840
demos. I could show you know, ten different configurations, but, but I cannot, well, so basically,

12:25.840 --> 12:30.640
about the NM state itself, some, you know, some PR. It's written in RAS because RAS is the cool

12:30.640 --> 12:36.080
kid in the block nowadays, so I not network manager as a backend. Well, you don't have a choice,

12:36.080 --> 12:42.240
right? This is what we have. Kubernetes operator, so this is something that we have live in action,

12:42.240 --> 12:47.600
people use it. I'm from Red Hat, so of course, you know, we, we sell stuff, so there are people paying

12:47.600 --> 12:54.720
for this, but it's open source, it works on Kubernetes, not only on, you know, the Red Hat flavor of

12:54.720 --> 13:00.000
Kubernetes, so it's upstream, you can take vanilla Kubernetes cluster, take the operator from

13:00.000 --> 13:05.680
GitHub and it works. There is no any hidden tricks, there is no any small footprint, it just works.

13:05.680 --> 13:10.080
The NM state itself, you can use it from RAS, go and buy from there are bindings and a lot of

13:10.080 --> 13:16.560
stuff, so it's, it's really super friendly and, and easy to use. What we did comparing to last year,

13:16.560 --> 13:22.400
so we introduced a lot of usage metrics, and I won't go too much into detail because we have

13:22.400 --> 13:27.120
five minutes and there may be some questions, but basically what we want to have is when you have a

13:27.120 --> 13:34.080
huge fleet of nodes in your cluster and use the operator, and presumably every one of the nodes

13:34.080 --> 13:40.000
has some network configuration. We want to see some statistics, I, for example, how many static DNS

13:40.000 --> 13:46.160
servers have you configured? How many nodes with static route you have? How many static routes you have?

13:46.160 --> 13:52.800
How many nodes do you have with static IP addresses versus, you know, DHCP or, you know,

13:52.800 --> 13:58.960
Slack if you are doing IPv6? All these kind of stuff, so we can draw some numbers about the topology,

13:58.960 --> 14:04.800
and we see, you know, what's what, and, you know, people like statistics, people like graphs,

14:04.800 --> 14:09.760
so this is what we do. Performance improvements, this is something that I cannot easily demo,

14:09.760 --> 14:15.840
but we basically started running the operator on cluster with hundreds of nodes and then

14:15.840 --> 14:21.760
hundreds of network configurations, then every of the nodes going into,

14:22.880 --> 14:29.120
after do the, after two digits of number of interfaces, and we started discovering, okay,

14:29.680 --> 14:35.840
it's still here, it's still there, and you know, you need to tune that. What are we going to do in 2025?

14:35.840 --> 14:41.520
So, reverting the configuration when you did it an NCP, I didn't explain what happens here when

14:41.520 --> 14:48.320
we start deleting the stuff, so I will not go into details, but basically there is people complain

14:48.320 --> 14:52.720
that it's counterintuitive because they would expect when you delete an NCP with configuration,

14:52.720 --> 14:56.960
the configuration will get deleted. It's not as simple because sometimes you have changed

14:56.960 --> 15:02.160
configuration and deleting one will break the other, so we need to figure out what to do and how to do

15:02.160 --> 15:08.240
it. Well, we will, we have a plan for that. Of course, there is always something that you want to

15:08.240 --> 15:13.040
find, you need to configure yourself, so people are asking us, hey, why did you hard code this

15:13.040 --> 15:18.480
value here? I would like to change it. Okay, we will give you ability to change it. Some bigger changes,

15:18.480 --> 15:23.520
so we have this CRD here and it's very opinionated because, you know, you need to start from something.

15:23.520 --> 15:27.680
Now we are working a lot of with telco people and they are telling us, oh, but this CRD, it's not

15:27.680 --> 15:32.560
actually what we would like to have, you would like to have something speaking more telco language,

15:32.640 --> 15:41.440
so we are working with them to get this CRD, you know, more in telco English, not in our English,

15:42.720 --> 15:48.720
much more performance stuff, so we are still hitting a lot of API, you know, a lot of calls, a lot of

15:48.720 --> 15:55.440
calls, a lot of calls and it's expensive on the API, we will fix that. As for today, we are publishing

15:55.440 --> 16:01.600
the upstream operator in the oil, as the oil, you know, that's our trust, you can always build it

16:01.680 --> 16:06.160
from the call, it's super simple, it's goal, so we will basically, you know, compile it, it's easy,

16:06.160 --> 16:10.160
but people are telling us that it would be nice to have a home chart, so we are working on this,

16:10.160 --> 16:14.880
we already have some upstream contributor implementing this, so it's, it's to be merged, you know,

16:14.880 --> 16:20.320
in the next week or two, and something much more on the political side, not three technical, so we are

16:20.320 --> 16:25.760
about to join the Kubernetes network planning working group, and this is part of a bigger plan

16:26.400 --> 16:31.200
to have a real app through and with outside contributors and so on, so it's not only, you know,

16:31.200 --> 16:37.040
red cut project for red cut customers, and this is, you know, an nice, nice kick start,

16:37.040 --> 16:42.560
it's not like we don't have upstream today, but being under the umbrella of, you know, Kubernetes

16:42.560 --> 16:48.080
see good, be something much, much, much, much bigger. So yeah, this is the end, we have less than one

16:48.080 --> 16:52.160
minute, so I guess I can take one question, and that would be this, thank you.

16:52.400 --> 16:59.760
All right, I do just one question here. Thank you for the interesting talk.

16:59.760 --> 17:04.960
I have one question, because for instance, I'm from the Turkey world, so I know what you're talking

17:04.960 --> 17:11.600
about. We have many use cases with VLAN, VX LANs, and so on, NMS state comes super handy,

17:11.600 --> 17:16.480
and I'm actually I'm using it, but you still need to configure some stuff in Kubernetes.

17:16.560 --> 17:23.280
You need to rely on plug-in such as a multi-CNI and so on, so any plans on joining forces

17:23.280 --> 17:30.560
with them or somehow. Yeah, so that's exactly the part of the CRD, so the operator will not

17:30.560 --> 17:35.840
be doing multi-CNI and this kind of stuff, so we will not merit it on the operator side,

17:35.840 --> 17:42.000
but I'm talking with people to have unified CRD, so that you wouldn't have, you know, CRD

17:42.000 --> 17:47.920
for multi-CRD for something else, but unified CRD for your telco side, and then we would have

17:47.920 --> 17:54.080
operators managing, you know, dispatching this. So we don't want to have one operator to who

17:54.080 --> 18:00.640
everything, because then we are basically doing next Kubernetes, but you will have interface as a

18:00.640 --> 18:09.040
user, which is CRD, and through this one CRD you will configure SRIOV, VX LAN, side VPN, and whatever you need,

18:09.120 --> 18:13.840
and then, okay, you will still need to install this operator, that operator, but you would manage

18:13.840 --> 18:19.680
them from one CRD altogether, so this is the plan. All right, thanks a lot.

