WEBVTT

00:00.000 --> 00:12.200
I typically ask this question in front of like a bunch of natural engineers, and I ask

00:12.200 --> 00:19.360
like how many of you know no Kubernetes by show of hands, I imagine that this in this room

00:19.360 --> 00:24.480
this would be a lot more, so how many of you know Kubernetes and how many have you

00:24.480 --> 00:33.440
played the round with it? That's the answer I expected quite a lot more than in front

00:33.440 --> 00:40.680
of more traditional networking folks. So what's cool about Kubernetes right? So the

00:40.680 --> 00:46.800
API server is really about making sure that eight humans and machines, controllers, in

00:46.800 --> 00:53.680
this case, can happily coexist and work together. So the API server exposes a number

00:53.720 --> 00:58.800
of resources, be the part, be the service, be the deployment, and there's some

00:58.800 --> 01:07.080
storage behind it. So another part of it is the controller or the reconciler or the

01:07.080 --> 01:16.240
operator or other names that are available for it, that basically execute the controller.

01:16.240 --> 01:24.120
They get a certain input, they run a certain task, they actuate a certain system and

01:24.120 --> 01:33.600
provide status back into the API. So we thought this controller or the custom resource

01:33.600 --> 01:41.440
definition and that framework inside of Kubernetes is a pretty awesome thing to drive

01:41.440 --> 01:48.560
network automation. So this close loop, the most practical thing that you can compare

01:48.560 --> 01:58.840
with is your, to terms that you monitor a certain temperature, if it goes below or above

01:58.840 --> 02:06.120
a certain threshold, you actuate a certain system to drive something. Another thing that we

02:06.120 --> 02:11.800
liked about the KRM, the Kubernetes resource model is that you get a lot for free. So

02:11.800 --> 02:19.360
you get the enormous ecosystem that's out there for free. There's other controllers

02:19.360 --> 02:25.880
for its search manager to name one that can help you issue certificates. Certificates

02:25.880 --> 02:31.480
that you might need on a traditional networking box to expose API interfaces like

02:31.480 --> 02:43.840
Natcon for Genemi or whatnot. And the API model is really cool in the sense that if you know

02:43.840 --> 02:49.920
how to drive one resource, you know how to drive them all basically. So it's a pretty consistent

02:49.920 --> 02:57.440
thing. And as a developer inside of that framework, basically you don't need to reinvent

02:57.440 --> 03:05.800
that a real on a lot of things. So how do we leverage this framework for network automation?

03:05.800 --> 03:11.960
So one of the things that the guys from Kubernetes that really goods is to abstract a lot

03:11.960 --> 03:21.960
of things, right? So they segmented the problem space and the domain. They used reusable components,

03:21.960 --> 03:26.680
they made it declarative. The clarity is really important in the sense that I want my

03:26.680 --> 03:31.640
network to look like this. And this you fire off to a controller and the controller

03:31.640 --> 03:36.640
figures out what needs to be done, given a certain state and the network getting from

03:36.640 --> 03:47.640
A to B. And the closed loop system was something that really we liked. So comparing what Kubernetes

03:47.640 --> 03:53.600
is that in containers and orchestrating containers is I mean if you compare it to networking

03:53.600 --> 04:00.240
or the control plane side of the networking, it's pretty much the same. So we have a network

04:00.240 --> 04:07.920
which is essentially a bunch of devices, a bunch of interfaces which are all interconnected.

04:07.920 --> 04:16.120
And the differences are like really certain use cases, the protocols that we use in networking

04:16.120 --> 04:23.280
to interconnect a bunch of devices using one's P.A. or I.A.S. or B.G.P. or whatever. The APIs and

04:23.280 --> 04:30.200
that's required. So the Kubernetes controller is really to help network engineers build

04:30.200 --> 04:39.560
Kubernetes controllers themselves to mainly drive traditional networking gear or to allow

04:39.560 --> 04:46.040
to expose your traditional network in a cloud native friendly way. So the ideas that

04:46.040 --> 04:52.520
we built a number of abstractions on top of each other where you have a bunch of controllers

04:52.520 --> 05:00.600
that would take traditional networking config for a vendor gear. Another abstraction layer

05:00.600 --> 05:07.880
would be a normalized model where you could create your own abstraction for what a network

05:07.880 --> 05:15.000
should look like. Because either way or either vendor you use an interface is still an interface,

05:15.000 --> 05:22.920
a portal, a still a port, a physical port. Those the terminologies are quite a say. So

05:22.920 --> 05:29.720
essentially what you need to define a VPC is basically a high level abstraction. Then the

05:29.720 --> 05:35.160
network design that you want to apply. Do I want my network to run VGP and generally

05:35.160 --> 05:41.880
doesn't need to run OSPF? Does it not need to run I.S.I.S. Which encapsulation is required

05:41.880 --> 05:48.680
VX LAN or MPLS? Which addressing do I need to use? Do I want to use single stack? Do I want

05:48.680 --> 05:55.400
to use dual stack? Do I want to go V6 only? And then the resources on the other side of the

05:55.400 --> 06:01.400
pain is really about which nodes are in the topologies, which links are there, how are the

06:01.400 --> 06:09.080
interconnected, which IP addresses are in my stateful database like my iPAM system, which I maintain.

06:09.400 --> 06:18.920
I'm so on. So the mental model is really about creating these layers of abstraction on top

06:18.920 --> 06:27.960
of each other. And I'll go through this a bit faster to get into the first layer. So you have

06:27.960 --> 06:36.280
a network with a number of devices. On top of it you would have a provider, which in the queue

06:36.280 --> 06:44.760
that story is the schema driven configuration component. On top of that you would have a number

06:44.760 --> 06:54.520
of other controllers that would expose abstract networking config as you would want it. So

06:54.520 --> 07:01.480
the queue net initiative is really to help network engineers define their own abstraction models

07:01.480 --> 07:09.000
because person A might want to abstract a network in a certain way, but person B might want to do it

07:09.000 --> 07:18.600
in another way. So coming into as they see our schema driven configuration and state,

07:19.800 --> 07:27.320
this is really the component that talks to the network. So it bridges the gap between Kubernetes

07:27.400 --> 07:35.480
so the control of that integrate all of the CRDs and those kind of things and the actual

07:35.480 --> 07:43.240
analysis on the on the cell bound. So what we do in as you see is basically we ingest all of the

07:43.240 --> 07:52.040
young models. So young models is the networking slang for the data modeling language that we use

07:52.040 --> 08:00.920
inside of networking. So we could have gone with protobov or crept or whatever, but the time

08:00.920 --> 08:06.440
that the angle was invented none of those were available and in the networking industry. We kind

08:06.440 --> 08:14.360
of want to do things or we have the thing to reinvent the wheel sometimes a little bit too much,

08:14.360 --> 08:22.040
but okay. So as you see handles structured config format, different ones. So we do Yamo,

08:22.040 --> 08:29.240
we do Jason, we do XML, we do protobov, we have several interfaces, cell bound to the device,

08:29.800 --> 08:39.720
G&MI and NATCOMF and basically what this does and we have to learn it a hard way is this aggregates

08:39.720 --> 08:47.640
all of the intends that you define in your Kubernetes resource. It ingest that for a certain physical

08:47.640 --> 08:53.800
network device and validates the config. So we have our schema or data modeling language,

08:53.800 --> 09:01.800
we have actual config, we try to validate it offline, we merge all of those declarative snippets of

09:01.800 --> 09:08.520
config and then when we have a valid config we the disk controller actually pushes it down to the

09:08.600 --> 09:15.800
device and monitors that the config is going to be in sync with the actual device.

09:17.960 --> 09:26.200
So we have a number of KRM resources to do that. So we have a config CR. This is basically your

09:29.880 --> 09:35.880
sonic config or your S.O. Linux config or your S.O.S.config or your Junior config or

09:36.840 --> 09:45.400
that you ingest into a CR. We have a also a config set CR that is where for instance you would want to

09:46.280 --> 09:54.600
replicate a control plane echo that's similar to a lot of devices. So basically the

09:56.120 --> 10:05.320
analogy with a replica set a bit. We have CRs that are not humanly created but created by the

10:05.320 --> 10:12.280
controller which is the running config. So that's syncs the actual config that's on the running

10:12.280 --> 10:20.040
device and the unmanaged config. So unmanaged config is snippets of config that's on the box

10:20.040 --> 10:27.160
that's not defined by an intent and that really allows us to onboard physical devices

10:28.200 --> 10:34.600
making sure that users are gaining trust in the system that it's working and then they can

10:34.600 --> 10:42.120
slowly onboard snippets of config that they define on their devices inside of the system.

10:42.120 --> 10:48.760
So they can really, I mean we can really start onboarding, brownfield systems like existing networks

10:48.760 --> 10:55.720
on there to onboard it into our system. So this is running short for time I guess.

10:55.960 --> 11:07.400
The CRD. So we ingest schema, the schema CR is a reference to a GitHub repo where the

11:07.400 --> 11:14.680
young models are hosted, we ingest those into a schema server. We have a bunch of discovery rules

11:14.680 --> 11:22.040
where we can automatically discover network targets and network devices in the network and then a bunch

11:22.120 --> 11:29.480
of other ones. The young schema, as I mentioned, it's a bunch, it's a data modeling language

11:29.480 --> 11:35.480
that we use inside of networking or the complex site of networking at least where we define

11:35.480 --> 11:43.320
modules, containers list, leaves, types, those kind of things. So I won't bore you too much with

11:43.320 --> 11:51.160
a bit, but it looks a bit like this. And what we do from and as a see point of view is we

11:52.440 --> 12:00.920
do all of the validations of the data model language. So we validate leaf graphs, we validate ranges,

12:00.920 --> 12:07.320
lengths, mandatory statements, patterns, choice cases and those kind of things inside of the thing.

12:08.280 --> 12:16.600
The important takeaway is that we know or to a high degree of certainty that the conflict

12:16.600 --> 12:23.960
is going to be valid before we send it down towards the device. Obviously there's always some

12:23.960 --> 12:33.320
runtime things on the device that can go wrong, like for instance the fit might be at a certain

12:33.320 --> 12:39.880
threshold, not allowing us to provision certain adults and things can still go wrong on the

12:39.880 --> 12:47.960
device, but we try to cover as much as we can before it hits the device. So that was the device

12:47.960 --> 12:55.800
layer and I'll go quickly through it. So we have another controller on top of that. That's

12:56.760 --> 13:03.560
a choreo, that's also part of the CubeNAT initiative. And that's really to once you have that

13:03.560 --> 13:10.920
vendor config is how do you abstract that? How do you make your own model available to open API

13:11.720 --> 13:20.600
or whatever and expose that. So you can define, I won't a VPC with a minimum set of inputs and that will

13:20.680 --> 13:29.320
render basically the device config, vendor config that can be pushed down with as you see.

13:29.960 --> 13:36.760
What I forgot to mention it as you see is that currently we do support everything or

13:38.040 --> 13:42.920
anything that has a young model. So we support a Resta, we support Cisco, we support Juniper,

13:43.720 --> 13:48.680
obviously the own network operating systems that Nokia provides like as a Linux and SLS,

13:49.240 --> 13:53.880
Juniper, so anything Sonic, anything that has a young model, we can ingest.

13:54.840 --> 13:58.120
Sometimes there's still a bug with okay, the bugs are there to be solved right.

13:59.160 --> 14:05.080
And choreo which is also part of the CubeNAT initiative really helps network engineers

14:05.800 --> 14:15.800
to find their own Kubernetes controllers. The key takeaway here is that we allow network engineers to

14:16.760 --> 14:24.920
or make it more user-friendly to define a controller for networking purposes. So the business

14:24.920 --> 14:33.000
logic that they can apply is either Python there or ginger templates or common codes to generate

14:33.000 --> 14:39.000
networking snippets basically. With that I think I still have a minute left.

14:39.960 --> 14:48.280
So a bunch of PR codes we have a discord where we collaborate with a bunch of network engineers

14:49.560 --> 14:55.400
trying to develop this stuff, our GitHub repo and some some YouTube links where you can check

14:56.920 --> 15:02.040
some more informative sessions about what CubeNAT is and what we're trying to do.

15:02.360 --> 15:09.400
With that I'm at the end of my presentation and I'll take any questions if they are there.

15:20.440 --> 15:25.800
Hey Hans, thanks for the presentation two questions quick. How are you handling state from the

15:25.800 --> 15:31.640
devices with respect to monitoring state and secondly if there is any storage so as you

15:31.640 --> 15:36.040
remember things is in memory or are you writing to discuss well is it at CD?

15:36.040 --> 15:45.800
Yeah. So inside of the as a C2 we also touch state from the networking device and we

15:45.800 --> 15:52.520
kind of catch that there in memory state. The config part like the actual config which is

15:52.520 --> 16:01.080
synced with the device we also persisted in stored at in memory.

