WEBVTT

00:00.000 --> 00:14.880
Okay, hi, everyone. Welcome. My name is Jeff Mendoza. I work for Kusari, and I'm here

00:14.880 --> 00:20.800
to talk to you about S-bombs and using them with clearly defined. I also want to give

00:20.800 --> 00:25.080
a shout out to Chiang and Lynette, who both wanted to come, but we're able to make it

00:25.080 --> 00:30.080
and they're, um, yeah, they're sad they can't be here, but me, Chiang, and Lynette,

00:30.080 --> 00:35.200
make up the technical steering committee for clearly defined. Other stuff that I work

00:35.200 --> 00:41.080
on is open SSF projects like guac, and scorecard, and all star.

00:41.080 --> 00:51.760
Okay, you're at the limit. Thank you. All right, so this shouldn't be a surprise. S-bombs are

00:51.760 --> 00:57.160
great for legal compliance. That was literally one of the primary reasons that S-bombs

00:57.160 --> 01:01.920
were created. They have all the right fields to put in your license information, the

01:01.920 --> 01:07.560
expressions, discovered licenses, attributions, copyright statements, et cetera.

01:07.560 --> 01:12.680
And then for those of you that are really only focused on security, legal compliance is

01:12.680 --> 01:17.000
of course, complying with the licenses of all the open source software that you use, that

01:17.040 --> 01:22.200
you ship, that you distribute. Licenses, what makes open source open source, and they

01:22.200 --> 01:28.760
have obligations, even permissive licenses, have obligations.

01:28.760 --> 01:34.240
All right, so getting back to S-bombs, so we have an S-bombs, we want to carry out legal

01:34.240 --> 01:39.160
compliance. They have all the right fields, but what goes into those fields. Most of the

01:39.160 --> 01:43.840
time, when you're generating an S-bombs, you're running some kind of S-bombs creation tool,

01:43.840 --> 01:51.520
generator. And these tools actually have to be jack of all trades. They have to find all

01:51.520 --> 01:57.320
of your dependencies, depending on what language you use, whether you have files, whether you

01:57.320 --> 02:02.960
have a container. They do a lot of great work there, but sometimes they can't be the

02:02.960 --> 02:09.440
best at everything. So your creation tool might on the legal side just look at the top

02:09.520 --> 02:14.960
level reported license, like if you have an NPM package, it might look at what is the license

02:14.960 --> 02:20.960
name inside your package JSON. I'm going to stick that into the S-bombs. Or some other

02:20.960 --> 02:25.920
top level matching on the license file, or they might just skip actually doing license

02:25.920 --> 02:30.480
detection and just put no assertion in all the license fields, because maybe you're using

02:30.480 --> 02:39.320
S-bombs for something else. So kind of what I want to get to is this is not, this is

02:39.400 --> 02:46.200
not suitable for really serious legal compliance. There's a lot more that you can look at

02:46.200 --> 02:51.400
and get to. And the creation tool, we shouldn't really expect it to be perfect at everything.

02:52.600 --> 02:59.080
So there's a project called scan code that will scan the not just the package top level,

02:59.080 --> 03:06.760
but all the different files inside of the package, inside of distribution, gets all the legal

03:06.760 --> 03:11.800
text out of every file, gets a detailed scan data. And there's other good scanners as well,

03:11.800 --> 03:16.520
like scan code that we do this for you. And anybody that works in legal compliance, it works

03:16.520 --> 03:23.400
with scan code or another tool like this all the time. So that brings us to clearly defined.

03:23.400 --> 03:30.760
So clearly defined is a project, a code project that scans lots of projects, and then it's also

03:30.760 --> 03:37.960
an open data set project that contains all of the results of all of the scans. So it's

03:39.080 --> 03:44.120
while it runs scan code and other scanners, it also compiles all of that information into a

03:44.120 --> 03:51.240
top level definition, where it's kind of summarizes and groups everything together. It has both

03:51.240 --> 03:57.240
the declared license, what we think that the package author was trying to say, and then the

03:57.240 --> 04:03.000
discovered, which is kind of the comprising of all the results of all the different scan tools.

04:03.800 --> 04:08.040
You can get the definition, which is that summary, or also the actual raw data from the scanners

04:08.040 --> 04:17.480
themselves. Clearly defined uses the SPDX license list for license identifiers and the expressions,

04:17.480 --> 04:26.200
which both major S-bomb formats use as well. And it gives you not just the license and the

04:26.200 --> 04:31.240
expression, but the copyright attribution information, which you would need to generate a notice file

04:31.240 --> 04:35.640
and the source location, which is if you're distributing a copy left, you don't be able to download

04:35.640 --> 04:42.360
that and distribute that as well. And so the whole idea of clearly defined is that if you were to run

04:42.360 --> 04:49.960
something like scan code on jQuery 1, 2, 3, if somebody else runs that scan code on that same

04:49.960 --> 04:53.640
package, they're going to get the same results and why does everybody have to run it separately?

04:53.640 --> 04:57.800
And it does take a little bit of time to run. So let's just put all that information in a central

04:57.800 --> 05:03.880
area. Another thing that clearly defined does is, in addition to all of the

05:05.960 --> 05:15.240
programmatically generated information, the community and of legal experts can say, oh, I think that

05:15.240 --> 05:21.320
this scan information is incorrect. We looked at all of the files and the author really intended

05:21.320 --> 05:25.880
this license. So they can actually add a curation on top of the data that's peer reviewed by

05:25.880 --> 05:30.360
the community. And then you can get actually higher fidelity information from clearly defined

05:30.360 --> 05:35.800
than you would if you just ran the scanner yourselves. So it's available on a website which you

05:35.800 --> 05:41.480
could browse or rest API, which you can use to download all the definitions. So we run the service,

05:41.480 --> 05:46.040
we run scans, and then we hold all the data. But does it work with S-bombs?

05:46.920 --> 05:55.480
Until now, no, but now it does. So this was actually very, very straightforward to write things

05:55.480 --> 06:02.280
to some other tools. But I wrote a very quick tool called CDS-bom, which will, you can pass it

06:03.560 --> 06:08.840
a nest bomb that has whatever legal information in there, but we can throw that away and then

06:08.840 --> 06:13.560
query the legal information from clearly defined, replace it in the S-bom and then output a new

06:13.560 --> 06:19.720
augmented S-bom. It can also generate a notice file. It's available as a CLI or a Google library.

06:20.600 --> 06:25.720
But the things that really made this very easy to do is that protobom is a great

06:27.400 --> 06:33.240
library that allows you to read modify right S-bombs with very little code in go.

06:34.360 --> 06:39.400
And then the other thing is clearly defined doesn't use pearls exactly. It uses this

06:40.280 --> 06:48.760
what it was called. The coordinate is the idea of the software package identifier. And so to translate

06:48.760 --> 06:53.880
between pearl and coordinate, the Glock project released a library that you can pass one and get the

06:53.880 --> 07:04.520
other and have that translation there. I do want to give a quick demo. So I've got here a directory that's

07:04.520 --> 07:13.640
got some S-bombs. I'm just going to run CDS-bom on S-bom S-bDX and it's going to read the S-bom.

07:14.920 --> 07:20.440
Find out all the pearls translate all those coordinates. It's very clearly defined. Get all the data.

07:20.440 --> 07:27.560
I output the difference here and then if we look here, we have a new file called S-bDX new and it

07:27.560 --> 07:34.520
has a bunch of we could down a little bit. Some of these discovered licenses get really long.

07:34.520 --> 07:40.200
When there's a lot of like test data or other stuff found and that's the declared side. It's

07:40.200 --> 07:49.480
still pretty easy. And then I want to run the the other tool which is S-bom. Notice.

07:49.480 --> 08:06.680
Let's run it on the cyclone DX one. And then if I refresh here, I'll have a notice file.

08:06.680 --> 08:11.240
So it'll just use the S-bom parset and go up to clearly find to grab a notice file that you

08:11.240 --> 08:17.240
could use to distribute that code. The other thing I wanted to show real quick,

08:18.200 --> 08:23.480
this is where the code is located. So I'm trying to figure out if we want to move it into the

08:23.480 --> 08:30.840
clearly defined project itself or if I'll host it or wherever just the install instructions.

08:31.800 --> 08:39.800
And then the the part that does the enhancement is again a library that you could easily.

08:39.800 --> 08:46.200
If you have a go code and you parse a S-bom with protobom, you can just call this library to enhance

08:46.840 --> 08:51.000
your protobom S-bom document with the legal information from clearly defined.

08:57.640 --> 09:07.480
And oops, I started from the beginning. Yeah, that's it. So you know, the user S-bom is clearly defined.

09:07.480 --> 09:13.480
If you have any any have any other ideas on how you want to use S-bom and get the legal

09:13.480 --> 09:16.840
information clearly defined that isn't here, I'd be really interested to know.

09:16.840 --> 09:21.320
Because I think putting these things together was very simple and I want to get workflows

09:21.320 --> 09:27.080
that work and get the right legal information to all of you all. Other things you can do to join the

09:27.080 --> 09:32.840
community is if you have a dependency and the the scan results from clearly defined or incorrect,

09:33.640 --> 09:37.800
open up a curation. The community will look at it and see like is this, is this the right

09:37.880 --> 09:45.000
change and merge it if so. The running the scans takes a lot of compute power. We allow people

09:45.000 --> 09:50.120
to donate their compute and run the scans on their their saw their hardware and then upload the results.

09:51.000 --> 09:55.160
And of course it's a it's a code project so you can join us and and start hacking.

09:58.360 --> 09:59.320
Yeah, question ringer.

09:59.320 --> 10:10.440
Yeah. So on the tools I wrote yeah the question is does cycling dx s pdx all have the right

10:10.440 --> 10:15.000
feature parity so the tools I wrote use protocol and protocol supports all the formats.

10:16.040 --> 10:22.440
Cycling dx doesn't have right now the license discovered or concluded field so that wouldn't be put in there.

10:23.400 --> 10:29.720
Question Anthony. Yeah. How do we offer it in all the languages?

10:31.720 --> 10:37.160
Well the question is like how do we offer this another languages? Well the CLI tool can be used by anybody.

10:37.960 --> 10:43.320
So this is another idea I'm having is like again getting away from the s bomb generator doing everything.

10:44.120 --> 10:51.160
I've seen tools like yours that are a little bit more about modifying s bombs after they're generated.

10:51.160 --> 10:57.640
So you know I can see people generating s bombs using a a number of tools to enhance them.

10:57.640 --> 11:02.040
And then ending up with a bit with a better s bomb. So the CLI tool should be great for everybody.

11:02.040 --> 11:07.880
But for libraries I'll maintain go and you know I'm sure there's a python one that will work.

11:09.320 --> 11:15.880
Great. Is there a plan to local pearls and the question is is there a plan to move to pearls and clearly find not right now?

11:21.800 --> 11:29.240
The question is how do we deal with information with a lower reliability score?

11:30.280 --> 11:34.840
So you know when we get a lot of curation we can think about like is there something that we can do there?

11:34.840 --> 11:37.560
But for the most part we're really relying on the underlying scan tool.

11:37.560 --> 12:06.280
Okay so the question is should we use the reliability score and clearly to find and maybe only selectively pull license information into the s bomb?

12:06.920 --> 12:08.920
That's a great feature I'd like to have that.

12:13.320 --> 12:14.920
Questions in the back?

12:20.760 --> 12:25.160
Yeah the question is how are pearls and coordinates different and

12:26.920 --> 12:31.480
there's just some minor differences like if there's an empty string in the coordinate use a dash.

12:36.280 --> 12:40.200
Sorry. Okay, I don't let that offline. Thank you.

