Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with using this library when it finds a license that isn't in the list #42

Closed
DarthHater opened this issue Feb 13, 2020 · 20 comments
Labels

Comments

@DarthHater
Copy link
Member

We are using this library in auditjs to generate an SBOM to send to Nexus IQ Server.

We've run into an issue with a few libraries where the license is presented as something that isn't in your current list.

An example is:

https://github.com/substack/node-optimist/blob/master/package.json#L35

That license is declared as MIT/X11 when in reality it should either be MIT X (not in your list) or X11

When the sbom is created, a license section with a Name is created, just no ID, and this fails validation in Nexus IQ Server.

I'm curious if we should add these kinda odd license types to your list, or if a PR of some sort where if it can't find an ID it adds something that indicates the license is Non-Conforming or something akin?

Thanks!

@DarthHater
Copy link
Member Author

I suppose another alternative is to provide a flag that just ignores licensing data (which is hyper specific to our case), but I don't personally like that.

@stevespringett
Copy link
Member

The node-optimist library contains an invalid license in package.json. It is a requirement that the license defined in package.json be a valid SPDX license ID or a valid SPDX expression. This library is so old, that this requirement likely wasn't enforced when this component was last published.

https://spdx.org/licenses/MIT.html
https://spdx.org/licenses/X11.html

It appears this library is dual licensed which should be declared in the package.json, but it's not.

Unresolved licenses like this will appear as a license 'name' in CycloneDX rather than a license ID. License ID's only resolve to valid SPDX license IDs and are not required by the spec. If Nexus IQ requires a license id, then it is violating the spec. This is an issue you'll need corrected by Sonatype.

@DarthHater
Copy link
Member Author

It's actually not dual licensed when you read about MIT/X11, it's just improperly declared. X11 is MIT with an extra sentence.

@stevespringett
Copy link
Member

Thanks for clarifying. MIT/X11 is still an invalid SPDX license ID, therefore the license name is used instead of the id.

@DarthHater
Copy link
Member Author

Took a look at the spec and you are right (didn't expect you to be wrong, by the way).

I'm trying to think of a middle ground, personally.

            if (spdxLicenses.some(v => { return l === v; })) {
                licenseContent.id = l;
            } else {
                licenseContent.id = 'UNKNOWN';
                licenseContent.name = l;
            }

Something like that seems to accomplish it, and indicate to someone they are using an Unknown license, which is I think accurate in this case.

@DarthHater
Copy link
Member Author

Optimist is FAR from the only package where I saw this coming up, and being a lot of these projects are dead or predate some sort of standard, trying to finagle a middle ground.

@DarthHater
Copy link
Member Author

The side benefit for someone using this to generate a SBOM is that they'd then know that whatever they are using has an old license and should probably be looked at with a bit more scrutiny?

@stevespringett
Copy link
Member

Whatever value is defined in the license id field will be validated against https://cyclonedx.org/schema/spdx.xsd which is derived from https://github.com/spdx/license-list-data.

'UNKNOWN' is not a valid SPDX license ID. If you create a BOM with this value, it will fail to validate.

License strings which do not resolve to valid SPDX license id's, will always use the license name field. That's what the spec defines and that's what all official CycloneDX implementations produce.

@DarthHater
Copy link
Member Author

So given license data is optional via the cyclonedx spec, can we provide a flag to just say "don't provide this data"? That seems like a good middle ground, and on spec.

@stevespringett
Copy link
Member

Roughly half of all ecosystems do not enforce SPDX license ID's. Java (Maven) and .NET (Nuget) are prime examples of this. This is a very common scenario and certainly not unique to this library. The overwhelming majority of libraries in Maven Central and NuGet are not using SPDX license IDs.

SBOMs are a statement of fact. Removing facts goes against the primary objectives of the spec. The only exception that's been made is license text, which can greatly influence the size of the resulting BOM.

You're free to fork the project and create your own version which excludes facts. However, I still recommend solving the problem in Nexus IQ itself.

@DarthHater
Copy link
Member Author

DarthHater commented Feb 13, 2020

I work for Sonatype, if that wasn't clear, and I'm here because Nexus IQ requires an ID and that's a hard requirement. Forking a library is a pretty poor excuse for not working with the author to make it more flexible, and I don't see a reason why it shouldn't be, given the spec itself marks this entire section as not required. If someone wants to throw it away, they'd be making an intentional choice to do so.

@stevespringett
Copy link
Member

Asking an independent open source author to make changes to every CycloneDX implementation to exclude details so that a commercial vendor can more easily do their job, is not a proper ask.

Dependency-Track attempts to resolve license names and does a decent job of it for certain licenses. If the license name cannot be resolved, then the license is discarded (as if it was never defined in the bom).

@DarthHater
Copy link
Member Author

I am not asking you to make changes, I've already got multiple permutations of this fired up on my side. I'm asking you to accept changes I'm willing to make, because they seem to make sense, looking at the spec itself, and the quality of the data I can expect to get from the npm ecosystem. I've presented an array of potential solutions to the problem I'm encountering, and I feel like I'm thinking as much as I can about the spec, and reality that I'm running in to. I'm unwilling to fork this repo longterm because:

  • Forks become a maintenance nightmare long term (you change upstream dependencies, something you are using becomes vulnerable, when do I find out about that? This adds up to me setting up more build infra)
  • It involves setting up infra to publish my fork to npm, which is duplicative of your work, and confusing to other people consuming this type of package

I get what you mean by spirit of the SBOM, and I'm sympathetic, but if something in the spec is marked as optional, giving someone the option to discard it seems fine, as it is done by intent, and doesn't violate the spec.

Also I have no clue what you mean by dropping the license text in your earlier comment, as I'm unaware of that option, and it doesn't appear to be documented. Please share that because it does indeed cause huge bloat (we are seeing 2-3mb SBOMs, and we don't need that data at all).

I'll drop this convo after this, but I feel like adding an option to discard the license data, by intention doesn't violate any part of the spec, and just adds additional flexibility to someone like me, the consumer.

@ButterB0wl
Copy link

So adding a flag to strip out non-required fields is out of the question? We can write the PR if you would accept it we just don't want to strip it after the fact in our module, but we will as this is currently breaking for us.

@stevespringett
Copy link
Member

Excluding license text is a common ask and it's one that doesn't really take away from the facts, only the supporting evidence. It has been implemented in a few official implementations, but I don't think it's made its way to the node version yet. But it will.

Also keep in mind that there are numerous unofficial implementations as well, which include:

  • Golang
  • Hex
  • Rebar3
  • SBT
  • OpenSource Review Toolkit (ORT)
  • Multiple container vendors are currently implementing

For example, a container security vendor that produces a BOM or ORT which produces a BOM are both capable of generating BOMs with npm modules. This is only a one implementation, but there isn't a guarantee that users will use this version. ORT for example, is extremely popular. I personally use it in some cases. That project comes specifically from a licensing background.

So even if this (or all official) implementation did have an option to exclude license, it doesn't solve the underlying issue. CycloneDX is a specification which just happens to have some implementations. There are many other ways to generate them, and many more currently in development, both publicly and internally at various organizations.

@DarthHater
Copy link
Member Author

I'm aware of quite a few unofficial implementations, as I've crafted a few already (and been meaning to get in touch with you over if you want me to make them more generic and inherit them). I've got Golang and Conda specific ones at current time, but a bit bastardized just because I had to make things work in non traditional ways. The conda one is here if you wanted to take a gander:

https://github.com/sonatype-nexus-community/jake/tree/master/jake/cyclonedx/v1_1

Yeah best I can tell excluding license text hasn't made it here (I could add that in a heartbeat for you, if you'd like).

I'd really rather work with you to provide flexibility to this library rather than roll my own, and a few other reasons why:

  • parsing XML after the fact is something that is scary, and for I think obvious reasons, and the safe way of doing it using SAX etc... kinda reduces the utility of a library providing something, as I'm then writing a lot of code to ensure safety
  • once the sbom is created, I really don't want to touch it, I'm treating it as immutable as much as I can when I work with it (get it and send it off)
  • ORT is not an option for me, as I'm writing a nodejs app that uses native dependencies, and one of our explicit goals is to provide a native nodejs experience (hence, using this library)
  • I work at Sonatype, and we don't use forks of repos unless we do the rest of the work to publish it to a repository, scan it for vulnerable third party deps, etc.... and by the time I make it through all of that, that's a lot of build infrastructure to setup in order to just make a very minimal change

@DarthHater
Copy link
Member Author

Popping back in to summarize:

As well, I packaged it all up on my own fork of this project:

https://github.com/DarthHater/cyclonedx-node-module/tree/TypeScript

If you want I can send you a PR, and if you want I can strip out the contentious license data removal piece (although I obviously still think it's beneficial). The PR itself I think would be beneficial as it:

  • Includes some minimal testing framework (mocha/chai) and a test (I tested a few cases, circular dependency, dependency on a dependency, regular dependency)
  • Being it's written in TypeScript, it provides types and makes it easier for TypeScript devs to consume this project
  • It decouples createBom from read-installed, allowing someone to potentially use this project to create a bower sbom with minimal changes (an example object that can be passed is in the test)

While I was at it I wired up an index.ts that will create a cli app like the previous one in the bin, I couldn't figure out how that was created, so I've generated a cli app with similar flags, etc... or shown examples where I thought a flag wasn't needed.

Cheers, and thanks for providing this library, in general!

@stevespringett
Copy link
Member

Thanks for the summary and the work you and the rest of the team has done with implementing it in your own ecosystems.

The project is always looking for contributions, and it sounds like you've made numerous improvements, both architecturally and in overall quality.

And although I cannot accept a PR which excludes license ID or name (but can accept exclusion of license text), I am certainly open to reviewing a PR. It would also likely be much easier to keep in sync if necessary.

There has been requests to support Bower (#35) so the architectural changes you made may make that possible in a future release.

@jkowalleck jkowalleck pinned this issue Dec 10, 2021
@jkowalleck jkowalleck unpinned this issue Dec 11, 2021
@jkowalleck jkowalleck added the bug label Dec 11, 2021
@jkowalleck jkowalleck pinned this issue Dec 12, 2021
@jkowalleck
Copy link
Member

reviewed the issue thwth the original MIT/X11 license of the given example.
cyclonedx-node-module currently detects the license and writes it as a non-SPDX-license correctly according to CycloneDX spec 1.3
behavior is as expected. Therefore this issue will be closed.

@jkowalleck jkowalleck unpinned this issue Dec 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants