Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a closed source Large Language Model CVE detection program or tool #11

Open
bennmann opened this issue Sep 20, 2023 · 3 comments

Comments

@bennmann
Copy link

With the prominence of Machine Learning over the past 10 years, and the rise of Large Language Models recently which can reason (per various scholarly works), the CVE org could train a Large Language Model on it's database of vulnerabilities and create a closed source tool for the public to use as a CVE detection tool based on the best Large Language Models and CVE data.

@zmanion
Copy link

zmanion commented Oct 5, 2024

Hi and thanks for the suggestion, although I'm not sure I clearly understand it. If you're suggesting an LLM based on the CVE corpus could be used to discover new vulnerabilities I don't think that would work. We are interested in using AI/ML to imrove CVE, but would need much more specific ideas.

@ryOF65aErb
Copy link

I see LLMs as offering the promise of processing the existing CVE JSON files to produce a data set with canonicalized affected range information - drawing upon the text of the CVE description as well as other fields (some containing idiosyncratically formatted version information), with the objective of producing an automation-ready data set.

@CVEProject CVEProject deleted a comment from Bdutti Feb 3, 2025
@CVEProject CVEProject deleted a comment from Bdutti Feb 3, 2025
@zmanion
Copy link

zmanion commented Feb 3, 2025

While I'm a bit of an AI skeptic (or possiblye Luddite), I'd try anyting to help us with vulnerability status, software ID, versions, ranges, etc. There's so much variation in accuracy and level of detail in CVE affected objects. I'd still be concerned about accuracy and precision when it comes to specific versions and ranges, e.g., that AI would decide a range that is off-by-one (inclusive/exclusive). IOW, garbage in garbage out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Under Discussion
Development

No branches or pull requests

3 participants