Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Difficulty Importing Large Quantities of CKLs #55

Closed
sagansapien opened this issue Dec 9, 2020 · 5 comments
Closed

BUG: Difficulty Importing Large Quantities of CKLs #55

sagansapien opened this issue Dec 9, 2020 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@sagansapien
Copy link

Greetings,

I attempted to import approximately 1500 CKLs into a collection last night. It parsed very quickly but the "Importing data" step ran for several hours and did not appear to complete. When I checked again this morning, my session had timed out. There are lots of new Assets but not all of them.

Now attempting to import the same 1500 CKLs again. I got the same warning about the same 3 duplicates from the initial load. Since this is the second import of this batch, I expected it to detect many more (several hundred) duplicates that had been successfully imported from the first attempt and skip them. It appears to be starting over and importing the same CKLs again.

Is there a timeout issue during large imports?

Can the status of previous import operations be found in the logs somewhere? For example, "on this date, 600 files were successfully imported, 35 new assets were created, 500 files failed to import".

What is the expected behavior of the duplicate detection function?

Can the total number of CKLs added to a collection be viewed?

Thank you!

@sagansapien sagansapien added the bug Something isn't working label Dec 9, 2020
@cd-rite
Copy link
Collaborator

cd-rite commented Dec 9, 2020

Hi Sagan-
Thanks for the issues and these questions, we really need input from people who are actually using the tool. I will update our Docs to address the questions you had more explicitly, and we will try to address the slow import you are seeing soon.

We have not experienced this exact problem, but we have not yet stress-tested with 1500 checklists going into a single Collection. We do plan on doing this soon, though.

There are a few avenues and considerations to address regarding your issue and questions.

Regarding session timeout you saw, this should be configurable in your keycloak instance, and may help resolve this problem. The max SSO session timeout would probably be the culprit, if this is the problem. Are you using our sample keycloak config? The import process will attempt to refresh its token when making requests, but if the session max is reached it will not be able to do that. That said, we should still be able to perform this import in a timely manner, so the problem may lie elsewhere. We enforce no timeouts on these imports ourselves.

Did you save any of the output we provide in the client?
What kind of progress did you see in the "Importing Data" status box before you left? Any idea how far it got?
Do the same files in smaller chunks (say, 500 at a time) succeed?

We are currently logging a LOT of data in that Importing Data status box, and think this may be the most likely bottleneck that caused your import to run slowly. We intend to reduce the amount of output we make to that "Importing Data" screen to something more like your suggestion of a summary of the operation. Look for a new STIGMan release in the next couple days to address this. We can probably tag you when we do, and would love to hear back from you if you can run your test again.

We do not create logs of these checklist imports on the API side. In this instance, most of the heavy lifting happens in the client, which parses the checklists and xccdf files provided into a JSON format that is much more efficient to send to the server API. From the API's perspective, it received a bunch of review updates (or asset POSTs, etc), not checklists.

The duplicates referrred to in the importer are duplicates in the set of files provided to the client. They are not compared to the set of assets already in the Collection. During the actual import, if an asset already exists in the Collection its properties will be updated with the data present in the checklists. Because you provided the same set of checklists, you saw the same set of duplicates.

CKLS are not actually "added" to the Collection in STIGMan as artifacts. Rather, the results and assignments found in those CKLs are applied to the Asset/STIG in question, and the relevant reviews are updated in the DB. We do not hold on to the CKL. Rather, STIGMan represents the current state of its asset assessments, maintained via the consolidation of both imports and manual evaluations in the GUI, and then produces a CKL of that state on demand. (We do, however, maintain a review history recording changes to reviews, though this will will be limited so as not to grow indefinitely)

Again, thanks for the feedback and questions!

@csmig
Copy link
Member

csmig commented Dec 10, 2020

@sagansapien I've pushed and released a change that reduces the amount of output displayed during Client import of CKL/SCAP files. I suspect this will speed up your imports and hopefully they will complete. Would kindly appreciate if you could try your 1500 files again and report on your experience when able.

@sagansapien
Copy link
Author

Thank you both for the detailed response and your modification. We'll test this out hopefully again tomorrow and will report back.

@sagansapien
Copy link
Author

Um, guys?

We tried the new update.

1500 CKLs imported in 3 minutes. A second batch of 3442 imported in 10 minutes.

I think you fixed it.

Bravo! This is awesome.

@csmig
Copy link
Member

csmig commented Dec 11, 2020

Yay! For reference, I've opened a feature request (#57) to further refine the output displayed during import and allow it to be saved.
Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants