-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Difficulty Importing Large Quantities of CKLs #55
Comments
Hi Sagan- We have not experienced this exact problem, but we have not yet stress-tested with 1500 checklists going into a single Collection. We do plan on doing this soon, though. There are a few avenues and considerations to address regarding your issue and questions. Regarding session timeout you saw, this should be configurable in your keycloak instance, and may help resolve this problem. The max SSO session timeout would probably be the culprit, if this is the problem. Are you using our sample keycloak config? The import process will attempt to refresh its token when making requests, but if the session max is reached it will not be able to do that. That said, we should still be able to perform this import in a timely manner, so the problem may lie elsewhere. We enforce no timeouts on these imports ourselves. Did you save any of the output we provide in the client? We are currently logging a LOT of data in that Importing Data status box, and think this may be the most likely bottleneck that caused your import to run slowly. We intend to reduce the amount of output we make to that "Importing Data" screen to something more like your suggestion of a summary of the operation. Look for a new STIGMan release in the next couple days to address this. We can probably tag you when we do, and would love to hear back from you if you can run your test again. We do not create logs of these checklist imports on the API side. In this instance, most of the heavy lifting happens in the client, which parses the checklists and xccdf files provided into a JSON format that is much more efficient to send to the server API. From the API's perspective, it received a bunch of review updates (or asset POSTs, etc), not checklists. The duplicates referrred to in the importer are duplicates in the set of files provided to the client. They are not compared to the set of assets already in the Collection. During the actual import, if an asset already exists in the Collection its properties will be updated with the data present in the checklists. Because you provided the same set of checklists, you saw the same set of duplicates. CKLS are not actually "added" to the Collection in STIGMan as artifacts. Rather, the results and assignments found in those CKLs are applied to the Asset/STIG in question, and the relevant reviews are updated in the DB. We do not hold on to the CKL. Rather, STIGMan represents the current state of its asset assessments, maintained via the consolidation of both imports and manual evaluations in the GUI, and then produces a CKL of that state on demand. (We do, however, maintain a review history recording changes to reviews, though this will will be limited so as not to grow indefinitely) Again, thanks for the feedback and questions! |
@sagansapien I've pushed and released a change that reduces the amount of output displayed during Client import of CKL/SCAP files. I suspect this will speed up your imports and hopefully they will complete. Would kindly appreciate if you could try your 1500 files again and report on your experience when able. |
Thank you both for the detailed response and your modification. We'll test this out hopefully again tomorrow and will report back. |
Um, guys? We tried the new update. 1500 CKLs imported in 3 minutes. A second batch of 3442 imported in 10 minutes. I think you fixed it. Bravo! This is awesome. |
Yay! For reference, I've opened a feature request (#57) to further refine the output displayed during import and allow it to be saved. |
Greetings,
I attempted to import approximately 1500 CKLs into a collection last night. It parsed very quickly but the "Importing data" step ran for several hours and did not appear to complete. When I checked again this morning, my session had timed out. There are lots of new Assets but not all of them.
Now attempting to import the same 1500 CKLs again. I got the same warning about the same 3 duplicates from the initial load. Since this is the second import of this batch, I expected it to detect many more (several hundred) duplicates that had been successfully imported from the first attempt and skip them. It appears to be starting over and importing the same CKLs again.
Is there a timeout issue during large imports?
Can the status of previous import operations be found in the logs somewhere? For example, "on this date, 600 files were successfully imported, 35 new assets were created, 500 files failed to import".
What is the expected behavior of the duplicate detection function?
Can the total number of CKLs added to a collection be viewed?
Thank you!
The text was updated successfully, but these errors were encountered: