Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebAPI v2.11 fails to start with many sources configured #2031 #2032

Merged
merged 2 commits into from
May 23, 2022

Conversation

anton-abushkevich
Copy link
Contributor

Quick fix for #2031

@anthonysena
Copy link
Collaborator

Tagging @anton-abushkevich @ssuvorov-fls here. I added a commit where I'm trying to add back control of the caching to honor the priority of the daimon.

Prior to this change, we'd only cache record counts where the priority is > 0. The code I added tries to re-apply this rule but while making the update, it did raise a few questions:

  • The changes done in Complete caching of Achilles results in Data Sources #1978 #1993 adds some new rules for caching. As I understand it, any source configured with both a vocabulary and results daimon will have a job started with 2 steps: one to run the CDMResultsCacheTasklet and anther to run the newly added AchillesCacheTasklet. With the changes I made, I only run the CDMResultsCacheTasklet if the source's results daimon has a priority > 0. I stopped short of applying this same rule for the AchillesCacheTasklet since I was unsure if this would cause any problems. If a source with a results daimon does not run the AchillesCacheTasklet, will the Data Sources functionality still work without the cache?
  • It seems that every start of WebAPI requires the AchillesCacheTasklet to run and presumably copy the Achilles data from the CDM to the WebAPI DB (the newly added cdm_cache table). My question is: do we need to do this every time we start up? This feels like it should be a 1-time operation to make that copy. We might consider that the tasklet should try to figure out if the caching needs to be re-applied? I think that was the idea for the cdm.cache.cron.warming.enable but I'm unsure why you'd need to run this on a scheduled basis?
  • The behavior of cdm.cache.cron.warming.enable differs from cdm.result.cache.warming.enable. cdm.cache.cron.warming.enable controls if the scheduled task will run based on the cron expression specified in cdm.cache.cron.expression. Originally cdm.result.cache.warming.enable controlled if we even warmed the results cache, which only included the record counts. That setting has now expanded to control both the record count cache and the Achilles results cache. I'd prefer if we have a mechanism to control both of these caches based on the point above.
  • There is a mechanism to "bucket" the jobs (see getBucketsSizes) and as I understand it, it is used to create a single job with multiple steps based on the sources involved. I'm unclear though if Spring Batch is honoring this since it appears to launch all jobs immediately. In the case where we have a large number of sources configured, that will leave WebAPI & Atlas unresponsive to user requests (i.e. cohort generations) until the queue is cleared. Again, this will require us to think about the 1st point about controlling which sources are immediately cached.

@anton-abushkevich
Copy link
Contributor Author

@anthonysena

  • If a source with a results daimon does not run the AchillesCacheTasklet, will the Data Sources functionality still work without the cache?

Yes, it should still work.

  • It seems that every start of WebAPI requires the AchillesCacheTasklet to run and presumably copy the Achilles data from the CDM to the WebAPI DB (the newly added cdm_cache table). My question is: do we need to do this every time we start up? This feels like it should be a 1-time operation to make that copy. We might consider that the tasklet should try to figure out if the caching needs to be re-applied? I think that was the idea for the cdm.cache.cron.warming.enable but I'm unsure why you'd need to run this on a scheduled basis?
  • The behavior of cdm.cache.cron.warming.enable differs from cdm.result.cache.warming.enable. cdm.cache.cron.warming.enable controls if the scheduled task will run based on the cron expression specified in cdm.cache.cron.expression. Originally cdm.result.cache.warming.enable controlled if we even warmed the results cache, which only included the record counts. That setting has now expanded to control both the record count cache and the Achilles results cache. I'd prefer if we have a mechanism to control both of these caches based on the point above.
  • There is a mechanism to "bucket" the jobs (see getBucketsSizes) and as I understand it, it is used to create a single job with multiple steps based on the sources involved. I'm unclear though if Spring Batch is honoring this since it appears to launch all jobs immediately. In the case where we have a large number of sources configured, that will leave WebAPI & Atlas unresponsive to user requests (i.e. cohort generations) until the queue is cleared. Again, this will require us to think about the 1st point about controlling which sources are immediately cached.

I think you right, but it is not the scope of this PR. Let's create another task for this and do the discuttion/research there.

I will approve your changes in this PR and create a new task to discuss these topics. I hope, Sergey @ssuvorov-fls will help us with it.

@anthonysena
Copy link
Collaborator

Yes, it should still work.

It will be important to preserve the backwards compatibility here - in our environment, it is not feasible to cache all Achilles results.

I think you right, but it is not the scope of this PR. Let's create another task for this and do the discuttion/research there.

I will approve your changes in this PR and create a new task to discuss these topics. I hope, Sergey @ssuvorov-fls will help us with it.

Could you please open a new issue with these details? We'll keep the focus of this PR on the name length issue but my preference is to add a configuration setting that allows us to disable the Achilles cache functionality for now.

@anthonysena
Copy link
Collaborator

This addresses the root of the problem described in #2031 by making sure the job name does not exceed the character limit. I'll approve this PR since this satisfies the bug fix required for that issue. Let's use #2034 to address the questions I've raised here. Thanks!

@anton-abushkevich anton-abushkevich merged commit fad3dab into master May 23, 2022
@delete-merged-branch delete-merged-branch bot deleted the issue-2031_cache_warm_job_name_fix branch May 23, 2022 07:44
anton-abushkevich added a commit that referenced this pull request Jun 9, 2022
* fails to start with many sources configured #2031

* Starting to add back the ability to control caching based on daimon priority

Co-authored-by: Anthony Sena <[email protected]>
(cherry picked from commit fad3dab)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants