Converted inputs to sets + fixed inputs #87

sesquideus · 2025-01-28T15:20:07Z

reconciled with jb/ifu_rsrf_dev
converted inputs to sets to avoid duplicates and facilitate comparisons
added a few diagnostic messages
added / removed inputs to conform to the SOF files (possibly not to DRL-D though)

hugobuddel · 2025-01-28T15:59:24Z

I don't think it is wise to convert the input to lists. The order of the input (e.g. in the SOF) file is important. As in that it affects things in the ESO stack.

Also, I would like to (at some point) derive the workflow directly from the recipes, or at least experiment with that. And then the order can be very important.

So I (strongly) oppose to this change.

hugobuddel · 2025-01-28T16:00:10Z

Plus, can we fix the CI before adding new things?

sesquideus · 2025-01-28T16:14:06Z

So the "set of files" is not actually a set, but a list? 🤔
I am not touching the contents of the framesets, only the Inputs in InputSet.inputs (the order is currently completely arbitrary there and ESO does not care either, this is our internal structure). What I need to prevent is the repetition, but of course that can be done with lists too.

janusbrink · 2025-01-28T17:50:24Z

Not sure about the ordering issue, but losing this info might get us tangled down the line, compared to what we gain by going to sets? So, we want 'ordered sets'? Something like OrderedSet([1, 2, 3]) ?

sesquideus · 2025-01-28T19:07:54Z

The source of this information is the order in which they are defined in the class hierarchy (by us). So for instance in DarkImageProcessor RAW is always before MASTER_DARK and then other recipe-specific inputs follow. I usually copied the inputs from DRLD, so there might be some logical ordering, but I never cared much -- if we access the inputs by attribute name (self.raw) order should not matter much anyway.

But with sets it is easier for me to prevent developer errors: if I derive from RawImageProcessor.InputSet, it automatically includes inputs |= {self.raw}. If I forget and include it again in the derived class, there will be too many inputs and my tests for all_frames == used_frames will fail.

hugobuddel · 2025-01-28T20:47:21Z

~~I think that the input should be order, and that it should be the exact same order as the with_associated_input entries in the EDPS tasks.~~ I changed my mind while writing this comment, go ahead, the input should be a set.

Technically the order should not matter for the current way things are organized. But making the input a set will make it harder to move your beautiful abstraction layer into the direction I would like. For example, I'd like all the information that is in the EDPS tasks to be also present in the recipe itself. To the extend that the EDPS task can be derived from the recipe. (FWIW, this also means I personally like to have 3 or 4 recipes for the dark instead of 1.)

However, I don't think the order of the with_associated_input actually matters in the current implementation of the EDPS. But a big point of my ADASS paper and talk is that I believe that to be wrong: making use of that order in the associated input allows one to propagate information over the dependency graph.

So I would like a way forward in which this ordering is kept.

Hmm, thinking about it more, I do think that it should be possible to infer the ordering in case it would matter. So maybe go ahead and do this.

To summarize a case where the ordering would matter in the EDPS workflow is when you'd want your standard star reduction to use the same flat as is used for the science data reduction. Then you need to first determine the flat for the science data before you can decide how to do the standard star reduction (which is indirectly also input to the science task).

But if such a selection mechanism is indeed needed, then it must be encoded. E.g. through a query like science.standard.flat == science.flat (where a dot follows the chain of input dependencies). It should be possible to encode such a query in a way that includes which of those is leading, e.g. by using another symbol than == or by convention that the right side should be evaluated first.

Also, you are right that this change does not directly affect the SOF files, but indirectly it might. For example if we create documentation from the recipe directly and that people create their SOF based on the documentation. (And you are also right that the order in the SOF files should not matter, but it can.)

So I've reduced my strong opposition to a mild one, and perhaps I now even think that the ordering should not matter. Furthermore, there are quite a lot of hypotheticals in the story above, that might never come true, so you should not worry about.

(Another hypothetical: say you actually want the same input twice? Not as a list-of-two, but as two distinct inputs. Now we can't do that anymore! But try as I might, all scenario's I can come up with would be better served with either a list, or by having different classes for the input data so they are not duplicates anymore.)

Concluding, I think the benefits outweigh the negatives, so good idea!

hugobuddel · 2025-01-28T20:50:44Z

Maybe a data structure with a deterministic ordering like OrderedSet would indeed be beneficial. Because just this morning I fixed a bug in the ScopeSim tests because find is not deterministic: AstarVienna/ScopeSim#535 . We should prevent such random bugs if possible.

sesquideus · 2025-01-29T10:33:24Z

@hugobuddel Thanks for the stream of consciousness! I had a throbbing headache at night and decided not to touch anything, now with a clear head I think I understand. I see no downsides in using an ordered set, but then we need to make sure it matches DRLD / workflows everywhere. On the other hand, I cannot find any ordered set implementation in the standard library, only sortedcollections.

If an input file is required twice for whatever reason, it is perfectly possible to use two separate Inputs pointing to the same file (though right now I have no idea how that could be useful).

janusbrink · 2025-01-29T10:49:01Z

From Python 3.7 onwards a keys-only dict should mostly work like an ordered set? Depends on the operations we expect on it after it is created, though.

sesquideus · 2025-01-29T12:17:23Z

From Python 3.7 onwards a keys-only dict should mostly work like an ordered set?

Then I am already using that. But I get

>>> {1, 2, 3} == {3, 1, 2}
True

which is exactly what I want, but apparently not ordered.

hugobuddel · 2025-01-29T12:52:21Z

I like to take a step back and think about what we are trying to solve. You claim that this is necessary for the "used frames", that is #85 (if I understand correctly), because there you say

It requires me to convert inputs from a list to a set

However, I'm not sure that #85 solves the problem as set out in #79, which I think it is aimed to do. So transitively, I'm not sure this P.R. is necessary. We might still merge this because it seems useful on its own, but we should do so for the right reasons.

The main reason to distinguish between "all frames" and "used frames" is when a recipe has multiple output products, and the different output products use different input. The two main cases are:

The recipe has extra (optional) outputs. The example in PipelineProduct.save(): distinguish between allframes and usedframes #79 is that many recipes calculate an intermediate data product that represents the background. This background is usually discarded, but it is worthwhile to have an option to save the background to file (in addition to saving the primary output of the recipe). The primary output of the recipe would have used all input files, but the background would only have used a subset of those. So the file with the background should have fewer files listed as dependency than the primary output.
A recipe can process multiple files in parallel. For example the "basic reduce" recipes could for example process all science raws from a single template in parallel. A benefit would be that calibrations like the master flat and master dark would be shared between all of them, and thus would have to be loaded into memory only once. But all frames would require their own individual persistence correction. So each file should be saved with only one persistence correction as used frame.

In both cases I don't see how having the input as a set would help. But I could be misunderstanding what we are trying to achieve. Also, we can still merge this either way.

sesquideus · 2025-01-29T15:08:30Z

I see. No, for now I only wanted to compare frames present in the SOF and those that are also loaded. If it should mean tracking which inputs were actually used for which product, it does not attempt to solve that.

Currently I am only trying to ensure that all inputs are present in the SOF and read (unless not required) and vice versa, that there are no extra / unused files in the SOF (I silently assume that the files are correct and whenever something is missing, it is the fault of whoever wrote the recipe). So it still is a tiny step in the right direction, but for validating the Inputs, not for tracking which are used in the Products.

But I guess we will have to add something similar to the Product class anyway. Then we can populate it manually from within process_images. The hierarchy can simplify it a lot, for instance a RawImageProcessor should be able to process and mark all RAWs as used too.

Again, it seems to me that this will be achieved more easily with sets rather than lists, but I will gladly defer this discussion to the future PR.

sesquideus · 2025-01-30T14:42:29Z

I have found the error, it is the discrepancy between what pyesorex and edps use as their SOF. After removing LM_WCU_OFF_RAW from metis_det_detlin and gain maps and linearity from the other SOFs EDPS completes all its job.

However that makes other pyesorex tests fail. I can xfail the tests now, but to be consistent with the DRLD the files have to be there.

hugobuddel

Thanks @sesquideus , let's merge this and get everything working again!

Some thoughts:

I think we should write our recipes such that they will do the best they can even if input data is missing (or weird). E.g. metis_det_detlin might still be able to do something sensible if there is no LM_WCU_OFF_RAW. Or maybe depending on some flag that can be set. Such a 'just do it' flag might be especially useful for AIT because people might deliberately want to skip some calibration steps. (Persistence should always be optional though.)
The recipes should emit a warning on missing input so it is immediately clear what is wrong.
If a change in the recipes require a change in the workflow, then we can just make that change in the workflow too.

Converted inputs to sets + fixed inputs

a2f8252

sesquideus requested a review from janusbrink January 28, 2025 15:20

hugobuddel mentioned this pull request Jan 29, 2025

PipelineProduct.save(): distinguish between allframes and usedframes #79

Open

Simplified fixtures

5c67b98

Removed files missing from workflows + xfailed tests

8f2f7ee

sesquideus requested a review from hugobuddel January 30, 2025 15:53

hugobuddel approved these changes Jan 31, 2025

View reviewed changes

sesquideus merged commit 49896b0 into main Jan 31, 2025
2 checks passed

sesquideus deleted the mb/usedframes branch January 31, 2025 10:50

sesquideus restored the mb/usedframes branch January 31, 2025 20:35

sesquideus deleted the mb/usedframes branch January 31, 2025 20:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converted inputs to sets + fixed inputs #87

Converted inputs to sets + fixed inputs #87

sesquideus commented Jan 28, 2025 •

edited

Loading

hugobuddel commented Jan 28, 2025

hugobuddel commented Jan 28, 2025

sesquideus commented Jan 28, 2025

janusbrink commented Jan 28, 2025

sesquideus commented Jan 28, 2025

hugobuddel commented Jan 28, 2025

hugobuddel commented Jan 28, 2025

sesquideus commented Jan 29, 2025 •

edited

Loading

janusbrink commented Jan 29, 2025

sesquideus commented Jan 29, 2025

hugobuddel commented Jan 29, 2025

sesquideus commented Jan 29, 2025

sesquideus commented Jan 30, 2025

hugobuddel left a comment

Converted inputs to sets + fixed inputs #87

Converted inputs to sets + fixed inputs #87

Conversation

sesquideus commented Jan 28, 2025 • edited Loading

hugobuddel commented Jan 28, 2025

hugobuddel commented Jan 28, 2025

sesquideus commented Jan 28, 2025

janusbrink commented Jan 28, 2025

sesquideus commented Jan 28, 2025

hugobuddel commented Jan 28, 2025

hugobuddel commented Jan 28, 2025

sesquideus commented Jan 29, 2025 • edited Loading

janusbrink commented Jan 29, 2025

sesquideus commented Jan 29, 2025

hugobuddel commented Jan 29, 2025

sesquideus commented Jan 29, 2025

sesquideus commented Jan 30, 2025

hugobuddel left a comment

Choose a reason for hiding this comment

sesquideus commented Jan 28, 2025 •

edited

Loading

sesquideus commented Jan 29, 2025 •

edited

Loading