Skip to content

Commit d52cb34

Browse files
authored
Refactor dag processor for code clarity (apache#46443)
Dag processor has experienced a lot of organic growth over the years and had gotten into a state where it became a bit hard to understand. E.g. method set_files did not have a name that clearly indicated its purpose. And that's probably in part because it did a bunch of things and was called in a bunch of places. I simplify that method and focus on just one aspect of its prior work, namely handling removed files, and call it handle_removed_files. A lot of the behavior was driven by a data structure on the instance called file paths. This contained all known files. It's one thing that was modified in set_files. From looking at the main loop it was not obvious where this structure was being used or modified. So I made it a local variable instead of instance attr. And now we can easily see all the methods that are using it because it must be passed around. I rename the file_paths to known_files because that is clearer about its meaning. Previously it was file_paths and file_queue -- harder to understand the different purposes. I make file_paths a dictionary because then it's easier to replace all the files in the bundle, something that was previously done by iterating through the files. In prepare file paths, I pull out the mtime mode logic into its own method because it's quite involved and made the prepare file paths method too big and complicated. Along with this I simplify the logic to not exclude recently processed files if they were recently changed. In some of the tests, I had to change the way we simulated mtime numbers because the input is now a set which does not guarantee order. So I encode the mtime in the filename in the test. And the one test dealing with zip file, this was apparently a flakey test. I change it so we don't mock anything but just copy the file to a tmp dir and make a bundle there, then remove it and see what happens. In clear_orphaned_import_errors I no longer pass the entire list of known dag files. Cus there could be a lot of them.
1 parent 4f2743c commit d52cb34

File tree

4 files changed

+310
-243
lines changed

4 files changed

+310
-243
lines changed

0 commit comments

Comments
 (0)