forked from apache/airflow
-
Notifications
You must be signed in to change notification settings - Fork 4
Commit d52cb34
authored
Refactor dag processor for code clarity (apache#46443)
Dag processor has experienced a lot of organic growth over the years and had gotten into a state where it became a bit hard to understand. E.g. method set_files did not have a name that clearly indicated its purpose. And that's probably in part because it did a bunch of things and was called in a bunch of places. I simplify that method and focus on just one aspect of its prior work, namely handling removed files, and call it handle_removed_files.
A lot of the behavior was driven by a data structure on the instance called file paths. This contained all known files. It's one thing that was modified in set_files. From looking at the main loop it was not obvious where this structure was being used or modified. So I made it a local variable instead of instance attr. And now we can easily see all the methods that are using it because it must be passed around.
I rename the file_paths to known_files because that is clearer about its meaning. Previously it was file_paths and file_queue -- harder to understand the different purposes. I make file_paths a dictionary because then it's easier to replace all the files in the bundle, something that was previously done by iterating through the files.
In prepare file paths, I pull out the mtime mode logic into its own method because it's quite involved and made the prepare file paths method too big and complicated. Along with this I simplify the logic to not exclude recently processed files if they were recently changed.
In some of the tests, I had to change the way we simulated mtime numbers because the input is now a set which does not guarantee order. So I encode the mtime in the filename in the test. And the one test dealing with zip file, this was apparently a flakey test. I change it so we don't mock anything but just copy the file to a tmp dir and make a bundle there, then remove it and see what happens.
In clear_orphaned_import_errors I no longer pass the entire list of known dag files. Cus there could be a lot of them.1 parent 4f2743c commit d52cb34Copy full SHA for d52cb34
File tree
4 files changed
+310
-243
lines changed- airflow
- dag_processing
- models
- tests
- dag_processing
- models
4 files changed
+310
-243
lines changed
0 commit comments