-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: Child notebooks imported from %run magic line are not assessed #2155
Comments
Databricks supports importing notebooks only if they do not have an extension and the %run magic line is the only line. If this is not the first cell, we already support this behavior via |
## Changes register child notebooks from magic lines in dependency-graph ### Linked issues Progresses #2155 ### Functionality None ### Tests - [x] added unit tests Co-authored-by: Eric Vergnaud <[email protected]>
Thanks to #2164, the notebooks are now assessed, but that assessment is not performed in context i.e. the notebook is not aware of global variables set by the parent notebook, which may lead to avoidable Advices. |
## Changes The current code lints local files without considering parent/child relationships (where a given 'parent' file imports or runs a 'child' file). This makes it impossible to lint files in context, where a child file uses variables inherited from a parent file. This PR is progress towards that, by linting child dependencies recursively. ### Linked issues Progresses #2155 Progresses #2156 ### Functionality - [x] modified existing command: `databricks labs ucx lint-local-code` ### Tests - [x] checked integration tests --------- Co-authored-by: Eric Vergnaud <[email protected]>
## Changes The current code lints local files without considering parent/child relationships (where a given 'parent' file imports or runs a 'child' file). This makes it impossible to lint files in context, where a child file uses variables inherited from a parent file. This PR is progress towards that, by linting child dependencies recursively. ### Linked issues Progresses #2155 Progresses #2156 ### Functionality - [x] modified existing command: `databricks labs ucx lint-local-code` ### Tests - [x] checked integration tests --------- Co-authored-by: Eric Vergnaud <[email protected]>
## Changes Rename GraphBuilder to PythonCodeAnalyzer and evolve API This is in preparation of #2236 ### Linked issues Progresses #2155 Progresses #2156 ### Functionality None ### Tests - [x] ran unit tests --------- Co-authored-by: Eric Vergnaud <[email protected]>
## Changes Enhance python abstract syntax tree API in preparation of linting with inherited context Move Tree static methods to TreeHelper class to avoid 'too many public methods' linting error ### Linked issues Progresses #2155 Progresses #2156 ### Functionality None ### Tests - [x] added unit tests --------- Co-authored-by: Eric Vergnaud <[email protected]>
## Changes Have `DependencyGraph` compute route from root file/notebook to child file/notebook. That route will be later used to populate an `InheritedContext` Have dependencies convey a property for inheriting context, such that context will be computed when but only when required This is in preparation of #2236 ### Linked issues Progresses #2155 Progresses #2156 ### Functionality None ### Tests - [x] added unit tests --------- Co-authored-by: Eric Vergnaud <[email protected]>
* Fixed codec error in md ([#2234](#2234)). In this release, we have addressed a codec error in the `md` file that caused issues on Windows machines due to the presence of curly quotes. This has been resolved by replacing curly quotes with straight quotes. The affected code pertains to the `.setJobGroup` pattern in the `SparkContext` where `spark.addTag()` is used to attach a tag, and `getTags()` and `interruptTag(tag)` are used to act upon the presence or absence of a tag. These APIs are specific to Spark Connect (Shared Compute Mode) and will not work in `Assigned` access mode. Additionally, the release includes updates to the README.md file, providing solutions for various issues related to UCX installation and configuration. These changes aim to improve the user experience and ensure a smooth installation process for software engineers adopting the project. This release also enhances compatibility and reliability of the code for users across various operating systems. The changes were co-authored by Cor and address issue [#2234](#2234). Please note that this release does not provide medical advice or treatment and should not be used as a substitute for professional medical advice. It also does not process Protected Health Information (PHI) as defined in the Health Insurance Portability and Accountability Act of 1996, unless certain conditions are met. All names used in the tool have been synthetically generated and do not map back to any actual persons or locations. * Group manager optimisation: during group enumeration only request the attributes that are needed ([#2240](#2240)). In this optimization update to the `groups.py` file, the `_list_workspace_groups` function has been modified to reduce the number of attributes requested during group enumeration to the minimum set necessary. This improvement is achieved by removing the `members` attribute from the list of requested attributes when it is requested during enumeration. For each group returned by `self._ws.groups.list`, the function now checks if the group is out of scope and, if not, retrieves the group with all its attributes using the `_get_group` function. Additionally, the new `scan_attributes` variable limits the attributes requested during the initial enumeration to "id", "displayName", and "meta". This optimization reduces the risk of timeouts caused by large attributes and improves the performance of group enumeration, particularly in cases where members are requested during enumeration due to API issues. * Group migration: additional logging ([#2239](#2239)). In this release, we have implemented logging improvements for group migration within the group manager. These enhancements include the addition of new informational and debug logs aimed at helping to understand potential issues during group migration. The affected functionality includes the existing workflow `group-migration`. New logging statements have been added to numerous methods, such as `rename_groups`, `_rename_group`, `_wait_for_rename`, `_wait_for_renamed_groups`, `reflect_account_groups_on_workspace`, `delete_original_workspace_groups`, and `validate_group_membership`, as well as data retrieval methods including `_workspace_groups_in_workspace`, `_account_groups_in_workspace`, and `_account_groups_in_account`. These changes will provide increased visibility into the group migration process, including starting to rename/reflect groups, checking for renamed groups, and validating group membership. * Group migration: improve robustness while deleting workspace groups ([#2247](#2247)). This pull request introduces changes to the group manager aimed at enhancing the reliability of deleting workspace groups, addressing an issue where deletion was being skipped for groups that had recently been renamed due to eventual consistency concerns. The changes involve double-checking the deletion of groups by ensuring they can no longer be directly retrieved from the API and are no longer present in the list of groups during enumeration. Additionally, logging has been improved, and the renaming of groups will be updated in a subsequent pull request. The `remove-workspace-local-backup-groups` workflow and related tests have been modified, and new classes indicating incomplete deletion or rename operations have been implemented. These changes improve the robustness of deleting workspace groups, reducing the likelihood of issues arising post-deletion and enhancing overall system consistency. * Improve error messages in case of connection errors ([#2210](#2210)). In this release, we've made significant improvements to error messages for connection errors in the `databricks labs ucx (un)install` command, addressing part of issue [#1323](#1323). The changes include the addition of a new import, `RequestsConnectionError` from the `requests` package, and updates to the error handling in the `run` method to provide clearer and more informative messages during connection problems. A new `except` block has been added to handle `TimeoutError` exceptions caused by `RequestsConnectionError`, logging a warning message with information on troubleshooting network connectivity issues. The `configure` method has also been updated with a docstring noting that connection errors are not handled within it. To ensure the improvements work as expected, we've added new manual and integration tests, including a test for a simulated workspace with no internet connection, and a new function to configure such a workspace. The test checks for the presence of a specific warning message in the log output. The changes also include new type annotations and imports. The target audience for this update includes software engineers adopting the project, who will benefit from clearer error messages and guidance when troubleshooting connection problems. * Increase timeout for sequence of slow preliminary jobs ([#2222](#2222)). In this enhancement, the timeout duration for a series of slow preliminary jobs has been increased from 4 minutes to 6 minutes, addressing issue [#2219](#2219). The modification is implemented in the `test_running_real_remove_backup_groups_job` function in the `tests/integration/install/test_installation.py` file, where the `get_group` function's `retried` decorator timeout is updated from 4 minutes to 6 minutes. This change improves the system's handling of slow preliminary jobs by allowing more time for the API to delete a group and minimizing errors resulting from insufficient deletion time. The overall functionality and tests of the system remain unaffected. * Init `RuntimeContext` from debug notebook to simplify interactive debugging flows ([#2253](#2253)). In this release, we have implemented a change to simplify interactive debugging flows in UCX workflows. We have introduced a new feature that initializes the `RuntimeContext` object from a debug notebook. The `RuntimeContext` is a subclass of `GlobalContext` that manages all object dependencies. Previously, all UCX workflows used a `RuntimeContext` instance for any object lookup, which could be complex during debugging. This change pre-initializes the `RuntimeContext` object correctly, making it easier to perform interactive debugging. Additionally, we have replaced the use of `Installation.load_local` and `WorkspaceClient` with the newly initialized `RuntimeContext` object. This reduces the complexity of object lookup and simplifies the code for debugging purposes. Overall, this change will make it easier to debug UCX workflows by pre-initializing the `RuntimeContext` object with the necessary configurations. * Lint child dependencies recursively ([#2226](#2226)). In this release, we've implemented significant changes to our linting process for enhanced context awareness, particularly in the context of parent-child file relationships. The `DependencyGraph` class in the `graph.py` module has been updated with new methods, including `parent`, `root_dependencies`, `root_paths`, and `root_relative_names`, and an improved `_relative_names` method. These changes allow for more accurate linting of child dependencies. The `lint` function in the `files.py` module has also been modified to accept new parameters and utilize a recursive linting approach for child dependencies. The `databricks labs ucx lint-local-code` command has been updated to include a `paths` parameter and lint child dependencies recursively, improving the linting process by considering parent-child relationships and resulting in better contextual code analysis. The release contains integration tests to ensure the functionality of these changes, addressing issues [#2155](#2155) and [#2156](#2156). * Removed deprecated `install.sh` script ([#2217](#2217)). In this release, we have removed the deprecated `install.sh` script from the codebase, which was previously used to install and set up the environment for the project. This script would check for the presence of Python binaries, identify the latest version, create a virtual environment, and install project dependencies. Going forward, developers will need to utilize an alternative method for installing and setting up the project environment, as the use of this script is now obsolete. We recommend consulting the updated documentation for guidance on the new installation process. * Tentatively fix failure when running assessment without a hive_metastore ([#2252](#2252)). In this update, we have enhanced the error handling of the `LocalCheckoutContext` class in the `workspace_cli.py` file. Specifically, we have addressed the issue where a fatal failure occurred when running an assessment without a Hive metastore ([#2252](#2252)) by implementing a more graceful error handling mechanism. Now, when the metastore fails to load during the initialization of a `LinterContext` object, a warning message is logged instead, and the `MigrationIndex` is initialized with an empty list. This change is linked to the resolution of issue [#2221](#2221). Additionally, we have imported the `MigrationIndex` class from the `hive_metastore.migration_status` module and added a logger to the module. However, please note that functional tests for this specific modification have not been conducted. * Total Storage Credentials count widget for Assessment Dashboard ([#2201](#2201)). In this commit, a new widget has been added to the Assessment Dashboard that displays the current total number of storage credentials created in the workspace, up to a limit of 200. This change includes a new SQL query to retrieve the count of storage credentials from the `inventory.external_locations` table and modifies the display of the widget with customized settings. Additionally, a new warning mechanism has been implemented to prevent migration from exceeding the UC storage credentials limit of 200. A new method, `get_roles_to_migrate`, has been added to `access.py` to retrieve the roles that need to be migrated. If the number of roles exceeds 200, a `RuntimeWarning` is raised. User documentation and manual testing have been updated to reflect these changes, but no unit or integration tests have been added yet. This feature is part of the implementation of issue [#1600](#1600) and is co-authored by Serge Smertin. * Updated dashboard install using latest `lsql` release ([#2246](#2246)). In this release, the install function for the UCX dashboard has been updated in the `databricks/labs/ucx/install.py` file to use the latest `lsql` release. The `databricks labs instal ucx` command has been modified to accommodate the updated `lsql` version and now includes new methods for upgrading dashboards from Redash to Lakeview, as well as creating and deleting dashboards in Lakeview, which also feature functionality to publish dashboards. The changes have been manually tested and verified on a staging environment. The query formatting in the dashboard has been improved, and the `--width` parameter is no longer necessary in certain instances. This update streamlines the dashboard installation process, enhances its functionality, and ensures its compatibility with the latest `lsql` release. * Updated sqlglot requirement from <25.7,>=25.5.0 to >=25.5.0,<25.8 ([#2248](#2248)). In this update, we have adjusted the version requirements for the SQL transpiler library, sqlglot, in our pyproject.toml file. The requirement has been updated from ">=25.5.0, <25.7" to ">=25.5.0, <25.8", allowing us to utilize the latest features and bug fixes available in sqlglot version 25.7.0 while still maintaining our previous version constraint. The changelog from sqlglot's repository has been included in this commit, detailing the new features and improvements introduced in version 25.7.0. A list of commits made since the previous version is also provided. The diff of this commit shows that the change only affects the version constraint for sqlglot and does not impact any other parts of the codebase. This update ensures that we are using the most recent stable version of sqlglot while maintaining backward compatibility. Dependency updates: * Updated sqlglot requirement from <25.7,>=25.5.0 to >=25.5.0,<25.8 ([#2248](#2248)).
* Fixed codec error in md ([#2234](#2234)). In this release, we have addressed a codec error in the `md` file that caused issues on Windows machines due to the presence of curly quotes. This has been resolved by replacing curly quotes with straight quotes. The affected code pertains to the `.setJobGroup` pattern in the `SparkContext` where `spark.addTag()` is used to attach a tag, and `getTags()` and `interruptTag(tag)` are used to act upon the presence or absence of a tag. These APIs are specific to Spark Connect (Shared Compute Mode) and will not work in `Assigned` access mode. Additionally, the release includes updates to the README.md file, providing solutions for various issues related to UCX installation and configuration. These changes aim to improve the user experience and ensure a smooth installation process for software engineers adopting the project. This release also enhances compatibility and reliability of the code for users across various operating systems. The changes were co-authored by Cor and address issue [#2234](#2234). Please note that this release does not provide medical advice or treatment and should not be used as a substitute for professional medical advice. It also does not process Protected Health Information (PHI) as defined in the Health Insurance Portability and Accountability Act of 1996, unless certain conditions are met. All names used in the tool have been synthetically generated and do not map back to any actual persons or locations. * Group manager optimisation: during group enumeration only request the attributes that are needed ([#2240](#2240)). In this optimization update to the `groups.py` file, the `_list_workspace_groups` function has been modified to reduce the number of attributes requested during group enumeration to the minimum set necessary. This improvement is achieved by removing the `members` attribute from the list of requested attributes when it is requested during enumeration. For each group returned by `self._ws.groups.list`, the function now checks if the group is out of scope and, if not, retrieves the group with all its attributes using the `_get_group` function. Additionally, the new `scan_attributes` variable limits the attributes requested during the initial enumeration to "id", "displayName", and "meta". This optimization reduces the risk of timeouts caused by large attributes and improves the performance of group enumeration, particularly in cases where members are requested during enumeration due to API issues. * Group migration: additional logging ([#2239](#2239)). In this release, we have implemented logging improvements for group migration within the group manager. These enhancements include the addition of new informational and debug logs aimed at helping to understand potential issues during group migration. The affected functionality includes the existing workflow `group-migration`. New logging statements have been added to numerous methods, such as `rename_groups`, `_rename_group`, `_wait_for_rename`, `_wait_for_renamed_groups`, `reflect_account_groups_on_workspace`, `delete_original_workspace_groups`, and `validate_group_membership`, as well as data retrieval methods including `_workspace_groups_in_workspace`, `_account_groups_in_workspace`, and `_account_groups_in_account`. These changes will provide increased visibility into the group migration process, including starting to rename/reflect groups, checking for renamed groups, and validating group membership. * Group migration: improve robustness while deleting workspace groups ([#2247](#2247)). This pull request introduces changes to the group manager aimed at enhancing the reliability of deleting workspace groups, addressing an issue where deletion was being skipped for groups that had recently been renamed due to eventual consistency concerns. The changes involve double-checking the deletion of groups by ensuring they can no longer be directly retrieved from the API and are no longer present in the list of groups during enumeration. Additionally, logging has been improved, and the renaming of groups will be updated in a subsequent pull request. The `remove-workspace-local-backup-groups` workflow and related tests have been modified, and new classes indicating incomplete deletion or rename operations have been implemented. These changes improve the robustness of deleting workspace groups, reducing the likelihood of issues arising post-deletion and enhancing overall system consistency. * Improve error messages in case of connection errors ([#2210](#2210)). In this release, we've made significant improvements to error messages for connection errors in the `databricks labs ucx (un)install` command, addressing part of issue [#1323](#1323). The changes include the addition of a new import, `RequestsConnectionError` from the `requests` package, and updates to the error handling in the `run` method to provide clearer and more informative messages during connection problems. A new `except` block has been added to handle `TimeoutError` exceptions caused by `RequestsConnectionError`, logging a warning message with information on troubleshooting network connectivity issues. The `configure` method has also been updated with a docstring noting that connection errors are not handled within it. To ensure the improvements work as expected, we've added new manual and integration tests, including a test for a simulated workspace with no internet connection, and a new function to configure such a workspace. The test checks for the presence of a specific warning message in the log output. The changes also include new type annotations and imports. The target audience for this update includes software engineers adopting the project, who will benefit from clearer error messages and guidance when troubleshooting connection problems. * Increase timeout for sequence of slow preliminary jobs ([#2222](#2222)). In this enhancement, the timeout duration for a series of slow preliminary jobs has been increased from 4 minutes to 6 minutes, addressing issue [#2219](#2219). The modification is implemented in the `test_running_real_remove_backup_groups_job` function in the `tests/integration/install/test_installation.py` file, where the `get_group` function's `retried` decorator timeout is updated from 4 minutes to 6 minutes. This change improves the system's handling of slow preliminary jobs by allowing more time for the API to delete a group and minimizing errors resulting from insufficient deletion time. The overall functionality and tests of the system remain unaffected. * Init `RuntimeContext` from debug notebook to simplify interactive debugging flows ([#2253](#2253)). In this release, we have implemented a change to simplify interactive debugging flows in UCX workflows. We have introduced a new feature that initializes the `RuntimeContext` object from a debug notebook. The `RuntimeContext` is a subclass of `GlobalContext` that manages all object dependencies. Previously, all UCX workflows used a `RuntimeContext` instance for any object lookup, which could be complex during debugging. This change pre-initializes the `RuntimeContext` object correctly, making it easier to perform interactive debugging. Additionally, we have replaced the use of `Installation.load_local` and `WorkspaceClient` with the newly initialized `RuntimeContext` object. This reduces the complexity of object lookup and simplifies the code for debugging purposes. Overall, this change will make it easier to debug UCX workflows by pre-initializing the `RuntimeContext` object with the necessary configurations. * Lint child dependencies recursively ([#2226](#2226)). In this release, we've implemented significant changes to our linting process for enhanced context awareness, particularly in the context of parent-child file relationships. The `DependencyGraph` class in the `graph.py` module has been updated with new methods, including `parent`, `root_dependencies`, `root_paths`, and `root_relative_names`, and an improved `_relative_names` method. These changes allow for more accurate linting of child dependencies. The `lint` function in the `files.py` module has also been modified to accept new parameters and utilize a recursive linting approach for child dependencies. The `databricks labs ucx lint-local-code` command has been updated to include a `paths` parameter and lint child dependencies recursively, improving the linting process by considering parent-child relationships and resulting in better contextual code analysis. The release contains integration tests to ensure the functionality of these changes, addressing issues [#2155](#2155) and [#2156](#2156). * Removed deprecated `install.sh` script ([#2217](#2217)). In this release, we have removed the deprecated `install.sh` script from the codebase, which was previously used to install and set up the environment for the project. This script would check for the presence of Python binaries, identify the latest version, create a virtual environment, and install project dependencies. Going forward, developers will need to utilize an alternative method for installing and setting up the project environment, as the use of this script is now obsolete. We recommend consulting the updated documentation for guidance on the new installation process. * Tentatively fix failure when running assessment without a hive_metastore ([#2252](#2252)). In this update, we have enhanced the error handling of the `LocalCheckoutContext` class in the `workspace_cli.py` file. Specifically, we have addressed the issue where a fatal failure occurred when running an assessment without a Hive metastore ([#2252](#2252)) by implementing a more graceful error handling mechanism. Now, when the metastore fails to load during the initialization of a `LinterContext` object, a warning message is logged instead, and the `MigrationIndex` is initialized with an empty list. This change is linked to the resolution of issue [#2221](#2221). Additionally, we have imported the `MigrationIndex` class from the `hive_metastore.migration_status` module and added a logger to the module. However, please note that functional tests for this specific modification have not been conducted. * Total Storage Credentials count widget for Assessment Dashboard ([#2201](#2201)). In this commit, a new widget has been added to the Assessment Dashboard that displays the current total number of storage credentials created in the workspace, up to a limit of 200. This change includes a new SQL query to retrieve the count of storage credentials from the `inventory.external_locations` table and modifies the display of the widget with customized settings. Additionally, a new warning mechanism has been implemented to prevent migration from exceeding the UC storage credentials limit of 200. A new method, `get_roles_to_migrate`, has been added to `access.py` to retrieve the roles that need to be migrated. If the number of roles exceeds 200, a `RuntimeWarning` is raised. User documentation and manual testing have been updated to reflect these changes, but no unit or integration tests have been added yet. This feature is part of the implementation of issue [#1600](#1600) and is co-authored by Serge Smertin. * Updated dashboard install using latest `lsql` release ([#2246](#2246)). In this release, the install function for the UCX dashboard has been updated in the `databricks/labs/ucx/install.py` file to use the latest `lsql` release. The `databricks labs instal ucx` command has been modified to accommodate the updated `lsql` version and now includes new methods for upgrading dashboards from Redash to Lakeview, as well as creating and deleting dashboards in Lakeview, which also feature functionality to publish dashboards. The changes have been manually tested and verified on a staging environment. The query formatting in the dashboard has been improved, and the `--width` parameter is no longer necessary in certain instances. This update streamlines the dashboard installation process, enhances its functionality, and ensures its compatibility with the latest `lsql` release. * Updated sqlglot requirement from <25.7,>=25.5.0 to >=25.5.0,<25.8 ([#2248](#2248)). In this update, we have adjusted the version requirements for the SQL transpiler library, sqlglot, in our pyproject.toml file. The requirement has been updated from ">=25.5.0, <25.7" to ">=25.5.0, <25.8", allowing us to utilize the latest features and bug fixes available in sqlglot version 25.7.0 while still maintaining our previous version constraint. The changelog from sqlglot's repository has been included in this commit, detailing the new features and improvements introduced in version 25.7.0. A list of commits made since the previous version is also provided. The diff of this commit shows that the change only affects the version constraint for sqlglot and does not impact any other parts of the codebase. This update ensures that we are using the most recent stable version of sqlglot while maintaining backward compatibility. Dependency updates: * Updated sqlglot requirement from <25.7,>=25.5.0 to >=25.5.0,<25.8 ([#2248](#2248)).
## Changes Introduces `InheritedContext` class which gathers code fragments from file/notebook parents, and uses it when linting child file/notebook ### Linked issues Resolves #2155 Resolves #2156 Resolves #2221 ### Functionality None ### Tests - [x] added unit tests - [x] added functional tests - [x] added integration tests --------- Co-authored-by: Eric Vergnaud <[email protected]> Co-authored-by: Andrew Snare <[email protected]>
* Added troubleshooting guide for self-signed SSL cert related error ([#2346](#2346)). In this release, we have added a troubleshooting guide to the README file to address a specific error that may occur when connecting from a local machine to a Databricks Account and Workspace using a web proxy and self-signed SSL certificate. This error, SSLCertVerificationError, can prevent UCX from connecting to the Account and Workspace. To resolve this issue, users can now set the `REQUESTS_CA_BUNDLE` and `CURL_CA_BUNDLE` environment variables to force the requests library to set `verify=False`, and set the `SSL_CERT_DIR` env var pointing to the proxy CA cert for the urllib3 library. This guide will help users understand and resolve this error, making it easier to connect to Databricks Accounts and Workspaces using a web proxy and self-signed SSL certificate. * Code Compatibility Dashboard: Fix broken links ([#2347](#2347)). In this release, we have addressed and resolved two issues in the Code Compatibility Dashboard of the UCX Migration (Main) project, enhancing its overall usability. Previously, the Markdown panel contained a broken link to the workflow due to an incorrect anchor, and the links in the table widget to the workflow and task definitions did not render correctly. These problems have been rectified, and the dashboard has been manually tested and verified in a staging environment. Additionally, we have updated the `invisibleColumns` section in the SQL file by changing the `fieldName` attribute to 'name', which will now display the `workflow_id` as a link. Before and after screenshots have been provided for visual reference. The corresponding workflow is now referred to as "Jobs Static Code Analysis Workflow". * Filter out missing import problems for imports within a try-except clause with ImportError ([#2332](#2332)). This release introduces changes to handle missing import problems within a try-except clause that catches ImportError. A new method, `_filter_import_problem_in_try_except`, has been added to filter out import-not-found issues when they occur in such a clause, preventing unnecessary build failures. The `_register_import` method now returns an Iterable[DependencyProblem] instead of yielding problems directly. Supporting classes and methods, including Dependency, DependencyGraph, and DependencyProblem from the databricks.labs.ucx.source_code.graph module, as well as FileLoader and PythonCodeAnalyzer from the databricks.labs.ucx.source_code.notebooks.cells module, have been added. The ImportSource.extract_from_tree method has been updated to accept a DependencyProblem object as an argument. Additionally, a new test case has been included for the scenario where a failing import in a try-except clause goes unreported. Issue [#1705](#1705) has been resolved, and unit tests have been added to ensure proper functionality. * Fixed `report-account-compatibility` cli command docstring ([#2340](#2340)). In this release, we have updated the `report-account-compatibility` CLI command's docstring to accurately reflect its functionality, addressing a previous issue where it inadvertently duplicated the `sync-workspace-info` command's description. This command now provides a clear and concise explanation of its purpose: "Report compatibility of all workspaces available in the account." Upon execution, it generates a readiness report for the account, specifically focusing on workspaces where ucx is installed. This enhancement improves the clarity of the CLI's functionality for software engineers, enabling them to understand and effectively utilize the `report-account-compatibility` command. * Fixed broken table migration workflow links in README ([#2286](#2286)). In this release, we have made significant improvements to the README file of our open-source library, including fixing broken links and adding a mermaid flowchart to demonstrate the table migration workflows. The table migration workflow has been renamed to the table migration process, which includes migrating Delta tables, non-Delta tables, external tables, and views. Two optional workflows have been added for migrating HiveSerDe tables in place and for migrating external tables using CTAS. Additionally, the commands related to table migration have been updated, with the table migration workflow being renamed to the table migration process. These changes are aimed at providing a more comprehensive understanding of the table migration process and enhancing the overall user experience. * Fixed dashboard queries fail when default catalog is not `hive_metastore` ([#2278](#2278)). In this release, we have addressed an issue where dashboard queries fail when the default catalog is not set to `hive_metastore`. This has been achieved by modifying the existing `databricks labs ucx install` command to always include the `hive_metastore` namespace in dashboard queries. Additionally, the code has been updated to add the `hive_metastore` namespace to the `DashboardMetadata` object used in creating a dashboard from SQL queries in a folder, ensuring queries are executed in the correct database. The commit also includes modifications to the `test_install.py` unit test file to ensure the installation process correctly handles specific configurations related to the `ucx` namespace for managing data storage and retrieval. The changes have been manually tested and verified on a staging environment. * Improve group migration error reporting ([#2344](#2344)). This PR introduces enhancements to the group migration dashboard, focusing on improved error reporting and a more informative user experience. The documentation widgets have been fine-tuned, and the failed-migration widget now provides formatted failure information with a link to the failed job run. The dashboard will display only failures from the latest workflow run, complete with logs. A new link to the job list has been added in the [workflows](/jobs) section of the documentation to assist users in identifying and troubleshooting issues. Additionally, the SQL query for retrieving group migration failure information has been refactored, improving readability and extracting relevant data using regular expressions. The changes have been tested and verified on the staging environment, providing clearer and more actionable insights during group migrations. The PR is related to previous work in [#2333](#2333) and [#1914](#1914), with updates to the UCX Migration (Groups) dashboard, but no new methods have been added. * Improve type checking in cli command ([#2335](#2335)). This release introduces enhanced type checking in the command line interface (CLI) of our open-source library, specifically in the `lint_local_code` function of the `cli.py` file. By utilizing a newly developed local code linter object, the function now performs more rigorous and accurate type checking for potential issues in the local code. While the functionality remains consistent, this improvement is expected to prevent similar occurrences like issue [#2221](#2221), ensuring more robust and reliable code. This change underscores our commitment to delivering a high-quality, efficient, and developer-friendly library. * Lint dependencies in context ([#2236](#2236)). The `InheritedContext` class has been introduced to gather code fragments from parent files or notebooks during linting of child files or notebooks, addressing issues [#2155](#2155), [#2156](#2156), and [#2221](#2221). This new feature includes the addition of the `InheritedContext` class, with methods for building instances from a route of dependencies, appending other `InheritedContext` instances, and finalizing them for use with linting. The `DependencyGraph` class has been updated to support the new functionality, and various classes, methods, and functions for handling the linter context have been added or updated. Unit, functional, and integration tests have been added to ensure the correct functioning of the changes, which improve the linting functionality by allowing it to consider the broader context of the codebase. * Make ucx pylsp plugin configurable ([#2280](#2280)). This commit introduces the ability to configure the ucx pylsp plugin with cluster information, which can be provided either in a file or by a client and is managed by the pylsp infrastructure. The Spark Connect linter is now only applied to UC Shared clusters, as Single-User clusters run in Spark Classic mode. A new entry point `pylsp_ucx` has been added to the pylsp configuration file. The changes affect the pylsp plugin configuration and the application of the Spark Connect linter. Unit tests and manual testing have been conducted, but integration tests and verification on a staging environment are not included in this release. * New dashboard: group migration, showing groups that failed to migrate ([#2333](#2333)). In this release, we have developed a new dashboard for monitoring group migration in the UCX Migration (Groups) workspace. This dashboard includes a widget displaying messages related to groups that failed to migrate during the `migrate-groups-experimental` workflow, aiding users in identifying and addressing migration issues. The group migration process consists of several steps, including renaming workspace groups, provisioning account-level groups, and replicating permissions. The release features new methods for displaying and monitoring migration-related messages, as well as links to documentation and workflows for assessing, validating, and removing workspace-level groups post-migration. The new dashboard is currently not connected to the existing system, but it has undergone manual testing and verification on the staging environment. The changes include the addition of a new SQL query file to implement the logic for fetching group migration failures and a new Markdown file displaying the Group Migration Failures section. * Support spaces in run cmd args ([#2330](#2330)). The recent commit resolves an issue where the system had trouble handling spaces in command-line arguments when running subprocesses. The previous implementation only accepted a full command line, which it would split on spaces, causing problems when the command line contained arguments with spaces. The new implementation supports argument lists, which are passed `as is` to `Popen`, allowing for proper handling of command lines with spaces. This change is incorporated in the `run_command` function of the `utils.py` file and the `_install_pip` method of the `PythonLibraryResolver` class. The `shlex.join()` function has been replaced with direct string formatting for increased flexibility. The feature is intended for use with the `PythonLibraryResolver` class and is co-authored by Eric Vergnaud and Andrew Snare. Integration tests have been enabled to ensure the proper functioning of the updated code. * Updated error messages for SparkConnect linter ([#2348](#2348)). The SparkConnect linter's error messages have been updated to improve clarity and precision. The term `UC Shared clusters` has been replaced with `Unity Catalog clusters in Shared access mode` throughout the codebase, affecting messages related to various unsupported functionalities or practices on these clusters. These changes include warnings about direct Spark log level setting, accessing the Spark Driver JVM or its logger, using `sc`, and employing RDD APIs. This revision enhances user experience by providing more accurate and descriptive error messages, enabling them to better understand and address the issues in their code. The functionality of the linter remains unchanged. * Updated sqlglot requirement from <25.8,>=25.5.0 to >=25.5.0,<25.9 ([#2279](#2279)). In this update, we have updated the required version range of the `sqlglot` dependency in the 'pyproject.toml' file from 'sqlglot>=25.5.0,<25.8' to 'sqlglot>=25.5.0,<25.9'. This change allows the project to utilize any version of `sqlglot` that is greater than or equal to 25.5.0 and less than 25.9, including the latest version. The update also includes a changelog for the updated version range, sourced from 'sqlglot's official changelog. This changelog includes various bug fixes and new features for several dialects such as BigQuery, DuckDB, and tSQL. Additionally, the parser has undergone some refactors and improvements. The commits section lists the individual commits included in this update. Dependency updates: * Updated sqlglot requirement from <25.8,>=25.5.0 to >=25.5.0,<25.9 ([#2279](#2279)).
* Added troubleshooting guide for self-signed SSL cert related error ([#2346](#2346)). In this release, we have added a troubleshooting guide to the README file to address a specific error that may occur when connecting from a local machine to a Databricks Account and Workspace using a web proxy and self-signed SSL certificate. This error, SSLCertVerificationError, can prevent UCX from connecting to the Account and Workspace. To resolve this issue, users can now set the `REQUESTS_CA_BUNDLE` and `CURL_CA_BUNDLE` environment variables to force the requests library to set `verify=False`, and set the `SSL_CERT_DIR` env var pointing to the proxy CA cert for the urllib3 library. This guide will help users understand and resolve this error, making it easier to connect to Databricks Accounts and Workspaces using a web proxy and self-signed SSL certificate. * Code Compatibility Dashboard: Fix broken links ([#2347](#2347)). In this release, we have addressed and resolved two issues in the Code Compatibility Dashboard of the UCX Migration (Main) project, enhancing its overall usability. Previously, the Markdown panel contained a broken link to the workflow due to an incorrect anchor, and the links in the table widget to the workflow and task definitions did not render correctly. These problems have been rectified, and the dashboard has been manually tested and verified in a staging environment. Additionally, we have updated the `invisibleColumns` section in the SQL file by changing the `fieldName` attribute to 'name', which will now display the `workflow_id` as a link. Before and after screenshots have been provided for visual reference. The corresponding workflow is now referred to as "Jobs Static Code Analysis Workflow". * Filter out missing import problems for imports within a try-except clause with ImportError ([#2332](#2332)). This release introduces changes to handle missing import problems within a try-except clause that catches ImportError. A new method, `_filter_import_problem_in_try_except`, has been added to filter out import-not-found issues when they occur in such a clause, preventing unnecessary build failures. The `_register_import` method now returns an Iterable[DependencyProblem] instead of yielding problems directly. Supporting classes and methods, including Dependency, DependencyGraph, and DependencyProblem from the databricks.labs.ucx.source_code.graph module, as well as FileLoader and PythonCodeAnalyzer from the databricks.labs.ucx.source_code.notebooks.cells module, have been added. The ImportSource.extract_from_tree method has been updated to accept a DependencyProblem object as an argument. Additionally, a new test case has been included for the scenario where a failing import in a try-except clause goes unreported. Issue [#1705](#1705) has been resolved, and unit tests have been added to ensure proper functionality. * Fixed `report-account-compatibility` cli command docstring ([#2340](#2340)). In this release, we have updated the `report-account-compatibility` CLI command's docstring to accurately reflect its functionality, addressing a previous issue where it inadvertently duplicated the `sync-workspace-info` command's description. This command now provides a clear and concise explanation of its purpose: "Report compatibility of all workspaces available in the account." Upon execution, it generates a readiness report for the account, specifically focusing on workspaces where ucx is installed. This enhancement improves the clarity of the CLI's functionality for software engineers, enabling them to understand and effectively utilize the `report-account-compatibility` command. * Fixed broken table migration workflow links in README ([#2286](#2286)). In this release, we have made significant improvements to the README file of our open-source library, including fixing broken links and adding a mermaid flowchart to demonstrate the table migration workflows. The table migration workflow has been renamed to the table migration process, which includes migrating Delta tables, non-Delta tables, external tables, and views. Two optional workflows have been added for migrating HiveSerDe tables in place and for migrating external tables using CTAS. Additionally, the commands related to table migration have been updated, with the table migration workflow being renamed to the table migration process. These changes are aimed at providing a more comprehensive understanding of the table migration process and enhancing the overall user experience. * Fixed dashboard queries fail when default catalog is not `hive_metastore` ([#2278](#2278)). In this release, we have addressed an issue where dashboard queries fail when the default catalog is not set to `hive_metastore`. This has been achieved by modifying the existing `databricks labs ucx install` command to always include the `hive_metastore` namespace in dashboard queries. Additionally, the code has been updated to add the `hive_metastore` namespace to the `DashboardMetadata` object used in creating a dashboard from SQL queries in a folder, ensuring queries are executed in the correct database. The commit also includes modifications to the `test_install.py` unit test file to ensure the installation process correctly handles specific configurations related to the `ucx` namespace for managing data storage and retrieval. The changes have been manually tested and verified on a staging environment. * Improve group migration error reporting ([#2344](#2344)). This PR introduces enhancements to the group migration dashboard, focusing on improved error reporting and a more informative user experience. The documentation widgets have been fine-tuned, and the failed-migration widget now provides formatted failure information with a link to the failed job run. The dashboard will display only failures from the latest workflow run, complete with logs. A new link to the job list has been added in the [workflows](/jobs) section of the documentation to assist users in identifying and troubleshooting issues. Additionally, the SQL query for retrieving group migration failure information has been refactored, improving readability and extracting relevant data using regular expressions. The changes have been tested and verified on the staging environment, providing clearer and more actionable insights during group migrations. The PR is related to previous work in [#2333](#2333) and [#1914](#1914), with updates to the UCX Migration (Groups) dashboard, but no new methods have been added. * Improve type checking in cli command ([#2335](#2335)). This release introduces enhanced type checking in the command line interface (CLI) of our open-source library, specifically in the `lint_local_code` function of the `cli.py` file. By utilizing a newly developed local code linter object, the function now performs more rigorous and accurate type checking for potential issues in the local code. While the functionality remains consistent, this improvement is expected to prevent similar occurrences like issue [#2221](#2221), ensuring more robust and reliable code. This change underscores our commitment to delivering a high-quality, efficient, and developer-friendly library. * Lint dependencies in context ([#2236](#2236)). The `InheritedContext` class has been introduced to gather code fragments from parent files or notebooks during linting of child files or notebooks, addressing issues [#2155](#2155), [#2156](#2156), and [#2221](#2221). This new feature includes the addition of the `InheritedContext` class, with methods for building instances from a route of dependencies, appending other `InheritedContext` instances, and finalizing them for use with linting. The `DependencyGraph` class has been updated to support the new functionality, and various classes, methods, and functions for handling the linter context have been added or updated. Unit, functional, and integration tests have been added to ensure the correct functioning of the changes, which improve the linting functionality by allowing it to consider the broader context of the codebase. * Make ucx pylsp plugin configurable ([#2280](#2280)). This commit introduces the ability to configure the ucx pylsp plugin with cluster information, which can be provided either in a file or by a client and is managed by the pylsp infrastructure. The Spark Connect linter is now only applied to UC Shared clusters, as Single-User clusters run in Spark Classic mode. A new entry point `pylsp_ucx` has been added to the pylsp configuration file. The changes affect the pylsp plugin configuration and the application of the Spark Connect linter. Unit tests and manual testing have been conducted, but integration tests and verification on a staging environment are not included in this release. * New dashboard: group migration, showing groups that failed to migrate ([#2333](#2333)). In this release, we have developed a new dashboard for monitoring group migration in the UCX Migration (Groups) workspace. This dashboard includes a widget displaying messages related to groups that failed to migrate during the `migrate-groups-experimental` workflow, aiding users in identifying and addressing migration issues. The group migration process consists of several steps, including renaming workspace groups, provisioning account-level groups, and replicating permissions. The release features new methods for displaying and monitoring migration-related messages, as well as links to documentation and workflows for assessing, validating, and removing workspace-level groups post-migration. The new dashboard is currently not connected to the existing system, but it has undergone manual testing and verification on the staging environment. The changes include the addition of a new SQL query file to implement the logic for fetching group migration failures and a new Markdown file displaying the Group Migration Failures section. * Support spaces in run cmd args ([#2330](#2330)). The recent commit resolves an issue where the system had trouble handling spaces in command-line arguments when running subprocesses. The previous implementation only accepted a full command line, which it would split on spaces, causing problems when the command line contained arguments with spaces. The new implementation supports argument lists, which are passed `as is` to `Popen`, allowing for proper handling of command lines with spaces. This change is incorporated in the `run_command` function of the `utils.py` file and the `_install_pip` method of the `PythonLibraryResolver` class. The `shlex.join()` function has been replaced with direct string formatting for increased flexibility. The feature is intended for use with the `PythonLibraryResolver` class and is co-authored by Eric Vergnaud and Andrew Snare. Integration tests have been enabled to ensure the proper functioning of the updated code. * Updated error messages for SparkConnect linter ([#2348](#2348)). The SparkConnect linter's error messages have been updated to improve clarity and precision. The term `UC Shared clusters` has been replaced with `Unity Catalog clusters in Shared access mode` throughout the codebase, affecting messages related to various unsupported functionalities or practices on these clusters. These changes include warnings about direct Spark log level setting, accessing the Spark Driver JVM or its logger, using `sc`, and employing RDD APIs. This revision enhances user experience by providing more accurate and descriptive error messages, enabling them to better understand and address the issues in their code. The functionality of the linter remains unchanged. * Updated sqlglot requirement from <25.8,>=25.5.0 to >=25.5.0,<25.9 ([#2279](#2279)). In this update, we have updated the required version range of the `sqlglot` dependency in the 'pyproject.toml' file from 'sqlglot>=25.5.0,<25.8' to 'sqlglot>=25.5.0,<25.9'. This change allows the project to utilize any version of `sqlglot` that is greater than or equal to 25.5.0 and less than 25.9, including the latest version. The update also includes a changelog for the updated version range, sourced from 'sqlglot's official changelog. This changelog includes various bug fixes and new features for several dialects such as BigQuery, DuckDB, and tSQL. Additionally, the parser has undergone some refactors and improvements. The commits section lists the individual commits included in this update. Dependency updates: * Updated sqlglot requirement from <25.8,>=25.5.0 to >=25.5.0,<25.9 ([#2279](#2279)).
Is there an existing issue for this?
Current Behavior
Child notebooks imported from %run magic line are not assessed
Expected Behavior
Child notebooks imported from %run magic line should be assessed by being added to the dependency graph
Steps To Reproduce
No response
Cloud
AWS
Operating System
macOS
Version
latest via Databricks CLI
Relevant log output
No response
The text was updated successfully, but these errors were encountered: