Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-6114: Static path matching #6146

Draft
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

dantleech
Copy link
Contributor

@dantleech dantleech commented Mar 10, 2025

This (early stage) WIP PR introduces a static path matcher which intends to emulate the behavior of the PHPUnit FileIterator in order to prevent PHPUnit traversing the filesystem when a deprecation is triggered.

The PHPUnit FileIterator uses glob to find directories and we therefore need to support the glob patterns - which can vary according to the platform. This PR uses https://man7.org/linux/man-pages/man7/glob.7.html as a reference in addition to testing the behavior locally to confirm assumptions.

The webmozart/glob provides a similar feature however it's behavior is different as it supports curly braces, and * is restricted to a single directory level, while * in PHPUnit will return all descendants and I'm sure there are other differences - however I've used that as a starting point.

TODO:

  • Escaping unhandled regex characters, or escaping the regex before parsing the pattern.
  • Character classes [:alnum:] etc.
  • Collating symbols
  • Equivalence class expressions
  • Emulating glob behavior of unterminated [ character groups.
  • Seems to be a PHPUnit globstar bug whereby /a** will match /b and all other directories, where as /ab* will not match anything. We can either copy that behavior or "fix" it.

and maybe writing the implementation from scratch if regex turns out to be a bad fit.

Usages on Github:

@dantleech dantleech marked this pull request as draft March 10, 2025 12:07
@sebastianbergmann sebastianbergmann added feature/test-runner CLI test runner type/performance Issues related to resource consumption (time and memory) labels Mar 10, 2025

class FileMatcherPattern
{
public function __construct(public string $path)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will/could include the $suffix, $prefix and $exclude also

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or could simply remove this and handle the prefix/suffix independently of the "file matcher"

@dantleech dantleech force-pushed the gh-6114-source-map-no-fs branch from b2da2f7 to 797d6b7 Compare March 12, 2025 11:47
@dantleech
Copy link
Contributor Author

Have refactored to tokenize the glob string, while less performant it's easier to reason about and we only need to compile the regex for each <include or <exclude - not for each path.

if ($bracketOpen === true && $type === self::T_BRACKET_OPEN) {
// if bracket is already open, interpret everything as a
// literal char
$resolved[] = [self::T_CHAR, $char];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename T_CHAR => T_LITERAL?

) {
$resolved[] = [self::T_GLOBSTAR, '**'];

// we eat the two `*` in addition to the slash
Copy link
Contributor Author

@dantleech dantleech Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we eat two * AND the slash

@dantleech dantleech force-pushed the gh-6114-source-map-no-fs branch 3 times, most recently from 792bae8 to 6f99ddc Compare March 12, 2025 23:21
continue;
}

$resolved[] = [$type, $char];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this handles any "left over tokens" - including T_ASTERISK - maybe all tokens should be handled explicitly?

for ($i = 0; $i < $length; $i++) {
$c = $glob[$i];

$tokens[] = match ($c) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could inline $c

@dantleech dantleech force-pushed the gh-6114-source-map-no-fs branch from ba1d99d to 6fc5179 Compare March 13, 2025 11:56
self::T_BRACKET_CLOSE => ']',
self::T_HYPHEN => '-',
self::T_COLON => ':',
self::T_BACKSLASH => '\\',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COLON and BACKSLASH are not tested

@dantleech dantleech force-pushed the gh-6114-source-map-no-fs branch from 6fc5179 to e07a96f Compare March 15, 2025 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/test-runner CLI test runner type/performance Issues related to resource consumption (time and memory)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants