Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default custom regex enhancement #571

Merged
merged 7 commits into from
Aug 17, 2018

Conversation

Zlopez
Copy link
Contributor

@Zlopez Zlopez commented Aug 6, 2018

This should solve the issue #482

The lint test fails because pytoml can't parse multiline string.

@codecov-io
Copy link

codecov-io commented Aug 6, 2018

Codecov Report

Merging #571 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #571   +/-   ##
=======================================
  Coverage   89.45%   89.45%           
=======================================
  Files          54       54           
  Lines        2561     2561           
  Branches      327      327           
=======================================
  Hits         2291     2291           
  Misses        203      203           
  Partials       67       67
Impacted Files Coverage Δ
anitya/config.py 100% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74ea662...ceddcdc. Read the comment docs.

@@ -52,7 +52,7 @@

librariesio_platform_whitelist = ['pypi', 'rubygems']

default_regex = '%(name)s(?:[-_]?(?:minsrc|src|source))?[-_]([^-/_\s]+?)(?i)(?:[-_](?:minsrc|src|source|asc|release))?\.(?:tar|t[bglx]z|tbz2|zip)'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also just split the string over multiple lines, as you prefer :)

Copy link
Contributor Author

@Zlopez Zlopez Aug 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already fixed this by using dummy regex in the test.
Your solution will not work (tested), because pytoml has issue with reading multiline string.

Copy link
Member

@jeremycline jeremycline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got one small change request, but otherwise it looks good!


REGEX = '%(name)s(?:[-_]?(?:minsrc|src|source))?[-_]([^-/_\s]+?)(?i)(?:[-_]'\
'(?:minsrc|src|source|asc))?\.(?:tar|t[bglx]z|tbz2|zip)'
REGEX = anitya_config.get('DEFAULT_REGEX')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since 'DEFAULT_REGEX' should always be present in the dictionary, it'd be better here to use anitya_config['DEFAULT_REGEX']. That way if (somehow) it's not present it fails early and in an obvious way at import time rather than being None later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will change this, I looked how other value is read in the code and just copied it.

@@ -51,6 +51,9 @@ social_auth_redirect_is_https = true
# via Libraries.io.
librariesio_platform_whitelist = []

# Default regular expression used for backend
default_regex = '%(name)s(?:[-_]?(?:minsrc|src|source))?[-_]([^-/_\s]+?)(?i)(?:[-_](?:minsrc|src|source|asc|release))?\.(?:tar|t[bglx]z|tbz2|zip)'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe TOML requires double quotes (") for strings. In case you'd like to try and break it up into multiple lines, the syntax for multi-line strings without adding extraneous whitespace is at https://github.com/toml-lang/toml#string. There's only so much that can be done to make the regex readable, though, so do what you think looks best here :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already tried that, but it looks like pytoml has issue with this syntax and it fails with error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you are right, this should be at least in double quotes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it looks like the issue is in backslash character. When trying to parse text without special characters I didn't have any issue with multiline string.
I will check if there is any escape character for toml.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yep, \\ should escape backslashes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the escaped backslashes aren't visible on html page :-(
screenshot from 2018-08-06 17-24-33

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird, It looks like the regex is read correctly from the configuration, but I don't know why it is shown without backslashes on the page.
Right now I have this as input:

default_regex = """\
                %(name)s(?:[-_]?(?:minsrc|src|source))?[-_]([^-/_\\]+?)(?i)(?:[-_]\
                (?:minsrc|src|source|asc|release))?\\.(?:tar|t[bglx]z|tbz2|zip).\
                """

And this is what is saved to config dictionary:

DEFAULT_REGEX = %(name)s(?:[-_]?(?:minsrc|src|source))?[-_]([^-/_\]+?)(?i)(?:[-_](?:minsrc|src|source|asc|release))?\.(?:tar|t[bglx]z|tbz2|zip).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issues with missing backslash in frontend is there even when not using multiline string.
So this looks like issue that was there before, but nobody noticed it.

Zlopez added 4 commits August 6, 2018 16:41
This will cause the application to fail if the DEFAULT_REGEX key
is not defined.
Fix render issue with backslashes in HTML render
@@ -31,6 +31,7 @@ class CustomBackend(BaseBackend):
more_info = 'More information in the '\
'<a href=\'/about#test-your-regex\'>about#test-your-regex</a>'
default_regex = REGEX % {'name': '{project name}'}
default_regex_html = default_regex.replace('\\', '\\\\')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only relevant in the HTML template, I'd recommend just putting it there. Jinja lets you write Python in the templates.

@@ -109,7 +109,7 @@ <h1>{{ context }} project</h1>

examples["{{ plugin.name }}"]="{{ plugin.examples | format_examples }}";
more_info["{{ plugin.name }}"]="{{ plugin.more_info}}";
default_regex["{{ plugin.name }}"]="{{ plugin.default_regex }}";
default_regex["{{ plugin.name }}"]="{{ plugin.default_regex_html }}";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this could just be plugin.default_regex.replace('\\', '\\\\'), I believe

I think this is fine, but just in case you hit some more complex (or user-provided) input that requires sanitizing, https://github.com/mozilla/bleach is a solid library used elsewhere in Fedora infra apps.

Copy link
Member

@jeremycline jeremycline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor simplification and I think this will be good to go.

@Zlopez Zlopez merged commit 7e171ce into fedora-infra:master Aug 17, 2018
@Zlopez Zlopez deleted the custom_regex_enhancement branch August 17, 2018 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants