Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tag_attribute_values doesn't filter out the attribute if value doesn't match #61

Open
douglasmiranda opened this issue Oct 27, 2024 · 1 comment

Comments

@douglasmiranda
Copy link

I may be doing something wrong or tag_attribute_values it's not working properly.

From docs:

tag_attribute_values (dict[str, dict[str, set[str]]], optional) – Sets the values of HTML attributes that are allowed on specific tags. The value is structured as a map from tag names to a map from attribute names to a set of attribute values. If a tag is not itself whitelisted, adding entries to this map will do nothing.

So, if allow a specific attr in my tag and then use tag_attribute_values, nh3 will filter that attr out of my tag, right?

The following code:

import nh3
print(
    nh3.clean(
        "<p my-attr='my-WRONG-attr-value'>text</p>",
        tags={"p"},
        attributes={"p": {"my-attr"}},
        tag_attribute_values={"p": {"my-attr": {"my-attr-value"}}},
    )
)

returns: <p my-attr="my-WRONG-attr-value">text</p>

A more real world example:

Allow p tag to have style, but only with text-align.

import nh3
print(
    nh3.clean(
        "<p style='color: #fff;'>text</p>",
        tags={"p"},
        attributes={"p": {"style"}},
        tag_attribute_values={"p": {"style": {"text-align"}}},
    )
)

PS: Since I'm not even sure if this should work for a attr like style that have multiple options as values. But that was just what I was trying to do when I caught this.

Since there's not much to work with the docs, I did my experiments looking at the tests.

The test is only covering the positive case, maybe that's why we have this not working.

assert (
        nh3.clean(
            "<my-tag my-attr=val>",
            tags={"my-tag"},
            tag_attribute_values={"my-tag": {"my-attr": {"val"}}},
        )
        == '<my-tag my-attr="val"></my-tag>'
    )

This is assuming I understood how tag_attribute_values should work.

@xmo-odoo
Copy link

xmo-odoo commented Feb 18, 2025

The issue is with using both attributes and tag_attribute_values: they are alternates. That is, for each attribute on an element ammonia checks:

  1. if it's in the generic_attributes (attributes['*'])
  2. if it's in tag_attributes (~ attributes) for the current tag
  3. if it's in tag_attribute_values for the current tag with a valid value

If any of these is true, then the attribute is allowed with its current value.

Since you have (2) set, every value for my-attr (or style) is allowed, and tag_attribute_values['p'][att] is never even checked.

This is not documented outright, but it is somewhat hinted at in the docstring:

If a tag is not itself whitelisted, adding entries to this map will do nothing.

This says nothing about needing to whitelist the attribute, only the tag. Note as well how the test / example doesn't need to whitelist my-tag/@my-attr separately

Maybe nh3 could improve the UX by deprecating tag_attribute_values, making attributes into something like a

dict[str, set[str] | dict[str, None | set[str]]]

then dispatch all the bits between tag_attribute and tag_attribute_values internally (the inner union is in case you want a mix of whitelisted attributes and attributes with whitelisted values for the same tag).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants