Getting Error when converting flux.dev with the latest version of deepcompressor #50

yibolu96 · 2025-02-18T16:56:52Z

Hi, I'm getting this error when converting flux.dev, how can I fix this? Big thanks ahead

miniconda3/envs/deepcompressor/lib/python3.12/site-packages/diffusers/models/embeddings.py", line 1210, in apply_rotary_emb
    out = (x.float() * cos + x_rotated.float() * sin).to(x.dtype)
           ~~~~~~~~~~^~~~~
RuntimeError: The size of tensor a (4608) must match the size of tensor b (2) at non-singleton dimension 2

Here is my full command:

python -m deepcompressor.app.diffusion.ptq examples/diffusion/configs/model/flux.1-dev.yaml examples/diffusion/configs/svdquant/int4.yaml examples/diffusion/configs/svdquant/fast.yaml

I removed the eval part since I just want to do the convert

The text was updated successfully, but these errors were encountered:

yibolu96 · 2025-02-18T16:57:39Z

Here is my pip list btw:

absl-py                  2.1.0
accelerate               1.2.1
aiohappyeyeballs         2.4.4
aiohttp                  3.11.11
aiosignal                1.3.2
attrs                    24.3.0
bitsandbytes             0.45.0
build                    1.2.2.post1
CacheControl             0.14.1
certifi                  2024.12.14
cffi                     1.17.1
chardet                  5.2.0
charset-normalizer       3.4.1
clean-fid                0.1.35
cleo                     2.1.0
click                    8.1.8
clip                     1.0
colorama                 0.4.6
crashtest                0.4.1
cryptography             44.0.0
DataProperty             1.0.2
datasets                 3.2.0
diffusers                0.32.0
dill                     0.3.8
distlib                  0.3.9
docstring_parser         0.16
dulwich                  0.21.7
evaluate                 0.4.3
fairscale                0.4.13
fastjsonschema           2.21.1
filelock                 3.16.1
frozenlist               1.5.0
fsspec                   2024.9.0
ftfy                     6.3.1
fuzzywuzzy               0.18.0
GPUtil                   1.4.0
huggingface-hub          0.27.0
idna                     3.10
image-reward             1.5
importlib_metadata       8.5.0
installer                0.7.0
jaraco.classes           3.4.0
jeepney                  0.8.0
jieba                    0.42.1
Jinja2                   3.1.5
joblib                   1.4.2
jsonlines                4.0.0
keyring                  24.3.1
lightning-utilities      0.11.9
lm_eval                  0.4.7
lxml                     5.3.0
MarkupSafe               3.0.2
mbstrdecoder             1.1.3
more-itertools           10.5.0
mpmath                   1.3.0
msgpack                  1.1.0
multidict                6.1.0
multiprocess             0.70.16
networkx                 3.4.2
nltk                     3.9.1
numexpr                  2.10.2
numpy                    2.2.1
nvidia-cublas-cu12       12.4.5.8
nvidia-cuda-cupti-cu12   12.4.127
nvidia-cuda-nvrtc-cu12   12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12        9.1.0.70
nvidia-cufft-cu12        11.2.1.3
nvidia-curand-cu12       10.3.5.147
nvidia-cusolver-cu12     11.6.1.9
nvidia-cusparse-cu12     12.3.1.170
nvidia-nccl-cu12         2.21.5
nvidia-nvjitlink-cu12    12.4.127
nvidia-nvtx-cu12         12.4.127
omniconfig               0.1.9
packaging                24.2
pandas                   2.2.3
pathvalidate             3.2.1
peft                     0.14.0
pexpect                  4.9.0
pillow                   11.0.0
pip                      24.2
pkginfo                  1.12.0
platformdirs             4.3.6
poetry                   1.8.5
poetry-core              1.9.1
poetry-plugin-export     1.8.0
portalocker              3.0.0
propcache                0.2.1
protobuf                 5.29.2
psutil                   6.1.1
ptyprocess               0.7.0
pyarrow                  18.1.0
pybind11                 2.13.6
pycparser                2.22
pyproject_hooks          1.2.0
pytablewriter            1.2.0
python-dateutil          2.9.0.post0
pytz                     2024.2
PyYAML                   6.0.2
RapidFuzz                3.11.0
regex                    2024.11.6
requests                 2.32.3
requests-toolbelt        1.0.0
rouge                    1.0.1
rouge_score              0.1.2
sacrebleu                2.4.3
safetensors              0.4.5
scikit-learn             1.6.0
scipy                    1.14.1
SecretStorage            3.3.3
sentencepiece            0.2.0
setuptools               75.1.0
shellingham              1.5.4
six                      1.17.0
sqlitedict               2.1.0
sympy                    1.13.1
tabledata                1.3.3
tabulate                 0.9.0
tcolorpy                 0.1.7
threadpoolctl            3.5.0
timm                     1.0.12
tokenizers               0.21.0
toml                     0.10.2
tomlkit                  0.13.2
torch                    2.5.1
torchmetrics             1.6.1
torchvision              0.20.1
tqdm                     4.67.1
tqdm-multiprocess        0.0.11
transformers             4.47.1
triton                   3.1.0
trove-classifiers        2024.10.21.16
typepy                   1.3.4
typing_extensions        4.12.2
tzdata                   2024.2
urllib3                  2.3.0
virtualenv               20.28.0
wcwidth                  0.2.13
wheel                    0.44.0
word2number              1.1
xxhash                   3.5.0
yarl                     1.18.3
zipp                     3.21.0
zstandard                0.23.0

synxlin · 2025-02-21T02:34:51Z

Hi @yibolu96,

I recommend updating both the diffusers and torch packages. Additionally, could you try loading the cached calibration data and printing the tensor shapes for all value fields?

It would also be helpful if you could install deepcompressor from the dev/v0.1.0 branch and rerun your command to see if the issue persists.

Let me know what you find!

yanglianwei · 2025-02-21T09:07:49Z

/python3.11/site-packages/diffusers/models/embeddings.py", line 1208, in apply_rotary_emb
out = (x.float() * cos + x_rotated.float() * sin).to(x.dtype)
~~~~~~~~~~^~~~~
RuntimeError: The size of tensor a (4608) must match the size of tensor b (32) at non-singleton dimension 2

I have the same problem and would like to ask how to solve it

godxin999 · 2025-02-25T11:13:05Z

Hi @yibolu96,

I recommend updating both the diffusers and torch packages. Additionally, could you try loading the cached calibration data and printing the tensor shapes for all value fields?

It would also be helpful if you could install deepcompressor from the dev/v0.1.0 branch and rerun your command to see if the issue persists.

Let me know what you find!

These two methods have no effect. Using dev/v0.1.0 will result in additional compile errors.

xibei8009 · 2025-03-06T07:41:21Z

Hi @yibolu96,
I recommend updating both the diffusers and torch packages. Additionally, could you try loading the cached calibration data and printing the tensor shapes for all value fields?
It would also be helpful if you could install deepcompressor from the dev/v0.1.0 branch and rerun your command to see if the issue persists.
Let me know what you find!

These two methods have no effect. Using dev/v0.1.0 will result in additional compile errors.

File "/root/anaconda3/envs/deepcompressor/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 1208, in apply_rotary_emb
out = (x.float() * cos + x_rotated.float() * sin).to(x.dtype)
~~~~~~~~~~^~~~~
RuntimeError: The size of tensor a (4608) must match the size of tensor b (32) at non-singleton dimension 2

I also encountered such a problem. Has this problem been solved?

josephrocca · 2025-03-06T11:22:20Z

I also ran into this issue while following the exact instructions in this readme (i.e. flux schnell quant):

https://github.com/mit-han-lab/deepcompressor/tree/main/examples/diffusion

Note that there are some dependency issues since this repo does not pin dependencies to exact versions:

Transformers should be pinned to 4.46.0? ImportError: cannot import name 'Gemma2FlashAttention2' from 'transformers.models.gemma2.modeling_gemma2' #54

That was the only difference from the examples/diffusion readme (i.e. I ran pip3 install transformers==4.46.0).

I used this docker image:

pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel

Here are the most recent logs leading up to the error:

https://gist.github.com/josephrocca/be9da75edbfa233b2b3503866ac5a7e7

I've tested this multiple times on different H100 machines (from Runpod): NVIDIA driver version: 560.35.05, CUDA Version: 12.6

@synxlin would you be able to provide a bit more detail on how to do this:

"could you try loading the cached calibration data and printing the tensor shapes for all value fields?"

josephrocca · 2025-03-06T23:32:24Z

I've tried to do some logging after following the steps here exactly:

https://github.com/mit-han-lab/deepcompressor/tree/main/examples/diffusion

The logs below refer to running this command, to be clear:

python3 -m deepcompressor.app.diffusion.ptq configs/model/flux.1-schnell.yaml configs/svdquant/int4.yaml --eval-benchmarks MJHQ --eval-num-samples 1024 --save-model /workspace/output

In diffusers/models/embeddings.py I've added logs as shown here:

def apply_rotary_emb(
    x: torch.Tensor,
    freqs_cis: Union[torch.Tensor, Tuple[torch.Tensor]],
    use_real: bool = True,
    use_real_unbind_dim: int = -1,
) -> Tuple[torch.Tensor, torch.Tensor]:
    """
    Apply rotary embeddings to input tensors using the given frequency tensor. This function applies rotary embeddings
    to the given query or key 'x' tensors using the provided frequency tensor 'freqs_cis'. The input tensors are
    reshaped as complex numbers, and the frequency tensor is reshaped for broadcasting compatibility. The resulting
    tensors contain rotary embeddings and are returned as real tensors.

    Args:
        x (`torch.Tensor`):
            Query or key tensor to apply rotary embeddings. [B, H, S, D] xk (torch.Tensor): Key tensor to apply
        freqs_cis (`Tuple[torch.Tensor]`): Precomputed frequency tensor for complex exponentials. ([S, D], [S, D],)

    Returns:
        Tuple[torch.Tensor, torch.Tensor]: Tuple of modified query tensor and key tensor with rotary embeddings.
    """
    if use_real:
        cos, sin = freqs_cis  # [S, D]
        cos = cos[None, None]
        sin = sin[None, None]
        cos, sin = cos.to(x.device), sin.to(x.device)

        if use_real_unbind_dim == -1:
            # Used for flux, cogvideox, hunyuan-dit
            x_real, x_imag = x.reshape(*x.shape[:-1], -1, 2).unbind(-1)  # [B, S, H, D//2]
            x_rotated = torch.stack([-x_imag, x_real], dim=-1).flatten(3)
        elif use_real_unbind_dim == -2:
            # Used for Stable Audio
            x_real, x_imag = x.reshape(*x.shape[:-1], 2, -1).unbind(-2)  # [B, S, H, D//2]
            x_rotated = torch.cat([-x_imag, x_real], dim=-1)
        else:
            raise ValueError(f"`use_real_unbind_dim={use_real_unbind_dim}` but should be -1 or -2.")
        print("x.shape:", x.shape)
        print("cos.shape:", cos.shape)
        print("x_rotated.shape:", x_rotated.shape)
        print("sin.shape:", sin.shape)
        out = (x.float() * cos + x_rotated.float() * sin).to(x.dtype)

        return out
    else:
        # used for lumina
        x_rotated = torch.view_as_complex(x.float().reshape(*x.shape[:-1], -1, 2))
        freqs_cis = freqs_cis.unsqueeze(2)
        x_out = torch.view_as_real(x_rotated * freqs_cis).flatten(3)

        return x_out.type_as(x)

And the result is:

x.shape: torch.Size([16, 24, 4608, 128])
cos.shape: torch.Size([1, 1, 32, 128])
x_rotated.shape: torch.Size([16, 24, 4608, 128])
sin.shape: torch.Size([1, 1, 32, 128])

  File "/opt/conda/lib/python3.11/site-packages/diffusers/models/attention_processor.py", line 2318, in __call__
    query = apply_rotary_emb(query, image_rotary_emb)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 1211, in apply_rotary_emb
    out = (x.float() * cos + x_rotated.float() * sin).to(x.dtype)
           ~~~~~~~~~~^~~~~
RuntimeError: The size of tensor a (4608) must match the size of tensor b (32) at non-singleton dimension

I've then added logging in diffusers/models/attention_processor.py like this:

        if image_rotary_emb is not None:
            from .embeddings import apply_rotary_emb
            print("query.shape", query.shape)
            print("image_rotary_emb", image_rotary_emb)
            query = apply_rotary_emb(query, image_rotary_emb)
            key = apply_rotary_emb(key, image_rotary_emb)

Which produces:

query.shape torch.Size([16, 24, 4608, 128])
image_rotary_emb (tensor([[1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1.,  ..., 1., 1., 1.],
        ...,
        [1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1.,  ..., 1., 1., 1.]], device='cuda:0'), tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0'))

And in diffusers/models/transformers/transformer_flux.py I've added these logs:

                print("hidden_states.shape", hidden_states.shape)
                print("encoder_hidden_states.shape", encoder_hidden_states.shape)
                print("temb.shape", temb.shape)
                print("image_rotary_emb[0].shape", image_rotary_emb[0].shape)
                print("image_rotary_emb[1].shape", image_rotary_emb[1].shape)
                print("image_rotary_emb", image_rotary_emb)
                print("joint_attention_kwargs", joint_attention_kwargs)
                encoder_hidden_states, hidden_states = block(
                    hidden_states=hidden_states,
                    encoder_hidden_states=encoder_hidden_states,
                    temb=temb,
                    image_rotary_emb=image_rotary_emb,
                    joint_attention_kwargs=joint_attention_kwargs,
                )

Which produces:

hidden_states.shape torch.Size([16, 4096, 3072])
encoder_hidden_states.shape torch.Size([16, 512, 3072])
temb.shape torch.Size([16, 3072])
image_rotary_emb[0].shape torch.Size([32, 128])
image_rotary_emb[1].shape torch.Size([32, 128])
image_rotary_emb (tensor([[1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1.,  ..., 1., 1., 1.],
        ...,
        [1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1.,  ..., 1., 1., 1.]], device='cuda:0'), tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0'))

And also in diffusers/models/transformers/transformer_flux.py I've added these logs in FluxTransformer2DModel.forward:

        print("txt_ids.shape:", txt_ids.shape)
        print("img_ids.shape:", img_ids.shape)
        ids = torch.cat((txt_ids, img_ids), dim=0)
        print("ids.shape:", ids.shape)
        image_rotary_emb = self.pos_embed(ids)
        print("image_rotary_emb[0].shape:", image_rotary_emb[0].shape)
        print("image_rotary_emb[1].shape:", image_rotary_emb[1].shape)

Which outputs:

txt_ids.shape: torch.Size([16, 3])
img_ids.shape: torch.Size([16, 3])
ids.shape: torch.Size([32, 3])
image_rotary_emb[0].shape: torch.Size([32, 128])
image_rotary_emb[1].shape: torch.Size([32, 128])

josephrocca · 2025-03-08T16:53:57Z

@synxlin @lmxyy I'm not sure if you received notification for the above messages, but if so, apologies for the extra ping. As mentioned along with the logs above, deepcompressor is currently completely broken - even the official example doesn't work at all when followed exactly:

https://github.com/mit-han-lab/deepcompressor/tree/main/examples/diffusion (the above logs are for this example)

Are you please provide any hint on how to fix this? 🙏

synxlin added the bug Something isn't working label Feb 21, 2025

synxlin self-assigned this Feb 21, 2025

josephrocca mentioned this issue Mar 6, 2025

Transformers should be pinned to 4.46.0? ImportError: cannot import name 'Gemma2FlashAttention2' from 'transformers.models.gemma2.modeling_gemma2' #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Error when converting flux.dev with the latest version of deepcompressor #50

Getting Error when converting flux.dev with the latest version of deepcompressor #50

yibolu96 commented Feb 18, 2025

yibolu96 commented Feb 18, 2025 •

edited

Loading

synxlin commented Feb 21, 2025

yanglianwei commented Feb 21, 2025

godxin999 commented Feb 25, 2025

xibei8009 commented Mar 6, 2025

josephrocca commented Mar 6, 2025 •

edited

Loading

josephrocca commented Mar 6, 2025 •

edited

Loading

josephrocca commented Mar 8, 2025 •

edited

Loading

Getting Error when converting flux.dev with the latest version of deepcompressor #50

Getting Error when converting flux.dev with the latest version of deepcompressor #50

Comments

yibolu96 commented Feb 18, 2025

yibolu96 commented Feb 18, 2025 • edited Loading

synxlin commented Feb 21, 2025

yanglianwei commented Feb 21, 2025

godxin999 commented Feb 25, 2025

xibei8009 commented Mar 6, 2025

josephrocca commented Mar 6, 2025 • edited Loading

josephrocca commented Mar 6, 2025 • edited Loading

josephrocca commented Mar 8, 2025 • edited Loading

yibolu96 commented Feb 18, 2025 •

edited

Loading

josephrocca commented Mar 6, 2025 •

edited

Loading

josephrocca commented Mar 6, 2025 •

edited

Loading

josephrocca commented Mar 8, 2025 •

edited

Loading