Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting Error when converting flux.dev with the latest version of deepcompressor #50

Open
yibolu96 opened this issue Feb 18, 2025 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@yibolu96
Copy link

Hi, I'm getting this error when converting flux.dev, how can I fix this? Big thanks ahead

miniconda3/envs/deepcompressor/lib/python3.12/site-packages/diffusers/models/embeddings.py", line 1210, in apply_rotary_emb
    out = (x.float() * cos + x_rotated.float() * sin).to(x.dtype)
           ~~~~~~~~~~^~~~~
RuntimeError: The size of tensor a (4608) must match the size of tensor b (2) at non-singleton dimension 2

Here is my full command:

python -m deepcompressor.app.diffusion.ptq examples/diffusion/configs/model/flux.1-dev.yaml examples/diffusion/configs/svdquant/int4.yaml examples/diffusion/configs/svdquant/fast.yaml

I removed the eval part since I just want to do the convert

@yibolu96
Copy link
Author

yibolu96 commented Feb 18, 2025

Here is my pip list btw:

absl-py                  2.1.0
accelerate               1.2.1
aiohappyeyeballs         2.4.4
aiohttp                  3.11.11
aiosignal                1.3.2
attrs                    24.3.0
bitsandbytes             0.45.0
build                    1.2.2.post1
CacheControl             0.14.1
certifi                  2024.12.14
cffi                     1.17.1
chardet                  5.2.0
charset-normalizer       3.4.1
clean-fid                0.1.35
cleo                     2.1.0
click                    8.1.8
clip                     1.0
colorama                 0.4.6
crashtest                0.4.1
cryptography             44.0.0
DataProperty             1.0.2
datasets                 3.2.0
diffusers                0.32.0
dill                     0.3.8
distlib                  0.3.9
docstring_parser         0.16
dulwich                  0.21.7
evaluate                 0.4.3
fairscale                0.4.13
fastjsonschema           2.21.1
filelock                 3.16.1
frozenlist               1.5.0
fsspec                   2024.9.0
ftfy                     6.3.1
fuzzywuzzy               0.18.0
GPUtil                   1.4.0
huggingface-hub          0.27.0
idna                     3.10
image-reward             1.5
importlib_metadata       8.5.0
installer                0.7.0
jaraco.classes           3.4.0
jeepney                  0.8.0
jieba                    0.42.1
Jinja2                   3.1.5
joblib                   1.4.2
jsonlines                4.0.0
keyring                  24.3.1
lightning-utilities      0.11.9
lm_eval                  0.4.7
lxml                     5.3.0
MarkupSafe               3.0.2
mbstrdecoder             1.1.3
more-itertools           10.5.0
mpmath                   1.3.0
msgpack                  1.1.0
multidict                6.1.0
multiprocess             0.70.16
networkx                 3.4.2
nltk                     3.9.1
numexpr                  2.10.2
numpy                    2.2.1
nvidia-cublas-cu12       12.4.5.8
nvidia-cuda-cupti-cu12   12.4.127
nvidia-cuda-nvrtc-cu12   12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12        9.1.0.70
nvidia-cufft-cu12        11.2.1.3
nvidia-curand-cu12       10.3.5.147
nvidia-cusolver-cu12     11.6.1.9
nvidia-cusparse-cu12     12.3.1.170
nvidia-nccl-cu12         2.21.5
nvidia-nvjitlink-cu12    12.4.127
nvidia-nvtx-cu12         12.4.127
omniconfig               0.1.9
packaging                24.2
pandas                   2.2.3
pathvalidate             3.2.1
peft                     0.14.0
pexpect                  4.9.0
pillow                   11.0.0
pip                      24.2
pkginfo                  1.12.0
platformdirs             4.3.6
poetry                   1.8.5
poetry-core              1.9.1
poetry-plugin-export     1.8.0
portalocker              3.0.0
propcache                0.2.1
protobuf                 5.29.2
psutil                   6.1.1
ptyprocess               0.7.0
pyarrow                  18.1.0
pybind11                 2.13.6
pycparser                2.22
pyproject_hooks          1.2.0
pytablewriter            1.2.0
python-dateutil          2.9.0.post0
pytz                     2024.2
PyYAML                   6.0.2
RapidFuzz                3.11.0
regex                    2024.11.6
requests                 2.32.3
requests-toolbelt        1.0.0
rouge                    1.0.1
rouge_score              0.1.2
sacrebleu                2.4.3
safetensors              0.4.5
scikit-learn             1.6.0
scipy                    1.14.1
SecretStorage            3.3.3
sentencepiece            0.2.0
setuptools               75.1.0
shellingham              1.5.4
six                      1.17.0
sqlitedict               2.1.0
sympy                    1.13.1
tabledata                1.3.3
tabulate                 0.9.0
tcolorpy                 0.1.7
threadpoolctl            3.5.0
timm                     1.0.12
tokenizers               0.21.0
toml                     0.10.2
tomlkit                  0.13.2
torch                    2.5.1
torchmetrics             1.6.1
torchvision              0.20.1
tqdm                     4.67.1
tqdm-multiprocess        0.0.11
transformers             4.47.1
triton                   3.1.0
trove-classifiers        2024.10.21.16
typepy                   1.3.4
typing_extensions        4.12.2
tzdata                   2024.2
urllib3                  2.3.0
virtualenv               20.28.0
wcwidth                  0.2.13
wheel                    0.44.0
word2number              1.1
xxhash                   3.5.0
yarl                     1.18.3
zipp                     3.21.0
zstandard                0.23.0

@synxlin synxlin added the bug Something isn't working label Feb 21, 2025
@synxlin synxlin self-assigned this Feb 21, 2025
@synxlin
Copy link
Contributor

synxlin commented Feb 21, 2025

Hi @yibolu96,

I recommend updating both the diffusers and torch packages. Additionally, could you try loading the cached calibration data and printing the tensor shapes for all value fields?

It would also be helpful if you could install deepcompressor from the dev/v0.1.0 branch and rerun your command to see if the issue persists.

Let me know what you find!

@yanglianwei
Copy link

/python3.11/site-packages/diffusers/models/embeddings.py", line 1208, in apply_rotary_emb
out = (x.float() * cos + x_rotated.float() * sin).to(x.dtype)
~~~~~~~~~~^~~~~
RuntimeError: The size of tensor a (4608) must match the size of tensor b (32) at non-singleton dimension 2

I have the same problem and would like to ask how to solve it

@godxin999
Copy link

Hi @yibolu96,

I recommend updating both the diffusers and torch packages. Additionally, could you try loading the cached calibration data and printing the tensor shapes for all value fields?

It would also be helpful if you could install deepcompressor from the dev/v0.1.0 branch and rerun your command to see if the issue persists.

Let me know what you find!

These two methods have no effect. Using dev/v0.1.0 will result in additional compile errors.

@xibei8009
Copy link

Hi @yibolu96,
I recommend updating both the diffusers and torch packages. Additionally, could you try loading the cached calibration data and printing the tensor shapes for all value fields?
It would also be helpful if you could install deepcompressor from the dev/v0.1.0 branch and rerun your command to see if the issue persists.
Let me know what you find!

These two methods have no effect. Using dev/v0.1.0 will result in additional compile errors.

File "/root/anaconda3/envs/deepcompressor/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 1208, in apply_rotary_emb
out = (x.float() * cos + x_rotated.float() * sin).to(x.dtype)
~~~~~~~~~~^~~~~
RuntimeError: The size of tensor a (4608) must match the size of tensor b (32) at non-singleton dimension 2

I also encountered such a problem. Has this problem been solved?

@josephrocca
Copy link

josephrocca commented Mar 6, 2025

I also ran into this issue while following the exact instructions in this readme (i.e. flux schnell quant):

Note that there are some dependency issues since this repo does not pin dependencies to exact versions:

That was the only difference from the examples/diffusion readme (i.e. I ran pip3 install transformers==4.46.0).

I used this docker image:

pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel

Here are the most recent logs leading up to the error:

I've tested this multiple times on different H100 machines (from Runpod): NVIDIA driver version: 560.35.05, CUDA Version: 12.6

@synxlin would you be able to provide a bit more detail on how to do this:

"could you try loading the cached calibration data and printing the tensor shapes for all value fields?"

@josephrocca
Copy link

josephrocca commented Mar 6, 2025

I've tried to do some logging after following the steps here exactly:

The logs below refer to running this command, to be clear:

python3 -m deepcompressor.app.diffusion.ptq configs/model/flux.1-schnell.yaml configs/svdquant/int4.yaml --eval-benchmarks MJHQ --eval-num-samples 1024 --save-model /workspace/output

In diffusers/models/embeddings.py I've added logs as shown here:

def apply_rotary_emb(
    x: torch.Tensor,
    freqs_cis: Union[torch.Tensor, Tuple[torch.Tensor]],
    use_real: bool = True,
    use_real_unbind_dim: int = -1,
) -> Tuple[torch.Tensor, torch.Tensor]:
    """
    Apply rotary embeddings to input tensors using the given frequency tensor. This function applies rotary embeddings
    to the given query or key 'x' tensors using the provided frequency tensor 'freqs_cis'. The input tensors are
    reshaped as complex numbers, and the frequency tensor is reshaped for broadcasting compatibility. The resulting
    tensors contain rotary embeddings and are returned as real tensors.

    Args:
        x (`torch.Tensor`):
            Query or key tensor to apply rotary embeddings. [B, H, S, D] xk (torch.Tensor): Key tensor to apply
        freqs_cis (`Tuple[torch.Tensor]`): Precomputed frequency tensor for complex exponentials. ([S, D], [S, D],)

    Returns:
        Tuple[torch.Tensor, torch.Tensor]: Tuple of modified query tensor and key tensor with rotary embeddings.
    """
    if use_real:
        cos, sin = freqs_cis  # [S, D]
        cos = cos[None, None]
        sin = sin[None, None]
        cos, sin = cos.to(x.device), sin.to(x.device)

        if use_real_unbind_dim == -1:
            # Used for flux, cogvideox, hunyuan-dit
            x_real, x_imag = x.reshape(*x.shape[:-1], -1, 2).unbind(-1)  # [B, S, H, D//2]
            x_rotated = torch.stack([-x_imag, x_real], dim=-1).flatten(3)
        elif use_real_unbind_dim == -2:
            # Used for Stable Audio
            x_real, x_imag = x.reshape(*x.shape[:-1], 2, -1).unbind(-2)  # [B, S, H, D//2]
            x_rotated = torch.cat([-x_imag, x_real], dim=-1)
        else:
            raise ValueError(f"`use_real_unbind_dim={use_real_unbind_dim}` but should be -1 or -2.")
        print("x.shape:", x.shape)
        print("cos.shape:", cos.shape)
        print("x_rotated.shape:", x_rotated.shape)
        print("sin.shape:", sin.shape)
        out = (x.float() * cos + x_rotated.float() * sin).to(x.dtype)

        return out
    else:
        # used for lumina
        x_rotated = torch.view_as_complex(x.float().reshape(*x.shape[:-1], -1, 2))
        freqs_cis = freqs_cis.unsqueeze(2)
        x_out = torch.view_as_real(x_rotated * freqs_cis).flatten(3)

        return x_out.type_as(x)

And the result is:

x.shape: torch.Size([16, 24, 4608, 128])
cos.shape: torch.Size([1, 1, 32, 128])
x_rotated.shape: torch.Size([16, 24, 4608, 128])
sin.shape: torch.Size([1, 1, 32, 128])
  File "/opt/conda/lib/python3.11/site-packages/diffusers/models/attention_processor.py", line 2318, in __call__
    query = apply_rotary_emb(query, image_rotary_emb)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 1211, in apply_rotary_emb
    out = (x.float() * cos + x_rotated.float() * sin).to(x.dtype)
           ~~~~~~~~~~^~~~~
RuntimeError: The size of tensor a (4608) must match the size of tensor b (32) at non-singleton dimension 

I've then added logging in diffusers/models/attention_processor.py like this:

        if image_rotary_emb is not None:
            from .embeddings import apply_rotary_emb
            print("query.shape", query.shape)
            print("image_rotary_emb", image_rotary_emb)
            query = apply_rotary_emb(query, image_rotary_emb)
            key = apply_rotary_emb(key, image_rotary_emb)

Which produces:

query.shape torch.Size([16, 24, 4608, 128])
image_rotary_emb (tensor([[1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1.,  ..., 1., 1., 1.],
        ...,
        [1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1.,  ..., 1., 1., 1.]], device='cuda:0'), tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0'))

And in diffusers/models/transformers/transformer_flux.py I've added these logs:

                print("hidden_states.shape", hidden_states.shape)
                print("encoder_hidden_states.shape", encoder_hidden_states.shape)
                print("temb.shape", temb.shape)
                print("image_rotary_emb[0].shape", image_rotary_emb[0].shape)
                print("image_rotary_emb[1].shape", image_rotary_emb[1].shape)
                print("image_rotary_emb", image_rotary_emb)
                print("joint_attention_kwargs", joint_attention_kwargs)
                encoder_hidden_states, hidden_states = block(
                    hidden_states=hidden_states,
                    encoder_hidden_states=encoder_hidden_states,
                    temb=temb,
                    image_rotary_emb=image_rotary_emb,
                    joint_attention_kwargs=joint_attention_kwargs,
                )

Which produces:

hidden_states.shape torch.Size([16, 4096, 3072])
encoder_hidden_states.shape torch.Size([16, 512, 3072])
temb.shape torch.Size([16, 3072])
image_rotary_emb[0].shape torch.Size([32, 128])
image_rotary_emb[1].shape torch.Size([32, 128])
image_rotary_emb (tensor([[1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1.,  ..., 1., 1., 1.],
        ...,
        [1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1.,  ..., 1., 1., 1.],
        [1., 1., 1.,  ..., 1., 1., 1.]], device='cuda:0'), tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0'))

And also in diffusers/models/transformers/transformer_flux.py I've added these logs in FluxTransformer2DModel.forward:

        print("txt_ids.shape:", txt_ids.shape)
        print("img_ids.shape:", img_ids.shape)
        ids = torch.cat((txt_ids, img_ids), dim=0)
        print("ids.shape:", ids.shape)
        image_rotary_emb = self.pos_embed(ids)
        print("image_rotary_emb[0].shape:", image_rotary_emb[0].shape)
        print("image_rotary_emb[1].shape:", image_rotary_emb[1].shape)

Which outputs:

txt_ids.shape: torch.Size([16, 3])
img_ids.shape: torch.Size([16, 3])
ids.shape: torch.Size([32, 3])
image_rotary_emb[0].shape: torch.Size([32, 128])
image_rotary_emb[1].shape: torch.Size([32, 128])

@josephrocca
Copy link

josephrocca commented Mar 8, 2025

@synxlin @lmxyy I'm not sure if you received notification for the above messages, but if so, apologies for the extra ping. As mentioned along with the logs above, deepcompressor is currently completely broken - even the official example doesn't work at all when followed exactly:

Are you please provide any hint on how to fix this? 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants