Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPS device limitation error when running TTS on Apple Silicon Mac #228

Open
kiwamizamurai opened this issue Jan 13, 2025 · 3 comments
Open

Comments

@kiwamizamurai
Copy link

Environment

  • OS: macOS 14.7
  • Hardware: Apple M3 Pro
  • Python: 3.11.11
  • MeloTTS: 0.1.2

Dependencies

dependencies = [
"transformers==4.27.4",
"torch>=2.0.0",
"sentencepiece>=0.1.99",
"click>=8.1.7",
"rich>=13.7.0",
"pydantic>=2.6.0",
"melotts @ git+https://github.com/myshell-ai/MeloTTS.git",
"unidic>=1.1.0",
"sounddevice>=0.5.1",
"nltk>=3.8.1",
]

Issue Description

When trying to run TTS on an Apple Silicon Mac using the MPS (Metal Performance Shaders) device, the following error occurs:

Error: Output channels > 65536 not supported at the MPS device. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

This error occurs during the speech synthesis process when calling tts_to_file() method.

Steps to Reproduce

  1. Initialize TTS with device='auto' or device='mps'
  2. Call tts_to_file() with any English text
  3. Error occurs during model inference

Current Workaround

Currently, we have two workarounds:

  1. Set environment variable: PYTORCH_ENABLE_MPS_FALLBACK=1
  2. Force CPU usage by initializing TTS with device='cpu'

Both workarounds result in slower performance compared to potential MPS acceleration.

Additional Context

This seems to be related to a limitation in PyTorch's MPS backend regarding the maximum number of output channels. It would be beneficial if the model architecture could be adjusted to work within MPS device limitations, or if there's a way to optimize the operations to stay under the 65536 channel limit.

Code Example

from melo.api import TTS

# This fails on MPS
engine = TTS(language='EN', device='auto')
audio = engine.tts_to_file("Test text", speaker_id=0, output_path=None)

# Current workaround
engine = TTS(language='EN', device='cpu')  # Force CPU usage
audio = engine.tts_to_file("Test text", speaker_id=0, output_path=None)
@yukiarimo
Copy link

Something is wrong with your model or installation. I installed everything the same way on Colab and macOS and it works perfectly with the same model. All MPS, CUDA, and CPU! Try reinstalling or using different versions of Pips or Python

@coldfire84
Copy link

I have the same issue:

  • M1 MacBook Pro, running Sequoia 15.2 (had the same issue w/ OS version 15.1).
  • Python 3.10.11, torch 2.5.1

This doesn't look like an isolated issue (associated PR).

@kiwamizamurai were you able to solve this?

@kiwamizamurai
Copy link
Author

@coldfire84

# Current workaround
engine = TTS(language='EN', device='cpu')  # Force CPU usage
audio = engine.tts_to_file("Test text", speaker_id=0, output_path=None)

You can avoid the issue like this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants