You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When trying to run TTS on an Apple Silicon Mac using the MPS (Metal Performance Shaders) device, the following error occurs:
Error: Output channels > 65536 not supported at the MPS device. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
This error occurs during the speech synthesis process when calling tts_to_file() method.
Steps to Reproduce
Initialize TTS with device='auto' or device='mps'
Call tts_to_file() with any English text
Error occurs during model inference
Current Workaround
Currently, we have two workarounds:
Set environment variable: PYTORCH_ENABLE_MPS_FALLBACK=1
Force CPU usage by initializing TTS with device='cpu'
Both workarounds result in slower performance compared to potential MPS acceleration.
Additional Context
This seems to be related to a limitation in PyTorch's MPS backend regarding the maximum number of output channels. It would be beneficial if the model architecture could be adjusted to work within MPS device limitations, or if there's a way to optimize the operations to stay under the 65536 channel limit.
Code Example
frommelo.apiimportTTS# This fails on MPSengine=TTS(language='EN', device='auto')
audio=engine.tts_to_file("Test text", speaker_id=0, output_path=None)
# Current workaroundengine=TTS(language='EN', device='cpu') # Force CPU usageaudio=engine.tts_to_file("Test text", speaker_id=0, output_path=None)
The text was updated successfully, but these errors were encountered:
Something is wrong with your model or installation. I installed everything the same way on Colab and macOS and it works perfectly with the same model. All MPS, CUDA, and CPU! Try reinstalling or using different versions of Pips or Python
Environment
Dependencies
dependencies = [
"transformers==4.27.4",
"torch>=2.0.0",
"sentencepiece>=0.1.99",
"click>=8.1.7",
"rich>=13.7.0",
"pydantic>=2.6.0",
"melotts @ git+https://github.com/myshell-ai/MeloTTS.git",
"unidic>=1.1.0",
"sounddevice>=0.5.1",
"nltk>=3.8.1",
]
Issue Description
When trying to run TTS on an Apple Silicon Mac using the MPS (Metal Performance Shaders) device, the following error occurs:
This error occurs during the speech synthesis process when calling
tts_to_file()
method.Steps to Reproduce
device='auto'
ordevice='mps'
tts_to_file()
with any English textCurrent Workaround
Currently, we have two workarounds:
PYTORCH_ENABLE_MPS_FALLBACK=1
device='cpu'
Both workarounds result in slower performance compared to potential MPS acceleration.
Additional Context
This seems to be related to a limitation in PyTorch's MPS backend regarding the maximum number of output channels. It would be beneficial if the model architecture could be adjusted to work within MPS device limitations, or if there's a way to optimize the operations to stay under the 65536 channel limit.
Code Example
The text was updated successfully, but these errors were encountered: