How can the API provided by ChatGPT-Sovits v2 be integrated into Dify to achieve text-to-speech functionality for response outputs? #15036

sums2001 · 2025-03-05T17:02:24Z

sums2001
Mar 5, 2025

Self Checks

I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
Please do not modify this template :) and fill in all the required fields.

Content

I have successfully deployed the Dify and Sovits APIs. Among them, the Sovits API can generate voice output normally.

The API format for submitting to Sovits API v2 is as follows:
http://192.168.3.119:9880/tts?text=你好！很高兴为你提供帮助。你可以问我任何问题了。&text_lang=zh&ref_audio_path=H:\AI\GPT-SoVITS_V2_250113\ref\fx.wav&prompt_text=当负面情绪涌上心头时，我们总是禁不住想要判断对方&prompt_lang=zh&text_split_method=cut5&batch_size=3&batch_threshold=0.75&speed_factor=1

I believe this can be accomplished through the development of custom tools and workflows, which I have begun to explore.This is the code for the custom tool.
{ "openapi": "3.1.0", "info": { "title": "GPT-SoVITS_TTS", "description": "动态参数增强版语音合成接口", "version": "v2.3.2" }, "servers": [ { "url": "http://192.168.3.119:9880" } ], "paths": { "/tts": { "get": { "description": "多参数控制语音合成", "operationId": "GenerateTTS", "parameters": [ { "name": "text", "in": "query", "required": true, "schema": { "type": "string" } }, { "name": "prompt_text", "in": "query", "required": true, "schema": { "type": "string" } }, { "name": "text_lang", "in": "query", "schema": { "type": "string", "default": "zh" } }, { "name": "prompt_lang", "in": "query", "schema": { "type": "string", "default": "zh" } }, { "name": "ref_audio_path", "in": "query", "schema": { "type": "string", "default": "H:\\AI\\GPT-SoVITS_V2_250113\\ref\\fx.wav" } }, { "name": "text_split_method", "in": "query", "schema": { "type": "string", "enum": ["cut0", "cut1", "cut5"], "default": "cut5" } }, { "name": "batch_size", "in": "query", "schema": { "type": "integer", "minimum": 1, "maximum": 10, "default": 3 } }, { "name": "speed_factor", "in": "query", "schema": { "type": "number", "minimum": 0.5, "maximum": 2.0, "default": 1.0 } } ], "responses": { "200": { "description": "成功生成语音文件", "content": { "application/json": { "schema": { "type": "object", "properties": { "url": { "type": "string", "description": "生成的语音文件URL" }, "files": { "type": "array", "items": { "type": "string" }, "description": "生成的语音文件URL列表" } } } } } } } } } } }

The following is the DSL file of the workflow.

app: description: AI对话语音助手 - 将AI对话内容转换为语音输出 icon: 🎙️ icon_background: '#FFEAD5' mode: advanced-chat name: AI对话语音助手 use_icon_as_answer_icon: false kind: app version: 0.1.5 workflow: conversation_variables: [] environment_variables: [] features: file_upload: allowed_file_extensions: - .JPG - .JPEG - .PNG - .GIF - .WEBP - .SVG - .WAV - .MP3 allowed_file_types: - image - audio allowed_file_upload_methods: - local_file - remote_url enabled: true fileUploadConfig: audio_file_size_limit: 50 batch_count_limit: 5 file_size_limit: 15 image_file_size_limit: 10 video_file_size_limit: 100 workflow_file_upload_limit: 10 image: enabled: false number_limits: 3 transfer_methods: - local_file - remote_url number_limits: 3 opening_statement: 欢迎使用AI对话语音助手！请输入您的问题，我会将回答转换为语音。 retriever_resource: enabled: false sensitive_word_avoidance: enabled: false speech_to_text: enabled: false suggested_questions: [] suggested_questions_after_answer: enabled: false text_to_speech: enabled: false language: '' voice: '' graph: edges: - data: isInIteration: false sourceType: start targetType: llm id: start-llm source: start sourceHandle: source target: llm targetHandle: target type: custom zIndex: 0 - data: isInIteration: false sourceType: llm targetType: tool id: llm-tool source: llm sourceHandle: source target: tts_tool targetHandle: target type: custom zIndex: 0 - data: isInIteration: false sourceType: tool targetType: answer id: tool-answer source: tts_tool sourceHandle: source target: answer targetHandle: target type: custom zIndex: 0 - data: isInIteration: false sourceType: tool targetType: answer id: tool-answer2 source: tts_tool sourceHandle: source target: answer2 targetHandle: target type: custom zIndex: 0 nodes: - data: desc: 开始对话 selected: false title: 开始 type: start variables: [] height: 80 id: start position: x: 80 y: 282 positionAbsolute: x: 80 y: 282 selected: false sourcePosition: right targetPosition: left type: custom width: 243 - data: context: enabled: true variable_selector: [] desc: AI对话模型 memory: role_prefix: assistant: '' user: '' window: enabled: true size: 10 model: completion_params: temperature: 0.7 mode: chat name: qwen2.5:latest provider: ollama prompt_template: - id: system-prompt role: system text: 你是一个友好的AI助手，请用简洁明了的语言回答用户的问题。 - id: user-prompt role: user text: '{{#sys.query#}}' selected: false title: LLM type: llm variables: [] vision: enabled: false height: 124 id: llm position: x: 380 y: 282 positionAbsolute: x: 380 y: 282 selected: false sourcePosition: right targetPosition: left type: custom width: 243 - data: desc: 文本转语音 provider_id: 6d69622c-642c-4513-9127-802b9c0ef327 provider_name: GPT-SoVITS_TTS_new2 provider_type: api selected: false title: GenerateTTS tool_configurations: {} tool_label: GenerateTTS tool_name: GenerateTTS tool_parameters: batch_size: type: constant value: 3 prompt_lang: type: mixed value: zh prompt_text: type: mixed value: 当负面情绪涌上心头时，我们总是禁不住想要判断对方 ref_audio_path: type: mixed value: H:\AI\GPT-SoVITS_V2_250113\ref\fx.wav speed_factor: type: constant value: 1 text: type: mixed value: '{{#llm.text#}}' text_lang: type: mixed value: zh text_split_method: type: mixed value: cut5 type: tool height: 80 id: tts_tool position: x: 646.3606747855553 y: 289.1717846545598 positionAbsolute: x: 646.3606747855553 y: 289.1717846545598 selected: true sourcePosition: right targetPosition: left type: custom width: 243 - data: answer: '{{#llm.text#}}\n\n[点击播放语音]({{#tts_tool.files#}})' desc: 显示回复和语音状态 selected: false title: 回复 type: answer variables: [] height: 148 id: answer position: x: 1011.1895211266806 y: 334.31790640604515 positionAbsolute: x: 1011.1895211266806 y: 334.31790640604515 selected: false sourcePosition: right targetPosition: left type: custom width: 243 - data: answer: 语音文件已生成：{{#tts_tool.files#}} desc: '' selected: false title: 语音文件 type: answer variables: [] height: 117 id: answer2 position: x: 1011.1895211266806 y: 504.31790640604515 positionAbsolute: x: 1011.1895211266806 y: 504.31790640604515 selected: false sourcePosition: right targetPosition: left type: custom width: 243 viewport: x: -283.69555528485046 y: -120.59900952601663 zoom: 0.8652607146560651

After completing the above operation, normal conversation is possible, but there is no voice output.Could someone help me figure out where the problem lies? Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can the API provided by ChatGPT-Sovits v2 be integrated into Dify to achieve text-to-speech functionality for response outputs? #15036

{{title}}

Replies: 0 comments

Select a reply

How can the API provided by ChatGPT-Sovits v2 be integrated into Dify to achieve text-to-speech functionality for response outputs? #15036

sums2001 Mar 5, 2025

Self Checks

Content

Replies: 0 comments

sums2001
Mar 5, 2025