Azure免费语音服务创建教程

释放双眼，带上耳机，听听看~！

本教程介绍如何在Azure平台上创建免费语音服务资源，包括订阅选择、资源创建、秘钥管理以及本地接入配置步骤。

本文正在参加learn.microsoft.com/zh-cn/azure…

在平台上创建免费订阅服务：azure.microsoft.com/zh-cn/free/…

免费订阅成功后，进入资源创建环节，这里我们访问网址，创建免费的语音资源：portal.azure.com/#create/Mic…

Azure免费语音服务创建教程

这里注意订阅选择免费试用，使用区域选择东亚，如果在国外可以选择国外的对应区域。

创建语音服务资源成功后，转到资源组列表，点击获取资源秘钥：

Azure免费语音服务创建教程

需要注意的是，任何时候都不要将秘钥进行传播，或者将秘钥写入代码并且提交版本。

这里相对稳妥的方式是将秘钥写入本地系统的环境变量中。

Windows系统使用如下命令：

setx COGNITIVE_SERVICE_KEY 您的秘钥

Linux系统使用如下命令：

export COGNITIVE_SERVICE_KEY=您的秘钥

Mac系统的bash终端：

编辑 ~/.bash_profile，然后添加环境变量

export COGNITIVE_SERVICE_KEY=您的秘钥

添加环境变量后，请从控制台窗口运行 source ~/.bash_profile，使更改生效。

Mac系统的zsh终端：

编辑 ~/.zshrc，然后添加环境变量

export COGNITIVE_SERVICE_KEY=您的秘钥

如此，前期准备工作就完成了。

本地接入

确保本地Python环境版本3.10以上，然后安装Azure平台sdk:

pip3 install azure-cognitiveservices-speech

创建test.py文件：

`import azure.cognitiveservices.speech as speechsdk  
import os  
  
speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('KEY'), region="eastasia")``audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)`

这里定义语音的配置文件，通过os模块将上文环境变量中的秘钥取出使用，region就是新建语音资源时选择的地区，audio_config是选择当前计算机默认的音箱进行输出操作。

接着，根据官方文档的配置，选择一个语音机器人：learn.microsoft.com/zh-cn/azure…

  
纯文本	wuu-CN-XiaotongNeural1（女）  
wuu-CN-YunzheNeural1（男）	不支持  
yue-CN	中文（粤语，简体）	yue-CN	纯文本	yue-CN-XiaoMinNeural1（女）  
yue-CN-YunSongNeural1（男）	不支持  
zh-CN	中文（普通话，简体）	zh-CN	音频 + 人工标记的脚本  
  
纯文本  
  
结构化文本  
  
短语列表	zh-CN-XiaochenNeural4、5、6（女）  
zh-CN-XiaohanNeural2、4、5、6（女）  
zh-CN-XiaomengNeural1、2、4、5、6（女）  
zh-CN-XiaomoNeural2、3、4、5、6（女）  
zh-CN-XiaoqiuNeural4、5、6（女）  
zh-CN-XiaoruiNeural2、4、5、6（女）  
zh-CN-XiaoshuangNeural2、4、5、6、8（女）  
zh-CN-XiaoxiaoNeural2、4、5、6（女）  
zh-CN-XiaoxuanNeural2、3、4、5、6（女）  
zh-CN-XiaoyanNeural4、5、6（女）  
zh-CN-XiaoyiNeural1、2、4、5、6（女）  
zh-CN-XiaoyouNeural4、5、6、8（女）  
zh-CN-XiaozhenNeural1、2、4、5、6（女）  
zh-CN-YunfengNeural1、2、4、5、6（男）  
zh-CN-YunhaoNeural1、2、4、5、6（男）  
zh-CN-YunjianNeural1、2、4、5、6（男）  
zh-CN-YunxiaNeural1、2、4、5、6（男）  
zh-CN-YunxiNeural2、3、4、5、6（男）  
zh-CN-YunyangNeural2、4、5、6（男）  
zh-CN-YunyeNeural2、3、4、5、6（男）  
zh-CN-YunzeNeural1、2、3、4、5、6（男）	神经网络定制声音专业版  
  
神经网络定制声音精简版（预览版）  
  
跨语言语音（预览版）  
zh-CN-henan	中文（中原河南普通话，中国大陆）	不支持	不支持	zh-CN-henan-YundengNeural1（男）	不支持  
zh-CN-liaoning	中文（东北普通话，中国大陆）	不支持	不支持	zh-CN-liaoning-XiaobeiNeural1（女）	不支持  
zh-CN-shaanxi	中文（中原陕西普通话，中国大陆）	不支持	不支持	zh-CN-shaanxi-XiaoniNeural1（女）	不支持  
zh-CN-shandong	中文（冀鲁普通话，中国大陆）	不支持	不支持	zh-CN-shandong-YunxiangNeural1（男）	不支持  
zh-CN-sichuan	中文（西南普通话，简体）	zh-CN-sichuan	纯文本	zh-CN-sichuan-YunxiNeural1（男）	不支持  
zh-HK	中文（粤语，繁体）	zh-HK	纯文本	zh-HK-HiuGaaiNeural4、5、6（女）  
zh-HK-HiuMaanNeural4、5、6（女）  
zh-HK-WanLungNeural1、4、5、6（男）	神经网络定制声音专业版  
zh-TW	中文(台湾普通话)	zh-TW	纯文本	zh-TW-HsiaoChenNeural4、5、6（女）  
zh-TW-HsiaoYuNeural4、5、6（女）  
zh-TW-YunJheNeural4、5、6（男）	神经网络定制声音专业版

单以中文语音论，可选择的范围还是相当广泛的。

继续编辑代码：

import azure.cognitiveservices.speech as speechsdk  
import os  
  
speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('KEY'), region="eastasia")  
audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)  
  
speech_config.speech_synthesis_voice_name='zh-CN-XiaomoNeural'  
  
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)  
  
text = "hello 大家好，这里是人工智能AI机器人在说话"  
  
speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()

这里我们选择zh-CN-XiaomoNeural作为默认AI语音，并且将text文本变量中的内容通过音箱进行输出。

如果愿意，我们也可以将语音输出为实体文件进行存储：



import azure.cognitiveservices.speech as speechsdk  
import os  
  
speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('KEY'), region="eastasia")  
audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)

file_config = speechsdk.audio.AudioOutputConfig(filename="./output.wav")  
  
  
speech_config.speech_synthesis_voice_name='zh-CN-XiaomoNeural'  
  
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=file_config)  
  
text = "hello 大家好，这里是人工智能AI机器人在说话"  
  
speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()

这里指定file_config配置为脚本相对路径下的output.wav文件：

ls  
output.wav

如此，音频文件就可以被保存起来，留作以后使用了。

语音调优

默认AI语音听多了，难免会有些索然寡味之感，幸运的是，Azure平台提供了语音合成标记语言 (SSML) ，它可以改善合成语音的听感。

根据Azure官方文档：learn.microsoft.com/zh-cn/azure…

通过调整语音的角色以及样式来获取定制化的声音：

语音	样式	角色  
en-GB-RyanNeural1	cheerful, chat	不支持  
en-GB-SoniaNeural1	cheerful, sad	不支持  
en-US-AriaNeural	chat, customerservice, narration-professional, newscast-casual, newscast-formal, cheerful, empathetic, angry, sad, excited, friendly, terrified, shouting, unfriendly, whispering, hopeful	不支持  
en-US-DavisNeural	chat, angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering	不支持  
en-US-GuyNeural	newscast, angry, cheerful, sad, excited, friendly, terrified, shouting, unfriendly, whispering, hopeful	不支持  
en-US-JaneNeural	angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering	不支持  
en-US-JasonNeural	angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering	不支持  
en-US-JennyNeural	assistant, chat, customerservice, newscast, angry, cheerful, sad, excited, friendly, terrified, shouting, unfriendly, whispering, hopeful	不支持  
en-US-NancyNeural	angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering	不支持  
en-US-SaraNeural	angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering	不支持  
en-US-TonyNeural	angry, cheerful, excited, friendly, hopeful, sad, shouting, terrified, unfriendly, whispering	不支持  
es-MX-JorgeNeural1	cheerful, chat	不支持  
fr-FR-DeniseNeural1	cheerful, sad	不支持  
fr-FR-HenriNeural1	cheerful, sad	不支持  
it-IT-IsabellaNeural1	cheerful, chat	不支持  
ja-JP-NanamiNeural	chat, customerservice, cheerful	不支持  
pt-BR-FranciscaNeural	calm	不支持  
zh-CN-XiaohanNeural5	calm, fearful, cheerful, disgruntled, serious, angry, sad, gentle, affectionate, embarrassed	不支持  
zh-CN-XiaomengNeural1、5	chat	不支持  
zh-CN-XiaomoNeural5	embarrassed, calm, fearful, cheerful, disgruntled, serious, angry, sad, depressed, affectionate, gentle, envious	YoungAdultFemale, YoungAdultMale, OlderAdultFemale, OlderAdultMale, SeniorFemale, SeniorMale, Girl, Boy  
zh-CN-XiaoruiNeural5	calm, fearful, angry, sad	不支持  
zh-CN-XiaoshuangNeural5	chat	不支持  
zh-CN-XiaoxiaoNeural5	assistant, chat, customerservice, newscast, affectionate, angry, calm, cheerful, disgruntled, fearful, gentle, lyrical, sad, serious, poetry-reading	不支持  
zh-CN-XiaoxuanNeural5	calm, fearful, cheerful, disgruntled, serious, angry, gentle, depressed	YoungAdultFemale, YoungAdultMale, OlderAdultFemale, OlderAdultMale, SeniorFemale, SeniorMale, Girl, Boy  
zh-CN-XiaoyiNeural1、5	angry, disgruntled, affectionate, cheerful, fearful, sad, embarrassed, serious, gentle	不支持  
zh-CN-XiaozhenNeural1、5	angry, disgruntled, cheerful, fearful, sad, serious	不支持  
zh-CN-YunfengNeural1、5	angry, disgruntled, cheerful, fearful, sad, serious, depressed	不支持  
zh-CN-YunhaoNeural1、2、5	advertisement-upbeat	不支持  
zh-CN-YunjianNeural1、3、4、5	Narration-relaxed, Sports_commentary, Sports_commentary_excited	不支持  
zh-CN-YunxiaNeural1、5	calm, fearful, cheerful, angry, sad	不支持  
zh-CN-YunxiNeural5	narration-relaxed, embarrassed, fearful, cheerful, disgruntled, serious, angry, sad, depressed, chat, assistant, newscast	Narrator, YoungAdultMale, Boy  
zh-CN-YunyangNeural5	customerservice, narration-professional, newscast-casual	不支持  
zh-CN-YunyeNeural5	embarrassed, calm, fearful, cheerful, disgruntled, serious, angry, sad	YoungAdultFemale, YoungAdultMale, OlderAdultFemale, OlderAdultMale, SeniorFemale, SeniorMale, Girl, Boy  
zh-CN-YunzeNeural1、5	calm, fearful, cheerful, disgruntled, serious, angry, sad, depressed, documentary-narration	OlderAdultMale, SeniorMale

这里将语音文本改造为SSML的配置格式：

import os  
import azure.cognitiveservices.speech as speechsdk

  
speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('KEY'), region="eastasia")  
audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)

file_config = speechsdk.audio.AudioOutputConfig(filename="./output.wav")  
  
  
speech_config.speech_synthesis_voice_name='zh-CN-XiaomoNeural'  
  
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=file_config)  
  
#text = "hello 大家好，这里是人工智能AI机器人在说话"  
  
#speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()  
  
text = """  
    <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="zh-CN">  
        <voice name="zh-CN-XiaoxiaoNeural">  
            <mstts:express-as style="lyrical"  role="YoungAdultFemale" >  
            <prosody rate="+12.00%">  
                hello 大家好，这里是刘悦的技术博客  
                大江东去，浪淘尽，千古风流人物。  
故垒西边，人道是，三国周郎赤壁。  
乱石穿空，惊涛拍岸，卷起千堆雪。  
江山如画，一时多少豪杰。  
</prosody>  
            </mstts:express-as>  
        </voice>  
    </speak>"""   
  
result = speech_synthesizer.speak_ssml_async(ssml=text).get()

通过使用style和role标记进行定制，同时使用rate属性来提升百分之十二的语速，从而让AI语音更加连贯顺畅。注意这里使用ssml=text来声明ssml格式的文本。

结语

人工智能AI语音系统完成了人工智能在语音合成这个细分市场的落地应用，为互联网领域内许多需要配音的业务节约了成本和时间。

本网站的内容主要来自互联网上的各种资源，仅供参考和信息分享之用，不代表本网站拥有相关版权或知识产权。如您认为内容侵犯您的权益，请联系我们，我们将尽快采取行动，包括删除或更正。

{{userData.name}}已认证

Azure免费语音服务创建教程

本地接入

语音调优

结语

生物识别技术：从指纹到面部识别

基于YOLOv8模型的烟头目标检测系统

GeoSpy.ai

Globe Explorer

即梦Dreamina

Luma Dream Machine

Motionshop

StoryDiffusion

归档

{{userData.name}}已认证

本地接入

语音调优

结语

生物识别技术：从指纹到面部识别

基于YOLOv8模型的烟头目标检测系统

Mojo编程语言：AI开发者的新选择

深度学习入门系列：一文看懂识别手写数字问题（MNIST数据集）

Python编写AI辅助瞄准系统开发与实战（一）

Excel变天！微软把Python「塞」进去了，直接可搞机器学习