Agent Skills
› NeverSight/learn-skills.dev
› tts-researcher
tts-researcher
GitHub提供TTS模型研究、架构适配及评估的系统化工作流。涵盖文献调研、资源搜集、社区实现查找、训练准备及客观主观指标评估,辅助用户高效开展语音合成技术研究与落地。
Trigger Scenarios
研究TTS架构或范式
将TTS模型适配到新语言
寻找TTS官方或社区代码库
规划TTS训练与评估方案
Install
npx skills add NeverSight/learn-skills.dev --skill tts-researcher -g -y
SKILL.md
Frontmatter
{
"name": "tts-researcher",
"description": "Methodical research and implementation workflow for Text-to-Speech (TTS) models. Use when gathering academic resources, locating official and community repos, adapting architectures to new languages, training, or evaluating TTS systems."
}
TTS Researcher
A specialized workflow for gathering resources, adapting architectures, and evaluating Text-to-Speech AI models.
Methodical Research Workflow
When tasked with researching a TTS architecture or language adaptation, follow this sequence:
1. Architectural Deep-Dive
- Identify Core Papers: Find the foundational paper(s), any follow-ups, and the official repository.
- Analyze Model Paradigm: Determine if it is Autoregressive (AR), Non-Autoregressive (NAR), Diffusion-based, or Flow-matching.
- Reference: See references/tts-landscape.md for a guide on major paradigms.
2. Resource Gathering (Academic + Implementation)
- Primary Sources: Collect papers, surveys, and benchmark reports that define the architecture and its evaluation.
- Official Repos: Locate the original or organization-owned GitHub repo and note license, last update, and reproducibility status.
- Community Repos: Search for forks and third-party implementations, especially ones adapted to new languages.
- Report Findings: Provide a short list of the most credible and relevant sources before moving on.
3. Community & Adaptation Research
- Locate Custom Implementations: Search GitHub for forks or unofficial implementations, specifically those targeting different languages (e.g., "MeloTTS-German", "VITS-Spanish").
- Technical Hurdles: Note any language-specific challenges mentioned in community discussions (e.g., pitch accent, tonal languages, specific phonemizers).
4. Training & Adaptation Requirements
- Data Preparation: Identify the required dataset format (e.g., LJSpeech, LibriTTS) and transcription needs.
- Phonemization: Verify the availability of phonemizers (e.g.,
espeak-ng,gruut) for the target language. - Hardware: Estimate VRAM requirements for training and inference.
- Reference: Use references/adaptation-checklist.md to verify readiness.
5. Evaluation Strategy
- Objective Metrics: Plan for MCD (Mel Cepstral Distortion), WER (Word Error Rate), and F0 Correlation.
- Subjective Metrics: Design a MOS (Mean Opinion Score) test or MUSHRA test if applicable.
- Reference: See references/evaluation-guide.md for metric definitions and tools.
Recommended Tools
google_web_search: For finding recent papers and repos.paper_search: For deep academic research on Hugging Face Hub.hub_repo_search: For finding models and datasets on Hugging Face.query-docs: For technical implementation details of libraries (e.g.,torch,transformers).
Version History
- e0220ca Current 2026-07-05 23:31


