检测音频内容中的语音和音乐

Iroro Orife, Chih-Wei Wu and Yun-Ning (Amy) Hung

Iroro OrifeChih-Wei WuYun-Ning (Amy) Hung

Introduction

介绍

When you enjoy the latest season of Stranger Things or Casa de Papel (Money Heist), have you ever wondered about the secrets to fantastic story-telling, besides the stunning visual presentation? From the violin melody accompanying a pivotal scene to the soaring orchestral arrangement and thunderous sound-effects propelling an edge-of-your-seat action sequence, the various components of the audio soundtrack combine to evoke the very essence of story-telling. To uncover the magic of audio soundtracks and further improve the sonic experience, we need a way to systematically examine the interaction of these components, typically categorized as dialogue, music and effects.

当你欣赏最新一季的《怪奇物语》或《纸钞屋》时,除了令人惊叹的视觉呈现,你是否曾经想过令人着迷的故事背后的秘密?从伴随关键场景的小提琴旋律到高亢的管弦乐编排和推动紧张动作场景的雷鸣般的音效,音频配乐的各个组成部分共同唤起了故事讲述的真正精髓。为了揭示音频配乐的魔力并进一步提升声音体验,我们需要一种系统地研究这些组成部分的相互作用,通常被归类为对话、音乐和效果

In this blog post, we will introduce speech and music detection as an enabling technology for a variety of audio applications in Film & TV, as well as introduce our speech and music activity detection (SMAD) system which we recently published as a journal article in EURASIP Journal on Audio, Speech, and Music Processing.

在这篇博文中,我们将介绍语音和音乐检测作为电影和电视中各种音频应用的一种使能技术,同时介绍我们最近在EURASIP音频、语音和音乐处理期刊上发表的语音和音乐活动检测(SMAD)系统的期刊文章

Like semantic segmentation for audio, SMAD separately tracks the amount of speech and music in each frame in an audio file and is useful in content understanding tasks during the audio production and delivery lifecycle. The detailed temporal metadata SMAD provides about speech and music regions in a polyphonic audio mixture are a first step for structural audio segmentation, indexing and pre-processing audio for the following downstream tasks. Let’s have a look at a few applications.

与音频的语义分割类似,SMAD可以在音频文件的每一帧中分别跟踪语音和音乐的数量,并在音频制作和交付生命周期中的内容理解任务中发挥作用。SMAD提供的关于多音频混合中语音和音乐区域的详细时间元数据是结构化音频分割、索引和预处理音频用于后续任务的第一步。让我们来看一些应用。

Practical use cases for speech & musi...

开通本站会员,查看完整译文。

Accueil - Wiki
Copyright © 2011-2024 iteam. Current version is 2.137.1. UTC+08:00, 2024-11-15 14:05
浙ICP备14020137号-1 $Carte des visiteurs$