音频处理 | Pika’s Homepage

主要涉及到的库/包/模块：

wave：二进制音频流的I/O，方便处理音频头，bytes表示音频数据；
pyaudio：音频的播放和记录，bytes 表示音频数据；
pydub：音频的编解码、处理，支持播放接口；

其他音频库简介：

音频I/O:

wavio：读写WAV音频文件到内存，numpy.array数据格式表示音频；
soundfile: 读写多种（libsndfile库）音频文件到内存，numpy.array数据格式；

音频播放/记录：

playsound：纯python，无依赖，纯净的音频播放库，直接播放WAV和MP3；
simpleaudio: 音频播放，支持WAV文件、bytes 和numpy.array 的播放；

import simpleaudio as sa

filename = 'myfile.wav'
wave_obj = sa.WaveObject.from_wave_file(filename)
play_obj = wave_obj.play()
play_obj.wait_done()  # Wait until sound has finished playing

sounddevice: 音频播放和记录（基于PortAudio库），读取音频成numpy.array 然后播放，或记录采样点到numpy.array 形式；

库介绍

wave

该模块仅支持WAVE_FORMAT_PCM文件的I/O。

wave.open(file, mode==None)

file可以是字符串（文件名），也可以是file-like对象，不同于built-in的open函数。

仅支持‘rb’和‘wb’模式，分别返回Wave_read对象或Wave_write对象。

wave.Wave_read对象

用于读取WAV文件的对象。

close(): 关闭文件流；

getnchannels(): 返回音频通道；

getsampwidth(): 返回量化深度；

getnframes(): 返回音频帧数（采样点总数）；

getparams(): 返回6元音频信息的命名数组，即(nchannels, sampwidth, framerate, nframes, comptype, compname)。

readframes(n): 返回最多n帧数据，n个采样点，bytes 类型对象，n为空时返回全部；

rewind(): 倒带到开头；

tell(): 返回当前位置；

setpos(pos): 设置文件指针到指定位置；

wave.Wave_write对象

用于写入WAV文件的对象。支持raw PCM和带头的WAV。

set*(n): 单独设置音频头信息，如setnchannels(n)，setsampwidth(n)等等；

setparams(tuple): 一次设置五个音频头信息，5元数组，注意params数值都是str或整数；

writeframesraw(data): 将bytes对象写入输出流，同时不会更新头信息nframes；

writeframes(data)：写入输出流，同时自动更新头信息nframes；

注意设置音频头信息必须要在写入数据之前，否则引发异常。

pyaudio

PortAudio库的python封装，用于音频的播放和记录。

可以用wave读取/保存音频，然后用pyaudio播放/记录音频。

创建接口

# Create an interface to PortAudio
p = pyaudio.PyAudio()

需要创建一个音频流用于播放或记录

# Open a .Stream object to write the WAV file to
# 'output = True' indicates that the sound will be played rather than recorded
stream = p.open(format = p.get_format_from_width(wf.getsampwidth()),
                channels = wf.getnchannels(),
                rate = wf.getframerate(),
                output = True)

播放音频

# Play the sound by writing the audio data to the stream
while data != b'':
    stream.write(data)
    data = wf.readframes(chunk)

记录音频，同样先创建音频流

# 'output = False' indicates that the sound will be recorded rather than played
chunk = 1024  # Record in chunks of 1024 samples
sample_format = pyaudio.paInt16  # 16 bits per sample
channels = 2
fs = 44100  # Record at 44100 samples per second
stream = p.open(format=sample_format,
                channels=channels,
                rate=fs,
                frames_per_buffer=chunk,
                input=True)

然后，录制指定时长

frames = []  # Initialize array to store frames
# Store data in chunks for 3 seconds
for i in range(0, int(fs / chunk * seconds)):
    data = stream.read(chunk)
    frames.append(data)

关闭接口

# Close and terminate the stream
stream.close()
p.terminate()

pydub

基于ffmpeg实现，支持各种音频格式编解码转换和操作音频。

也支持playback，需要安装其他库，如simpleaudio或pyaudio。

读取/保存各种格式：

# 读取
song = AudioSegment.from_wav("never_gonna_give_you_up.wav")
song = AudioSegment.from_mp3("never_gonna_give_you_up.mp3")
# 保存
awesome.export("mashup.mp3", format="mp3")

ffmpeg支持的各种参数，pydub也支持。

# Use preset mp3 quality 0 (equivalent to lame V0)
awesome.export("mashup.mp3", format="mp3", parameters=["-q:a", "0"])
# Mix down to two channels and set hard output volume
awesome.export("mashup.mp3", format="mp3", parameters=["-ac", "2", "-vol", "150"])

读取各种格式的音频文件：AudioSegment(…).from_file()

保存各种格式的音频文件：AudioSegment(…).export()

支持音频操作，如：

# sound1 6 dB louder, then 3.5 dB quieter
louder = sound1 + 6
quieter = sound1 - 3.5

# sound1, with sound2 appended
combined = sound1 + sound2

# sound1 repeated 3 times
repeated = sound1 * 3

# duration
duration_in_milliseconds = len(sound1)

注意不同于采样点数量，音频长度用ms表示。

pydub.AudioSegment

不可变对象，用于操作音频。

除了读取音频文件生成，也可以从内存二进制数据生成：

# Advanced usage, if you have raw audio data:
sound = AudioSegment(
    # raw audio data (bytes)
    data=b'…',

    # 2 byte (16 bit) samples
    sample_width=2,

    # 44.1 kHz frame rate
    frame_rate=44100,

    # stereo
    channels=2
)

该构建方式可以和wave 集合，完成格式转换。

错误、异常和警告

Dataset, Sampler, DataLoader