Windows：Faster Whisperを使用可能にする

概要

Faster Whisper(https://github.com/guillaumekln/faster-whisper) をインストールし、
正常動作するところまでを確認(現時点の最新版であるv0.5.1を使用)
Faster Whisperは、OpenAIのWhisperを再実装し、Whisperより4倍速くなったらしい

使用したPCのスペック

CPU：Core i7 11800H
メモリ：16GB
GPU：GeForce RTX 3070
OS：Windows 11 Home

GPUを使用可能にするための準備

Faster Whisper の処理をGPUで実行するために必要なソフトウェアをインストールする

CUDA Toolkitのダウンロード

CUDA Toolkit の最新バージョンは、12.X 系だが、
Faster Whisper は対応していないため、11.X 系をインストールする

https://developer.nvidia.com/cuda-11-8-0-download-archive

から CUDA Toolkit 11.8 をダウンロードする

Operating System：Windows
Architecture：x86_64
Version：11
Installer Type：exe(network)

を選択し、インストーラーをダウンロードする

CUDA Toolkitのインストール

ダウンロードしたインストーラーを実行し、CUDA Toolkit 11.8 をインストールする
(割と時間が掛かる)

Zlibのダウンロード

http://www.winimage.com/zLibDll/zlib123dllx64.zip

から Zlib をダウンロードする

Zlibの展開

ダウンロードしたzipファイルを展開し、zlibwapi.dll を任意のディレクトリに格納する
(ここでは、C:\Program Files\Zlib を使用する)

cuDNNのダウンロード

※NVIDIA DEVELOPERアカウントが必要

https://developer.nvidia.com/cudnn

から cuDNN をダウンロードする
CUDA 11.X 系に対応したものをダウンロードする必要がある

cuDNNの展開

ダウンロードしたzipファイルを展開し、

bin
include
lib

を任意のディレクトリに格納する (ここでは、C:\Program Files\NVIDIA\CUDNN\v8.9 を使用する)

環境変数の追加

control sysdm.cpl

を実行し、システムのプロパティウィンドウを開く

詳細設定タブをクリックし、環境変数ボタンを押下する

システム環境変数のPathを選択し、編集ボタンを押下する

環境変数に
・C:\Program Files\Zlib
・C:\Program Files\NVIDIA\CUDNN\v8.9\bin
を追加する

Faster Whisperの実行環境を作成

Pythonのインストール

Python 3.10.11 を使用した

Windows用Pythonのダウンロード先
https://www.python.org/downloads/windows/

venv環境を作成

mkdir C:\work\faster_whisper
cd C:\work\faster_whisper
py -3.10 -m venv venv

必要なPython packageをインストール

venv\Scripts\activate
pip install ctranslate2
pip install faster-whisper

Faster Whisperの動作確認

動作確認用コード

from faster_whisper import WhisperModel

model_size = "large-v2"

# Run on GPU with FP16
model = WhisperModel(model_size, device="cuda", compute_type="float16")

# or run on GPU with INT8
# model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
# model = WhisperModel(model_size, device="cpu", compute_type="int8")

segments, info = model.transcribe("stereo_diarization.wav", beam_size=5)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

文字起こしする音源は、FasterWhisperのGitHubリポジトリ内のテストデータを使用した

https://github.com/guillaumekln/faster-whisper/raw/master/tests/data/stereo_diarization.wav

実行結果

Detected language 'en' with probability 0.999023
[0.00s -> 4.00s]  He began a confused complaint against the wizard, who had vanished behind the curtain
[4.00s -> 4.72s]  on the left.

参考

Faster Whisper

https://github.com/guillaumekln/faster-whisper

cuDNNのインストール手順

https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#install-windows

戻る

目次

概要