dpo_reader.tts¶

TTS backends for DPO Reader.

class dpo_reader.tts.BarkBackend[source]¶

Bases: TTSBackend

Bark TTS backend.

__init__()[source]¶

get_voices()[source]¶

Return list of available voice IDs.

Return type:: list[str]

name: str = 'bark'¶

narrator_voice: str = 'v2/en_speaker_0'¶

sample_rate: int = 24000¶

synthesize(text, voice)[source]¶

Synthesize text to audio.

Parameters:

text (str) – Text to synthesize
voice (str) – Voice ID to use

Returns:

Audio as float32 numpy array

Return type:

ndarray

class dpo_reader.tts.OpenAIBackend[source]¶

Bases: TTSBackend

OpenAI TTS backend using their API.

Requires OPENAI_API_KEY environment variable.

__init__(model='tts-1')[source]¶

Initialize OpenAI TTS backend.

Parameters:: model (str) – Model to use - “tts-1” (faster) or “tts-1-hd” (higher quality)

get_voices()[source]¶

Return list of available voice IDs.

Return type:: list[str]

name: str = 'openai'¶

narrator_voice: str = 'onyx'¶

sample_rate: int = 24000¶

synthesize(text, voice)[source]¶

Synthesize text to audio using OpenAI TTS API.

Parameters:

text (str) – Text to synthesize
voice (str) – Voice ID to use (alloy, echo, fable, onyx, nova, shimmer)

Returns:

Audio as float32 numpy array

Return type:

ndarray

class dpo_reader.tts.PiperBackend[source]¶

Bases: TTSBackend

Piper TTS backend using piper-tts package.

__init__(model_dir=None)[source]¶

Initialize Piper backend.

Parameters:: model_dir (Path | None) – Directory to store/load models. Defaults to ~/.local/share/piper

get_voices()[source]¶

Return list of available voice IDs.

Return type:: list[str]

name: str = 'piper'¶

narrator_voice: str = 'libritts'¶

sample_rate: int = 22050¶

synthesize(text, voice)[source]¶

Synthesize text using Piper.

Parameters:

text (str)
voice (str)

Return type:

ndarray

class dpo_reader.tts.TTSBackend[source]¶

Bases: ABC

Abstract base class for TTS backends.

generate_silence(duration_seconds)[source]¶

Generate silence of specified duration.

Parameters:: duration_seconds (float)
Return type:: ndarray

abstractmethod get_voices()[source]¶

Return list of available voice IDs.

Return type:: list[str]

name: str = 'base'¶

narrator_voice: str = 'default'¶

sample_rate: int = 24000¶

abstractmethod synthesize(text, voice)[source]¶

Synthesize text to audio.

Parameters:

text (str) – Text to synthesize
voice (str) – Voice ID to use

Returns:

Audio as float32 numpy array

Return type:

ndarray

class dpo_reader.tts.TTSGenerator[source]¶

Bases: object

High-level TTS generator with caching and progress tracking.

__init__(backend, voice_assignment, cache_dir=None, include_attribution=True, pause_between_posts=1.5, narrator_voice=None)[source]¶

Parameters:

backend (TTSBackend)
voice_assignment (VoiceAssignment)
cache_dir (Path | None)
include_attribution (bool)
pause_between_posts (float)
narrator_voice (str | None)

generate_all(posts, progress_callback=None, return_segments=False)[source]¶

Generate audio for all posts.

Parameters:

posts (list[Post]) – List of posts to convert
progress_callback (Callable[..., Any] | None) – Optional callback(current, total, post)
return_segments (bool) – If True, return (audio, segments) where segments contains start/end sample positions for each post

Returns:

Audio array, or tuple of (audio, segments) if return_segments=True

Return type:

np.ndarray | tuple[np.ndarray, list[dict]]

generate_post(post)[source]¶

Generate audio for a post, using cache if available.

Uses narrator voice for attribution (“Author says:”) and the author’s assigned voice for actual content.

Returns:: Tuple of (audio_array, attribution_samples) where attribution_samples is the number of samples used for the “Author says:” portion.
Parameters:: post (Post)
Return type:: tuple[np.ndarray, int]

generate_streaming(posts, progress_callback=None)[source]¶

Generate audio segments one at a time (yields as generated).

Yields:

Tuple of (audio_chunk, segment_info, post_index, total_posts)

Parameters:

posts (list[Post])
progress_callback (Callable[..., Any] | None)

Modules

`bark`	Bark TTS backend - highest quality, GPU recommended for speed.
`base`	Base TTS backend interface.
`openai`	OpenAI TTS backend - high quality cloud voices.
`piper`	Piper TTS backend - fast, good quality, works on CPU.