dpo_reader.tts

TTS backends for DPO Reader.

class dpo_reader.tts.BarkBackend[source]

Bases: TTSBackend

Bark TTS backend.

__init__()[source]
get_voices()[source]

Return list of available voice IDs.

Return type:

list[str]

name: str = 'bark'
narrator_voice: str = 'v2/en_speaker_0'
sample_rate: int = 24000
synthesize(text, voice)[source]

Synthesize text to audio.

Parameters:
  • text (str) – Text to synthesize

  • voice (str) – Voice ID to use

Returns:

Audio as float32 numpy array

Return type:

ndarray

class dpo_reader.tts.OpenAIBackend[source]

Bases: TTSBackend

OpenAI TTS backend using their API.

Requires OPENAI_API_KEY environment variable.

__init__(model='tts-1')[source]

Initialize OpenAI TTS backend.

Parameters:

model (str) – Model to use - “tts-1” (faster) or “tts-1-hd” (higher quality)

get_voices()[source]

Return list of available voice IDs.

Return type:

list[str]

name: str = 'openai'
narrator_voice: str = 'onyx'
sample_rate: int = 24000
synthesize(text, voice)[source]

Synthesize text to audio using OpenAI TTS API.

Parameters:
  • text (str) – Text to synthesize

  • voice (str) – Voice ID to use (alloy, echo, fable, onyx, nova, shimmer)

Returns:

Audio as float32 numpy array

Return type:

ndarray

class dpo_reader.tts.PiperBackend[source]

Bases: TTSBackend

Piper TTS backend using piper-tts package.

__init__(model_dir=None)[source]

Initialize Piper backend.

Parameters:

model_dir (Path | None) – Directory to store/load models. Defaults to ~/.local/share/piper

get_voices()[source]

Return list of available voice IDs.

Return type:

list[str]

name: str = 'piper'
narrator_voice: str = 'libritts'
sample_rate: int = 22050
synthesize(text, voice)[source]

Synthesize text using Piper.

Parameters:
Return type:

ndarray

class dpo_reader.tts.TTSBackend[source]

Bases: ABC

Abstract base class for TTS backends.

generate_silence(duration_seconds)[source]

Generate silence of specified duration.

Parameters:

duration_seconds (float)

Return type:

ndarray

abstractmethod get_voices()[source]

Return list of available voice IDs.

Return type:

list[str]

name: str = 'base'
narrator_voice: str = 'default'
sample_rate: int = 24000
abstractmethod synthesize(text, voice)[source]

Synthesize text to audio.

Parameters:
  • text (str) – Text to synthesize

  • voice (str) – Voice ID to use

Returns:

Audio as float32 numpy array

Return type:

ndarray

class dpo_reader.tts.TTSGenerator[source]

Bases: object

High-level TTS generator with caching and progress tracking.

__init__(backend, voice_assignment, cache_dir=None, include_attribution=True, pause_between_posts=1.5, narrator_voice=None)[source]
Parameters:
generate_all(posts, progress_callback=None, return_segments=False)[source]

Generate audio for all posts.

Parameters:
  • posts (list[Post]) – List of posts to convert

  • progress_callback (Callable[..., Any] | None) – Optional callback(current, total, post)

  • return_segments (bool) – If True, return (audio, segments) where segments contains start/end sample positions for each post

Returns:

Audio array, or tuple of (audio, segments) if return_segments=True

Return type:

np.ndarray | tuple[np.ndarray, list[dict]]

generate_post(post)[source]

Generate audio for a post, using cache if available.

Uses narrator voice for attribution (“Author says:”) and the author’s assigned voice for actual content.

Returns:

Tuple of (audio_array, attribution_samples) where attribution_samples is the number of samples used for the “Author says:” portion.

Parameters:

post (Post)

Return type:

tuple[np.ndarray, int]

generate_streaming(posts, progress_callback=None)[source]

Generate audio segments one at a time (yields as generated).

Yields:

Tuple of (audio_chunk, segment_info, post_index, total_posts)

Parameters:
  • posts (list[Post])

  • progress_callback (Callable[..., Any] | None)

Modules

bark

Bark TTS backend - highest quality, GPU recommended for speed.

base

Base TTS backend interface.

openai

OpenAI TTS backend - high quality cloud voices.

piper

Piper TTS backend - fast, good quality, works on CPU.