dpo_reader.discourse¶

Discourse thread fetcher and parser.

Functions

`fetch_thread`(url[, max_posts])	Fetch a complete Discourse thread.
`fetch_thread_sync`(url[, max_posts])	Synchronous wrapper for fetch_thread.
`html_to_text`(html)	Convert HTML content to plain text.
`parse_discourse_url`(url)	Extract base URL and topic identifier from a Discourse thread URL.

Classes

`Post`	A single post from a Discourse thread.
`Thread`	A complete Discourse thread.

class dpo_reader.discourse.Post[source]¶

Bases: object

A single post from a Discourse thread.

id: int¶

number: int¶

author: str¶

username: str¶

content: str¶

created_at: str¶

reply_to: int | None = None¶

__init__(id, number, author, username, content, created_at, reply_to=None)¶

Parameters:

id (int)
number (int)
author (str)
username (str)
content (str)
created_at (str)
reply_to (int | None)

Return type:

None

class dpo_reader.discourse.Thread[source]¶

Bases: object

A complete Discourse thread.

id: int¶

title: str¶

url: str¶

posts: list[Post]¶

property authors: set[str]¶: Get unique authors in the thread.

property author_post_counts: dict[str, int]¶: Get post count per author, sorted by count descending.

__init__(id, title, url, posts)¶

Parameters:

id (int)
title (str)
url (str)
posts (list[Post])

Return type:

None

dpo_reader.discourse.parse_discourse_url(url)[source]¶

Extract base URL and topic identifier from a Discourse thread URL.

Parameters:: url (str) – Full Discourse thread URL
Returns:: Tuple of (base_url, topic_identifier) where identifier can be ID or slug
Return type:: tuple[str, str]

dpo_reader.discourse.html_to_text(html)[source]¶

Convert HTML content to plain text.

Parameters:: html (str)
Return type:: str

async dpo_reader.discourse.fetch_thread(url, max_posts=None)[source]¶

Fetch a complete Discourse thread.

Parameters:

url (str) – The Discourse thread URL
max_posts (int | None) – Maximum number of posts to fetch (None for all)

Returns:

Thread object with all posts

Return type:

Thread

dpo_reader.discourse.fetch_thread_sync(url, max_posts=None)[source]¶

Synchronous wrapper for fetch_thread.

Parameters:

url (str)
max_posts (int | None)

Return type:

Thread