dpo_reader.discourse

Discourse thread fetcher and parser.

Functions

fetch_thread(url[, max_posts])

Fetch a complete Discourse thread.

fetch_thread_sync(url[, max_posts])

Synchronous wrapper for fetch_thread.

html_to_text(html)

Convert HTML content to plain text.

parse_discourse_url(url)

Extract base URL and topic identifier from a Discourse thread URL.

Classes

Post

A single post from a Discourse thread.

Thread

A complete Discourse thread.

class dpo_reader.discourse.Post[source]

Bases: object

A single post from a Discourse thread.

id: int
number: int
author: str
username: str
content: str
created_at: str
reply_to: int | None = None
__init__(id, number, author, username, content, created_at, reply_to=None)
Parameters:
Return type:

None

class dpo_reader.discourse.Thread[source]

Bases: object

A complete Discourse thread.

id: int
title: str
url: str
posts: list[Post]
property authors: set[str]

Get unique authors in the thread.

property author_post_counts: dict[str, int]

Get post count per author, sorted by count descending.

__init__(id, title, url, posts)
Parameters:
Return type:

None

dpo_reader.discourse.parse_discourse_url(url)[source]

Extract base URL and topic identifier from a Discourse thread URL.

Parameters:

url (str) – Full Discourse thread URL

Returns:

Tuple of (base_url, topic_identifier) where identifier can be ID or slug

Return type:

tuple[str, str]

dpo_reader.discourse.html_to_text(html)[source]

Convert HTML content to plain text.

Parameters:

html (str)

Return type:

str

async dpo_reader.discourse.fetch_thread(url, max_posts=None)[source]

Fetch a complete Discourse thread.

Parameters:
  • url (str) – The Discourse thread URL

  • max_posts (int | None) – Maximum number of posts to fetch (None for all)

Returns:

Thread object with all posts

Return type:

Thread

dpo_reader.discourse.fetch_thread_sync(url, max_posts=None)[source]

Synchronous wrapper for fetch_thread.

Parameters:
  • url (str)

  • max_posts (int | None)

Return type:

Thread