Substack Article Fetcher

Fetch Substack articles and comments programmatically. Supports authenticated access for paid subscriber content.

Installation

# Requires Python 3.12+
python3.12 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Quick Start

Fetch a public post with comments

python main.py --url https://newsletter.substack.com/p/post-slug

Fetch paid/paywalled content

You need to provide your session cookies from a logged-in browser session.

Export your cookies:
- Log into Substack in your browser
- Open Developer Tools (F12) > Application > Cookies
- Find cookies for .substack.com
- Copy substack.sid, substack.lli, and connect.sid values
- Create cookies.json based on cookies.example.json

Fetch with authentication:

python main.py --url https://newsletter.substack.com/p/paid-post --cookies cookies.json

Command Line Arguments

Input Options

Argument	Description
`--url URL`	URL of a specific Substack post to fetch
`--newsletter URL`	URL of a Substack newsletter to fetch multiple posts
`--from-list FILE`	Path to a posts list JSON file (from `--list --output-dir`)

Output Options

Argument	Description
`--output`, `-o FILE`	Output file path for single post (default: stdout)
`--output-dir DIR`	Output directory for saving JSON files

Authentication

Argument	Description
`--cookies FILE`	Path to cookies JSON file for authenticated access

Fetch Options

Argument	Default	Description
`--limit N`	10	Maximum number of posts to fetch
`--all`	-	Fetch all posts (overrides `--limit`)
`--list`	-	List posts without fetching content
`--no-comments`	-	Don't fetch comments
`--resume`	-	Skip posts that already exist in output directory

Date Filtering

Argument	Description
`--since YYYY-MM-DD`	Only include posts from this date onwards
`--until YYYY-MM-DD`	Only include posts up to this date

Rate Limiting

Argument	Default	Description
`--delay N`	2.0	Delay in seconds between requests
`--jitter N`	1.0	Random jitter (0 to N seconds) added to delay

Debug

Argument	Description
`--verbose`, `-v`	Enable verbose output for debugging

Usage Examples

Fetch a single post

# Output to stdout
python main.py --url https://newsletter.substack.com/p/post-slug

# Save to file
python main.py --url https://newsletter.substack.com/p/post-slug --output post.json

# Without comments
python main.py --url https://newsletter.substack.com/p/post-slug --no-comments

# With authentication for paid content
python main.py --url https://newsletter.substack.com/p/paid-post --cookies cookies.json

List posts from a newsletter

# List latest 10 posts (JSON to stdout)
python main.py --newsletter https://newsletter.substack.com --list

# List all posts and save to file
python main.py --newsletter https://newsletter.substack.com --list --all --output-dir ./posts

# List posts from a specific year
python main.py --newsletter https://newsletter.substack.com --list --all --since 2024-01-01 --until 2024-12-31

# List posts from 2025 onwards
python main.py --newsletter https://newsletter.substack.com --list --all --since 2025-01-01

Fetch multiple posts

# Fetch latest 20 posts to directory
python main.py --newsletter https://newsletter.substack.com --limit 20 --output-dir ./posts

# Fetch all posts with custom delay
python main.py --newsletter https://newsletter.substack.com --all --output-dir ./posts --delay 3 --jitter 2

Resumable batch fetching (recommended workflow)

For large newsletters, use a two-step process:

# Step 1: Get the list of all posts (fast, single API call)
python main.py --newsletter https://newsletter.substack.com --list --all --output-dir ./posts

# Step 2: Fetch posts from the list with resume support
python main.py --from-list ./posts/newsletter_posts.json --output-dir ./posts --resume --verbose

# If interrupted, just run the same command again - it skips already downloaded posts
python main.py --from-list ./posts/newsletter_posts.json --output-dir ./posts --resume

# Fetch only posts from a specific date range
python main.py --from-list ./posts/newsletter_posts.json --output-dir ./posts --resume --since 2024-01-01 --until 2024-12-31

Debug network issues

# Enable verbose output to see what's happening
python main.py --newsletter https://newsletter.substack.com --list --all --verbose

Python API

from src.substack_fetcher import SubstackFetcher

# Initialize (with optional authentication)
fetcher = SubstackFetcher(cookies_path="cookies.json")

# Get a single post with comments
post = fetcher.get_post_with_comments("https://newsletter.substack.com/p/post-slug")
print(post["metadata"]["title"])
print(f"Comments: {post['comment_count']}")

# Get just comments
comments = fetcher.get_comments("https://newsletter.substack.com/p/post-slug")

# List posts from a newsletter
posts = fetcher.get_posts("https://newsletter.substack.com", limit=20)

# Fetch and save multiple posts
fetcher.fetch_all_posts(
    "https://newsletter.substack.com",
    output_dir="./posts",
    limit=50,
    include_comments=True,
    delay=2.0,
    jitter=1.0,
)

Cookie Setup (for paid content)

Create a cookies.json file with your session cookies:

[
  {
    "name": "substack.sid",
    "value": "your_session_id",
    "domain": ".substack.com",
    "path": "/",
    "secure": true
  },
  {
    "name": "substack.lli",
    "value": "your_lli_value",
    "domain": ".substack.com",
    "path": "/",
    "secure": true
  },
  {
    "name": "connect.sid",
    "value": "your_connect_sid",
    "domain": ".substack.com",
    "path": "/",
    "secure": true
  }
]

How to get your cookies

Open your browser and log into Substack
Open Developer Tools (F12 or Cmd+Option+I)
Go to Application tab (Chrome) or Storage tab (Firefox)
Expand Cookies and select https://substack.com
Copy the values for the cookies listed above

Important: Only use this to access content you're legitimately subscribed to.

Output Format

Single post JSON

{
  "metadata": {
    "id": 123456,
    "title": "Post Title",
    "subtitle": "Post subtitle",
    "slug": "post-slug",
    "post_date": "2024-01-15T12:00:00.000Z",
    "audience": "everyone",
    "previous_post_slug": "previous-post",
    "next_post_slug": "next-post"
  },
  "content": "<html content>",
  "is_paywalled": false,
  "comments": [
    {
      "id": 789,
      "body": "Comment text",
      "user": {"name": "Commenter Name"},
      "date": "2024-01-16T10:00:00.000Z"
    }
  ],
  "comment_count": 42
}

Posts list JSON (from `--list`)

{
  "newsletter": "https://newsletter.substack.com",
  "count": 150,
  "posts": [
    {
      "title": "Post Title",
      "slug": "post-slug",
      "url": "https://newsletter.substack.com/p/post-slug",
      "date": "2024-01-15",
      "audience": "everyone",
      "subtitle": "Post subtitle"
    }
  ]
}

Disclaimer

This tool uses unofficial APIs and is not affiliated with or endorsed by Substack. Use responsibly and in compliance with Substack's Terms of Service. Only access content you have legitimate access to.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
db		db
docs		docs
embedded_vectordb		embedded_vectordb
external_sources		external_sources
motherduck		motherduck
src		src
telegram		telegram
transcribe		transcribe
twitter		twitter
weaviate		weaviate
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
DATABASE.md		DATABASE.md
INGEST_NEW_BLOG.md		INGEST_NEW_BLOG.md
INGEST_SINGLE_POST.md		INGEST_SINGLE_POST.md
README.md		README.md
VIDEO_CHAPTERS.md		VIDEO_CHAPTERS.md
WEAVIATE.md		WEAVIATE.md
analyze_media.py		analyze_media.py
cookies.example.json		cookies.example.json
download_media.py		download_media.py
ingest_single_post.py		ingest_single_post.py
ingest_transcripts.py		ingest_transcripts.py
load_html_posts.py		load_html_posts.py
load_posts.py		load_posts.py
main.py		main.py
refresh_blog.py		refresh_blog.py
requirements.txt		requirements.txt
restore_analysis.py		restore_analysis.py
substacks.example.json		substacks.example.json
transcribe_all.sh		transcribe_all.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Substack Article Fetcher

Installation

Quick Start

Fetch a public post with comments

Fetch paid/paywalled content

Command Line Arguments

Input Options

Output Options

Authentication

Fetch Options

Date Filtering

Rate Limiting

Debug

Usage Examples

Fetch a single post

List posts from a newsletter

Fetch multiple posts

Resumable batch fetching (recommended workflow)

Debug network issues

Python API

Cookie Setup (for paid content)

How to get your cookies

Output Format

Single post JSON

Posts list JSON (from `--list`)

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Substack Article Fetcher

Installation

Quick Start

Fetch a public post with comments

Fetch paid/paywalled content

Command Line Arguments

Input Options

Output Options

Authentication

Fetch Options

Date Filtering

Rate Limiting

Debug

Usage Examples

Fetch a single post

List posts from a newsletter

Fetch multiple posts

Resumable batch fetching (recommended workflow)

Debug network issues

Python API

Cookie Setup (for paid content)

How to get your cookies

Output Format

Single post JSON

Posts list JSON (from --list)

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Posts list JSON (from `--list`)

Packages