docs: Revamp README with comprehensive documentation and troubleshooting guides

- Add quick start section with 3-step setup instructions
- Include prerequisites and platform compatibility information
- Expand credentials configuration with security best practices
- Add detailed configuration section with media constraints table
- Provide concrete usage examples for RSS, Twitter daemon, and systemd
- Include cron job integration examples for scheduling
- Add project structure diagram showing all key files and directories
- Create extensive troubleshooting section with common issues and solutions
- Add debugging guide with log analysis tips
- Include FAQ section addressing typical user questions
- Document use cases and real-world scenarios
- Add security notes for credential management
- Improve contributing guidelines with step-by-step workflow
- Enhance formatting with emojis, tables, and better organization
- Replace vague descriptions with actionable, specific guidance

This makes the documentation suitable for both beginner and advanced users while providing clear paths for setup, usage, and troubleshooting.

Co-authored-by: Copilot <copilot@github.com>
This commit is contained in:
Guillem Hernandez Sola
2026-04-28 06:56:13 +02:00
parent 0fbfd68585
commit 40b379e261

493
README.md
View File

@@ -1,104 +1,497 @@
# post2bsky # post2bsky
post2bsky is a Python-based tool for automatically posting content from RSS feeds and Twitter accounts to Bluesky (AT Protocol). It supports both RSS-to-Bluesky and Twitter-to-Bluesky synchronization, with configurable workflows for various sources. A Python-based automation tool for reposting content to Bluesky from RSS feeds and Twitter accounts. Includes a daemon mode for continuous operation with comprehensive media support, deduplication, and extensive logging.
## Features **Note**: This tool is designed for content creators and maintainers who need to automatically synchronize feeds/accounts to Bluesky. Ensure you have permission to repost content and comply with all platform terms of service.
- **RSS to Bluesky**: Parse RSS feeds and post new entries to Bluesky with proper formatting and media handling. ## ✨ Features
- **Twitter to Bluesky**: Scrape tweets from specified Twitter accounts and repost them to Bluesky, including media attachments.
- **Daemon Mode**: Run as a background service for continuous monitoring and posting. - **RSS → Bluesky**: Parse RSS feeds and automatically post new entries with proper formatting
- **Configurable Workflows**: Use YAML-based workflows to define sources, schedules, and posting rules. - **Twitter → Bluesky**: Scrape tweets from Twitter accounts and repost to Bluesky (with media)
- **Media Support**: Handle images, videos, and other media from feeds and tweets. - **Daemon Mode**: Run continuously as a background service for unattended operation
- **Deduplication**: Prevent duplicate posts using state tracking. - **Media Support**: Handle images, videos, and other media with automatic optimization
- **Logging**: Comprehensive logging for monitoring and debugging. - **Deduplication**: Track posted content to prevent duplicates across runs
- **Configurable Workflows**: YAML-based pipelines for each source with scheduling
- **Media Constraints**: Auto-handles Bluesky's limits (300 chars, 4 images, 45MB video, etc.)
- **Error Recovery**: Automatic retries with exponential backoff for transient failures
- **Comprehensive Logging**: Detailed logs for monitoring and troubleshooting
## 📋 Prerequisites
- Python 3.9 or higher
- macOS, Linux, or Windows with Chromium support (for Twitter scraping)
- Bluesky account with credentials
- Twitter account (if using Twitter→Bluesky syncing)
## 🚀 Quick Start
### 1. Clone & Setup Environment
```bash
git clone <repository-url>
cd post2bsky
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requeriments.txt
```
### 2. Configure Credentials
Create a `.env` file in the project root:
```env
# Bluesky Authentication
BSKY_USERNAME=your_bluesky_handle
BSKY_PASSWORD=your_bluesky_password
# Optional: Custom Bluesky instance (default: https://bsky.social)
BSKY_BASE_URL=https://bsky.social
# Twitter Authentication (if using Twitter syncing)
TWITTER_USERNAME=your_twitter_handle
TWITTER_PASSWORD=your_twitter_password
```
### 3. Run a Quick Test
```bash
# Test RSS feed posting
python rss2bsky.py --feed-url https://example.com/rss
# Test Twitter account scraping
python twitter2bsky_daemon.py --test
```
## Installation ## Installation
### Standard Installation
1. Clone the repository: 1. Clone the repository:
```bash ```bash
git clone https://github.com/yourusername/post2bsky.git git clone https://github.com/yourusername/post2bsky.git
cd post2bsky cd post2bsky
``` ```
2. Install Python dependencies: 2. Create and activate virtual environment:
```bash
python3 -m venv venv
source venv/bin/activate # macOS/Linux
# or
venv\Scripts\activate # Windows
```
3. Install dependencies:
```bash ```bash
pip install -r requeriments.txt pip install -r requeriments.txt
``` ```
3. Set up environment variables: 4. Set up environment variables:
Create a `.env` file with your Bluesky credentials: Create a `.env` file in the root directory (see [Credentials](#-credentials) section)
```
## ⚙️ Configuration
### Credentials
Your credentials should be stored in `.env` file at the project root. This file should never be committed to version control (already in `.gitignore`):
```env
BSKY_USERNAME=your_bluesky_handle BSKY_USERNAME=your_bluesky_handle
BSKY_PASSWORD=your_bluesky_password BSKY_PASSWORD=your_bluesky_password
# For Twitter scraping (email or username, and password)
TWITTER_USERNAME=your_twitter_username_or_email
TWITTER_PASSWORD=your_twitter_password
``` ```
For Twitter scraping, additional setup may be required (see Configuration). **Security Note**: Never commit credentials to Git. The `.env` file is automatically ignored.
## Configuration ### RSS Feed Configuration
### RSS Feeds Run `rss2bsky.py` to post from RSS feeds:
Use `rss2bsky.py` to post from RSS feeds. Configure the feed URL and other options via command-line arguments.
Example:
```bash ```bash
# Basic usage
python rss2bsky.py --feed-url https://example.com/rss --bsky-handle your_handle python rss2bsky.py --feed-url https://example.com/rss --bsky-handle your_handle
# With advanced options
python rss2bsky.py \
--feed-url https://example.com/rss \
--bsky-handle your_handle \
--max-posts 5 \
--limit-age 3 # Only posts from last 3 days
``` ```
### Twitter Accounts **State Management**: The tool tracks posted entries in `twitter2bsky_state.json` to prevent duplicates. This file is updated automatically on each run.
Use `twitter2bsky_daemon.py` for Twitter-to-Bluesky posting. It requires browser automation for scraping.
Configure Twitter accounts in the script or via environment variables. ### Twitter Account Configuration
### Workflows Configure Twitter accounts in `twitter2bsky_daemon.py`. The script uses Playwright for browser automation to scrape tweets:
The `workflows/` directory contains Jenkins pipeline configurations for automated runs. Each `.yml` file defines a pipeline for a specific source (e.g., `324.yml` for 324 RSS feed).
To run a workflow manually, use the `sync_runner.sh` script or execute the Python scripts directly.
## Usage
### Running RSS Sync
```bash ```bash
python rss2bsky.py [options] # Run Twitter daemon
python twitter2bsky_daemon.py
# Run with test mode (dry-run, no posting)
python twitter2bsky_daemon.py --test
# Specify custom state file
python twitter2bsky_daemon.py --state-file custom_state.json
``` ```
Options: **Twitter Scraping Details**:
- `--feed-url`: URL of the RSS feed - Uses Playwright Chromium for headless browser automation
- `--bsky-handle`: Your Bluesky handle - Handles t.co URL redirects and link metadata
- Other options for filtering, formatting, etc. - Includes screenshot capture for error debugging
- Automatic retry with exponential backoff on failures
### Running Twitter Daemon ### Workflow Pipelines
The `workflows/` directory contains YAML pipeline configurations that define:
- Data source (RSS feed URL or Twitter handle)
- Posting schedule and frequency
- Content filtering rules
- Target Bluesky account
Example: `workflows/324.yml` defines the pipeline for the "324" RSS feed.
Each workflow typically has a corresponding Jenkins configuration in `jenkins/` for CI/CD integration.
**Running Workflows**:
```bash ```bash
python twitter2bsky_daemon.py [options] # Manual execution
./sync_runner.sh
# Run specific workflow
python rss2bsky.py --feed-url $(grep 'url:' workflows/324.yml | head -1 | cut -d' ' -f2)
``` ```
Options: ### Media Handling
- Configure Twitter accounts and Bluesky credentials
- Run in daemon mode for continuous operation The tool automatically optimizes media for Bluesky's constraints:
| Constraint | Value |
|-----------|-------|
| Image size limit | 950 KB per image |
| Image max dimension | 2000px (width or height) |
| Max images per post | 4 |
| Video size limit | 45 MB |
| Video max duration | 3 minutes |
| Thumbnail size | 950 KB |
| Text length | 300 characters (grapheme clusters) |
Images are automatically converted to JPEG with quality optimization (min 40-45 JPEG quality).
## 💻 Usage
### RSS to Bluesky (`rss2bsky.py`)
Post entries from RSS feeds to Bluesky:
```bash
# Simple usage
python rss2bsky.py --feed-url https://example.com/feed.xml --bsky-handle @your_handle
# Limit to recent posts
python rss2bsky.py --feed-url https://example.com/feed.xml --limit-age 7
# Dry run (preview without posting)
python rss2bsky.py --feed-url https://example.com/feed.xml --dry-run
```
**Output**: The script logs all actions to `twitter2bsky.log` and maintains state in `twitter2bsky_state.json`.
### Twitter to Bluesky (`twitter2bsky_daemon.py`)
Run continuously to sync tweets from specified accounts:
```bash
# Start daemon mode (continuous monitoring)
python twitter2bsky_daemon.py
# Run once and exit
python twitter2bsky_daemon.py --once
# Test mode (no actual posts to Bluesky)
python twitter2bsky_daemon.py --test
# Custom configuration
python twitter2bsky_daemon.py --max-retries 5 --timeout 30
```
**Features**:
- Automatically fetches new tweets from configured accounts
- Handles retweets, quotes, and threaded tweets
- Downloads and optimizes media attachments
- Resolves shortened t.co links to actual URLs
- Prevents duplicate posts with state tracking
### Running with Sync Runner
### Using Sync Runner
```bash ```bash
./sync_runner.sh ./sync_runner.sh
``` ```
This script can be used to run multiple syncs or integrate with cron jobs. This script can orchestrate multiple sources and is suitable for integration with cron jobs or systemd timers.
## Dependencies ### Daemon Mode Setup (systemd)
All Python dependencies are listed in `requeriments.txt`. Key packages include: To run `twitter2bsky_daemon.py` continuously as a system service on Linux:
- `atproto`: For Bluesky API interaction
- `fastfeedparser`: For RSS parsing
- `playwright`: For browser automation (Twitter scraping)
- `beautifulsoup4`: For HTML parsing
- And many others for media processing, logging, etc.
## License 1. Create service file `/etc/systemd/system/post2bsky.service`:
```ini
[Unit]
Description=post2bsky Twitter to Bluesky Daemon
After=network.target
[Service]
Type=simple
User=your_user
WorkingDirectory=/path/to/post2bsky
Environment="PATH=/path/to/post2bsky/venv/bin"
ExecStart=/path/to/post2bsky/venv/bin/python twitter2bsky_daemon.py
Restart=always
RestartSec=60
[Install]
WantedBy=multi-user.target
```
2. Enable and start:
```bash
sudo systemctl daemon-reload
sudo systemctl enable post2bsky
sudo systemctl start post2bsky
sudo systemctl status post2bsky
```
3. View logs:
```bash
tail -f twitter2bsky.log
```
### Cron Job Integration
Add to crontab with `crontab -e`:
```bash
# Run RSS sync every 30 minutes
*/30 * * * * cd /path/to/post2bsky && source venv/bin/activate && python rss2bsky.py --feed-url https://example.com/rss
# Run all workflows at 9 AM daily
0 9 * * * cd /path/to/post2bsky && ./sync_runner.sh
```
## 📦 Dependencies
All Python dependencies are listed in `requeriments.txt`. Key packages:
| Package | Purpose |
|---------|---------|
| `atproto` | Bluesky API client for posting |
| `fastfeedparser` | RSS/Atom feed parsing |
| `playwright` | Browser automation for Twitter scraping |
| `beautifulsoup4` | HTML parsing and content extraction |
| `pillow` | Image optimization and processing |
| `moviepy` | Video processing and duration detection |
| `grapheme` | Unicode grapheme cluster counting for Bluesky's text limits |
| `httpx` | HTTP client for URL resolution and media downloads |
| `python-dotenv` | Environment variable management |
| `arrow` | Date/time handling with timezone support |
Install all dependencies with:
```bash
pip install -r requeriments.txt
```
## 📁 Project Structure
```
post2bsky/
├── rss2bsky.py # RSS feed → Bluesky posting script
├── twitter2bsky_daemon.py # Twitter → Bluesky daemon (main logic)
├── twitter_login.py # Twitter authentication helper
├── cookie_login.py # Alternative login method
├── sync_runner.sh # Orchestration script for multiple sources
├── twitter2bsky_state.json # State file tracking posted content (auto-generated)
├── twitter2bsky.log # Application logs (auto-generated)
├── requeriments.txt # Python dependencies
├── README.md # This file
├── LICENSE # GNU GPLv3 license
├── jenkins/ # Jenkins CI/CD configurations
│ └── [account_name]Tw/ # Config for each account
├── workflows/ # YAML pipeline definitions
│ ├── 324.yml # Example: RSS feed for "324"
│ ├── fcbarcelona.yml # Example: Twitter account for FC Barcelona
│ └── ...
└── venv/ # Python virtual environment (created during setup)
```
## 🔧 Troubleshooting
### Authentication Issues
**Problem**: `Login failed: Invalid credentials`
**Solution**:
1. Verify credentials in `.env` are correct (no extra spaces)
2. Check if Bluesky account requires app password (Settings → App passwords)
3. If using 2FA, generate an app-specific password
4. For Twitter, ensure account isn't rate-limited or restricted
### Twitter Scraping Issues
**Problem**: `Playwright browser failed` or screenshot errors
**Solution**:
1. Ensure Chromium is properly installed: `playwright install chromium`
2. Check available disk space (Playwright requires ~500MB)
3. Run script with `--debug` flag for detailed output
4. Check browser error screenshots in `screenshot_*.png` files
**Problem**: `No tweets found` or `Tweets already posted`
**Solution**:
1. Verify Twitter account handle is correct in configuration
2. Check `twitter2bsky_state.json` for deduplication data
3. Delete state file to reset tracking (careful: may cause re-posting)
4. Review `twitter2bsky.log` for detailed debugging
### Media Processing Issues
**Problem**: `Image upload failed` or `Video too large`
**Solution**:
1. Images are auto-optimized, but source should be <100MB
2. Videos must be <45MB and <3 minutes
3. Check available disk space for temporary files
4. Enable debug logging in the script for detailed info
### Performance Issues
**Problem**: Script runs slowly or times out
**Solution**:
1. Check network connectivity
2. Reduce `SCRAPE_TWEET_LIMIT` in `twitter2bsky_daemon.py` (default: 30)
3. Increase timeout constants if on slow connection
4. Run with `--once` instead of daemon mode to diagnose
5. Check system resources (CPU, memory, disk I/O)
### Log Analysis
Check `twitter2bsky.log` for detailed debugging:
```bash
# View recent errors
grep ERROR twitter2bsky.log | tail -20
# View all warnings
grep WARNING twitter2bsky.log | tail -20
# Watch logs in real-time
tail -f twitter2bsky.log
# Count posts by status
grep -c "✅ Posted to Bluesky" twitter2bsky.log
```
## 🐛 Debugging
Enable debug logging by modifying the logging level in the script:
```python
# In twitter2bsky_daemon.py, change:
level=logging.INFO,
# To:
level=logging.DEBUG,
```
Run with verbose output:
```bash
python twitter2bsky_daemon.py 2>&1 | tee debug.log
```
Error screenshots are automatically saved as `screenshot_YYYYMMDD_HHMMSS.png` for investigation.
## 📄 License
This project is licensed under the GNU General Public License v3.0. See [LICENSE](LICENSE) for details. This project is licensed under the GNU General Public License v3.0. See [LICENSE](LICENSE) for details.
## Contributing **Summary**: You are free to use, modify, and distribute this software, but any modifications must also be open-source under GPLv3.
Contributions are welcome! Please open issues or submit pull requests on GitHub. ## 🤝 Contributing
Contributions are welcome! To contribute:
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes with clear commit messages
4. Push to your fork (`git push origin feature/amazing-feature`)
5. Open a Pull Request with a description of your changes
**Before submitting**:
- Test your changes thoroughly
- Ensure code follows existing style conventions
- Add comments for complex logic
- Update README if adding new features
## ❓ FAQ
**Q: Can I use this on Windows?**
A: Yes, but ensure you have Python 3.9+ and Chromium/Playwright support. Use `venv\Scripts\activate` instead of `source venv/bin/activate`.
**Q: How do I avoid posting duplicates?**
A: The state file (`twitter2bsky_state.json`) tracks all posted content. It's automatically maintained; just don't delete it between runs.
**Q: Can I post to multiple Bluesky accounts?**
A: Currently, the tool posts to one account per instance. Run multiple instances with different `.env` configurations to handle multiple accounts.
**Q: What happens if posting fails?**
A: The script has automatic retry logic with exponential backoff. Failed posts are logged but the state file is NOT updated, so retries on next run.
**Q: Is my content optimized for Bluesky?**
A: Yes. The tool automatically:
- Truncates text to 300 characters (grapheme-aware)
- Optimizes images to Bluesky specs
- Handles video conversion and compression
- Resolves shortened URLs
**Q: How do I run this on a server?**
A: Use the systemd service example in the [Usage](#-usage) section, or set up a cron job.
**Q: Can I schedule posts?**
A: Not directly through this tool. Instead, use cron/scheduler to run the script at desired times.
## 🎯 Use Cases
- **Content Creators**: Automatically repost your RSS feeds to Bluesky for wider reach
- **News Aggregation**: Create Bluesky bots that share news from multiple RSS sources
- **Account Management**: Keep social media accounts synchronized across platforms
- **Content Distribution**: Distribute content from Twitter to Bluesky without manual copying
## 🔐 Security Notes
- **Never commit `.env`**: Credentials are automatically gitignored
- **Secure your state file**: `twitter2bsky_state.json` may contain URLs; protect it like credentials
- **Use app passwords**: For Bluesky, use app-specific passwords instead of main account password
- **Monitor logs**: Regularly review `twitter2bsky.log` for unauthorized access attempts
## 📞 Support
- **Issues**: Open an issue on GitHub with detailed reproduction steps
- **Documentation**: Check this README and inline code comments
- **Logs**: Attach relevant log excerpts when reporting issues
- **Testing**: Test with `--test` flag before running in production
## 📝 Changelog
See Git commit history for detailed changes. Notable versions:
- **v2.0**: Added Twitter scraping with media support, daemon mode
- **v1.5**: Improved RSS parsing and media handling
- **v1.0**: Initial release with basic RSS→Bluesky posting
## Disclaimer ## Disclaimer
This tool is for personal use and automation. Ensure compliance with the terms of service of Bluesky, Twitter, and any RSS sources you use. Respect rate limits and avoid spamming. This tool is for personal use and automation. Ensure compliance with the terms of service of Bluesky, Twitter, and any RSS sources you use. Respect rate limits and avoid spamming.