- Add quick start section with 3-step setup instructions - Include prerequisites and platform compatibility information - Expand credentials configuration with security best practices - Add detailed configuration section with media constraints table - Provide concrete usage examples for RSS, Twitter daemon, and systemd - Include cron job integration examples for scheduling - Add project structure diagram showing all key files and directories - Create extensive troubleshooting section with common issues and solutions - Add debugging guide with log analysis tips - Include FAQ section addressing typical user questions - Document use cases and real-world scenarios - Add security notes for credential management - Improve contributing guidelines with step-by-step workflow - Enhance formatting with emojis, tables, and better organization - Replace vague descriptions with actionable, specific guidance This makes the documentation suitable for both beginner and advanced users while providing clear paths for setup, usage, and troubleshooting. Co-authored-by: Copilot <copilot@github.com>
498 lines
16 KiB
Markdown
498 lines
16 KiB
Markdown
# post2bsky
|
|
|
|
A Python-based automation tool for reposting content to Bluesky from RSS feeds and Twitter accounts. Includes a daemon mode for continuous operation with comprehensive media support, deduplication, and extensive logging.
|
|
|
|
**Note**: This tool is designed for content creators and maintainers who need to automatically synchronize feeds/accounts to Bluesky. Ensure you have permission to repost content and comply with all platform terms of service.
|
|
|
|
## ✨ Features
|
|
|
|
- **RSS → Bluesky**: Parse RSS feeds and automatically post new entries with proper formatting
|
|
- **Twitter → Bluesky**: Scrape tweets from Twitter accounts and repost to Bluesky (with media)
|
|
- **Daemon Mode**: Run continuously as a background service for unattended operation
|
|
- **Media Support**: Handle images, videos, and other media with automatic optimization
|
|
- **Deduplication**: Track posted content to prevent duplicates across runs
|
|
- **Configurable Workflows**: YAML-based pipelines for each source with scheduling
|
|
- **Media Constraints**: Auto-handles Bluesky's limits (300 chars, 4 images, 45MB video, etc.)
|
|
- **Error Recovery**: Automatic retries with exponential backoff for transient failures
|
|
- **Comprehensive Logging**: Detailed logs for monitoring and troubleshooting
|
|
|
|
## 📋 Prerequisites
|
|
|
|
- Python 3.9 or higher
|
|
- macOS, Linux, or Windows with Chromium support (for Twitter scraping)
|
|
- Bluesky account with credentials
|
|
- Twitter account (if using Twitter→Bluesky syncing)
|
|
|
|
## 🚀 Quick Start
|
|
|
|
### 1. Clone & Setup Environment
|
|
|
|
```bash
|
|
git clone <repository-url>
|
|
cd post2bsky
|
|
python3 -m venv venv
|
|
source venv/bin/activate # On Windows: venv\Scripts\activate
|
|
pip install -r requeriments.txt
|
|
```
|
|
|
|
### 2. Configure Credentials
|
|
|
|
Create a `.env` file in the project root:
|
|
|
|
```env
|
|
# Bluesky Authentication
|
|
BSKY_USERNAME=your_bluesky_handle
|
|
BSKY_PASSWORD=your_bluesky_password
|
|
|
|
# Optional: Custom Bluesky instance (default: https://bsky.social)
|
|
BSKY_BASE_URL=https://bsky.social
|
|
|
|
# Twitter Authentication (if using Twitter syncing)
|
|
TWITTER_USERNAME=your_twitter_handle
|
|
TWITTER_PASSWORD=your_twitter_password
|
|
```
|
|
|
|
### 3. Run a Quick Test
|
|
|
|
```bash
|
|
# Test RSS feed posting
|
|
python rss2bsky.py --feed-url https://example.com/rss
|
|
|
|
# Test Twitter account scraping
|
|
python twitter2bsky_daemon.py --test
|
|
```
|
|
|
|
## Installation
|
|
|
|
### Standard Installation
|
|
|
|
1. Clone the repository:
|
|
```bash
|
|
git clone https://github.com/yourusername/post2bsky.git
|
|
cd post2bsky
|
|
```
|
|
|
|
2. Create and activate virtual environment:
|
|
```bash
|
|
python3 -m venv venv
|
|
source venv/bin/activate # macOS/Linux
|
|
# or
|
|
venv\Scripts\activate # Windows
|
|
```
|
|
|
|
3. Install dependencies:
|
|
```bash
|
|
pip install -r requeriments.txt
|
|
```
|
|
|
|
4. Set up environment variables:
|
|
Create a `.env` file in the root directory (see [Credentials](#-credentials) section)
|
|
|
|
## ⚙️ Configuration
|
|
|
|
### Credentials
|
|
|
|
Your credentials should be stored in `.env` file at the project root. This file should never be committed to version control (already in `.gitignore`):
|
|
|
|
```env
|
|
BSKY_USERNAME=your_bluesky_handle
|
|
BSKY_PASSWORD=your_bluesky_password
|
|
|
|
# For Twitter scraping (email or username, and password)
|
|
TWITTER_USERNAME=your_twitter_username_or_email
|
|
TWITTER_PASSWORD=your_twitter_password
|
|
```
|
|
|
|
**Security Note**: Never commit credentials to Git. The `.env` file is automatically ignored.
|
|
|
|
### RSS Feed Configuration
|
|
|
|
Run `rss2bsky.py` to post from RSS feeds:
|
|
|
|
```bash
|
|
# Basic usage
|
|
python rss2bsky.py --feed-url https://example.com/rss --bsky-handle your_handle
|
|
|
|
# With advanced options
|
|
python rss2bsky.py \
|
|
--feed-url https://example.com/rss \
|
|
--bsky-handle your_handle \
|
|
--max-posts 5 \
|
|
--limit-age 3 # Only posts from last 3 days
|
|
```
|
|
|
|
**State Management**: The tool tracks posted entries in `twitter2bsky_state.json` to prevent duplicates. This file is updated automatically on each run.
|
|
|
|
### Twitter Account Configuration
|
|
|
|
Configure Twitter accounts in `twitter2bsky_daemon.py`. The script uses Playwright for browser automation to scrape tweets:
|
|
|
|
```bash
|
|
# Run Twitter daemon
|
|
python twitter2bsky_daemon.py
|
|
|
|
# Run with test mode (dry-run, no posting)
|
|
python twitter2bsky_daemon.py --test
|
|
|
|
# Specify custom state file
|
|
python twitter2bsky_daemon.py --state-file custom_state.json
|
|
```
|
|
|
|
**Twitter Scraping Details**:
|
|
- Uses Playwright Chromium for headless browser automation
|
|
- Handles t.co URL redirects and link metadata
|
|
- Includes screenshot capture for error debugging
|
|
- Automatic retry with exponential backoff on failures
|
|
|
|
### Workflow Pipelines
|
|
|
|
The `workflows/` directory contains YAML pipeline configurations that define:
|
|
- Data source (RSS feed URL or Twitter handle)
|
|
- Posting schedule and frequency
|
|
- Content filtering rules
|
|
- Target Bluesky account
|
|
|
|
Example: `workflows/324.yml` defines the pipeline for the "324" RSS feed.
|
|
|
|
Each workflow typically has a corresponding Jenkins configuration in `jenkins/` for CI/CD integration.
|
|
|
|
**Running Workflows**:
|
|
```bash
|
|
# Manual execution
|
|
./sync_runner.sh
|
|
|
|
# Run specific workflow
|
|
python rss2bsky.py --feed-url $(grep 'url:' workflows/324.yml | head -1 | cut -d' ' -f2)
|
|
```
|
|
|
|
### Media Handling
|
|
|
|
The tool automatically optimizes media for Bluesky's constraints:
|
|
|
|
| Constraint | Value |
|
|
|-----------|-------|
|
|
| Image size limit | 950 KB per image |
|
|
| Image max dimension | 2000px (width or height) |
|
|
| Max images per post | 4 |
|
|
| Video size limit | 45 MB |
|
|
| Video max duration | 3 minutes |
|
|
| Thumbnail size | 950 KB |
|
|
| Text length | 300 characters (grapheme clusters) |
|
|
|
|
Images are automatically converted to JPEG with quality optimization (min 40-45 JPEG quality).
|
|
|
|
## 💻 Usage
|
|
|
|
### RSS to Bluesky (`rss2bsky.py`)
|
|
|
|
Post entries from RSS feeds to Bluesky:
|
|
|
|
```bash
|
|
# Simple usage
|
|
python rss2bsky.py --feed-url https://example.com/feed.xml --bsky-handle @your_handle
|
|
|
|
# Limit to recent posts
|
|
python rss2bsky.py --feed-url https://example.com/feed.xml --limit-age 7
|
|
|
|
# Dry run (preview without posting)
|
|
python rss2bsky.py --feed-url https://example.com/feed.xml --dry-run
|
|
```
|
|
|
|
**Output**: The script logs all actions to `twitter2bsky.log` and maintains state in `twitter2bsky_state.json`.
|
|
|
|
### Twitter to Bluesky (`twitter2bsky_daemon.py`)
|
|
|
|
Run continuously to sync tweets from specified accounts:
|
|
|
|
```bash
|
|
# Start daemon mode (continuous monitoring)
|
|
python twitter2bsky_daemon.py
|
|
|
|
# Run once and exit
|
|
python twitter2bsky_daemon.py --once
|
|
|
|
# Test mode (no actual posts to Bluesky)
|
|
python twitter2bsky_daemon.py --test
|
|
|
|
# Custom configuration
|
|
python twitter2bsky_daemon.py --max-retries 5 --timeout 30
|
|
```
|
|
|
|
**Features**:
|
|
- Automatically fetches new tweets from configured accounts
|
|
- Handles retweets, quotes, and threaded tweets
|
|
- Downloads and optimizes media attachments
|
|
- Resolves shortened t.co links to actual URLs
|
|
- Prevents duplicate posts with state tracking
|
|
|
|
### Running with Sync Runner
|
|
|
|
```bash
|
|
./sync_runner.sh
|
|
```
|
|
|
|
This script can orchestrate multiple sources and is suitable for integration with cron jobs or systemd timers.
|
|
|
|
### Daemon Mode Setup (systemd)
|
|
|
|
To run `twitter2bsky_daemon.py` continuously as a system service on Linux:
|
|
|
|
1. Create service file `/etc/systemd/system/post2bsky.service`:
|
|
```ini
|
|
[Unit]
|
|
Description=post2bsky Twitter to Bluesky Daemon
|
|
After=network.target
|
|
|
|
[Service]
|
|
Type=simple
|
|
User=your_user
|
|
WorkingDirectory=/path/to/post2bsky
|
|
Environment="PATH=/path/to/post2bsky/venv/bin"
|
|
ExecStart=/path/to/post2bsky/venv/bin/python twitter2bsky_daemon.py
|
|
Restart=always
|
|
RestartSec=60
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
```
|
|
|
|
2. Enable and start:
|
|
```bash
|
|
sudo systemctl daemon-reload
|
|
sudo systemctl enable post2bsky
|
|
sudo systemctl start post2bsky
|
|
sudo systemctl status post2bsky
|
|
```
|
|
|
|
3. View logs:
|
|
```bash
|
|
tail -f twitter2bsky.log
|
|
```
|
|
|
|
### Cron Job Integration
|
|
|
|
Add to crontab with `crontab -e`:
|
|
|
|
```bash
|
|
# Run RSS sync every 30 minutes
|
|
*/30 * * * * cd /path/to/post2bsky && source venv/bin/activate && python rss2bsky.py --feed-url https://example.com/rss
|
|
|
|
# Run all workflows at 9 AM daily
|
|
0 9 * * * cd /path/to/post2bsky && ./sync_runner.sh
|
|
```
|
|
|
|
## 📦 Dependencies
|
|
|
|
All Python dependencies are listed in `requeriments.txt`. Key packages:
|
|
|
|
| Package | Purpose |
|
|
|---------|---------|
|
|
| `atproto` | Bluesky API client for posting |
|
|
| `fastfeedparser` | RSS/Atom feed parsing |
|
|
| `playwright` | Browser automation for Twitter scraping |
|
|
| `beautifulsoup4` | HTML parsing and content extraction |
|
|
| `pillow` | Image optimization and processing |
|
|
| `moviepy` | Video processing and duration detection |
|
|
| `grapheme` | Unicode grapheme cluster counting for Bluesky's text limits |
|
|
| `httpx` | HTTP client for URL resolution and media downloads |
|
|
| `python-dotenv` | Environment variable management |
|
|
| `arrow` | Date/time handling with timezone support |
|
|
|
|
Install all dependencies with:
|
|
```bash
|
|
pip install -r requeriments.txt
|
|
```
|
|
|
|
## 📁 Project Structure
|
|
|
|
```
|
|
post2bsky/
|
|
├── rss2bsky.py # RSS feed → Bluesky posting script
|
|
├── twitter2bsky_daemon.py # Twitter → Bluesky daemon (main logic)
|
|
├── twitter_login.py # Twitter authentication helper
|
|
├── cookie_login.py # Alternative login method
|
|
├── sync_runner.sh # Orchestration script for multiple sources
|
|
├── twitter2bsky_state.json # State file tracking posted content (auto-generated)
|
|
├── twitter2bsky.log # Application logs (auto-generated)
|
|
├── requeriments.txt # Python dependencies
|
|
├── README.md # This file
|
|
├── LICENSE # GNU GPLv3 license
|
|
├── jenkins/ # Jenkins CI/CD configurations
|
|
│ └── [account_name]Tw/ # Config for each account
|
|
├── workflows/ # YAML pipeline definitions
|
|
│ ├── 324.yml # Example: RSS feed for "324"
|
|
│ ├── fcbarcelona.yml # Example: Twitter account for FC Barcelona
|
|
│ └── ...
|
|
└── venv/ # Python virtual environment (created during setup)
|
|
```
|
|
|
|
## 🔧 Troubleshooting
|
|
|
|
### Authentication Issues
|
|
|
|
**Problem**: `Login failed: Invalid credentials`
|
|
|
|
**Solution**:
|
|
1. Verify credentials in `.env` are correct (no extra spaces)
|
|
2. Check if Bluesky account requires app password (Settings → App passwords)
|
|
3. If using 2FA, generate an app-specific password
|
|
4. For Twitter, ensure account isn't rate-limited or restricted
|
|
|
|
### Twitter Scraping Issues
|
|
|
|
**Problem**: `Playwright browser failed` or screenshot errors
|
|
|
|
**Solution**:
|
|
1. Ensure Chromium is properly installed: `playwright install chromium`
|
|
2. Check available disk space (Playwright requires ~500MB)
|
|
3. Run script with `--debug` flag for detailed output
|
|
4. Check browser error screenshots in `screenshot_*.png` files
|
|
|
|
**Problem**: `No tweets found` or `Tweets already posted`
|
|
|
|
**Solution**:
|
|
1. Verify Twitter account handle is correct in configuration
|
|
2. Check `twitter2bsky_state.json` for deduplication data
|
|
3. Delete state file to reset tracking (careful: may cause re-posting)
|
|
4. Review `twitter2bsky.log` for detailed debugging
|
|
|
|
### Media Processing Issues
|
|
|
|
**Problem**: `Image upload failed` or `Video too large`
|
|
|
|
**Solution**:
|
|
1. Images are auto-optimized, but source should be <100MB
|
|
2. Videos must be <45MB and <3 minutes
|
|
3. Check available disk space for temporary files
|
|
4. Enable debug logging in the script for detailed info
|
|
|
|
### Performance Issues
|
|
|
|
**Problem**: Script runs slowly or times out
|
|
|
|
**Solution**:
|
|
1. Check network connectivity
|
|
2. Reduce `SCRAPE_TWEET_LIMIT` in `twitter2bsky_daemon.py` (default: 30)
|
|
3. Increase timeout constants if on slow connection
|
|
4. Run with `--once` instead of daemon mode to diagnose
|
|
5. Check system resources (CPU, memory, disk I/O)
|
|
|
|
### Log Analysis
|
|
|
|
Check `twitter2bsky.log` for detailed debugging:
|
|
|
|
```bash
|
|
# View recent errors
|
|
grep ERROR twitter2bsky.log | tail -20
|
|
|
|
# View all warnings
|
|
grep WARNING twitter2bsky.log | tail -20
|
|
|
|
# Watch logs in real-time
|
|
tail -f twitter2bsky.log
|
|
|
|
# Count posts by status
|
|
grep -c "✅ Posted to Bluesky" twitter2bsky.log
|
|
```
|
|
|
|
## 🐛 Debugging
|
|
|
|
Enable debug logging by modifying the logging level in the script:
|
|
|
|
```python
|
|
# In twitter2bsky_daemon.py, change:
|
|
level=logging.INFO,
|
|
# To:
|
|
level=logging.DEBUG,
|
|
```
|
|
|
|
Run with verbose output:
|
|
```bash
|
|
python twitter2bsky_daemon.py 2>&1 | tee debug.log
|
|
```
|
|
|
|
Error screenshots are automatically saved as `screenshot_YYYYMMDD_HHMMSS.png` for investigation.
|
|
|
|
## 📄 License
|
|
|
|
This project is licensed under the GNU General Public License v3.0. See [LICENSE](LICENSE) for details.
|
|
|
|
**Summary**: You are free to use, modify, and distribute this software, but any modifications must also be open-source under GPLv3.
|
|
|
|
## 🤝 Contributing
|
|
|
|
Contributions are welcome! To contribute:
|
|
|
|
1. Fork the repository
|
|
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
|
|
3. Make your changes with clear commit messages
|
|
4. Push to your fork (`git push origin feature/amazing-feature`)
|
|
5. Open a Pull Request with a description of your changes
|
|
|
|
**Before submitting**:
|
|
- Test your changes thoroughly
|
|
- Ensure code follows existing style conventions
|
|
- Add comments for complex logic
|
|
- Update README if adding new features
|
|
|
|
## ❓ FAQ
|
|
|
|
**Q: Can I use this on Windows?**
|
|
A: Yes, but ensure you have Python 3.9+ and Chromium/Playwright support. Use `venv\Scripts\activate` instead of `source venv/bin/activate`.
|
|
|
|
**Q: How do I avoid posting duplicates?**
|
|
A: The state file (`twitter2bsky_state.json`) tracks all posted content. It's automatically maintained; just don't delete it between runs.
|
|
|
|
**Q: Can I post to multiple Bluesky accounts?**
|
|
A: Currently, the tool posts to one account per instance. Run multiple instances with different `.env` configurations to handle multiple accounts.
|
|
|
|
**Q: What happens if posting fails?**
|
|
A: The script has automatic retry logic with exponential backoff. Failed posts are logged but the state file is NOT updated, so retries on next run.
|
|
|
|
**Q: Is my content optimized for Bluesky?**
|
|
A: Yes. The tool automatically:
|
|
- Truncates text to 300 characters (grapheme-aware)
|
|
- Optimizes images to Bluesky specs
|
|
- Handles video conversion and compression
|
|
- Resolves shortened URLs
|
|
|
|
**Q: How do I run this on a server?**
|
|
A: Use the systemd service example in the [Usage](#-usage) section, or set up a cron job.
|
|
|
|
**Q: Can I schedule posts?**
|
|
A: Not directly through this tool. Instead, use cron/scheduler to run the script at desired times.
|
|
|
|
## 🎯 Use Cases
|
|
|
|
- **Content Creators**: Automatically repost your RSS feeds to Bluesky for wider reach
|
|
- **News Aggregation**: Create Bluesky bots that share news from multiple RSS sources
|
|
- **Account Management**: Keep social media accounts synchronized across platforms
|
|
- **Content Distribution**: Distribute content from Twitter to Bluesky without manual copying
|
|
|
|
## 🔐 Security Notes
|
|
|
|
- **Never commit `.env`**: Credentials are automatically gitignored
|
|
- **Secure your state file**: `twitter2bsky_state.json` may contain URLs; protect it like credentials
|
|
- **Use app passwords**: For Bluesky, use app-specific passwords instead of main account password
|
|
- **Monitor logs**: Regularly review `twitter2bsky.log` for unauthorized access attempts
|
|
|
|
## 📞 Support
|
|
|
|
- **Issues**: Open an issue on GitHub with detailed reproduction steps
|
|
- **Documentation**: Check this README and inline code comments
|
|
- **Logs**: Attach relevant log excerpts when reporting issues
|
|
- **Testing**: Test with `--test` flag before running in production
|
|
|
|
## 📝 Changelog
|
|
|
|
See Git commit history for detailed changes. Notable versions:
|
|
|
|
- **v2.0**: Added Twitter scraping with media support, daemon mode
|
|
- **v1.5**: Improved RSS parsing and media handling
|
|
- **v1.0**: Initial release with basic RSS→Bluesky posting
|
|
|
|
## Disclaimer
|
|
|
|
This tool is for personal use and automation. Ensure compliance with the terms of service of Bluesky, Twitter, and any RSS sources you use. Respect rate limits and avoid spamming.
|
|
|