# post2bsky A Python-based automation tool for reposting content to Bluesky from RSS feeds and Twitter accounts. Includes a daemon mode for continuous operation with comprehensive media support, deduplication, and extensive logging. **Note**: This tool is designed for content creators and maintainers who need to automatically synchronize feeds/accounts to Bluesky. Ensure you have permission to repost content and comply with all platform terms of service. ## ✨ Features - **RSS β†’ Bluesky**: Parse RSS feeds and automatically post new entries with proper formatting - **Twitter β†’ Bluesky**: Scrape tweets from Twitter accounts and repost to Bluesky (with media) - **Daemon Mode**: Run continuously as a background service for unattended operation - **Media Support**: Handle images, videos, and other media with automatic optimization - **Deduplication**: Track posted content to prevent duplicates across runs - **Configurable Workflows**: YAML-based pipelines for each source with scheduling - **Media Constraints**: Auto-handles Bluesky's limits (300 chars, 4 images, 45MB video, etc.) - **Error Recovery**: Automatic retries with exponential backoff for transient failures - **Comprehensive Logging**: Detailed logs for monitoring and troubleshooting ## πŸ“‹ Prerequisites - Python 3.9 or higher - macOS, Linux, or Windows with Chromium support (for Twitter scraping) - Bluesky account with credentials - Twitter account (if using Twitterβ†’Bluesky syncing) ## πŸš€ Quick Start ### 1. Clone & Setup Environment ```bash git clone cd post2bsky python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requeriments.txt ``` ### 2. Configure Credentials Create a `.env` file in the project root: ```env # Bluesky Authentication BSKY_USERNAME=your_bluesky_handle BSKY_PASSWORD=your_bluesky_password # Optional: Custom Bluesky instance (default: https://bsky.social) BSKY_BASE_URL=https://bsky.social # Twitter Authentication (if using Twitter syncing) TWITTER_USERNAME=your_twitter_handle TWITTER_PASSWORD=your_twitter_password ``` ### 3. Run a Quick Test ```bash # Test RSS feed posting python rss2bsky.py --feed-url https://example.com/rss # Test Twitter account scraping python twitter2bsky_daemon.py --test ``` ## Installation ### Standard Installation 1. Clone the repository: ```bash git clone https://github.com/yourusername/post2bsky.git cd post2bsky ``` 2. Create and activate virtual environment: ```bash python3 -m venv venv source venv/bin/activate # macOS/Linux # or venv\Scripts\activate # Windows ``` 3. Install dependencies: ```bash pip install -r requeriments.txt ``` 4. Set up environment variables: Create a `.env` file in the root directory (see [Credentials](#-credentials) section) ## βš™οΈ Configuration ### Credentials Your credentials should be stored in `.env` file at the project root. This file should never be committed to version control (already in `.gitignore`): ```env BSKY_USERNAME=your_bluesky_handle BSKY_PASSWORD=your_bluesky_password # For Twitter scraping (email or username, and password) TWITTER_USERNAME=your_twitter_username_or_email TWITTER_PASSWORD=your_twitter_password ``` **Security Note**: Never commit credentials to Git. The `.env` file is automatically ignored. ### RSS Feed Configuration Run `rss2bsky.py` to post from RSS feeds: ```bash # Basic usage python rss2bsky.py --feed-url https://example.com/rss --bsky-handle your_handle # With advanced options python rss2bsky.py \ --feed-url https://example.com/rss \ --bsky-handle your_handle \ --max-posts 5 \ --limit-age 3 # Only posts from last 3 days ``` **State Management**: The tool tracks posted entries in `twitter2bsky_state.json` to prevent duplicates. This file is updated automatically on each run. ### Twitter Account Configuration Configure Twitter accounts in `twitter2bsky_daemon.py`. The script uses Playwright for browser automation to scrape tweets: ```bash # Run Twitter daemon python twitter2bsky_daemon.py # Run with test mode (dry-run, no posting) python twitter2bsky_daemon.py --test # Specify custom state file python twitter2bsky_daemon.py --state-file custom_state.json ``` **Twitter Scraping Details**: - Uses Playwright Chromium for headless browser automation - Handles t.co URL redirects and link metadata - Includes screenshot capture for error debugging - Automatic retry with exponential backoff on failures ### Workflow Pipelines The `workflows/` directory contains YAML pipeline configurations that define: - Data source (RSS feed URL or Twitter handle) - Posting schedule and frequency - Content filtering rules - Target Bluesky account Example: `workflows/324.yml` defines the pipeline for the "324" RSS feed. Each workflow typically has a corresponding Jenkins configuration in `jenkins/` for CI/CD integration. **Running Workflows**: ```bash # Manual execution ./sync_runner.sh # Run specific workflow python rss2bsky.py --feed-url $(grep 'url:' workflows/324.yml | head -1 | cut -d' ' -f2) ``` ### Media Handling The tool automatically optimizes media for Bluesky's constraints: | Constraint | Value | |-----------|-------| | Image size limit | 950 KB per image | | Image max dimension | 2000px (width or height) | | Max images per post | 4 | | Video size limit | 45 MB | | Video max duration | 3 minutes | | Thumbnail size | 950 KB | | Text length | 300 characters (grapheme clusters) | Images are automatically converted to JPEG with quality optimization (min 40-45 JPEG quality). ## πŸ’» Usage ### RSS to Bluesky (`rss2bsky.py`) Post entries from RSS feeds to Bluesky: ```bash # Simple usage python rss2bsky.py --feed-url https://example.com/feed.xml --bsky-handle @your_handle # Limit to recent posts python rss2bsky.py --feed-url https://example.com/feed.xml --limit-age 7 # Dry run (preview without posting) python rss2bsky.py --feed-url https://example.com/feed.xml --dry-run ``` **Output**: The script logs all actions to `twitter2bsky.log` and maintains state in `twitter2bsky_state.json`. ### Twitter to Bluesky (`twitter2bsky_daemon.py`) Run continuously to sync tweets from specified accounts: ```bash # Start daemon mode (continuous monitoring) python twitter2bsky_daemon.py # Run once and exit python twitter2bsky_daemon.py --once # Test mode (no actual posts to Bluesky) python twitter2bsky_daemon.py --test # Custom configuration python twitter2bsky_daemon.py --max-retries 5 --timeout 30 ``` **Features**: - Automatically fetches new tweets from configured accounts - Handles retweets, quotes, and threaded tweets - Downloads and optimizes media attachments - Resolves shortened t.co links to actual URLs - Prevents duplicate posts with state tracking ### Running with Sync Runner ```bash ./sync_runner.sh ``` This script can orchestrate multiple sources and is suitable for integration with cron jobs or systemd timers. ### Daemon Mode Setup (systemd) To run `twitter2bsky_daemon.py` continuously as a system service on Linux: 1. Create service file `/etc/systemd/system/post2bsky.service`: ```ini [Unit] Description=post2bsky Twitter to Bluesky Daemon After=network.target [Service] Type=simple User=your_user WorkingDirectory=/path/to/post2bsky Environment="PATH=/path/to/post2bsky/venv/bin" ExecStart=/path/to/post2bsky/venv/bin/python twitter2bsky_daemon.py Restart=always RestartSec=60 [Install] WantedBy=multi-user.target ``` 2. Enable and start: ```bash sudo systemctl daemon-reload sudo systemctl enable post2bsky sudo systemctl start post2bsky sudo systemctl status post2bsky ``` 3. View logs: ```bash tail -f twitter2bsky.log ``` ### Cron Job Integration Add to crontab with `crontab -e`: ```bash # Run RSS sync every 30 minutes */30 * * * * cd /path/to/post2bsky && source venv/bin/activate && python rss2bsky.py --feed-url https://example.com/rss # Run all workflows at 9 AM daily 0 9 * * * cd /path/to/post2bsky && ./sync_runner.sh ``` ## πŸ“¦ Dependencies All Python dependencies are listed in `requeriments.txt`. Key packages: | Package | Purpose | |---------|---------| | `atproto` | Bluesky API client for posting | | `fastfeedparser` | RSS/Atom feed parsing | | `playwright` | Browser automation for Twitter scraping | | `beautifulsoup4` | HTML parsing and content extraction | | `pillow` | Image optimization and processing | | `moviepy` | Video processing and duration detection | | `grapheme` | Unicode grapheme cluster counting for Bluesky's text limits | | `httpx` | HTTP client for URL resolution and media downloads | | `python-dotenv` | Environment variable management | | `arrow` | Date/time handling with timezone support | Install all dependencies with: ```bash pip install -r requeriments.txt ``` ## πŸ“ Project Structure ``` post2bsky/ β”œβ”€β”€ rss2bsky.py # RSS feed β†’ Bluesky posting script β”œβ”€β”€ twitter2bsky_daemon.py # Twitter β†’ Bluesky daemon (main logic) β”œβ”€β”€ twitter_login.py # Twitter authentication helper β”œβ”€β”€ cookie_login.py # Alternative login method β”œβ”€β”€ sync_runner.sh # Orchestration script for multiple sources β”œβ”€β”€ twitter2bsky_state.json # State file tracking posted content (auto-generated) β”œβ”€β”€ twitter2bsky.log # Application logs (auto-generated) β”œβ”€β”€ requeriments.txt # Python dependencies β”œβ”€β”€ README.md # This file β”œβ”€β”€ LICENSE # GNU GPLv3 license β”œβ”€β”€ jenkins/ # Jenkins CI/CD configurations β”‚ └── [account_name]Tw/ # Config for each account β”œβ”€β”€ workflows/ # YAML pipeline definitions β”‚ β”œβ”€β”€ 324.yml # Example: RSS feed for "324" β”‚ β”œβ”€β”€ fcbarcelona.yml # Example: Twitter account for FC Barcelona β”‚ └── ... └── venv/ # Python virtual environment (created during setup) ``` ## πŸ”§ Troubleshooting ### Authentication Issues **Problem**: `Login failed: Invalid credentials` **Solution**: 1. Verify credentials in `.env` are correct (no extra spaces) 2. Check if Bluesky account requires app password (Settings β†’ App passwords) 3. If using 2FA, generate an app-specific password 4. For Twitter, ensure account isn't rate-limited or restricted ### Twitter Scraping Issues **Problem**: `Playwright browser failed` or screenshot errors **Solution**: 1. Ensure Chromium is properly installed: `playwright install chromium` 2. Check available disk space (Playwright requires ~500MB) 3. Run script with `--debug` flag for detailed output 4. Check browser error screenshots in `screenshot_*.png` files **Problem**: `No tweets found` or `Tweets already posted` **Solution**: 1. Verify Twitter account handle is correct in configuration 2. Check `twitter2bsky_state.json` for deduplication data 3. Delete state file to reset tracking (careful: may cause re-posting) 4. Review `twitter2bsky.log` for detailed debugging ### Media Processing Issues **Problem**: `Image upload failed` or `Video too large` **Solution**: 1. Images are auto-optimized, but source should be <100MB 2. Videos must be <45MB and <3 minutes 3. Check available disk space for temporary files 4. Enable debug logging in the script for detailed info ### Performance Issues **Problem**: Script runs slowly or times out **Solution**: 1. Check network connectivity 2. Reduce `SCRAPE_TWEET_LIMIT` in `twitter2bsky_daemon.py` (default: 30) 3. Increase timeout constants if on slow connection 4. Run with `--once` instead of daemon mode to diagnose 5. Check system resources (CPU, memory, disk I/O) ### Log Analysis Check `twitter2bsky.log` for detailed debugging: ```bash # View recent errors grep ERROR twitter2bsky.log | tail -20 # View all warnings grep WARNING twitter2bsky.log | tail -20 # Watch logs in real-time tail -f twitter2bsky.log # Count posts by status grep -c "βœ… Posted to Bluesky" twitter2bsky.log ``` ## πŸ› Debugging Enable debug logging by modifying the logging level in the script: ```python # In twitter2bsky_daemon.py, change: level=logging.INFO, # To: level=logging.DEBUG, ``` Run with verbose output: ```bash python twitter2bsky_daemon.py 2>&1 | tee debug.log ``` Error screenshots are automatically saved as `screenshot_YYYYMMDD_HHMMSS.png` for investigation. ## πŸ“„ License This project is licensed under the GNU General Public License v3.0. See [LICENSE](LICENSE) for details. **Summary**: You are free to use, modify, and distribute this software, but any modifications must also be open-source under GPLv3. ## 🀝 Contributing Contributions are welcome! To contribute: 1. Fork the repository 2. Create a feature branch (`git checkout -b feature/amazing-feature`) 3. Make your changes with clear commit messages 4. Push to your fork (`git push origin feature/amazing-feature`) 5. Open a Pull Request with a description of your changes **Before submitting**: - Test your changes thoroughly - Ensure code follows existing style conventions - Add comments for complex logic - Update README if adding new features ## ❓ FAQ **Q: Can I use this on Windows?** A: Yes, but ensure you have Python 3.9+ and Chromium/Playwright support. Use `venv\Scripts\activate` instead of `source venv/bin/activate`. **Q: How do I avoid posting duplicates?** A: The state file (`twitter2bsky_state.json`) tracks all posted content. It's automatically maintained; just don't delete it between runs. **Q: Can I post to multiple Bluesky accounts?** A: Currently, the tool posts to one account per instance. Run multiple instances with different `.env` configurations to handle multiple accounts. **Q: What happens if posting fails?** A: The script has automatic retry logic with exponential backoff. Failed posts are logged but the state file is NOT updated, so retries on next run. **Q: Is my content optimized for Bluesky?** A: Yes. The tool automatically: - Truncates text to 300 characters (grapheme-aware) - Optimizes images to Bluesky specs - Handles video conversion and compression - Resolves shortened URLs **Q: How do I run this on a server?** A: Use the systemd service example in the [Usage](#-usage) section, or set up a cron job. **Q: Can I schedule posts?** A: Not directly through this tool. Instead, use cron/scheduler to run the script at desired times. ## 🎯 Use Cases - **Content Creators**: Automatically repost your RSS feeds to Bluesky for wider reach - **News Aggregation**: Create Bluesky bots that share news from multiple RSS sources - **Account Management**: Keep social media accounts synchronized across platforms - **Content Distribution**: Distribute content from Twitter to Bluesky without manual copying ## πŸ” Security Notes - **Never commit `.env`**: Credentials are automatically gitignored - **Secure your state file**: `twitter2bsky_state.json` may contain URLs; protect it like credentials - **Use app passwords**: For Bluesky, use app-specific passwords instead of main account password - **Monitor logs**: Regularly review `twitter2bsky.log` for unauthorized access attempts ## πŸ“ž Support - **Issues**: Open an issue on GitHub with detailed reproduction steps - **Documentation**: Check this README and inline code comments - **Logs**: Attach relevant log excerpts when reporting issues - **Testing**: Test with `--test` flag before running in production ## πŸ“ Changelog See Git commit history for detailed changes. Notable versions: - **v2.0**: Added Twitter scraping with media support, daemon mode - **v1.5**: Improved RSS parsing and media handling - **v1.0**: Initial release with basic RSSβ†’Bluesky posting ## Disclaimer This tool is for personal use and automation. Ensure compliance with the terms of service of Bluesky, Twitter, and any RSS sources you use. Respect rate limits and avoid spamming.