diff --git a/README.md b/README.md index 59200d3..7c37631 100644 --- a/README.md +++ b/README.md @@ -1,104 +1,497 @@ # post2bsky -post2bsky is a Python-based tool for automatically posting content from RSS feeds and Twitter accounts to Bluesky (AT Protocol). It supports both RSS-to-Bluesky and Twitter-to-Bluesky synchronization, with configurable workflows for various sources. +A Python-based automation tool for reposting content to Bluesky from RSS feeds and Twitter accounts. Includes a daemon mode for continuous operation with comprehensive media support, deduplication, and extensive logging. -## Features +**Note**: This tool is designed for content creators and maintainers who need to automatically synchronize feeds/accounts to Bluesky. Ensure you have permission to repost content and comply with all platform terms of service. -- **RSS to Bluesky**: Parse RSS feeds and post new entries to Bluesky with proper formatting and media handling. -- **Twitter to Bluesky**: Scrape tweets from specified Twitter accounts and repost them to Bluesky, including media attachments. -- **Daemon Mode**: Run as a background service for continuous monitoring and posting. -- **Configurable Workflows**: Use YAML-based workflows to define sources, schedules, and posting rules. -- **Media Support**: Handle images, videos, and other media from feeds and tweets. -- **Deduplication**: Prevent duplicate posts using state tracking. -- **Logging**: Comprehensive logging for monitoring and debugging. +## ✨ Features + +- **RSS β†’ Bluesky**: Parse RSS feeds and automatically post new entries with proper formatting +- **Twitter β†’ Bluesky**: Scrape tweets from Twitter accounts and repost to Bluesky (with media) +- **Daemon Mode**: Run continuously as a background service for unattended operation +- **Media Support**: Handle images, videos, and other media with automatic optimization +- **Deduplication**: Track posted content to prevent duplicates across runs +- **Configurable Workflows**: YAML-based pipelines for each source with scheduling +- **Media Constraints**: Auto-handles Bluesky's limits (300 chars, 4 images, 45MB video, etc.) +- **Error Recovery**: Automatic retries with exponential backoff for transient failures +- **Comprehensive Logging**: Detailed logs for monitoring and troubleshooting + +## πŸ“‹ Prerequisites + +- Python 3.9 or higher +- macOS, Linux, or Windows with Chromium support (for Twitter scraping) +- Bluesky account with credentials +- Twitter account (if using Twitterβ†’Bluesky syncing) + +## πŸš€ Quick Start + +### 1. Clone & Setup Environment + +```bash +git clone +cd post2bsky +python3 -m venv venv +source venv/bin/activate # On Windows: venv\Scripts\activate +pip install -r requeriments.txt +``` + +### 2. Configure Credentials + +Create a `.env` file in the project root: + +```env +# Bluesky Authentication +BSKY_USERNAME=your_bluesky_handle +BSKY_PASSWORD=your_bluesky_password + +# Optional: Custom Bluesky instance (default: https://bsky.social) +BSKY_BASE_URL=https://bsky.social + +# Twitter Authentication (if using Twitter syncing) +TWITTER_USERNAME=your_twitter_handle +TWITTER_PASSWORD=your_twitter_password +``` + +### 3. Run a Quick Test + +```bash +# Test RSS feed posting +python rss2bsky.py --feed-url https://example.com/rss + +# Test Twitter account scraping +python twitter2bsky_daemon.py --test +``` ## Installation +### Standard Installation + 1. Clone the repository: ```bash git clone https://github.com/yourusername/post2bsky.git cd post2bsky ``` -2. Install Python dependencies: +2. Create and activate virtual environment: + ```bash + python3 -m venv venv + source venv/bin/activate # macOS/Linux + # or + venv\Scripts\activate # Windows + ``` + +3. Install dependencies: ```bash pip install -r requeriments.txt ``` -3. Set up environment variables: - Create a `.env` file with your Bluesky credentials: - ``` - BSKY_USERNAME=your_bluesky_handle - BSKY_PASSWORD=your_bluesky_password - ``` +4. Set up environment variables: + Create a `.env` file in the root directory (see [Credentials](#-credentials) section) - For Twitter scraping, additional setup may be required (see Configuration). +## βš™οΈ Configuration -## Configuration +### Credentials -### RSS Feeds -Use `rss2bsky.py` to post from RSS feeds. Configure the feed URL and other options via command-line arguments. +Your credentials should be stored in `.env` file at the project root. This file should never be committed to version control (already in `.gitignore`): + +```env +BSKY_USERNAME=your_bluesky_handle +BSKY_PASSWORD=your_bluesky_password + +# For Twitter scraping (email or username, and password) +TWITTER_USERNAME=your_twitter_username_or_email +TWITTER_PASSWORD=your_twitter_password +``` + +**Security Note**: Never commit credentials to Git. The `.env` file is automatically ignored. + +### RSS Feed Configuration + +Run `rss2bsky.py` to post from RSS feeds: -Example: ```bash +# Basic usage python rss2bsky.py --feed-url https://example.com/rss --bsky-handle your_handle + +# With advanced options +python rss2bsky.py \ + --feed-url https://example.com/rss \ + --bsky-handle your_handle \ + --max-posts 5 \ + --limit-age 3 # Only posts from last 3 days ``` -### Twitter Accounts -Use `twitter2bsky_daemon.py` for Twitter-to-Bluesky posting. It requires browser automation for scraping. +**State Management**: The tool tracks posted entries in `twitter2bsky_state.json` to prevent duplicates. This file is updated automatically on each run. -Configure Twitter accounts in the script or via environment variables. +### Twitter Account Configuration -### Workflows -The `workflows/` directory contains Jenkins pipeline configurations for automated runs. Each `.yml` file defines a pipeline for a specific source (e.g., `324.yml` for 324 RSS feed). +Configure Twitter accounts in `twitter2bsky_daemon.py`. The script uses Playwright for browser automation to scrape tweets: -To run a workflow manually, use the `sync_runner.sh` script or execute the Python scripts directly. - -## Usage - -### Running RSS Sync ```bash -python rss2bsky.py [options] +# Run Twitter daemon +python twitter2bsky_daemon.py + +# Run with test mode (dry-run, no posting) +python twitter2bsky_daemon.py --test + +# Specify custom state file +python twitter2bsky_daemon.py --state-file custom_state.json ``` -Options: -- `--feed-url`: URL of the RSS feed -- `--bsky-handle`: Your Bluesky handle -- Other options for filtering, formatting, etc. +**Twitter Scraping Details**: +- Uses Playwright Chromium for headless browser automation +- Handles t.co URL redirects and link metadata +- Includes screenshot capture for error debugging +- Automatic retry with exponential backoff on failures -### Running Twitter Daemon +### Workflow Pipelines + +The `workflows/` directory contains YAML pipeline configurations that define: +- Data source (RSS feed URL or Twitter handle) +- Posting schedule and frequency +- Content filtering rules +- Target Bluesky account + +Example: `workflows/324.yml` defines the pipeline for the "324" RSS feed. + +Each workflow typically has a corresponding Jenkins configuration in `jenkins/` for CI/CD integration. + +**Running Workflows**: ```bash -python twitter2bsky_daemon.py [options] +# Manual execution +./sync_runner.sh + +# Run specific workflow +python rss2bsky.py --feed-url $(grep 'url:' workflows/324.yml | head -1 | cut -d' ' -f2) ``` -Options: -- Configure Twitter accounts and Bluesky credentials -- Run in daemon mode for continuous operation +### Media Handling + +The tool automatically optimizes media for Bluesky's constraints: + +| Constraint | Value | +|-----------|-------| +| Image size limit | 950 KB per image | +| Image max dimension | 2000px (width or height) | +| Max images per post | 4 | +| Video size limit | 45 MB | +| Video max duration | 3 minutes | +| Thumbnail size | 950 KB | +| Text length | 300 characters (grapheme clusters) | + +Images are automatically converted to JPEG with quality optimization (min 40-45 JPEG quality). + +## πŸ’» Usage + +### RSS to Bluesky (`rss2bsky.py`) + +Post entries from RSS feeds to Bluesky: + +```bash +# Simple usage +python rss2bsky.py --feed-url https://example.com/feed.xml --bsky-handle @your_handle + +# Limit to recent posts +python rss2bsky.py --feed-url https://example.com/feed.xml --limit-age 7 + +# Dry run (preview without posting) +python rss2bsky.py --feed-url https://example.com/feed.xml --dry-run +``` + +**Output**: The script logs all actions to `twitter2bsky.log` and maintains state in `twitter2bsky_state.json`. + +### Twitter to Bluesky (`twitter2bsky_daemon.py`) + +Run continuously to sync tweets from specified accounts: + +```bash +# Start daemon mode (continuous monitoring) +python twitter2bsky_daemon.py + +# Run once and exit +python twitter2bsky_daemon.py --once + +# Test mode (no actual posts to Bluesky) +python twitter2bsky_daemon.py --test + +# Custom configuration +python twitter2bsky_daemon.py --max-retries 5 --timeout 30 +``` + +**Features**: +- Automatically fetches new tweets from configured accounts +- Handles retweets, quotes, and threaded tweets +- Downloads and optimizes media attachments +- Resolves shortened t.co links to actual URLs +- Prevents duplicate posts with state tracking + +### Running with Sync Runner -### Using Sync Runner ```bash ./sync_runner.sh ``` -This script can be used to run multiple syncs or integrate with cron jobs. +This script can orchestrate multiple sources and is suitable for integration with cron jobs or systemd timers. -## Dependencies +### Daemon Mode Setup (systemd) -All Python dependencies are listed in `requeriments.txt`. Key packages include: -- `atproto`: For Bluesky API interaction -- `fastfeedparser`: For RSS parsing -- `playwright`: For browser automation (Twitter scraping) -- `beautifulsoup4`: For HTML parsing -- And many others for media processing, logging, etc. +To run `twitter2bsky_daemon.py` continuously as a system service on Linux: -## License +1. Create service file `/etc/systemd/system/post2bsky.service`: + ```ini + [Unit] + Description=post2bsky Twitter to Bluesky Daemon + After=network.target + + [Service] + Type=simple + User=your_user + WorkingDirectory=/path/to/post2bsky + Environment="PATH=/path/to/post2bsky/venv/bin" + ExecStart=/path/to/post2bsky/venv/bin/python twitter2bsky_daemon.py + Restart=always + RestartSec=60 + + [Install] + WantedBy=multi-user.target + ``` + +2. Enable and start: + ```bash + sudo systemctl daemon-reload + sudo systemctl enable post2bsky + sudo systemctl start post2bsky + sudo systemctl status post2bsky + ``` + +3. View logs: + ```bash + tail -f twitter2bsky.log + ``` + +### Cron Job Integration + +Add to crontab with `crontab -e`: + +```bash +# Run RSS sync every 30 minutes +*/30 * * * * cd /path/to/post2bsky && source venv/bin/activate && python rss2bsky.py --feed-url https://example.com/rss + +# Run all workflows at 9 AM daily +0 9 * * * cd /path/to/post2bsky && ./sync_runner.sh +``` + +## πŸ“¦ Dependencies + +All Python dependencies are listed in `requeriments.txt`. Key packages: + +| Package | Purpose | +|---------|---------| +| `atproto` | Bluesky API client for posting | +| `fastfeedparser` | RSS/Atom feed parsing | +| `playwright` | Browser automation for Twitter scraping | +| `beautifulsoup4` | HTML parsing and content extraction | +| `pillow` | Image optimization and processing | +| `moviepy` | Video processing and duration detection | +| `grapheme` | Unicode grapheme cluster counting for Bluesky's text limits | +| `httpx` | HTTP client for URL resolution and media downloads | +| `python-dotenv` | Environment variable management | +| `arrow` | Date/time handling with timezone support | + +Install all dependencies with: +```bash +pip install -r requeriments.txt +``` + +## πŸ“ Project Structure + +``` +post2bsky/ +β”œβ”€β”€ rss2bsky.py # RSS feed β†’ Bluesky posting script +β”œβ”€β”€ twitter2bsky_daemon.py # Twitter β†’ Bluesky daemon (main logic) +β”œβ”€β”€ twitter_login.py # Twitter authentication helper +β”œβ”€β”€ cookie_login.py # Alternative login method +β”œβ”€β”€ sync_runner.sh # Orchestration script for multiple sources +β”œβ”€β”€ twitter2bsky_state.json # State file tracking posted content (auto-generated) +β”œβ”€β”€ twitter2bsky.log # Application logs (auto-generated) +β”œβ”€β”€ requeriments.txt # Python dependencies +β”œβ”€β”€ README.md # This file +β”œβ”€β”€ LICENSE # GNU GPLv3 license +β”œβ”€β”€ jenkins/ # Jenkins CI/CD configurations +β”‚ └── [account_name]Tw/ # Config for each account +β”œβ”€β”€ workflows/ # YAML pipeline definitions +β”‚ β”œβ”€β”€ 324.yml # Example: RSS feed for "324" +β”‚ β”œβ”€β”€ fcbarcelona.yml # Example: Twitter account for FC Barcelona +β”‚ └── ... +└── venv/ # Python virtual environment (created during setup) +``` + +## πŸ”§ Troubleshooting + +### Authentication Issues + +**Problem**: `Login failed: Invalid credentials` + +**Solution**: +1. Verify credentials in `.env` are correct (no extra spaces) +2. Check if Bluesky account requires app password (Settings β†’ App passwords) +3. If using 2FA, generate an app-specific password +4. For Twitter, ensure account isn't rate-limited or restricted + +### Twitter Scraping Issues + +**Problem**: `Playwright browser failed` or screenshot errors + +**Solution**: +1. Ensure Chromium is properly installed: `playwright install chromium` +2. Check available disk space (Playwright requires ~500MB) +3. Run script with `--debug` flag for detailed output +4. Check browser error screenshots in `screenshot_*.png` files + +**Problem**: `No tweets found` or `Tweets already posted` + +**Solution**: +1. Verify Twitter account handle is correct in configuration +2. Check `twitter2bsky_state.json` for deduplication data +3. Delete state file to reset tracking (careful: may cause re-posting) +4. Review `twitter2bsky.log` for detailed debugging + +### Media Processing Issues + +**Problem**: `Image upload failed` or `Video too large` + +**Solution**: +1. Images are auto-optimized, but source should be <100MB +2. Videos must be <45MB and <3 minutes +3. Check available disk space for temporary files +4. Enable debug logging in the script for detailed info + +### Performance Issues + +**Problem**: Script runs slowly or times out + +**Solution**: +1. Check network connectivity +2. Reduce `SCRAPE_TWEET_LIMIT` in `twitter2bsky_daemon.py` (default: 30) +3. Increase timeout constants if on slow connection +4. Run with `--once` instead of daemon mode to diagnose +5. Check system resources (CPU, memory, disk I/O) + +### Log Analysis + +Check `twitter2bsky.log` for detailed debugging: + +```bash +# View recent errors +grep ERROR twitter2bsky.log | tail -20 + +# View all warnings +grep WARNING twitter2bsky.log | tail -20 + +# Watch logs in real-time +tail -f twitter2bsky.log + +# Count posts by status +grep -c "βœ… Posted to Bluesky" twitter2bsky.log +``` + +## πŸ› Debugging + +Enable debug logging by modifying the logging level in the script: + +```python +# In twitter2bsky_daemon.py, change: +level=logging.INFO, +# To: +level=logging.DEBUG, +``` + +Run with verbose output: +```bash +python twitter2bsky_daemon.py 2>&1 | tee debug.log +``` + +Error screenshots are automatically saved as `screenshot_YYYYMMDD_HHMMSS.png` for investigation. + +## πŸ“„ License This project is licensed under the GNU General Public License v3.0. See [LICENSE](LICENSE) for details. -## Contributing +**Summary**: You are free to use, modify, and distribute this software, but any modifications must also be open-source under GPLv3. -Contributions are welcome! Please open issues or submit pull requests on GitHub. +## 🀝 Contributing + +Contributions are welcome! To contribute: + +1. Fork the repository +2. Create a feature branch (`git checkout -b feature/amazing-feature`) +3. Make your changes with clear commit messages +4. Push to your fork (`git push origin feature/amazing-feature`) +5. Open a Pull Request with a description of your changes + +**Before submitting**: +- Test your changes thoroughly +- Ensure code follows existing style conventions +- Add comments for complex logic +- Update README if adding new features + +## ❓ FAQ + +**Q: Can I use this on Windows?** +A: Yes, but ensure you have Python 3.9+ and Chromium/Playwright support. Use `venv\Scripts\activate` instead of `source venv/bin/activate`. + +**Q: How do I avoid posting duplicates?** +A: The state file (`twitter2bsky_state.json`) tracks all posted content. It's automatically maintained; just don't delete it between runs. + +**Q: Can I post to multiple Bluesky accounts?** +A: Currently, the tool posts to one account per instance. Run multiple instances with different `.env` configurations to handle multiple accounts. + +**Q: What happens if posting fails?** +A: The script has automatic retry logic with exponential backoff. Failed posts are logged but the state file is NOT updated, so retries on next run. + +**Q: Is my content optimized for Bluesky?** +A: Yes. The tool automatically: + - Truncates text to 300 characters (grapheme-aware) + - Optimizes images to Bluesky specs + - Handles video conversion and compression + - Resolves shortened URLs + +**Q: How do I run this on a server?** +A: Use the systemd service example in the [Usage](#-usage) section, or set up a cron job. + +**Q: Can I schedule posts?** +A: Not directly through this tool. Instead, use cron/scheduler to run the script at desired times. + +## 🎯 Use Cases + +- **Content Creators**: Automatically repost your RSS feeds to Bluesky for wider reach +- **News Aggregation**: Create Bluesky bots that share news from multiple RSS sources +- **Account Management**: Keep social media accounts synchronized across platforms +- **Content Distribution**: Distribute content from Twitter to Bluesky without manual copying + +## πŸ” Security Notes + +- **Never commit `.env`**: Credentials are automatically gitignored +- **Secure your state file**: `twitter2bsky_state.json` may contain URLs; protect it like credentials +- **Use app passwords**: For Bluesky, use app-specific passwords instead of main account password +- **Monitor logs**: Regularly review `twitter2bsky.log` for unauthorized access attempts + +## πŸ“ž Support + +- **Issues**: Open an issue on GitHub with detailed reproduction steps +- **Documentation**: Check this README and inline code comments +- **Logs**: Attach relevant log excerpts when reporting issues +- **Testing**: Test with `--test` flag before running in production + +## πŸ“ Changelog + +See Git commit history for detailed changes. Notable versions: + +- **v2.0**: Added Twitter scraping with media support, daemon mode +- **v1.5**: Improved RSS parsing and media handling +- **v1.0**: Initial release with basic RSSβ†’Bluesky posting ## Disclaimer -This tool is for personal use and automation. Ensure compliance with the terms of service of Bluesky, Twitter, and any RSS sources you use. Respect rate limits and avoid spamming. \ No newline at end of file +This tool is for personal use and automation. Ensure compliance with the terms of service of Bluesky, Twitter, and any RSS sources you use. Respect rate limits and avoid spamming. +