Added new yml

This commit is contained in:
Guillem Hernandez Sola
2026-04-05 09:04:46 +02:00
parent cce6ff558c
commit 6fdd588179
2 changed files with 164 additions and 61 deletions

119
README.md
View File

@@ -1,37 +1,104 @@
RSS to Bluesky - in Python # post2bsky
--------------------------
This is a proof-of-concept implementation for posting RSS/Atom content to Bluesky. Some hacking may be required. Issues and pull requests welcome to improve the system. post2bsky is a Python-based tool for automatically posting content from RSS feeds and Twitter accounts to Bluesky (AT Protocol). It supports both RSS-to-Bluesky and Twitter-to-Bluesky synchronization, with configurable workflows for various sources.
## Features
## Built with: - **RSS to Bluesky**: Parse RSS feeds and post new entries to Bluesky with proper formatting and media handling.
- **Twitter to Bluesky**: Scrape tweets from specified Twitter accounts and repost them to Bluesky, including media attachments.
- **Daemon Mode**: Run as a background service for continuous monitoring and posting.
- **Configurable Workflows**: Use YAML-based workflows to define sources, schedules, and posting rules.
- **Media Support**: Handle images, videos, and other media from feeds and tweets.
- **Deduplication**: Prevent duplicate posts using state tracking.
- **Logging**: Comprehensive logging for monitoring and debugging.
* [arrow](https://arrow.readthedocs.io/) - Time handling for humans ## Installation
* [atproto](https://github.com/MarshalX/atproto) - AT protocol implementation for Python. The API of the library is still unstable, but the version is pinned in requirements.txt
* [fastfeedparser](https://github.com/kagisearch/fastfeedparser) - For feed parsing with a unified API
* [httpx](https://www.python-httpx.org/) - For grabbing remote media
1. Clone the repository:
```bash
git clone https://github.com/yourusername/post2bsky.git
cd post2bsky
```
## Features: 2. Install Python dependencies:
```bash
pip install -r requeriments.txt
```
* Deduplication: The script queries the target timeline and only posts RSS items that are more recent than the latest top-level post by the handle. 3. Set up environment variables:
* Filters: Easy to extend code to support filters on RSS contents for simple transformations and limiting cross-posts. Create a `.env` file with your Bluesky credentials:
* Minimal rich-text support (links): Rich text is represented in a typed hierarchy in the AT protocol. This script currently performs post-processing on filtered string content of the input feeds to support links as long as they stand as a single line in the text. This definitely needs some improvement. ```
* Threading for long posts BSKY_USERNAME=your_bluesky_handle
* Tags BSKY_PASSWORD=your_bluesky_password
* Image references: Can forward image links from RSS to Bsky ```
## Usage and configuration For Twitter scraping, additional setup may be required (see Configuration).
1. Start by installing the required libraries `pip install -r requirements.txt` ## Configuration
2. Copy the configuration file and then edit it `cp config.json.sample config.json`
3. Run the script like `python rss2bsky.py`
The configuration file accepts the configuration of: ### RSS Feeds
Use `rss2bsky.py` to post from RSS feeds. Configure the feed URL and other options via command-line arguments.
* a feed URL Example:
* bsky parameters for a handle, username, and password ```bash
* Handle is like name.bsky.social python rss2bsky.py --feed-url https://example.com/rss --bsky-handle your_handle
* Username is the email address associated with the account. ```
* Password is your password. If you have a literal quote it can be escaped with a backslash like `\"`
* sleep - the amount of time to sleep while running ### Twitter Accounts
Use `twitter2bsky_daemon.py` for Twitter-to-Bluesky posting. It requires browser automation for scraping.
Configure Twitter accounts in the script or via environment variables.
### Workflows
The `workflows/` directory contains Jenkins pipeline configurations for automated runs. Each `.yml` file defines a pipeline for a specific source (e.g., `324.yml` for 324 RSS feed).
To run a workflow manually, use the `sync_runner.sh` script or execute the Python scripts directly.
## Usage
### Running RSS Sync
```bash
python rss2bsky.py [options]
```
Options:
- `--feed-url`: URL of the RSS feed
- `--bsky-handle`: Your Bluesky handle
- Other options for filtering, formatting, etc.
### Running Twitter Daemon
```bash
python twitter2bsky_daemon.py [options]
```
Options:
- Configure Twitter accounts and Bluesky credentials
- Run in daemon mode for continuous operation
### Using Sync Runner
```bash
./sync_runner.sh
```
This script can be used to run multiple syncs or integrate with cron jobs.
## Dependencies
All Python dependencies are listed in `requeriments.txt`. Key packages include:
- `atproto`: For Bluesky API interaction
- `fastfeedparser`: For RSS parsing
- `playwright`: For browser automation (Twitter scraping)
- `beautifulsoup4`: For HTML parsing
- And many others for media processing, logging, etc.
## License
This project is licensed under the GNU General Public License v3.0. See [LICENSE](LICENSE) for details.
## Contributing
Contributions are welcome! Please open issues or submit pull requests on GitHub.
## Disclaimer
This tool is for personal use and automation. Ensure compliance with the terms of service of Bluesky, Twitter, and any RSS sources you use. Respect rate limits and avoid spamming.

View File

@@ -8,6 +8,7 @@ import httpx
import time import time
import os import os
import subprocess import subprocess
import tempfile
from urllib.parse import urlparse from urllib.parse import urlparse
from dotenv import load_dotenv from dotenv import load_dotenv
from atproto import Client, client_utils, models from atproto import Client, client_utils, models
@@ -21,6 +22,9 @@ SCRAPE_TWEET_LIMIT = 30
DEDUPE_BSKY_LIMIT = 30 DEDUPE_BSKY_LIMIT = 30
TWEET_MAX_AGE_DAYS = 3 TWEET_MAX_AGE_DAYS = 3
STATE_MAX_ENTRIES = 5000
STATE_MAX_AGE_DAYS = 180
# --- Logging Setup --- # --- Logging Setup ---
logging.basicConfig( logging.basicConfig(
format="%(asctime)s [%(levelname)s] %(message)s", format="%(asctime)s [%(levelname)s] %(message)s",
@@ -261,6 +265,27 @@ def build_text_media_key(normalized_text, media_fingerprint):
return hashlib.sha256(f"{normalized_text}||{media_fingerprint}".encode("utf-8")).hexdigest() return hashlib.sha256(f"{normalized_text}||{media_fingerprint}".encode("utf-8")).hexdigest()
def safe_remove_file(path):
if path and os.path.exists(path):
try:
os.remove(path)
logging.debug(f"🧹 Removed temp file: {path}")
except Exception as e:
logging.warning(f"⚠️ Could not remove temp file {path}: {e}")
def build_temp_video_output_path(tweet):
"""
Create a unique temp mp4 path for this tweet.
"""
canonical_url = canonicalize_tweet_url(tweet.tweet_url) or ""
seed = canonical_url or f"{tweet.created_on}_{tweet.text[:50]}"
suffix = hashlib.sha256(seed.encode("utf-8")).hexdigest()[:12]
temp_dir = tempfile.gettempdir()
return os.path.join(temp_dir, f"twitter2bsky_{suffix}.mp4")
# --- Local State Management --- # --- Local State Management ---
def default_state(): def default_state():
return { return {
@@ -357,47 +382,59 @@ def candidate_matches_state(candidate, state):
if canonical_tweet_url and canonical_tweet_url in posted_tweets: if canonical_tweet_url and canonical_tweet_url in posted_tweets:
return True, "state:tweet_url" return True, "state:tweet_url"
for _, record in posted_tweets.items(): for record in posted_tweets.values():
if record.get("text_media_key") == text_media_key: if record.get("text_media_key") == text_media_key:
return True, "state:text_media_fingerprint" return True, "state:text_media_fingerprint"
for _, record in posted_tweets.items(): for record in posted_tweets.values():
if record.get("normalized_text") == normalized_text: if record.get("normalized_text") == normalized_text:
return True, "state:normalized_text" return True, "state:normalized_text"
return False, None return False, None
def prune_state(state, max_entries=5000): def prune_state(state, max_entries=STATE_MAX_ENTRIES, max_age_days=STATE_MAX_AGE_DAYS):
""" """
Keep state file from growing forever. Keep state file from growing forever.
Prunes oldest records by posted_at if necessary. Prunes:
- entries older than max_age_days
- entries beyond max_entries, keeping newest first
- orphan posted_by_bsky_uri keys
""" """
posted_tweets = state.get("posted_tweets", {}) posted_tweets = state.get("posted_tweets", {})
cutoff = arrow.utcnow().shift(days=-max_age_days)
if len(posted_tweets) <= max_entries: kept_items = []
return state
sortable = []
for key, record in posted_tweets.items(): for key, record in posted_tweets.items():
posted_at = record.get("posted_at") or "" posted_at_raw = record.get("posted_at")
sortable.append((key, posted_at)) keep = True
sortable.sort(key=lambda x: x[1], reverse=True) if posted_at_raw:
keep_keys = {key for key, _ in sortable[:max_entries]} try:
posted_at = arrow.get(posted_at_raw)
if posted_at < cutoff:
keep = False
except Exception:
pass
new_posted_tweets = {} if keep:
for key, record in posted_tweets.items(): kept_items.append((key, record))
if key in keep_keys:
new_posted_tweets[key] = record
new_posted_by_bsky_uri = {} kept_items.sort(key=lambda item: item[1].get("posted_at", ""), reverse=True)
for bsky_uri, key in state.get("posted_by_bsky_uri", {}).items(): kept_items = kept_items[:max_entries]
if key in keep_keys:
new_posted_by_bsky_uri[bsky_uri] = key keep_keys = {key for key, _ in kept_items}
state["posted_tweets"] = {key: record for key, record in kept_items}
posted_by_bsky_uri = state.get("posted_by_bsky_uri", {})
state["posted_by_bsky_uri"] = {
bsky_uri: key
for bsky_uri, key in posted_by_bsky_uri.items()
if key in keep_keys
}
state["posted_tweets"] = new_posted_tweets
state["posted_by_bsky_uri"] = new_posted_by_bsky_uri
return state return state
@@ -898,12 +935,8 @@ def download_and_crop_video(video_url, output_path):
return None return None
finally: finally:
for path in [temp_input, temp_output]: safe_remove_file(temp_input)
if os.path.exists(path): safe_remove_file(temp_output)
try:
os.remove(path)
except Exception:
pass
def candidate_matches_existing_bsky(candidate, recent_bsky_posts): def candidate_matches_existing_bsky(candidate, recent_bsky_posts):
@@ -938,6 +971,8 @@ def sync_feeds(args):
logging.info("🔄 Starting sync cycle...") logging.info("🔄 Starting sync cycle...")
try: try:
state = load_state(STATE_PATH) state = load_state(STATE_PATH)
state = prune_state(state)
save_state(state, STATE_PATH)
tweets = scrape_tweets_via_playwright( tweets = scrape_tweets_via_playwright(
args.twitter_username, args.twitter_username,
@@ -1028,7 +1063,7 @@ def sync_feeds(args):
return return
new_posts = 0 new_posts = 0
state_file = "twitter_browser_state.json" browser_state_file = "twitter_browser_state.json"
with sync_playwright() as p: with sync_playwright() as p:
browser = p.chromium.launch( browser = p.chromium.launch(
@@ -1043,8 +1078,8 @@ def sync_feeds(args):
), ),
"viewport": {"width": 1920, "height": 1080}, "viewport": {"width": 1920, "height": 1080},
} }
if os.path.exists(state_file): if os.path.exists(browser_state_file):
context_kwargs["storage_state"] = state_file context_kwargs["storage_state"] = browser_state_file
context = browser.new_context(**context_kwargs) context = browser.new_context(**context_kwargs)
@@ -1078,7 +1113,7 @@ def sync_feeds(args):
logging.warning("⚠️ Tweet has video marker but no tweet URL. Skipping video.") logging.warning("⚠️ Tweet has video marker but no tweet URL. Skipping video.")
continue continue
temp_video_path = "temp_video.mp4" temp_video_path = build_temp_video_output_path(tweet)
try: try:
real_video_url = extract_video_url_from_tweet_page(context, tweet.tweet_url) real_video_url = extract_video_url_from_tweet_page(context, tweet.tweet_url)
@@ -1099,8 +1134,9 @@ def sync_feeds(args):
video_embed = build_video_embed(video_blob, dynamic_alt) video_embed = build_video_embed(video_blob, dynamic_alt)
finally: finally:
if os.path.exists(temp_video_path): safe_remove_file(temp_video_path)
os.remove(temp_video_path) safe_remove_file(temp_video_path.replace(".mp4", "_source.mp4"))
safe_remove_file(temp_video_path.replace(".mp4", "_cropped.mp4"))
try: try:
post_result = None post_result = None
@@ -1116,7 +1152,7 @@ def sync_feeds(args):
bsky_uri = getattr(post_result, "uri", None) bsky_uri = getattr(post_result, "uri", None)
remember_posted_tweet(state, candidate, bsky_uri=bsky_uri) remember_posted_tweet(state, candidate, bsky_uri=bsky_uri)
state = prune_state(state, max_entries=5000) state = prune_state(state)
save_state(state, STATE_PATH) save_state(state, STATE_PATH)
recent_bsky_posts.insert(0, { recent_bsky_posts.insert(0, {