Added new yml

This commit is contained in:
Guillem Hernandez Sola
2026-04-05 09:04:46 +02:00
parent cce6ff558c
commit 6fdd588179
2 changed files with 164 additions and 61 deletions

119
README.md
View File

@@ -1,37 +1,104 @@
RSS to Bluesky - in Python
--------------------------
# post2bsky
This is a proof-of-concept implementation for posting RSS/Atom content to Bluesky. Some hacking may be required. Issues and pull requests welcome to improve the system.
post2bsky is a Python-based tool for automatically posting content from RSS feeds and Twitter accounts to Bluesky (AT Protocol). It supports both RSS-to-Bluesky and Twitter-to-Bluesky synchronization, with configurable workflows for various sources.
## Features
## Built with:
- **RSS to Bluesky**: Parse RSS feeds and post new entries to Bluesky with proper formatting and media handling.
- **Twitter to Bluesky**: Scrape tweets from specified Twitter accounts and repost them to Bluesky, including media attachments.
- **Daemon Mode**: Run as a background service for continuous monitoring and posting.
- **Configurable Workflows**: Use YAML-based workflows to define sources, schedules, and posting rules.
- **Media Support**: Handle images, videos, and other media from feeds and tweets.
- **Deduplication**: Prevent duplicate posts using state tracking.
- **Logging**: Comprehensive logging for monitoring and debugging.
* [arrow](https://arrow.readthedocs.io/) - Time handling for humans
* [atproto](https://github.com/MarshalX/atproto) - AT protocol implementation for Python. The API of the library is still unstable, but the version is pinned in requirements.txt
* [fastfeedparser](https://github.com/kagisearch/fastfeedparser) - For feed parsing with a unified API
* [httpx](https://www.python-httpx.org/) - For grabbing remote media
## Installation
1. Clone the repository:
```bash
git clone https://github.com/yourusername/post2bsky.git
cd post2bsky
```
## Features:
2. Install Python dependencies:
```bash
pip install -r requeriments.txt
```
* Deduplication: The script queries the target timeline and only posts RSS items that are more recent than the latest top-level post by the handle.
* Filters: Easy to extend code to support filters on RSS contents for simple transformations and limiting cross-posts.
* Minimal rich-text support (links): Rich text is represented in a typed hierarchy in the AT protocol. This script currently performs post-processing on filtered string content of the input feeds to support links as long as they stand as a single line in the text. This definitely needs some improvement.
* Threading for long posts
* Tags
* Image references: Can forward image links from RSS to Bsky
3. Set up environment variables:
Create a `.env` file with your Bluesky credentials:
```
BSKY_USERNAME=your_bluesky_handle
BSKY_PASSWORD=your_bluesky_password
```
## Usage and configuration
For Twitter scraping, additional setup may be required (see Configuration).
1. Start by installing the required libraries `pip install -r requirements.txt`
2. Copy the configuration file and then edit it `cp config.json.sample config.json`
3. Run the script like `python rss2bsky.py`
## Configuration
The configuration file accepts the configuration of:
### RSS Feeds
Use `rss2bsky.py` to post from RSS feeds. Configure the feed URL and other options via command-line arguments.
* a feed URL
* bsky parameters for a handle, username, and password
* Handle is like name.bsky.social
* Username is the email address associated with the account.
* Password is your password. If you have a literal quote it can be escaped with a backslash like `\"`
* sleep - the amount of time to sleep while running
Example:
```bash
python rss2bsky.py --feed-url https://example.com/rss --bsky-handle your_handle
```
### Twitter Accounts
Use `twitter2bsky_daemon.py` for Twitter-to-Bluesky posting. It requires browser automation for scraping.
Configure Twitter accounts in the script or via environment variables.
### Workflows
The `workflows/` directory contains Jenkins pipeline configurations for automated runs. Each `.yml` file defines a pipeline for a specific source (e.g., `324.yml` for 324 RSS feed).
To run a workflow manually, use the `sync_runner.sh` script or execute the Python scripts directly.
## Usage
### Running RSS Sync
```bash
python rss2bsky.py [options]
```
Options:
- `--feed-url`: URL of the RSS feed
- `--bsky-handle`: Your Bluesky handle
- Other options for filtering, formatting, etc.
### Running Twitter Daemon
```bash
python twitter2bsky_daemon.py [options]
```
Options:
- Configure Twitter accounts and Bluesky credentials
- Run in daemon mode for continuous operation
### Using Sync Runner
```bash
./sync_runner.sh
```
This script can be used to run multiple syncs or integrate with cron jobs.
## Dependencies
All Python dependencies are listed in `requeriments.txt`. Key packages include:
- `atproto`: For Bluesky API interaction
- `fastfeedparser`: For RSS parsing
- `playwright`: For browser automation (Twitter scraping)
- `beautifulsoup4`: For HTML parsing
- And many others for media processing, logging, etc.
## License
This project is licensed under the GNU General Public License v3.0. See [LICENSE](LICENSE) for details.
## Contributing
Contributions are welcome! Please open issues or submit pull requests on GitHub.
## Disclaimer
This tool is for personal use and automation. Ensure compliance with the terms of service of Bluesky, Twitter, and any RSS sources you use. Respect rate limits and avoid spamming.

View File

@@ -8,6 +8,7 @@ import httpx
import time
import os
import subprocess
import tempfile
from urllib.parse import urlparse
from dotenv import load_dotenv
from atproto import Client, client_utils, models
@@ -21,6 +22,9 @@ SCRAPE_TWEET_LIMIT = 30
DEDUPE_BSKY_LIMIT = 30
TWEET_MAX_AGE_DAYS = 3
STATE_MAX_ENTRIES = 5000
STATE_MAX_AGE_DAYS = 180
# --- Logging Setup ---
logging.basicConfig(
format="%(asctime)s [%(levelname)s] %(message)s",
@@ -261,6 +265,27 @@ def build_text_media_key(normalized_text, media_fingerprint):
return hashlib.sha256(f"{normalized_text}||{media_fingerprint}".encode("utf-8")).hexdigest()
def safe_remove_file(path):
if path and os.path.exists(path):
try:
os.remove(path)
logging.debug(f"🧹 Removed temp file: {path}")
except Exception as e:
logging.warning(f"⚠️ Could not remove temp file {path}: {e}")
def build_temp_video_output_path(tweet):
"""
Create a unique temp mp4 path for this tweet.
"""
canonical_url = canonicalize_tweet_url(tweet.tweet_url) or ""
seed = canonical_url or f"{tweet.created_on}_{tweet.text[:50]}"
suffix = hashlib.sha256(seed.encode("utf-8")).hexdigest()[:12]
temp_dir = tempfile.gettempdir()
return os.path.join(temp_dir, f"twitter2bsky_{suffix}.mp4")
# --- Local State Management ---
def default_state():
return {
@@ -357,47 +382,59 @@ def candidate_matches_state(candidate, state):
if canonical_tweet_url and canonical_tweet_url in posted_tweets:
return True, "state:tweet_url"
for _, record in posted_tweets.items():
for record in posted_tweets.values():
if record.get("text_media_key") == text_media_key:
return True, "state:text_media_fingerprint"
for _, record in posted_tweets.items():
for record in posted_tweets.values():
if record.get("normalized_text") == normalized_text:
return True, "state:normalized_text"
return False, None
def prune_state(state, max_entries=5000):
def prune_state(state, max_entries=STATE_MAX_ENTRIES, max_age_days=STATE_MAX_AGE_DAYS):
"""
Keep state file from growing forever.
Prunes oldest records by posted_at if necessary.
Prunes:
- entries older than max_age_days
- entries beyond max_entries, keeping newest first
- orphan posted_by_bsky_uri keys
"""
posted_tweets = state.get("posted_tweets", {})
cutoff = arrow.utcnow().shift(days=-max_age_days)
if len(posted_tweets) <= max_entries:
return state
kept_items = []
sortable = []
for key, record in posted_tweets.items():
posted_at = record.get("posted_at") or ""
sortable.append((key, posted_at))
posted_at_raw = record.get("posted_at")
keep = True
sortable.sort(key=lambda x: x[1], reverse=True)
keep_keys = {key for key, _ in sortable[:max_entries]}
if posted_at_raw:
try:
posted_at = arrow.get(posted_at_raw)
if posted_at < cutoff:
keep = False
except Exception:
pass
new_posted_tweets = {}
for key, record in posted_tweets.items():
if key in keep_keys:
new_posted_tweets[key] = record
if keep:
kept_items.append((key, record))
new_posted_by_bsky_uri = {}
for bsky_uri, key in state.get("posted_by_bsky_uri", {}).items():
if key in keep_keys:
new_posted_by_bsky_uri[bsky_uri] = key
kept_items.sort(key=lambda item: item[1].get("posted_at", ""), reverse=True)
kept_items = kept_items[:max_entries]
keep_keys = {key for key, _ in kept_items}
state["posted_tweets"] = {key: record for key, record in kept_items}
posted_by_bsky_uri = state.get("posted_by_bsky_uri", {})
state["posted_by_bsky_uri"] = {
bsky_uri: key
for bsky_uri, key in posted_by_bsky_uri.items()
if key in keep_keys
}
state["posted_tweets"] = new_posted_tweets
state["posted_by_bsky_uri"] = new_posted_by_bsky_uri
return state
@@ -898,12 +935,8 @@ def download_and_crop_video(video_url, output_path):
return None
finally:
for path in [temp_input, temp_output]:
if os.path.exists(path):
try:
os.remove(path)
except Exception:
pass
safe_remove_file(temp_input)
safe_remove_file(temp_output)
def candidate_matches_existing_bsky(candidate, recent_bsky_posts):
@@ -938,6 +971,8 @@ def sync_feeds(args):
logging.info("🔄 Starting sync cycle...")
try:
state = load_state(STATE_PATH)
state = prune_state(state)
save_state(state, STATE_PATH)
tweets = scrape_tweets_via_playwright(
args.twitter_username,
@@ -1028,7 +1063,7 @@ def sync_feeds(args):
return
new_posts = 0
state_file = "twitter_browser_state.json"
browser_state_file = "twitter_browser_state.json"
with sync_playwright() as p:
browser = p.chromium.launch(
@@ -1043,8 +1078,8 @@ def sync_feeds(args):
),
"viewport": {"width": 1920, "height": 1080},
}
if os.path.exists(state_file):
context_kwargs["storage_state"] = state_file
if os.path.exists(browser_state_file):
context_kwargs["storage_state"] = browser_state_file
context = browser.new_context(**context_kwargs)
@@ -1078,7 +1113,7 @@ def sync_feeds(args):
logging.warning("⚠️ Tweet has video marker but no tweet URL. Skipping video.")
continue
temp_video_path = "temp_video.mp4"
temp_video_path = build_temp_video_output_path(tweet)
try:
real_video_url = extract_video_url_from_tweet_page(context, tweet.tweet_url)
@@ -1099,8 +1134,9 @@ def sync_feeds(args):
video_embed = build_video_embed(video_blob, dynamic_alt)
finally:
if os.path.exists(temp_video_path):
os.remove(temp_video_path)
safe_remove_file(temp_video_path)
safe_remove_file(temp_video_path.replace(".mp4", "_source.mp4"))
safe_remove_file(temp_video_path.replace(".mp4", "_cropped.mp4"))
try:
post_result = None
@@ -1116,7 +1152,7 @@ def sync_feeds(args):
bsky_uri = getattr(post_result, "uri", None)
remember_posted_tweet(state, candidate, bsky_uri=bsky_uri)
state = prune_state(state, max_entries=5000)
state = prune_state(state)
save_state(state, STATE_PATH)
recent_bsky_posts.insert(0, {
@@ -1186,4 +1222,4 @@ def main():
if __name__ == "__main__":
main()
main()