Added new yml
This commit is contained in:
119
README.md
119
README.md
@@ -1,37 +1,104 @@
|
|||||||
RSS to Bluesky - in Python
|
# post2bsky
|
||||||
--------------------------
|
|
||||||
|
|
||||||
This is a proof-of-concept implementation for posting RSS/Atom content to Bluesky. Some hacking may be required. Issues and pull requests welcome to improve the system.
|
post2bsky is a Python-based tool for automatically posting content from RSS feeds and Twitter accounts to Bluesky (AT Protocol). It supports both RSS-to-Bluesky and Twitter-to-Bluesky synchronization, with configurable workflows for various sources.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
## Built with:
|
- **RSS to Bluesky**: Parse RSS feeds and post new entries to Bluesky with proper formatting and media handling.
|
||||||
|
- **Twitter to Bluesky**: Scrape tweets from specified Twitter accounts and repost them to Bluesky, including media attachments.
|
||||||
|
- **Daemon Mode**: Run as a background service for continuous monitoring and posting.
|
||||||
|
- **Configurable Workflows**: Use YAML-based workflows to define sources, schedules, and posting rules.
|
||||||
|
- **Media Support**: Handle images, videos, and other media from feeds and tweets.
|
||||||
|
- **Deduplication**: Prevent duplicate posts using state tracking.
|
||||||
|
- **Logging**: Comprehensive logging for monitoring and debugging.
|
||||||
|
|
||||||
* [arrow](https://arrow.readthedocs.io/) - Time handling for humans
|
## Installation
|
||||||
* [atproto](https://github.com/MarshalX/atproto) - AT protocol implementation for Python. The API of the library is still unstable, but the version is pinned in requirements.txt
|
|
||||||
* [fastfeedparser](https://github.com/kagisearch/fastfeedparser) - For feed parsing with a unified API
|
|
||||||
* [httpx](https://www.python-httpx.org/) - For grabbing remote media
|
|
||||||
|
|
||||||
|
1. Clone the repository:
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/yourusername/post2bsky.git
|
||||||
|
cd post2bsky
|
||||||
|
```
|
||||||
|
|
||||||
## Features:
|
2. Install Python dependencies:
|
||||||
|
```bash
|
||||||
|
pip install -r requeriments.txt
|
||||||
|
```
|
||||||
|
|
||||||
* Deduplication: The script queries the target timeline and only posts RSS items that are more recent than the latest top-level post by the handle.
|
3. Set up environment variables:
|
||||||
* Filters: Easy to extend code to support filters on RSS contents for simple transformations and limiting cross-posts.
|
Create a `.env` file with your Bluesky credentials:
|
||||||
* Minimal rich-text support (links): Rich text is represented in a typed hierarchy in the AT protocol. This script currently performs post-processing on filtered string content of the input feeds to support links as long as they stand as a single line in the text. This definitely needs some improvement.
|
```
|
||||||
* Threading for long posts
|
BSKY_USERNAME=your_bluesky_handle
|
||||||
* Tags
|
BSKY_PASSWORD=your_bluesky_password
|
||||||
* Image references: Can forward image links from RSS to Bsky
|
```
|
||||||
|
|
||||||
## Usage and configuration
|
For Twitter scraping, additional setup may be required (see Configuration).
|
||||||
|
|
||||||
1. Start by installing the required libraries `pip install -r requirements.txt`
|
## Configuration
|
||||||
2. Copy the configuration file and then edit it `cp config.json.sample config.json`
|
|
||||||
3. Run the script like `python rss2bsky.py`
|
|
||||||
|
|
||||||
The configuration file accepts the configuration of:
|
### RSS Feeds
|
||||||
|
Use `rss2bsky.py` to post from RSS feeds. Configure the feed URL and other options via command-line arguments.
|
||||||
|
|
||||||
* a feed URL
|
Example:
|
||||||
* bsky parameters for a handle, username, and password
|
```bash
|
||||||
* Handle is like name.bsky.social
|
python rss2bsky.py --feed-url https://example.com/rss --bsky-handle your_handle
|
||||||
* Username is the email address associated with the account.
|
```
|
||||||
* Password is your password. If you have a literal quote it can be escaped with a backslash like `\"`
|
|
||||||
* sleep - the amount of time to sleep while running
|
### Twitter Accounts
|
||||||
|
Use `twitter2bsky_daemon.py` for Twitter-to-Bluesky posting. It requires browser automation for scraping.
|
||||||
|
|
||||||
|
Configure Twitter accounts in the script or via environment variables.
|
||||||
|
|
||||||
|
### Workflows
|
||||||
|
The `workflows/` directory contains Jenkins pipeline configurations for automated runs. Each `.yml` file defines a pipeline for a specific source (e.g., `324.yml` for 324 RSS feed).
|
||||||
|
|
||||||
|
To run a workflow manually, use the `sync_runner.sh` script or execute the Python scripts directly.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Running RSS Sync
|
||||||
|
```bash
|
||||||
|
python rss2bsky.py [options]
|
||||||
|
```
|
||||||
|
|
||||||
|
Options:
|
||||||
|
- `--feed-url`: URL of the RSS feed
|
||||||
|
- `--bsky-handle`: Your Bluesky handle
|
||||||
|
- Other options for filtering, formatting, etc.
|
||||||
|
|
||||||
|
### Running Twitter Daemon
|
||||||
|
```bash
|
||||||
|
python twitter2bsky_daemon.py [options]
|
||||||
|
```
|
||||||
|
|
||||||
|
Options:
|
||||||
|
- Configure Twitter accounts and Bluesky credentials
|
||||||
|
- Run in daemon mode for continuous operation
|
||||||
|
|
||||||
|
### Using Sync Runner
|
||||||
|
```bash
|
||||||
|
./sync_runner.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
This script can be used to run multiple syncs or integrate with cron jobs.
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
|
||||||
|
All Python dependencies are listed in `requeriments.txt`. Key packages include:
|
||||||
|
- `atproto`: For Bluesky API interaction
|
||||||
|
- `fastfeedparser`: For RSS parsing
|
||||||
|
- `playwright`: For browser automation (Twitter scraping)
|
||||||
|
- `beautifulsoup4`: For HTML parsing
|
||||||
|
- And many others for media processing, logging, etc.
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
This project is licensed under the GNU General Public License v3.0. See [LICENSE](LICENSE) for details.
|
||||||
|
|
||||||
|
## Contributing
|
||||||
|
|
||||||
|
Contributions are welcome! Please open issues or submit pull requests on GitHub.
|
||||||
|
|
||||||
|
## Disclaimer
|
||||||
|
|
||||||
|
This tool is for personal use and automation. Ensure compliance with the terms of service of Bluesky, Twitter, and any RSS sources you use. Respect rate limits and avoid spamming.
|
||||||
@@ -8,6 +8,7 @@ import httpx
|
|||||||
import time
|
import time
|
||||||
import os
|
import os
|
||||||
import subprocess
|
import subprocess
|
||||||
|
import tempfile
|
||||||
from urllib.parse import urlparse
|
from urllib.parse import urlparse
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from atproto import Client, client_utils, models
|
from atproto import Client, client_utils, models
|
||||||
@@ -21,6 +22,9 @@ SCRAPE_TWEET_LIMIT = 30
|
|||||||
DEDUPE_BSKY_LIMIT = 30
|
DEDUPE_BSKY_LIMIT = 30
|
||||||
TWEET_MAX_AGE_DAYS = 3
|
TWEET_MAX_AGE_DAYS = 3
|
||||||
|
|
||||||
|
STATE_MAX_ENTRIES = 5000
|
||||||
|
STATE_MAX_AGE_DAYS = 180
|
||||||
|
|
||||||
# --- Logging Setup ---
|
# --- Logging Setup ---
|
||||||
logging.basicConfig(
|
logging.basicConfig(
|
||||||
format="%(asctime)s [%(levelname)s] %(message)s",
|
format="%(asctime)s [%(levelname)s] %(message)s",
|
||||||
@@ -261,6 +265,27 @@ def build_text_media_key(normalized_text, media_fingerprint):
|
|||||||
return hashlib.sha256(f"{normalized_text}||{media_fingerprint}".encode("utf-8")).hexdigest()
|
return hashlib.sha256(f"{normalized_text}||{media_fingerprint}".encode("utf-8")).hexdigest()
|
||||||
|
|
||||||
|
|
||||||
|
def safe_remove_file(path):
|
||||||
|
if path and os.path.exists(path):
|
||||||
|
try:
|
||||||
|
os.remove(path)
|
||||||
|
logging.debug(f"🧹 Removed temp file: {path}")
|
||||||
|
except Exception as e:
|
||||||
|
logging.warning(f"⚠️ Could not remove temp file {path}: {e}")
|
||||||
|
|
||||||
|
|
||||||
|
def build_temp_video_output_path(tweet):
|
||||||
|
"""
|
||||||
|
Create a unique temp mp4 path for this tweet.
|
||||||
|
"""
|
||||||
|
canonical_url = canonicalize_tweet_url(tweet.tweet_url) or ""
|
||||||
|
seed = canonical_url or f"{tweet.created_on}_{tweet.text[:50]}"
|
||||||
|
suffix = hashlib.sha256(seed.encode("utf-8")).hexdigest()[:12]
|
||||||
|
|
||||||
|
temp_dir = tempfile.gettempdir()
|
||||||
|
return os.path.join(temp_dir, f"twitter2bsky_{suffix}.mp4")
|
||||||
|
|
||||||
|
|
||||||
# --- Local State Management ---
|
# --- Local State Management ---
|
||||||
def default_state():
|
def default_state():
|
||||||
return {
|
return {
|
||||||
@@ -357,47 +382,59 @@ def candidate_matches_state(candidate, state):
|
|||||||
if canonical_tweet_url and canonical_tweet_url in posted_tweets:
|
if canonical_tweet_url and canonical_tweet_url in posted_tweets:
|
||||||
return True, "state:tweet_url"
|
return True, "state:tweet_url"
|
||||||
|
|
||||||
for _, record in posted_tweets.items():
|
for record in posted_tweets.values():
|
||||||
if record.get("text_media_key") == text_media_key:
|
if record.get("text_media_key") == text_media_key:
|
||||||
return True, "state:text_media_fingerprint"
|
return True, "state:text_media_fingerprint"
|
||||||
|
|
||||||
for _, record in posted_tweets.items():
|
for record in posted_tweets.values():
|
||||||
if record.get("normalized_text") == normalized_text:
|
if record.get("normalized_text") == normalized_text:
|
||||||
return True, "state:normalized_text"
|
return True, "state:normalized_text"
|
||||||
|
|
||||||
return False, None
|
return False, None
|
||||||
|
|
||||||
|
|
||||||
def prune_state(state, max_entries=5000):
|
def prune_state(state, max_entries=STATE_MAX_ENTRIES, max_age_days=STATE_MAX_AGE_DAYS):
|
||||||
"""
|
"""
|
||||||
Keep state file from growing forever.
|
Keep state file from growing forever.
|
||||||
Prunes oldest records by posted_at if necessary.
|
Prunes:
|
||||||
|
- entries older than max_age_days
|
||||||
|
- entries beyond max_entries, keeping newest first
|
||||||
|
- orphan posted_by_bsky_uri keys
|
||||||
"""
|
"""
|
||||||
posted_tweets = state.get("posted_tweets", {})
|
posted_tweets = state.get("posted_tweets", {})
|
||||||
|
cutoff = arrow.utcnow().shift(days=-max_age_days)
|
||||||
|
|
||||||
if len(posted_tweets) <= max_entries:
|
kept_items = []
|
||||||
return state
|
|
||||||
|
|
||||||
sortable = []
|
|
||||||
for key, record in posted_tweets.items():
|
for key, record in posted_tweets.items():
|
||||||
posted_at = record.get("posted_at") or ""
|
posted_at_raw = record.get("posted_at")
|
||||||
sortable.append((key, posted_at))
|
keep = True
|
||||||
|
|
||||||
sortable.sort(key=lambda x: x[1], reverse=True)
|
if posted_at_raw:
|
||||||
keep_keys = {key for key, _ in sortable[:max_entries]}
|
try:
|
||||||
|
posted_at = arrow.get(posted_at_raw)
|
||||||
|
if posted_at < cutoff:
|
||||||
|
keep = False
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
new_posted_tweets = {}
|
if keep:
|
||||||
for key, record in posted_tweets.items():
|
kept_items.append((key, record))
|
||||||
if key in keep_keys:
|
|
||||||
new_posted_tweets[key] = record
|
|
||||||
|
|
||||||
new_posted_by_bsky_uri = {}
|
kept_items.sort(key=lambda item: item[1].get("posted_at", ""), reverse=True)
|
||||||
for bsky_uri, key in state.get("posted_by_bsky_uri", {}).items():
|
kept_items = kept_items[:max_entries]
|
||||||
if key in keep_keys:
|
|
||||||
new_posted_by_bsky_uri[bsky_uri] = key
|
keep_keys = {key for key, _ in kept_items}
|
||||||
|
|
||||||
|
state["posted_tweets"] = {key: record for key, record in kept_items}
|
||||||
|
|
||||||
|
posted_by_bsky_uri = state.get("posted_by_bsky_uri", {})
|
||||||
|
state["posted_by_bsky_uri"] = {
|
||||||
|
bsky_uri: key
|
||||||
|
for bsky_uri, key in posted_by_bsky_uri.items()
|
||||||
|
if key in keep_keys
|
||||||
|
}
|
||||||
|
|
||||||
state["posted_tweets"] = new_posted_tweets
|
|
||||||
state["posted_by_bsky_uri"] = new_posted_by_bsky_uri
|
|
||||||
return state
|
return state
|
||||||
|
|
||||||
|
|
||||||
@@ -898,12 +935,8 @@ def download_and_crop_video(video_url, output_path):
|
|||||||
return None
|
return None
|
||||||
|
|
||||||
finally:
|
finally:
|
||||||
for path in [temp_input, temp_output]:
|
safe_remove_file(temp_input)
|
||||||
if os.path.exists(path):
|
safe_remove_file(temp_output)
|
||||||
try:
|
|
||||||
os.remove(path)
|
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
|
|
||||||
|
|
||||||
def candidate_matches_existing_bsky(candidate, recent_bsky_posts):
|
def candidate_matches_existing_bsky(candidate, recent_bsky_posts):
|
||||||
@@ -938,6 +971,8 @@ def sync_feeds(args):
|
|||||||
logging.info("🔄 Starting sync cycle...")
|
logging.info("🔄 Starting sync cycle...")
|
||||||
try:
|
try:
|
||||||
state = load_state(STATE_PATH)
|
state = load_state(STATE_PATH)
|
||||||
|
state = prune_state(state)
|
||||||
|
save_state(state, STATE_PATH)
|
||||||
|
|
||||||
tweets = scrape_tweets_via_playwright(
|
tweets = scrape_tweets_via_playwright(
|
||||||
args.twitter_username,
|
args.twitter_username,
|
||||||
@@ -1028,7 +1063,7 @@ def sync_feeds(args):
|
|||||||
return
|
return
|
||||||
|
|
||||||
new_posts = 0
|
new_posts = 0
|
||||||
state_file = "twitter_browser_state.json"
|
browser_state_file = "twitter_browser_state.json"
|
||||||
|
|
||||||
with sync_playwright() as p:
|
with sync_playwright() as p:
|
||||||
browser = p.chromium.launch(
|
browser = p.chromium.launch(
|
||||||
@@ -1043,8 +1078,8 @@ def sync_feeds(args):
|
|||||||
),
|
),
|
||||||
"viewport": {"width": 1920, "height": 1080},
|
"viewport": {"width": 1920, "height": 1080},
|
||||||
}
|
}
|
||||||
if os.path.exists(state_file):
|
if os.path.exists(browser_state_file):
|
||||||
context_kwargs["storage_state"] = state_file
|
context_kwargs["storage_state"] = browser_state_file
|
||||||
|
|
||||||
context = browser.new_context(**context_kwargs)
|
context = browser.new_context(**context_kwargs)
|
||||||
|
|
||||||
@@ -1078,7 +1113,7 @@ def sync_feeds(args):
|
|||||||
logging.warning("⚠️ Tweet has video marker but no tweet URL. Skipping video.")
|
logging.warning("⚠️ Tweet has video marker but no tweet URL. Skipping video.")
|
||||||
continue
|
continue
|
||||||
|
|
||||||
temp_video_path = "temp_video.mp4"
|
temp_video_path = build_temp_video_output_path(tweet)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
real_video_url = extract_video_url_from_tweet_page(context, tweet.tweet_url)
|
real_video_url = extract_video_url_from_tweet_page(context, tweet.tweet_url)
|
||||||
@@ -1099,8 +1134,9 @@ def sync_feeds(args):
|
|||||||
video_embed = build_video_embed(video_blob, dynamic_alt)
|
video_embed = build_video_embed(video_blob, dynamic_alt)
|
||||||
|
|
||||||
finally:
|
finally:
|
||||||
if os.path.exists(temp_video_path):
|
safe_remove_file(temp_video_path)
|
||||||
os.remove(temp_video_path)
|
safe_remove_file(temp_video_path.replace(".mp4", "_source.mp4"))
|
||||||
|
safe_remove_file(temp_video_path.replace(".mp4", "_cropped.mp4"))
|
||||||
|
|
||||||
try:
|
try:
|
||||||
post_result = None
|
post_result = None
|
||||||
@@ -1116,7 +1152,7 @@ def sync_feeds(args):
|
|||||||
bsky_uri = getattr(post_result, "uri", None)
|
bsky_uri = getattr(post_result, "uri", None)
|
||||||
|
|
||||||
remember_posted_tweet(state, candidate, bsky_uri=bsky_uri)
|
remember_posted_tweet(state, candidate, bsky_uri=bsky_uri)
|
||||||
state = prune_state(state, max_entries=5000)
|
state = prune_state(state)
|
||||||
save_state(state, STATE_PATH)
|
save_state(state, STATE_PATH)
|
||||||
|
|
||||||
recent_bsky_posts.insert(0, {
|
recent_bsky_posts.insert(0, {
|
||||||
@@ -1186,4 +1222,4 @@ def main():
|
|||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
main()
|
main()
|
||||||
Reference in New Issue
Block a user