Creating plugins
This guide walks through creating a plugin for a new platform.
Plugin architecture
Megaloader uses a registry pattern to map domains to plugin classes. To add support for a new platform, you create a class that inherits from BasePlugin and implements the extraction logic.
In other words, every plugin:
- Inherits from
BasePlugin - Implements
extract()to yieldDownloadItemobjects - Optionally overrides
_configure_session()for platform-specific setup - Gets registered in
PLUGIN_REGISTRYto map domains to the plugin
Minimal example
Your plugin must implement the extract method and yield DownloadItem objects.
from collections.abc import Generator
from megaloader.item import DownloadItem
from megaloader.plugin import BasePlugin
class SimpleHost(BasePlugin):
def extract(self) -> Generator[DownloadItem, None, None]:
response = self.session.get(self.url, timeout=30)
response.raise_for_status()
data = response.json()
for file_data in data["files"]:
yield DownloadItem(
download_url=file_data["url"],
filename=file_data["name"],
size_bytes=file_data.get("size"),
)That's it. BasePlugin handles session creation, retry logic, and default headers. Your plugin just needs to fetch data and yield DownloadItem objects.
What BasePlugin provides:
self.url- The URL passed toextract()self.options- Dictionary of keyword argumentsself.session- Lazy-loadedrequests.Sessionwith retry logic. Always use it! Don't reimplement a retry logic.
What you implement:
extract()- Must yieldDownloadItemobjects_configure_session(session)- Optional, for custom headers/cookies
Building a plugin
Let's build a plugin for a fictional platform called "FileBox" that has album URLs like https://filebox.com/album/abc123 and file URLs like https://filebox.com/file/xyz789.
Start by creating the plugin class and parsing the URL:
import logging
import re
from collections.abc import Generator
from typing import Any
from megaloader.item import DownloadItem
from megaloader.plugin import BasePlugin
logger = logging.getLogger(__name__)
class FileBox(BasePlugin):
API_BASE = "https://api.filebox.com/v1"
def __init__(self, url: str, **options: Any) -> None:
super().__init__(url, **options)
self.content_type, self.content_id = self._parse_url(url)
def _parse_url(self, url: str) -> tuple[str, str]:
if match := re.search(r"filebox\.com/(album|file)/([\w-]+)", url):
return match.group(1), match.group(2)
raise ValueError(f"Invalid FileBox URL: {url}")
def extract(self) -> Generator[DownloadItem, None, None]:
if self.content_type == "album":
yield from self._extract_album()
else:
yield from self._extract_file()Implement album extraction:
def _extract_album(self) -> Generator[DownloadItem, None, None]:
response = self.session.get(
f"{self.API_BASE}/albums/{self.content_id}",
timeout=30
)
response.raise_for_status()
data = response.json()
album_name = data.get("name", self.content_id)
for file_data in data.get("files", []):
yield DownloadItem(
download_url=file_data["download_url"],
filename=file_data["filename"],
collection_name=album_name,
source_id=file_data["id"],
size_bytes=file_data.get("size"),
)For single files:
def _extract_file(self) -> Generator[DownloadItem, None, None]:
response = self.session.get(
f"{self.API_BASE}/files/{self.content_id}",
timeout=30
)
response.raise_for_status()
data = response.json()
yield DownloadItem(
download_url=data["download_url"],
filename=data["filename"],
source_id=data["id"],
size_bytes=data.get("size"),
)If the platform requires specific headers, override _configure_session():
def _configure_session(self, session: requests.Session) -> None:
session.headers.update({
"Referer": "https://filebox.com/",
"Origin": "https://filebox.com",
})Finally, register your plugin in packages/core/megaloader/plugins/__init__.py:
from megaloader.plugins.filebox import FileBox
PLUGIN_REGISTRY: dict[str, type[BasePlugin]] = {
# ... existing plugins ...
"filebox.com": FileBox,
"filebox.cc": FileBox, # if it has other domains but same structure
}Now you can use it:
import megaloader as mgl
for item in mgl.extract("https://filebox.com/album/test123"):
print(item.filename)Adding authentication
For platforms requiring authentication, accept credentials through options or environment variables:
import os
class FileBox(BasePlugin):
def __init__(self, url: str, **options: Any) -> None:
super().__init__(url, **options)
# Prefer explicit option, fall back to environment variable
self.api_key = self.options.get("api_key") or os.getenv("FILEBOX_API_KEY")
self.content_type, self.content_id = self._parse_url(url)
def _configure_session(self, session: requests.Session) -> None:
session.headers["Referer"] = "https://filebox.com/"
if self.api_key:
session.headers["Authorization"] = f"Bearer {self.api_key}"Users can then pass credentials:
mgl.extract("https://filebox.com/album/test", api_key="your-key")Or set the environment variable:
export FILEBOX_API_KEY="your-key"DownloadItem fields
When creating items, you must provide:
download_url(str) - Direct download URLfilename(str) - Original filename
Optional fields:
collection_name(str | None) - Album/gallery/user groupingsource_id(str | None) - Platform-specific identifierheaders(dict[str, str]) - Additional HTTP headers needed for downloadsize_bytes(int | None) - File size in bytes
Example with all fields:
yield DownloadItem(
download_url="https://cdn.filebox.com/files/abc123.jpg",
filename="photo.jpg",
collection_name="vacation_2024",
source_id="abc123",
headers={"Referer": "https://filebox.com/"},
size_bytes=1024000,
)Common patterns
Handle pagination withing extract so the consumer sees a single continuos stream of items. This is useful when an API returns results in multiple pages, as is the case with the Rule34 plugin.
def extract(self) -> Generator[DownloadItem, None, None]:
page = 1
while True:
response = self.session.get(
f"{self.API_BASE}/albums/{self.content_id}",
params={"page": page},
timeout=30
)
response.raise_for_status()
files = response.json().get("files", [])
if not files:
break
for file_data in files:
yield self._create_item(file_data)
page += 1Nested collections (when albums contain sub-galleries):
def extract(self) -> Generator[DownloadItem, None, None]:
album_data = self._fetch_album(self.content_id)
for gallery in album_data.get("galleries", []):
for file_data in gallery["files"]:
yield DownloadItem(
download_url=file_data["url"],
filename=file_data["name"],
collection_name=f"{self.content_id}/{gallery['name']}",
)Deduplication (when a host supports uploading files with the same name):
def extract(self) -> Generator[DownloadItem, None, None]:
seen_urls: set[str] = set()
for file_data in self._fetch_files():
url = file_data["download_url"]
if url in seen_urls:
continue
seen_urls.add(url)
yield self._create_item(file_data)HTML parsing (when there's no API):
from bs4 import BeautifulSoup
def extract(self) -> Generator[DownloadItem, None, None]:
response = self.session.get(self.url, timeout=30)
response.raise_for_status()
soup = BeautifulSoup(response.text, "html.parser")
for img in soup.select("div.gallery img"):
if src := img.get("src"):
yield DownloadItem(
download_url=src,
filename=src.split("/")[-1],
)Error handling
Let errors propagate naturally. The extract() function wraps plugin errors in ExtractionError:
def extract(self) -> Generator[DownloadItem, None, None]:
response = self.session.get(self.url, timeout=30)
response.raise_for_status()
data = response.json()
for item in data["files"]:
yield self._create_item(item)Validate input early and raise ValueError for invalid URLs:
def __init__(self, url: str, **options: Any) -> None:
super().__init__(url, **options)
if not self._is_valid_url(url):
raise ValueError(f"Invalid FileBox URL: {url}")Use logging for debug information:
import logging
logger = logging.getLogger(__name__)
def extract(self) -> Generator[DownloadItem, None, None]:
logger.debug("Starting extraction for album: %s", self.content_id)
response = self.session.get(url, timeout=30)
logger.debug("Received %d files", len(response.json()["files"]))Testing your plugin
See Testing plugins for manual testing, writing unit tests, and adding live tests.
Best practices
Use type hints for better IDE support and type checking:
from collections.abc import Generator
from typing import Any
def extract(self) -> Generator[DownloadItem, None, None]:
...Add docstrings to document behaviour:
def extract(self) -> Generator[DownloadItem, None, None]:
"""
Extract files from FileBox albums and files.
Yields:
DownloadItem: Metadata for each file
Raises:
ValueError: Invalid URL format
requests.HTTPError: Network request failed
"""Use constants for magic values:
class FileBox(BasePlugin):
API_BASE = "https://api.filebox.com/v1"
TIMEOUT = 30Extract helper methods to keep code readable:
def extract(self) -> Generator[DownloadItem, None, None]:
data = self._fetch_album_data()
for file_data in data["files"]:
yield self._create_item(file_data)Handle missing data:
yield DownloadItem(
download_url=file_data["url"],
filename=file_data.get("name", "unknown"),
size_bytes=file_data.get("size"), # None is fine
)Contributing
Once your plugin works:
- Add tests (see Testing plugins)
- Update documentation (add to platforms.md)
- Run linting:
ruff formatandruff check --fix - Run type checking:
mypy packages/core/megaloader - Submit a pull request
See the Contributing Guide for details.