ArchiveBox / ArchiveBox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

self-hosted archivebox backups bookmark-archiver browser-bookmarks chromium digipres firefox headless-browser internet-archiving pinboard pocket python rss singlefile warc wayback-machine web-archiving wget youtube-dl

ArchiveBox

Open source self-hosted web archiving.

ArchiveBox takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Features

Saves the entire page, including HTML, JS, CSS, media, and more
Supports a variety of input sources, including URLs, browser history, bookmarks, Pocket, and Pinboard
Can be self-hosted on your own server or in the cloud
Is open source and free to use
Is actively developed and maintained

Benefits

Preserve your online history. ArchiveBox can help you save your important web pages before they disappear.
Protect your privacy. ArchiveBox does not track or store your personal information.
Control your own data. Your archives are stored on your own server, so you have complete control over them.

How it works

ArchiveBox uses a headless browser to visit the web pages you want to archive. It then saves the HTML, JS, CSS, media, and other resources from the page to your server.

You can access your archives via a web interface or through the API.

Getting started

To get started with ArchiveBox, you will need to install it on your server. You can find detailed installation instructions on the ArchiveBox website.

Once you have installed ArchiveBox, you can start archiving web pages by adding them to your queue. You can do this via the web interface, the command line, or the API.

Resources

Contributing

ArchiveBox is an open source project and we welcome contributions from the community. You can find more information about contributing on the ArchiveBox website.