If you need to download a website from the Wayback Machine but don’t know how, you’re in the right place! After my friend asked me to download her old WordPress blog, I knew wget was probably the best way to do this. (Turns out it is)
Below, find a complete tutorial on how to download an entire website from the Internet Archive’s Wayback Machine.
What Is wget?
Real quick, wget is a small computer program that allows you to download content from web servers. This program runs as a command line application meaning it doesn’t have buttons. Although there are no buttons, wget is simple, lightweight, and safe to install.
Use wget to Download Wayback Machine Website
When you have wget installed on your computer, launch Terminal if you’re using Mac or Command Prompt on Windows.
Type in the following command. At this point, I’m assuming that you already have your own Wayback Machine URL snapshot. Most likely, the URL will be in the format of:
Be sure to replace the URL in the wget command below with your own.
wget --recursive --no-clobber --page-requisites --convert-links --domains web.archive.org --no-parent https://web.archive.org/web/20180314211747/https://tonyflorida.com
When everything looks good, hit enter. The wget program will begin to recursively retrieve the contents of your website from the Wayback Machine from that point in time.
The options that we passed to the wget program do the following:
- –recursive: follow download HTML links from one page to the next
- –no-clobber: don’t download the same file more than once
- –page-requisites: download all the files that are necessary to properly display a page
- –convert-links: convert the links to make them suitable for offline viewing
- –domains: only download from these domains
- –no-parent: limit download to files and directories below specified
Examine Your Website Download
After wget finishes downloading your website archive to your computer, you can check it out. Open the folder containing your downloaded website. In most cases, it defaults to a folder called web.archive.org.
You will have to navigate into a few folders until you open the folder with your domain name. In here you’ll find a file called index.html. Double click this file to open the home page of your website in your web browser.
Because we used the “convert links” wget option, you can navigate around your website as if it was being hosted on a server. The only difference here is that it’s being hosted right from your local computer.
Potential Drawbacks to Wayback Machine
The Wayback Machine isn’t perfect. Far from it actually. The internet is huge, and the Wayback Machine is only so smart.
In it’s simplest form, the Wayback Machine automatically follows links from one website to the next while saving copies of web pages that it visits. Some websites or web pages may not exist in the Wayback Machine because it isn’t aware of their existence. In other words, you may only be able to download a portion of a website, if at all.
Consider yourself lucky if you find a copy of an old website on the Wayback Machine. My hope is that this blog post has taught you how to download sites from the Wayback Machine.
If you have any questions about Wayback Machine downloads, let me know in the comments below.