Ever wonder if there was a way to download a website without a web browser? You’re in luck. With the power of Linux command line, anything is possible. There are multiple methods to complete this task, but we’re focusing on wget in this article.
What Is wget?
wget is a GNU command-line utility for retrieving content from web servers. As a downloader, wget is very powerful in its own right. wget is capable of working with multiple protocols, such as HTTP, HTTPS and FTP. Other capabilities of the wget utility include:
- ability to run silently or in the background
- integrated with Linux scripts or CRON jobs
- can run multiple downloads at one time
- downloads files that require a password
While there are a multitude of tools that can perform website activities, wget allows for a broad scope. It gives the user the ability to function without a web browser by:
- downloading a full copy of a website
- downloading a specific file from a website
- automating the retrieval of a file on demand
- obtaining a document from an authentication portal
wget is also built into most Linux distros, so it is available right from the start, and no further installation is required.
Getting started with wget is fairly simple. First, open a Linux Terminal.
Once a terminal window is open, you can run wget as shown below:
Replace “URL” with the exact URL of the website.
To resume a partially downloaded file, use a
-c switch in your command as follows:
To make your wget download silent, add the
-q switch to your initial wget command:
If you are not sure of proper use of options within wget, use the following:
Other than websites, you can also download a file using wget. For example:
It would simply grab the file and save it to the current directory.
If you want to save to a different filename or different location, use the
As noted earlier, wget supports FTP as well. If you just specify a FTP site:
wget will assume you want an anonymous login. Alternatively, you can manually specify things like username and password with the following flags:
--ftp-user=USER: specifies the username for login
--ftp-password=PASS: specifies password
--no-passive-ftp: disables passive transfer mode
Timeouts, Retries, and Failed Downloads
Finally, wget comes with several options relating to server connection problems and timeouts. Not all failures can be dealt, with of course, but the following flags are all intended to help deal with server issues:
--tries=NUMBER: specifies number of times to retry download
--retry-connrefused: Retries download even if connection is refused by server
--timeout=SECONDS: global setting – how long to wait before timeouts
--wait=SECONDS: how long to wait between successful downloads (if repeating)
Who Would Use wget?
In reading this post, you may be thinking, “This sounds complicated and far more difficult than using a web browser,” but anyone can find a use for this utility, whether as a systems admin or a programmer. Below are two examples of how I use this command throughout my day, with my role sometimes changing.
It makes my works as a security researcher easier because I can schedule this command to download multiple websites at once. I can do this by creating a text file (using any text editor) that contains a number of URLs in a list (one URL per line). By executing the command below with the
-i switch, wget will download each website in the list.
As a systems administrator, I can obtain documents from password-protected locations with ease. This may not assist you as well offline, but by running wget allows, it allows you to add credentials to a site.
There you have it! Was it as difficult as you thought? Being able to automate your actions with wget will save you time and give you the ability to also work offline. What do you have to lose?
Leave a comment below and let us know whether you found this useful.