Website Migration Using Wget

Website Migration Using Wget

There are occasions when you need to move a website from one hosting provider to another and the more standard approach of using FTP to collect all of the files isn’t obtainable.

This can sometimes occur because there has been a falling out between the owner of the website and the existing web great number, the access details have been lost, the web great number can’t be contacted, the migration is urgent etc.

Wget is a shared unix tool, that is also obtainable on windows. Wget works from the command line, and has many different configuration options obtainable to control exactly how much it will download from the starting point it is given and afterward what it does with what it finds.

Wget works by starting at the homepage and trawling by the site getting a copy of every html or image file that it can find a link to, that is part of the website it started at.

We often use wget to completely mirror far away sites, when a new customer comes over to us from another web hosting provider, we often copy the site for them using wget. To use it on our server, log in using ssh. From the command prompt, run wget with the url of the file you want to download. This will download the file directly to our server. As as hosting provider we have to function very fast internet connections, and so using wget directly from our servers is much faster than downloading it to your local machine and then re-uploading the files to our servers.

Another shared use is, as I said, to mirror an complete site. Let’s assume you are moving the keep up in a place website from website hosting company A to hosting company B. You have your new account setup, and you have logged in via ssh to B’s server. Now to mirror your site, run

wget -r http://www.keep up in a and wget will recursively download your website to the new account.

Now you should have a complete copy of your website, but be warned, wget does not read javascript, so all those fancy rollover effects will not work unless you copy the correct files manually.

By default wget will create a directory named after the site it is downloading, you probably want to put the files in the directory you are in at the moment, so just add -nd to the command. This tells wget not to create directories except when needed for your website.

The final command should look something like this

wget -rnp -nd http://www.keep up in a

Another information of warning is in relation to websites which are produced by programming languages. Wget is really only useful for mirroring sites in a specific set of circumstances. If the website has been constructed using asp, php, perl, java etc, wget will only download the html files that these programs render instead of the original source files. This is important to take observe of since these programming languages may be performing taskssuch as changing the content of the page based on the user, interacting with a database to collect statistics, or accept orders.

Once you’ve used wget to make a copy of your website, it’s important that you test the files in the new location to ensure it is behaving in the same way that the original site did.

leave your comment