Wysmedia.com

Icon

~ We make IT easy for you ~

Mirroring a Website with Httrack

I have been use this for quite long long times. Before I was using GNU Wget (let’s call it wget) to download a website (mirroring) because in the past I have limited internet connection (very expensive, because it use time based and slow). Wget is good for those who loves using linux and console (it available in windows as well).

image of httrack window

Httrack is windows based free software that using capabilities like those in wget. It can mirror a website, download all related pages. I used this software especially for downloading documentation sites.

Here is my Rules to download a codex.wordpress.com (just for Developer Documentation and its related pages) :

+*.png +*.gif +*.jpg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar
-http://codex.wordpress.org/Mailing_Lists
-http://codex.wordpress.org/IRC
-http://wordpress.org/support
-http://codex.wordpress.org/Contributing_to_WordPress
-http://codex.wordpress.org/Automated_Testing
-*action=*
-http://codex.wordpress.org/Codex:Community_Portal
-*Help:*
-http://codex.wordpress.org/Current_events
-http://codex.wordpress.org/Special:Recentchanges
-http://codex.wordpress.org/Special:Randompage
-http://codex.wordpress.org/Development_Team

In there I just need the Function References but for some reason I don’t want to download “unused” pages ..which I won’t read like (http://codex.wordpress.org/Current_events,http://codex.wordpress.org/Special:Recentchanges, etc). You can see there that I can put regular expression wildcard into the rules. I also put the limit to 3 depths parsing only so that I dont need to download large amount of pages.

Rules for Downloading codex.wordpress.com

The nice part, is you can update the download without downloading all one by one in the future. Just don’t delete the files generated by Httrack.

Here is my complete commands that generate by Httrack.

winhttrack -qwr3%e0C2%Ps0u1%s%uN0%I0p3DaK0H0%kf2A25000%f#f -F “Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)” -%F “” -%l “en, en, *” http://codex.wordpress.org/Developer_Documentation -O1 “C:\Downloads\codex-wordpress\codex wordpress” +*.png +*.gif +*.jpg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar -http://codex.wordpress.org/Mailing_Lists -http://codex.wordpress.org/IRC -http://wordpress.org/support -http://codex.wordpress.org/Contributing_to_WordPress -http://codex.wordpress.org/Automated_Testing -*action=* -http://codex.wordpress.org/Codex:Community_Portal -*Help:* -http://codex.wordpress.org/Current_events -http://codex.wordpress.org/Special:Recentchanges -http://codex.wordpress.org/Special:Randompage -http://codex.wordpress.org/Development_Team

ps: you can use httrack in console so you can create an scheduler to download / update / mirroring websites frequently.

Category: Fun, Others, software

Tagged: , ,

3 Responses

  1. adwin says:

    This will produce around 20MB files. You can use this as offline references.

  2. ya_wes says:

    mantab….

  3. HUGH says:


    PillSpot.org. Canadian Health&Care.Best quality drugs.Special Internet Prices.No prescription online pharmacy. Low price drugs. Buy pills online

    Buy:Amoxicillin.Seroquel.Prozac.Female Pink Viagra.Acomplia.Wellbutrin SR.Female Cialis.Benicar.Aricept.Buspar.Cozaar.Lasix.Lipitor.SleepWell.Zocor.Advair.Ventolin.Lipothin.Nymphomax.Zetia….

Leave a Reply