jbikker

Apparently, there exists some software that is able to recursively fetch webpages from Google cache, as well as other sources, and combinations of sources if copies are partial. The software is called Warrick:
Unfortunately, Warrick is currently undergoing a drastic update which was required because of changes in Google APIs and Archive.org. An updated version is expected 'in a couple of weeks' (see above url). Apparently, Google may delete a site from the cache when it fails to crawl it, so let's hope a couple of weeks is fast enough. It may provide us with a full copy of ompf.

nhm

I tried downloading and running warrick but as stated it doesn't work with the Internet Archive any longer. I stopped playing with it pretty quickly as I didn't want google to ban my server's IP if I happened to get it working without IA. Might be worth looking into the technique they use here (ignore the spammy sounding URL):


Not sure I'll have time to look into this before the next version of Warrick is released, but I thought I'd mention it if anyone else has free time.


davepermen

just in case. does bing have a web archive? if we mess up the google one..

