The World Wide Web is a wonderfully huge, messy, continuously changing place. Can you remember (and sometimes, be able to prove) exactly what a Web page looked like when you found it? This week I will show you one of the simplest ways to solve this problem, and a little known use for it.
Remembering how Web pages looked like? What do you mean?
Online services like InstaPaper or the Evernote Web Clipper do save complete copies for later reading, either on the Web or in corporate intranets. Saving as PDF serves a similar purpose, in a completely different way. And, of course, if a page is 100% static and you bookmark it, you can always reload it whenever you are online. “Remembering exactly how a Web page looked”, however, is something different from all these services and actions. It’s about preserving something exactly as it was in the moment when you saw it, even if it includes dynamic parts that a PDF can’t reproduce, or the original page disappears from the Web — or you have no Internet access.
The ScrapBook solution
The Scrapbook extension for Firefox is a really easy way to make perfect snapshots of Web pages. Its online documentation is very clear, so I will only explain what are, in my opinions, the main pros, cons and useful ways to customize it.
The ScrapBook user interface (Figure A) resembles the one for Firefox bookmarks. A dedicated menu in the top bar lets you save the current page, or open the ones you already saved. Saving the content of all your open tabs takes just one click. It is possible to copy just parts of a page, or different versions of the same whole page in different moments. You can organize the copies in as many levels of folders as you wish and then browse them in a Sidebar, which also includes a text search function. Figure B shows how flexible the actual Save operation is. You can add comments, load only the text, exclude certain attachments, and define the levels of link to copy.
ScrapBook isn’t just an archiver. When you select a folder in the Sidebar and click on Combined View, you get all the copies it contains, one after another, in the current Firefox tab. It also has the capability to edit and/or annotate the local copies: you can highlight text, remove unwanted parts of a page, and add Sticky or Inline annotations. The first ones look like a Post-It note, the others appear when you hover with the mouse over the corresponding text.
The limits of ScrapBook…
As cool as it is, ScrapBook has some limits that, for me at least, are hard to ignore. To begin with, it is browser-specific. Besides, by default your local copies are, well, just local: they won’t be accessible from your smartphone, or from a borrowed laptop. A partial solution is to select Tools | Options | Organize in the Sidebar, immediately after installation, and tell ScrapBook to use the right folder: one that is automatically synchronized, via services like UbuntuOne or plain old rsync, to a remote location. This will give you always the same, personal Web archive from any computer that can run Firefox and ScrapBook.
As far as I am concerned, however, ScrapBook’s biggest issues are lack of integration with one’s bookmarks and the fact that you can’t add multiple tags to each page. My dream personal Web-archival solution would be one completely integrated with personal, online bookmarks systems like SemanticScuttle: I would like to both bookmark a Web page and make a complete copy of it in my own Web server with one click, no matter what browser or computer I’m using. If you already know how to do this, please tell us in the comments!
…and one great way to use it
In spite of these limits, ScrapBook has two other features that are very useful to me. The second thing to do right after installation, in the same panel where you set the archive location, is to enable the Multi-ScrapBook mode. When coupled with”Export as HTML”, this function makes of ScrapBook a great assistant for courseware preparation, which is one of my activities. I create separate ScrapBook folders for each of my online courses, then annotate, highlight or comment each copied page as needed. When everything is ready, I click on Tools | Export as HTML, select the right folder(s) and save. Next, I run this script, with the name of a new, non-existing folder as first argument:
1 #! /bin/bash
4 mkdir $COURSE_FOLDER
5 cd $SCRAPBOOK_HOME
6 tar cf scrap.tar tree `grep ../data/ tree/index.html | cut -d/ -f2-3`
7 mv scrap.tar $COURSE_FOLDER
7 cd $COURSE_FOLDER
8 tar xf scrap.tar
9 rm scrap.tar
Line 6 is the only important one. ScrapBook keeps all the copies it makes inside one “data” folder, one subfolder per page. The grep command in line 6 finds exactly which of those subfolders are linked from the HTML index, and passes only those names to tar.
The Konqueror window in Figure C shows the result (note the content of the location bar): a complete copy of all and only the annotated pages I wanted, with a dynamic menu. It’s all in one folder that my students can download, get by email or on a CD, to study and do homework even when they are offline!