I Know I Read It Somewhere
Posted by indroneel on March 5, 2007
An Internet clipbook is a collection of Web page snippets, documents and links organized for easy browsing and quick accessibility. The imagery is drawn from a “real” paper scrapbook that contains among other things, photographs, handwritten memos, newspaper and magazine cuttings.
To avoid ambiguities, any mention of an Internet clipbook within this article can interchangeably mean the clipbook data and the application(s) used to maintain the data.
The Internet clipbook is an essential tool for any serious net user. Most users spend an average of 45 seconds per Web page, browse through 45-50 pages and change context at least 2-3 times per session. With such aggressive browsing profile the user has just enough time to comprehend and appreciate the contents of a page, but not enough retention to effectively apply the same at a later point in time.
So you stumble upon this bit of interesting errata on the Web. You read it … and then you move on.
Later, you are in a situation where the above mentioned errata is a useful reference in solving a problem or achieving a goal. Sure, you do remember what you had read earlier in “broad outlines”, but a verbatim copy would have been handy.
And you think: “I should have saved it somewhere.”
An Internet clipbook can maximize the output of general purpose browsing (consulting sources that have a high likelihood of items of interest) through continuous collection of information.
The Vanilla Guide to Clipbook Management
This is the simplest of all possible techniques and consequently used by a wide section of users to maintain a repository of online information. The process can summed up as follows:
- Save each web page as individual files. The output format can be one of HTML, RTF or DOC if the emphasis is on textual content. For accurate page duplication use one of MHT or PDF document formats.
- Organize saved pages into a folder hierarchy on the local file system. Each folder represents a common topic that is shared across all documents within the folder.
The next level of sophistication and automation involves a variety of information management tools for storage and classification of web content. Popular options include:
While the vanilla approach serves its purpose for simple scenarios with newbie users, it lacks the necessary degree of integration and automation required for serious and result-oriented browsing. In such cases specialized applications need to be considered for clipbook maintenance.
Clipbook Application Features
- Browser integration using right-click context menu, toolbars and sidebars. During capture, an entire page, a portion of the page or a link to the page can go in as clipbook content. Multiple sections from the same or different page(s) can be combined into a single clipbook item.
- Clipboard integration allows content capture from any source that allows text to be selected (e.g. viewers and editors).
- Drag and drop capture is an extension of clipboard integration. Documents and images dropped from file system are captured as attachments.
- Preserve original formatting and styles to the maximum extent possible. For example, while capturing an entire web page, associated images and stylesheets should also be downloaded and stored as part of the clip.
- Visual editing of web clips. The editing can occur both during a capture or on a previously captured item.
- Associate metadata with clipbook items. Some metadata information are automatically generated (e.g. the clip title, source URL and timestamp) — others need to be entered manually (e.g. comments and tags).
- Common clipbook for multiple item types. Supported item types include web clips, links, documents and images.
- Non-proprietary storage formats to reduce application dependency and avoid issues like (application) non-availability, licensing issues and version mismatch. Preferred formats include HTML, RTF and XML optionally compressed using a ZIP compatible format.
- Outline-based organization using a familiar tree-like structure to classify clips into categories and sub-categories. The folder structure can be defined manually, or automatically mapped onto tags associated with clipped items.
- Data synchronization and migration across multiple clipbooks. This is useful in case you maintain two clipbooks on different machines (e.g. one at home and one at office) and need to move data between the two.
- Automated backups capture snapshots of the clipbook in a removable media (tape or CDROM) or to a file system on another machine using FTP. Automation involves execution of the backup task as a scheduled operation executed at defined intervals.
- Export and share in a format (PDF, Zipped HTML sets and CHM to name a few) that does note require the clipbook application for viewing the collection.
- Fast search and lookup: the contents of a clipbook must be indexed so that it is possible to access relevant content using a full-text search.
The following applications support most of the features of clipbook maintenance as described above.
Net Snippets Free Edition
Net Snippets Free Edition is a totally free research tool designed for anyone looking for a quick, easy way to conduct web research and collect, organize and share online information. The application integrates seamlessly with Internet Explorer as a sidebar item. For other browsers like Mozilla Firefox, Net Snippets supports drag and drop capture of web pages and a desktop bar (external application) for outline-based organization.
Note: Marketing and further development on Net Snippets have been discontinued as of March 2007. You may still access the download page from Google cache. Grab your copy of this great application before it is lost for ever.
Keepoint Web Reasearch Engine
Keepoint provides you with many of the features that can be expected of a good web information management tool. It serves as a viable alternative to the now defunct Net Snippets application. The free edition of Keepoint however limits you to only 99 pages and on the whole, provides lesser features when compared with Net Snippets Free Edition.
Mozilla Firefox with Scrapbook Extension
If you are looking for a completely free (as in freedom) clipbook application you should seriously consider Mozilla Firefox 2 with installed Scrapbook extension. It gives you the same browser-level integration (Firefox only), organization and automation as Net Snippets. Support for annotations, DOM level clip editing and open storage format (files are stored in their native format) are some of the features to look out for.
Limitations of this application include the inability to save anything other than a web page opened within Firefox. Firefox needs to be running for the clipbook functionalities to be available. Since the browser and its extensions are undergoing rapid development, there is also a high probability of issues related to version incompatibility and data mismatch.
 The Web archive for email (MHT) format was popularized by Internet Explorer and now supported by all popular browsers. MHT internally uses MIME to combine HTML, textual and binary data into a single human readable file. Binary data are converted to an ASCII form usually through Base64 encoding.
 The easiest way to convert from HTML to PDF is to install and use a PDF printer (e.g. FreePDF XP). The conversion process is as simple as invoking the print command from the Web browser.