Web Development

General discussion


Saving compliance officers, helping analysts!

By pkefi ·

Web archiving is fairly new compared to other data retention practices that have been existing for a long period of time.

That makes it even harder for a consumer to gather the necessary data and make a smart purchasing decision.

This guide contains basic information about web archiving, lists some of the most popular providers on the market and sets up some criteria and choices
Different web archiving approaches

Web archiving has made significant progress during the last five to seven years.

It now offers a choice of approach to both policy and supporting technology.

These choices should be considered carefully prior to make a purchase.
Client-side Archiving

The market

Although most, if not all of the solutions available in the market today use a client- side archiving approach we can split these in two categories.

First category we???re going to call it website copiers due to certain similarities with HTTrack, this technology consists mainly in taking snapshots of websites and archiving them.


-low cost solution
-small disk usage


-no dynamic content played
-does not replay the archives
-low level of depth

Examples of companies on the market: Iterasi, Next point these solutions are suitable for litigation support and compliance (not dynamic media).

The second category is content archiving. This web archiving method allows the capture of rich and highly dynamic content.

It uses web bots (i.e crawlers) that capture all web pages (including social media). The web pages are stored exactly as they are captured (including links, rich media, video, and Flash)).


-a technology that capture multiple web formats in dynamic websites,
-high-quality archive accessibility and rendering,
-fulltext search for large web archive collections
-deduplicated full-text search results in real-time,
-daily archiving capabilities,
-support of WARC ISO file format,
-In-House solutions.


-these solutions costs more than the ones from the first category,
consume ressources (disk space, cpu etc.)

Some companies are pushing web archiving further than a compliance solution, such as Aleph Archives which implemented business intelligence tools to take advantage of the tremendous amount of data gathered on the internet.

Some companies claim to have an In-house solution but most of them only store the data In-House.

Examples of companies offering this solution: Aleph Archives, Hanzo Archives, Pagefreezer.

Cloud based:

Most of the solutions mentioned above are cloud based; the pricing differs from a company to another since there is only a few competitors offering complete solutions.

Usually and logically solutions that take only snapshots are more affordable due to a less complicated technology and a small disk space usage.

Solutions offering a full capture of the websites in depth are more expensive, and usually charge per URL, and base their price on the archiving frequency, the scope (list of URLs), and the operation fees (maintenance, data security, retention etc.).

Some companies base their prices accordingly to the data storage.

In-House solutions: there is a very few companies to provide a fully automated In-House solution. The In-House solution???s price is hardly determined, and it???s usually more expensive than the cloud based however it can be considered as a one-time fee (the customer purchase the licence) plus maintenance and support if the customer chooses a support plan.
Recommendations prior to buy a web archiving solution

specify your needs: if you are under a regulation, you can be compliant using any of the solutions mentioned above, however in making your decision you will need to consider the fact that a web archiving solution can go beyond ???the compliance and litigation support need???, such as providing relevant data to departments, and preserve your corporate heritage. Numerous corporations and enterprises which are not under any regulation choose to acquire a web archiving solution for business intelligence, and social media monitoring content in order to enhance their customer service and avoid false disclaimers.
acquiring a web archiving solution, can be an investment or an annual expense. The In-House solution can be a long term investment, and allows you to have more freedom, security, and no latency. Some companies which offer technologically competitive solutions recommend the In-House deployment.
ask about the archiving process and judge if it is suitable for your company.
Compare the capabilities of all solutions.

This conversation is currently closed to new comments.

Thread display: Collapse - | Expand +

All Comments

Related Discussions

Related Forums