three blocks

Analysis

Deduplicating archived databases

posted on 14 July 2008 11:09


Structured database information deduped to 5% of its size.

US-owned but UK-located, Clearpace Software has, in its NParchive software, software compression and deduplication technology that can cut the size of a database down to 5 percent of its original size. This is deduplication for structured information.

It is targeted at the archiving of database information with the view that up to eighty percent or more of a database's contents can be infrequently, even very infrequently, accessed and can be moved to a separate archive database store freeing up fast online disk drive stores to work faster and make DBMS backups much faster.

A Clearpace release states 'Research by storage analysts ESG highlights the growing pressure on IT managers to provide long term access to structured data. Their recent report states that in 2005 none of their survey respondents had been required to provide structured data in response to discovery requests. By 2007 this figure shifted to 57 per cent. Typically, retrieving archived data necessitates restoring data from tape, sometimes the reconstruction of old hardware configurations and almost always the redeployment of developers who should be working on revenue generating initiatives.'

History

Clearpace was founded in the 2000/2001 timeframe by Tom Longshaw, now Chief Scientist, and Andy Ben-Dyke,now CTO. Both worked at the UK's DERA (Defense Evaluation and Research Agency) and developed software technology to remove duplicated data from a relational database's rows and columns of data. The algorithms include both compression and deduplication, with sophisticated pattern-matching techniques.

They left DERA and, with Gary Pratley, an ex-London Underground software man, founded Clearpace to productize the technology. By 2002 they had a headquarters in Milton Keynes, a development center in Gloucester, a small sales office in London, plus a 2-man office in Westchester, Illinois.

Funding from two venture capitalist firms, Doughty Hanson and Dow Investments, came in 2004. Three years later the business management of the firm was boosted with the arrival of John Bantleman as CEO. The Milton Keynes office closed and the HQ functions transferred to Gloucester. Both the London and Westchester, IL, offices closed. That year, 2007, the NParchive product was released.

Julian Cook, VP marketing, and Martin Blackmore, VP Sales, joined in 2008, as did a Chief Financial Officer, Jamie Andrews.

Bartleman said: "We made a very conscious decision to assemble a new management team of seasoned professionals who also have experience defining and growing new markets. I believe we've assembled a world-class management team that rivals any technology company in Europe and sends a clear statement of intent. The talent, ambition and drive of the individuals that we have attracted to Clearpace is a clear indicator of the sizeable market opportunity for the company."

Upon joining, Andrews said: "There is very little white space in the technology industry at present. Therefore, the opportunity to join a company that is defining a new category of software designed specifically to address the problem of retaining large and growing volumes of structured data is an exciting prospect. Given the caliber of the team I'll be joining and the strength of our product offering, Clearpace is well positioned to make a big impact in the data management and storage markets."

Blackmore said: "Clearpace invested heavily in researching market requirements and productizing their unique patent-protected technology during 2007 before initiating outbound sales and marketing activity. The result is that the outbound team we have
been able to hit the road running and we're already seeing significant interest from prospective customers, partners, analysts as we go-to-market with a fully functional product and compelling proposition."

An ISV partner program was announced in May this year. At that time Blackmore said: "Clearpace has been successful marketing and selling to customers directly, however, as a partner-centric organisation we recognise the value that partners add to Clearpace and the end customer. ... we're always keen to explore partnership opportunities that enable Clearpace and its partners to maximise revenue opportunities and better serve our joint customers."

A new version of NParchive with improved algorthms has just been announced.

What does NParchive do?

It provides an online archive store for inactive structured data that has been relocated from production databases, data warehouses or log files. It de-duplicates and compresses data down to 5 percent or less of its original size. NParchive stores each record, at ingest time, as a series of pointers to the location of a single instance of a data value, or pattern of data values. The NParchive data store comprises a tree-based structure that links the various instances of the patterns together to establish the data records.

The resulting archives are fully queryable, online, without needing to re-expand the data. No special tools are needed to query the data – NParchive supports ANSI standard SQL queries, as well as database-specific extensions. These queries can be submitted using existing Business Intelligence or reporting tools, including Business Objects, Crystal Reports, COGNOS and Microstrategy.

Clearpace believes this can halve the cost of regulatory records management for transactional data.

The product is database-agnostic. Its NPbase component takes data from any source and creates a single query space that can be interrogated from any perspective at, Clearpace believes, unparalleled speeds.

NParchive is the first deduplicating and compressing technology product for structured information that B&F has encountered. Diligent, DataDoman, NetApp's A-SIS, FalconStor, and others are all focussed on unstructured information and backups. Within databases the deduplication algorithms that work with unstructured information are not best suited to the work. The algorithms have to work with database structures.

This gives Clearpace a technology lead that could represent several years over any competing technology. We may well be hearing a lot more of Clearpace in the future.

[Chris Mellor.]




tags:  deduplication DBMS compression