Friday, February 16, 2007

E-Discovery Meets Web 2.0

The more I think about the electronic discovery process, the more it seems that we're operating under an outmoded construct, particularly in how we harness information technology.  After all, is not the evidence that ultimately drives the story.  While litigators are certainly wont to frame it to a particular result, the route one takes is ultimately constrained by what is discovered, whether that be documents, photographs, video, lab results, or eyewitness testimony.  The entire litigation, including expert testimony, demonstrative evidence, and closing statements must be based on that discovered rather than manufactured evidence.  Nonetheless, the IT infrastructure we set up assumes what we will find.  Our data calls typically survey the usual suspects (i.e., management types) that supposedly know where everything is and who supposedly knows what. 

Unfortunately data is not usually created from edicts from the top.  Instead, mountains of information are generated the masses, information that may never flow up the chain but is instead consumed horizontally across the organization.  And this data is not just trivial gossip or discussions about minor matters.  Instead, the information may reflect the basis for critical contract decisions, the generation of a key piece of intellectual property, or evidence of harrassment.  Employees are often empowered to make important decisions that affect the life of a company.  While most organizations relegate the dispositive actions of approving contracts, hiring and firing, compensation decisions, patent applications, and like to certain individuals, the basis for those approvals, which may be little more than "I concur", are found far down the pecking order.

In typical litigation, the basis for a decision and those who contributed to it may be more important than the actual decision.  The basis for a wrongful termination complaint, for example, make sense until light is shown on all the e-mail exchanges that led up to a person's termination.  In this world, one needs to gather up all the evidence.  However, typical litigation support technology assumes a significant amount of structure.  Typically, electronic evidence was collected from a variety of sources, printed out or converted to a TIFF image that was then Bates numbered.  No attempt was made to actually capture the structure in which that electronic document actually existed.  Instead, that web of relationships in space and time must be laboriously pieced together by attorneys at a later date.

In many ways, this is how data in an organization had been typically arranged.  Structure was imposed, and it was assumed that any data that didn't fit that structure wasn't important and was either discarded or relegated to those vast realms of file shares that no one ever dares to look at again.  We dump data there, but once it's used for that initial purpose, it might as well exist in never never land.

That's what the whole Web 2.0 concept was meant to address for the Internet, to link together content, particularly user created content, using standard tools but with a loose structure to allow the content to define structure.  We see this with sites like MySpace, SlashDot, and del.icio.us.  Within the corporation, this growth of content relationships has grown slower but is being urged on by indidivuals such as and through what they herald as Enterprise 2.0.  For the enterprise, this means a recognition and content cannot always be structured from the top.  That useful data is created at all levels and often needs to be shared with others.

This same challenge is presented with electronic evidence.  Litigation technology must be standardized but also be able to form around the structure of the evidence in its natural environment.  While preservation concerns may dictate that the data be moved or copied to a litigation archive, the technology's ability to first discover that data while in its natural state and then replicate that structure in the litigation repository is critical to effectively harnessing the technology to actually help drive the case rather than just a dumping ground for evidence to be sorted out those.  Of course, effective content analytics to help expose relationships and identify relevant and privileged information would also help, but it is in that initial collection where organizations have the best opportunity to really exploit the benefits of technology by letting it define the structure based on the evidence rather by a presupposed structure envisioned by a technology vendor.

1 comment:

Paul Harris said...

I have been coming across a variety of new technologies that are target the lengthy pre-discovery process of collecting data from both online and offline sources. The retrieval of data from backup or archive tape media presents a significant challenge to time constrained litigation demands because of the process involved in extracting actionable information from tapes. Retrieving data from tapes is a multi-step process where the data must first be restored to disk in order to search the contents. However, certain technologies exist that compress the eDisovery process when it comes to tape. Index Engines, for example allows counsel to perform highly granular searches directly from tape without having to restore to disk - completely eliminating the need to restore disk content to tape in order to search. This can reduce the search time from weeks to days according to companies like National Data Conversion that use their tape indexing. Driving this are the incredible costs involved in the search and recovery of suspect content which I suspect will drive further innovation.