Friday, October 31, 2008

Unstructured Information Management What You Don’t Know Can Hurt You

Companies large and small create an impressive amount of data, including email messages, documents and presentations. Most of that data is unstructured, existing primarily on corporate file servers, employee desktop and notebook computers. Industry analysts estimate that this unstructured data accounts for 80% of all corporate information, and expect it to grow 50% or more each year.

Unstructured Information is Unmanaged Information Unstructured data is typically unmanaged. The file system on which this information resides typically is not monitored and the content is practically invisible to employees, auditors or corporate compliance officers. In an effort to provide a greater degree of visibility, control and management of this information to meet compliance reporting requirements, companies have implemented one or more technologies, each of which has advantages and disadvantages:

Enterprise search ? An enterprise search engine is an effective way to index and find documents that contain certain terms. Most are easy to implement and require only a modicum of regular maintenance. Unfortunately, most enterprise search engines are tuned to find all the documents that may contain a particular term, rather than a specific document that may be required by an auditor. It is left to the user to winnow through all the returned documents to find what they need, which can be a time-consuming and costly exercise. Additionally, search engines are mostly lacking in providing the ability to manage the documents it indexes.

Enterprise Content Management ? ECM systems can effectively manage many types of content and can provide access and version control, both of which are effective aspects of information management. ECM systems also tend to be very expensive to setup and maintain. These systems typically require an organization to purchase server and user licenses, implement policies and processes for using the system, and train its users. Because of these costs, companies often limit their ECM implementations to specific areas of their business or types of data, such as documents that pertain to finance. According to many analyst organizations, ECM systems are being used to manage approximately five percent of today?s corporate information.

File Backup ? Many companies attempt to solve the problem of document retention by creating regular backups of all the data on the network. These backups are saved to tapes, which are then stored offsite for disaster recovery purposes. Backing up all data regardless of its business value is an inefficient use of time and resources, increases the cost of tape storage and decreases the likelihood of rapid single file recovery, which is the most-used aspect of file backup.

Doing nothing ? This is the ?solution? that many companies choose for handling unstructured information. Unfortunately, the prevailing thought among many has been that unstructured information is insignificant and therefore does not require management. After all, most of this information ranges from personal files to draft documents or one of dozens of copies of sales presentations, the majority of which aren?t worth the cost required to manage them.

While most files aren?t worth managing, the risk comes from the small number of files that do matter. For instance, your Sarbanes-Oxley policy and procedure manual, which took valuable internal resources, a consulting firm, and many months to create, has likely been copied from the content management system specially created for finance-related documents. The next time you update that manual with critical information, you have fulfilled one aspect of the act by tracking and recording those changes in your records management system. However, what about the dozens of copies that may have spread across the network on shared file servers? How can you be certain those copies are deleted or updated to keep people from following old procedures or controls? If you aren?t doing anything to manage that data, you are leaving your company exposed and vulnerable.

Recognizing Valuable Information
Addressing these issues is key to an effective solution for Sarbanes-Oxley or any information governance initiative. Obviously doing nothing is not the answer. At the same time, it would be cost-prohibitive to manage all files as though they were critical business records. Therefore, the ability to specify which data is critical and worthy of this level of management is a crucial first step. If you are aware of the data?s value, you can make educated decisions as to the disposition of important data and create an appropriate retention policy.

Determining a data?s value is a result of effective information visibility and control.

Information Visibility ? The first aspect of recognizing valuable data requires that it be visible. While your compliance office may have access to all corporate information across the network, the sheer amount of data necessitates the use of technology to find and manage the appropriate documents.

Information Control ? To effectively manage and control unstructured information, you need a solution that allows you to copy, move, delete or tag documents with custom metadata; i.e., information about the document. Even better, the solution should provide an integrated policy engine that can be customized with your company?s information governance regulations. For instance, creating a policy mandates that any document on the employee network that contains a customer account number must be 1) tagged with custom metadata of ?Customer,? and 2) moved to a secured server or file archive system.

Data classification is an important aspect of information visibility and control. Several products have emerged or expanded into this space, to offer an all-embracing solution for complying with Sarbanes-Oxley and other regulations. By implementing one of these data classification systems, documents on your network can be located, opened and tagged according to the content found within each document. A typical classification workflow might look something like this:

1. Catalog ? The system scans the file systems, finding and collecting file metadata from hundreds of file types.

2. Classify ? Opening each document, the system classifies data according to file attributes and keywords or word patterns, and tags with custom metadata according to pre-set policies.

3. Search ? The system allows users to find desirable information based on a combination of metadata and full document text, utilizing standard Windows and UNIX access control lists.

4. Report ? The system should allow appropriate users to create and access summary or detailed reporting functionality.

5. Act ? Finally, the system should integrate actions, such as tagging files with custom metadata, setting retention and monitoring policies, and offering move, copy and delete functionality, again based upon an access control list.

To contrast, an enterprise search engine provides an efficient method to find content that contains the search term you need. But then what? If you wanted to copy, move, delete or perhaps tag the document with customized metadata, you would have to manually do so.

Data Retention, Availability and Recovery Retention is another aspect of corporate information that cannot be overlooked. While many companies elect to back up all data on a weekly or monthly basis, the cost of time and resources increases as the amount of data grows. Knowing what is in your data ? by making information visible, by tagging with metadata and by controlling access ? allows you to intelligently create a retention policy that moves or backs up only the data needed to comply with your corporate information governance policy or government regulation.

Most organizations use a backup solution that periodically copies data to tape or disk drives. An organization may back up its mission-critical data every night and all of its data every week. It may store the backup tapes for up to six months to guard against accidental deletions, send tape copies offsite as a safeguard against disaster and retain backup tapes long-term to meet regulatory requirements.

Lacking the means to gauge the value of the data, companies often take the safe route and back up all of it. Not only is the approach ineffective, it indicates inefficient data management and creates a potential risk. Storing data that is not required to be kept can be used against a company in the event of a lawsuit or regulatory compliance issue. In this respect, backing up data in its entirety creates a liability.

Corporations can meet regulatory data retention requirements, cut backup and recovery costs and manage risk by introducing file archiving into the mix.

A file archiving system uses data classification to determine the content?s value, then moves or copies files according to that value. File archiving systems can find and retrieve files based on their content. Any number of parameters can be used, including author, date, and customized tags such as ?SEC 17a-4? or ?Sarbanes-Oxley.?

This naturally leads us to the tiering of storage services. Backup and file archiving are natural places to start for providing tiered storage services, based upon the value of the data in your network.

As an example, consider a company that has 10 terabytes (TB) of data on production file servers. In the past, the company may have backed up critical files onto disk storage and then backed up all files onto tape once a week. The company catalogued the tapes, kept them for three months and then cycled them back through the process. New government regulations mandate that all data related to quarterly financial results must be kept for five years. Unfortunately, the company has no way to differentiate among the disparate types of data on its network. The company is forced to retain all of the data for five years, expanding the amount retained from 10 TB to 2.5 petabytes (PB). As data amounts double annually, so will the amount that must be stored. The company will find itself devoting more and more time and resources to data backup.

To solve this problem, let us assume that the company implemented a data classification system. By discovering the value of its unstructured information and tagging according to the value, the company copied 500 GB of financial reporting data to WORM storage for long-term retention and moved seven TB to tiered storage, which is backed up to tape every three months. The data in three-month storage would total 42 TB, compared with the 2.5 PB that would have been required if the data had not been archived. With tiered storage, the company significantly reduced backup time and resources, shrank the cost of production file storage and increased its IT service levels by freeing up personnel and data for other tasks.

Tiering your data storage services allows you to put SOX controls only around the data that pertains to your financial information and lock down the appropriate data on compliance-specific storage boxes.

Proving Compliance

The old adage is true: the best defense is a good offense. In the case of Sarbanes-Oxley compliance, the best offense is to create and implement provable policies. Having a data classification system allows you to produce standard reports that show duplicate copies of applicable documents, that show who has accessed the file within a specific time period, and that monitor implementation of your information governance policies. With reporting functionality available in a dashboard implementation, you can think of your system as a burglar alarm: a deterrent to potential wrongdoing and a way to prove that you?re actively checking for compliance-related issues.

Best Practices
Implementing one of today?s data classification systems should be an integral part of your Sarbanes-Oxley best practices. Setting information governance policies fulfills a basic requirement. Active management of your unstructured data will find, tag and move content according to your corporate policies, lowering the risk that information will ?fall through the cracks? and potentially protect you from breaking the law. Creating a tiered storage system will allow you to set retention policies according to the value of the content, saving money and reducing risks. And proving compliance ? or at least show that you?re attempting to comply ? is sometimes the best way to meet and exceed current ? and future ? government regulations not only around financial systems but around employee and customer privacy as well.

Reducing Risk and Lowering Costs
In the end, the benefits of visibility and control of your unstructured information reduces risks ? of compliance violations, litigation exposure, untimely responses and privacy and security breaches ? and lowers costs through streamlined storage operations, improved service levels and automated policy-driven data management.

Friday, October 24, 2008

The eDiscovery Struggle - Pricing and Small Jobs

With the change in FRCP, all companies are required to discover, search and produce electronic stored information. For Fortune 2000 companies, the price tag for eDiscovery jobs typically run in the millions of dollars. eDiscovery vendors compete aggressively for Fortune 2000 business.
However a more seedy side of eDiscovery has been identified by Craig Ball of LTN in his new article Too Little, Too Late.

Small companies facing litigation procedings must meet new FRCP regulations, too. But they are finding it challenging to even get the attention of eDiscovery vendors. They find themselves in a Catch 22....needing to meet federal regulations while their smaller eDiscovery jobs are ignored by big vendors.

Read Craig's full article and find out the truth about eDiscovery vendors! - Too Little, Too Late.

Monday, October 20, 2008

Kazeon eDiscovery Solutions featured in Information Week

With its award winning eDiscovery software and growing list of Global 2000 companies which rely on its eDiscovery for proactive and reactive litigation processes, Kazeon makes the news again in a new article in Information Week. To find out the whole story....See the full article below.......

http://www.informationweek.com/news/business_intelligence/mining/showArticle.jhtml?articleID=210604040&pgno=1&queryText=&isPrev=

Karthik Kannan Quoted in eWeek on Financial Crisis and its impact on eDiscovery

Karthik's Quote....

"The effects of the bailout package passed by the House and Senate last week [on the e-discovery industry] are probably going to come in this [fourth] quarter," Karthik Kannan, vice president of marketing and business development for Kazeon, a respected e-discovery software provider, told me."In the last couple of years, we've had significant traction and sufficient growth. But we have barely touched the tip of the iceberg in terms of market availability, from an addressable market standpoint." Kannan said he thinks this Wall Street-centered meltdown clearly will be a further impetus "to get people to call us. We are seeing that more people are adopting a proactive position on information management," he said. "We are also seeing more hits on our Web site the last couple of months."


Read the whole story: http://www.eweek.com/c/a/Data-Storage/eDiscovery-Search-and-Storage-Providers-Await-a-Litigation-Boom-Following-Financial-Crash/

Kazeon and Quantum form solution Alliance

Kazeon announced a solutions alliance with Quantum Corp., the leading global specialist in backup, recovery and archive. Kazeon's ability to perform automated eDiscovery collection, preservation, processing, analysis and review on all electronically stored information (ESI) residing within an enterprise works seamlessly with Quantum's DXi-Series disk backup systems incorporating data de-duplication and replication technology. The combination provides a comprehensive solution for eDiscovery, archival and compliance minimizing the data actually stored.

http://www.kazeon.com/newsroom2/2008-10-07.php

Federated eDiscovery

Common challenges encountered by corporations in managing their enterprise-wide information:

1. Unnecessary data movement and collection before first-pass review by inside and outside counsel – untargeted, over-collection leads to liabilities, high preservation costs, delays in decision making and inability to make crucial early-case assessments

2. Too much data being outsourced to legal service providers for eDiscovery and hosting for analysis/review – corporations do not have control on in-house eDiscovery and costs are driven sky-high

3. Data is brought over the WAN to central locations for review, analysis & processing – too much network bandwidth is consumed and the infrastructure costs are high

The ideal solution will provide a federated architecture that obviates unnecessary data movement, allows in-place analysis and first-pass review, enables early-case assessments before collection, and enables targeted collection of relevant data. The result is not only efficiency and cost reduction, but also accuracy, defensibility and providing control to the corporation.

http://www.kazeon.com/newsroom2/2008-10-20.php

Kazeon’s Agent-less eDiscovery

After reviewing the needs and demands of corporations, Kazeon’s eDiscovery approach leverages the agent-less discovery paradigm and resolves the associated agent-less challenges to deliver a robust eDiscovery capability. With its rapid installation, agent-less discovery’s “time to information” for early assessments is accelerated to ensure corporations meet the shrinking discovery windows. The robust auto-discovery capability identifies and searches of all information repositories on the network to reduce litigation risk, increase litigation defense capabilities, and mitigate relevant material omission. To handle the compute intensive discovery operations, Kazeon has designed and architected a high-performance and scalable clustered platform to meet intensive corporate litigation demands while maintain control and security of the discovery process. With an extensible platform architecture and ERDM compliance, Kazeon’s patented connector engine technology delivers a seamless solution to the agent-less “Live and Locked” file challenge, as well as offers numerous other application discovery capabilities. With is agent-less centric approach and robust technology, Kazeon delivers corporations the best of both worlds to solve today’s eDiscovery challenges.
Kazeon’s Agent-less Benefits
• Rapid Deployment – no agent deployment
• Network control – no user termination
• Ease of Use
• Auto-Discovery of all Sources – highest discovery coverage
• Clustered Architecture for Scalability and High-Performance
• Live and Locked Files discovery

Friday, October 17, 2008

Agent-less Paradigm in eDiscovery

Agent-less Paradigm
The agent-less centric approach does not require the time consuming deployment and installation of an application on all information sources. With agent-less discovery, the eDiscovery solution is deployed in the corporate network, auto-discovers all data sources available, and commences information discovery. The ubiquity of agent-less eDiscovery capability is that it leaves no stone unturned to provide corporations with a holistic information management view, which provides corporations with the must complete and defensible information possible. With the agent-less approach, laptop and desktop users cannot disable the discovery process, as in the agent centric approach. With agent-less search, there are no compatibility issues to threaten the operational health of critical systems. The downside to the agent-less approach is that the discovery process is conducted by the eDiscovery the servers alone, which typically reduces performance and scalability of the eDiscovery process. Another challenge with agent-less approach is the inability to search and preserve “Live and Locked” files due to file system permission settings.

Agent-less Benefits
• Rapid Deployment – no agents deployment
• Auto-discovery of sources
• Network controlled – no user termination capability
Agent-less Challenges
• Performance
• Scalability
• Live and Locked File access

Agent Paradigm eDiscovery Paradigm

Agent Paradigm
Many companies use agent centric discovery. In the agent centric paradigm, a corporation must deploy and install a small application on each and every device it wishes to perform discover upon. The advantage of agent centric discovery is that the discovery effort leverages the source compute engine (CPU) to conduct search. Agent search also solves the challenge in discovering Live and Locked Files since runs in the client and has root access. The downside to agent centric approach is that corporations have to know each and every device on their network before discovery begins. The deployment and installation of the discovery agent must occur on every device. A corporation with thousands of desktops, laptops, servers and file servers will need significant planning to deploy the agent, not including test time of the agent on critical operational systems to ensure compatibility. Since corporations can only collect data from known sources, the agent approach exposes the corporation to omission liability, if relevant litigation materials are found on an unknown source. Another challenge with agent centric discovery is user intervention. Agents are often disabled by users because they slow system performance, which further impedes information discovery and collection.


Agent Benefits
• Distributes Compute Load
• Searches Lock and Live Files

Agent Challenges
• Search only know sources
• Agent compatibility challenges
• Agent application termination by users
• Agent deployment and installation on all information sources

Managing Information “in the Wild”: Ten Tips for Lite-ECM

Traditional Enterprise Content Management (ECM) products and solutions have and continue to deliver the ability to manage content, documents, and records within the enterprise to meet today’s business regulations. ECM platforms are a corporate necessity and will continue to be the backbone for critical, high-value information management and collaboration capabilities for the foreseeable future. However, there is a growing challenge in enterprise content creation and management. With the increase in knowledge workers and user friendly content creation tools, electronically stored information (ESI) volume is growing at nearly a hundred percent year over year. With this volume of information, over eighty percent of informational assets reside outside the ECM repositories, i.e. are “in the wild”. With today’s dispersed mobile workforce, documents and records are now created and scattered across databases, servers, email applications, laptops, desktops and storage systems around the world. The problem is how to find the key critical documents and records in today’s informational ocean and pull them into the ECM workflow, while leaving the rest in-place and indexing it. Because, importing all documents and records into the corporation’s central ECM repository is no longer possible or feasible due to sheer volume. A solution to the “in the wild” information problem would be to provide ECM functionality and features ranging from information access, information management, records management, metadata capture and management, full text and full content indexing, auto-classification across any device to separate non-critical information from critical information and seamlessly transition it into the ECM repository or indexing and managing information in-place.

To help manage the information growth while maintaining enterprise content management control of high-value documents and records, corporations need visibility into important data outside the ECM repository or “in the wild” information. Gartner coined the term ”Lite-ECM” which describes a cooperative Information Access and Management suite that seamlessly integrates and extends today’s ECM platform capabilities to provide identification, search, analysis and auto-classification of information outside the ECM repository stored throughout the organization. Adding in-place indexing and management capability delivers the ability to virtually organize distributed information into a single, easy-to-use information viewpoint, regardless of where the information resides. But, adding another tool to the toolbox can be challenging, especially on today’s constrained IT budgets. What are the key features needed to determine the right Lite-ECM product fit for your corporation? David Morris, Director of Product Marketing at Kazeon, provider of Information Management and eDiscovery technologies, offers insight into the top criteria for choosing a “Lite-ECM” software suite to augment and extend your ECM capabilities and reduce deployment and management headaches.

1. Enterprise-class Scalability & Performance – Most Lite-ECM information platforms were architected with a reactionary (do it once) ideology, which causes significant scalability challenges when attempting to deploy continuous Lite-ECM capabilities to manage today’s dynamic information environments. A Lite-ECM suite must be scalable to search across hundreds of terabytes of electronically stored information, as well as scale into the billions documents, and have the performance to process the data to keep pace with today’s information growth.

2. Auto-Discovery of data sources - The Lite-ECM suite must have the capability to auto-discover informational sources anywhere on the network, since critical data may reside in the enterprise file storage file server or a laptop in Shanghai. To truly manage all information, auto-discovery is a critical feature of any enterprise level Lite-ECM suite.

3. Holistic and Dynamic Organizational Information Map –Since network topology can change rapidly, having a dynamic and active continuous auto-discovery capability is critical for information indexing, internal investigations, litigation procedures and information capacity planning.

4.Agent-less Information Management– Organizations have enough critical data running on servers, laptops and desktops today. Having another agent on all devices simply reduces operational health and increases risk, not to mention that the device has to be “known” to install an agent. Agent-less search has a low impact on the IT infrastructure and is more rapid to deploy. With the scalability challenges solved, it is the lowest risk highest reward approach to identifying all data sources on the network. Since users cannot disable agent-less search, it provides a rapid and powerful investigation and litigation capability to find potentially relevant information and hold it in-place for review.

5. Robust Search, Analysis, and Classification - Searching, analyzing and classifying information complex challenges; however, a Lite-ECM suite will need have all three to truly add value and help separate the relevant from the non-relevant information within an organization. Having a strong analysis and auto-classification capability that can sort large data sets based on metadata, document content, file type, etc. is necessary to accurately and quickly reduce the volume of data to a relevant and manageable set to review and processing.

6. Tagging –Automating the tagging of individual content or grouping content into relevant virtual folders with a robust policy based engine allows administrators to simplify the review and reporting process by delivering a virtualized organizational information overview.

7. Workflow Management– After gaining insight into and classifying critical information, bring the “in the wild” data into the ECM platform for workflow management and preservation is a key capability. With the ability to automate the move, copy, encrypt, delete actions; an automated policy based methodology accelerates the manual processes for processing of all enterprise data. Furthermore, it allows corporate governance policies and IT policies and procedures to be managed and enforced through the existing platform.

8. Unified Management – With billions of documents and petabytes of storage, corporations can easily be overwhelmed by the volume of data and its presentation. A robust Lite-ECM suite must have a unified management view across the entire network and the ECM platform to simplify operational management. Without a unified management approach, the management task is overly burdensome and not feasible.

9. In-Place Record Hold – Being able to tag and hold potential critical information at the source, i.e. server or laptop, is a capability that separates the efficient Lite-ECM suites from unusable ones. It is not reasonable to move all potential critical data back to a repository before review, the in-place hold and review and subsequent collection process streamlines and accelerates the process to meet today’s demands and reduce infrastructure costs.

10. Enterprise-Wide Critical Information Capture – With 80% of corporation’s informational assets outside the control of the ECM platform, a Lite-ECM will need to have the flexibility to identify, access, search, and review information which resides in databases, email archives, servers, email systems and storage systems across the network. With an automated workflow policy engine, capture and movement of critical information to the ECM repository can be accomplished on a daily, week, or monthly basis. Having an extensible architecture to facilitate search, collection and review across existing and emerging applications and data types is a critical capability.

Deploying a Lite-ECM suite is a complex process, since it impacts IT, Legal, human resources, records management and security teams. To meet stakeholder needs, Mr. Morris advocates convening a cross-functional team to gather requirements, review solutions, and manage deployment of a Lite-ECM suite, as well as to create a sense of ownership and responsibility or the enduring usage of the new suite.