Wednesday, November 12, 2008

Pit Falls in Data Mapping and Relevant Information Collection

With the growth in the usage computers and content creation tools, the volume of electronically stored information has outstripped corporate IT administrator’s ability to keep up. Keeping all information electronically generated materials is no longer viable from a cost perspective, as well as a litigation risk perspective. Many corporations have instituted electronic information policies which delete documents and emails after a specific period of time, e.g. three months to six months after creation. However, many employees hoard information and have found way around the information management policies, which increase litigation risk exposure for corporations.

How do employees get around the IT enforcement policies?

1. Rename Exchange Archive PST to another extension – most enforcement software does not cross check the content of a file with the extension. Employees have learned to create an archive of their email on a regular basis and simply rename the AAA.pst to AAA.doc to circumvent the email enforcement policy. Make sure that the eDiscovery software can not only find documents, but verify that the document and the document type are not in conflict to find all relevant information on the network.

2. Save PSTs to USB thumb or harddrive – If the data is not on a company share or computer, the enforcement policy is circumvented. However, opposing counsel can depose senior officers and uncover “personal” copies of company information for discovery purposes.

3. Non-IT storage – With strict IT enforcement policies, some corporate divisions have deployed a “divisional” storage server which is outside the knowledge and purview of corporate IT. Although storing electronic information can be useful, it poses a significant litigation exposure risk. Defense and Opposing counsel need to ensure a complete organizational Data Map has been created that has searched the network for “Rogue Storage Sites” to fully comply with FRCP. Make sure that the eDiscovery software can identify all informational sources on the network….otherwise the Organizational Data Map is worthless.

With eDiscovery cost being reduced with automated software tools, corporations have the ability to deploy systems to manage their informational assets efficiently. Furthermore with the dropping cost of eDiscovery, opposing counsel can now trust the defense’s Data Map, but ask for verification with a robust eDiscovery suite, like Kazeon’s eDiscovery software, to ensure and verify compliance.

Tuesday, November 11, 2008

Info Management Technology

21st century e-discovery technology clearly has value in that it facilitates discovery for a host of initiatives that require information access and classification. In short, it is a technology with broad information governance and management applicability, a claim that niche players can never make. This technology integrates directly with the most common data storage, email, database and date archival platforms in existence today to provide corporate clients with flexibility without re-tooling, an approach that not only extends the lifecycle, utility and value of any technology investment, but will be sure to put a smile on even the most austere CFO’s face.

Use case: IT storage management

While storage managers have any number of issues on their front burners at any given time, the one issue that may subsume them all is the management of storage growth. In an environment where one click of the send button on a keyboard can result in a file proliferating five hundred fold across enterprise and geographical boundaries, the concept of data deduplication has never been more relevant or important. In fact, storage managers should take heart, the pedigree of information access technology and core competencies of the technology architects can be traced back to storage management. Date de-duplication is an important factor; file type mismatch reports help uncover MP3 music files cleverly renamed to look into conduct that could adversely impact the organization.

These tools have other uses too, particularly file size and aging reports which are of enormous benefit to the planning and budgeting for IT multi-tiered storage strategies and “green data center” initiatives, which in turn have a significant impact on disaster recovery and business continuity planning.

Use case: Records retention policy and management protocol

For many organizations, the perennial information management challenge they face is records retention. Some organizations my either have no cognizable records retention policy or a state-of-the-art records management protocol they can never seem to get effectively implemented. Regardless of what state its records policy or management protocols are in, and organization that seeks to implement one should have a sense of the following fundamental characteristics of their data and organization:
a. File create, modify and access times;
b. File “owner”;
c. file content;
d. The rules that define which group within the organization’s internal data taxonomy, the classified information belongs to, i.e. – HR, finance, taxes, customers, operations, marketing, etc.;
e. The applicable regulatory framework for the organization – this consists of nondiscretionary externally mandated records retention requirements; and
f. Pending (to the extent they exist) litigation hold requirements.
1. Defensible data remediation; getting rid of files with little business value and high exposure.
Organizations that have implemented records retention programs without classification are likely over-capturing information. This means that while they get what they should, they may also capture superfluous information that is of no business value that could well represent significant risk to the organization.
Records retention managers who leverage search and classification technology will be able to report on a host of information, including, but not limited to, the age of files it encounter during a network scan as well s file access and modify times. This allows records retention managers to being data remediation based on file age and utility. Arguably, old files that are never modified or accessed likely have little or no business value. In fact these files are the ones that constitute the greatest organizational threat in the form of dormant data liability. However, simply because they exist, they may be responsive to a litigation or regulatory request.
2. Data classification; defining that which is an organizational record.
The precursor step to defining what constitutes an organizational record is data classification. Prior to data classification, the data must be discovered and its contents accessed. In today’s world, file metadata and content become part of an enterprise index. Policies, driven by corporate stakeholder criteria, can then identify the items that qualify as “records” based on the organization’s rules and apply user-specific metadata to them.
In the records retention use case, records retention managers or policy makers can create and automate simple or complex classification rule sets that will classify and tag relevant documents with values such as “tax record,” “HR record,” “final contract,” etc. The resulting record can then be migrated to its archival resting point on secondary storage or read-only media. In short, powerful metadata tagging capabilities now allow records managers and policy makers to get their arms around vast amounts of data and apply flexible and elegant classification schemes that heretofore would have been inconceivable.
II. Use case: Information security
In a recent study conducted by TIP, and independent IT research group founded by Gartner, EMC, Giga, and Bell Labs alumni, upwards of 70% of information security (infosec) professionals interviewed have confirmed that there has been a shift in their focus from external threats to internal threats to their information security. Some hot points for information storage mangers include stemming intellectual property (IP) leakage, identifying network security gaps and managing information access. Helping infosec professional by giving them insight into the nature of data at rest is a core foundation for infosec solutions. Questions such as, “who in the enterprise has data related to project X and where is it?” can easily be answered by leveraging regular expression content filter engines that identify and alert infosec professional to the existence and network coordinates of certain types of information that meet the organization’s risk profiles. Infosec managers should be able to scan, locate and sequester information such as credit card data, social security information or any other pattern or keyword-based sensitive and proprietary material. Even more importantly, they should be able to conduct automated risk rankings that combine multiple metadata and content conditions. Information access technology delivers these types of solutions. Now infosec personnel can ascertain data ownership and access rights and be in a better position to help make policy decisions about avoiding future organizational exposure.

Use case: Litigation discovery

1. Implementing a legal hold.
When the specter of litigation looms, counsel has a common law obligation to prevent spoliation. Legal hold implementation has been notoriously taxing on IT department operations and disruptive to custodian workflow. Effective litigation holds are predicated on the ability to identify responsive information and the ability to sequester the information in a defensible manner.

The implications and benefits of this functionality for in-house counsel and IT are:
a. Centralized and consistent collection methodology;

b. Uniformity of collection criteria;

c. Scalability – to collect from 10 or a thousand employees; from 10GB to 100TB;

d. Collection transparency; removal of the onus of collection from organizational custodians;

e. Centralized auditing of the collection steps, criteria and the data collected;

f. The ability to “tag” data based at the point of collection by collector, mater, privilege, responsive term, issue or any other relevant criteria;

g. The ability to de-duplicate data prior to providing access to outside counsel;

h. The ability to use a single interface to search, “logically tag” and sequester email, loose electronic files and database records across multiple geographic locations; and

i. The ability to implement multiple “lights-out litigation holds”’ a process whereby multiple automated rules (policy) engines can scan locations, identify, copy or move responsive information to litigation hold locations.

2. Addressing the disclosure requirements of FRCP 26(a) – Identifying data storage points, quantifying data volumes and types to facilitate initial disclosures.
Simplify the entire litigation data mapping process by automatically creating a “data profile” of:

a. Network file servers;
b. User desktops and laptops;
c. Email (MS-Exchange); and
d. Databases (Oracle).

Products should automatically identify all data storage devices on a network and begin indexing the contents of the devices to which it has access, then automatically extract all metadata and file ownership information from files it scans, including email and PST files. Depending upon the speed of the network, the configuration of the related target data source and the mix of data types, the indexing and classification should proceed at a pace of approximately one terabyte of data per day. Lastly, the solution should have built-in-reporting to provide counsel with summary or detailed reports on the data to help with providing outside counsel with the requisite information for disclosures.

3. Facilitating counsel effectiveness in FRCP rule 26(b)(2)(B) and 26(b)(5)(B) meet and confer-Provide a substantive data landscape that facilitates keyword and form of production negotiations.

Now that counsel has unfettered access to data, s/he can gauge the responsiveness of electronically stored information (ESI) to various terms. This can greatly assist coming up with and finalizing the production criteria as well as help with the form of production.

4. Reducing the likelihood of motions to compel from accessible sources.
The solution functionality and audit trail will substantiate the thoroughness of any discovery initiative to the extent the relevant data sources were accessible.

5. Saving upwards of 40% on litigation support services disbursements.
Today, corporations will pay up to $2,000 per gigabyte to “process” electronic files so they can be reviewed by outside counsel. The bulk of the processing fee is associated with: data de-duplication (getting rid of duplicate files during review), keyword searching and metadata extraction.

Newer products using information access technology perform all of these services in house at a cost of approximately $4 per gigabyte, representing a major evolution. They also generate output that can be used by common litigation support and document review applications like Concordance and Summation.

Information Management - The Big Picture

Given the explosive growth of electronic information in corporate America, managing electronic discovery is increasingly a challenge for corporate IT departments, in-house and outside counsel, each of whom are stakeholders. In December of 2006, the Judicial Conference of the US amended the Federal Rules of Civil Procedure (FRCP) to clarify the roles, responsibilities and discovery obligations of the various parties to litigation. The amendments, for the first time, made specific reference to electronically stored information, or ESI, as it is now commonly known. The changes in attitudes toward e-discovery are noticeable and the amendments have, without question, helped create an unprecedented level of dialog and collaboration to understand how electronic information is created, used, managed and disposed of in the corporate environment.

Why, then, have the amendments, intended to reduce confusion, also introduced a level of complexity to the e-discovery process that has left a lot of people scratching their heads?

For example, corporate counsel in a defense posture is keyed in on everything from creating corporate data maps to handling multiple and complex litigation holds, as well as establishing repeatable and defensible guidelines for discovery. What happens the following week when the storage administrator retires a key server and implements his data consolidation strategy? How good is the data map then?
Records retention managers have also been significantly affected. For years, they have been seen as silent corporate operatives who had murky roles and dealt with boxes of old documents. Today, nothing could be further from the truth. They are on the front lines of protecting an organization from a data management policy perspective.

Another role that has seen significant evolution is that of the “storage administrator.” Corporate data storage administrators are IT personnel whose roles are largely characterized by their knowledge of an organization’s data growth and proliferation patterns – key factors that allow them to make recommendations as to how, when and if an organization’s data management hardware and associated software platforms need modification or change.

Another driver is the evolution of technology for e-discovery to serve both proactive and reactive use cases. The vast majority of matters today are addressed in a reactive fashion with a mind to quickly address pressing, active concerns that demand rapid retrieval of responsive ESI for early case assessments, meet and confer and other matter-specific requirements. However, the future is clear in that there is a need for consistent, repeatable and targeted e-discovery processes that can also be deployed across a company, creating an “e-discovery ready,” proactive environment.

Therefore, the answer may lie in the fact that while the amendments impose obligations on the parties, they don’t specifically state how one should go about fulfilling them. When it comes to corporations today, the old silo-based information management paradigms will not work when it comes to information discovery of any kind, for any reason. The bottom line is: litigation, storage management/data consolidation, records retention, regulatory responses, internal investigations, information security initiatives, personnel policy management, business intelligence, data mining, compliance and monitoring are all effectively subsets of what we call “e-discovery.” This new paradigm of e-discovery subsumes many previously compartmentalized departmental initiatives that are under the auspices of legal, IT, records management, HR and finance. It is predicated on the degree to which an organization has information access and the ability to perform effective data classification. In short, companies should be able to leverage enterprise data for multiple business needs from a common underlying information access and classification platform.

Friday, October 31, 2008

Unstructured Information Management What You Don’t Know Can Hurt You

Companies large and small create an impressive amount of data, including email messages, documents and presentations. Most of that data is unstructured, existing primarily on corporate file servers, employee desktop and notebook computers. Industry analysts estimate that this unstructured data accounts for 80% of all corporate information, and expect it to grow 50% or more each year.

Unstructured Information is Unmanaged Information Unstructured data is typically unmanaged. The file system on which this information resides typically is not monitored and the content is practically invisible to employees, auditors or corporate compliance officers. In an effort to provide a greater degree of visibility, control and management of this information to meet compliance reporting requirements, companies have implemented one or more technologies, each of which has advantages and disadvantages:

Enterprise search ? An enterprise search engine is an effective way to index and find documents that contain certain terms. Most are easy to implement and require only a modicum of regular maintenance. Unfortunately, most enterprise search engines are tuned to find all the documents that may contain a particular term, rather than a specific document that may be required by an auditor. It is left to the user to winnow through all the returned documents to find what they need, which can be a time-consuming and costly exercise. Additionally, search engines are mostly lacking in providing the ability to manage the documents it indexes.

Enterprise Content Management ? ECM systems can effectively manage many types of content and can provide access and version control, both of which are effective aspects of information management. ECM systems also tend to be very expensive to setup and maintain. These systems typically require an organization to purchase server and user licenses, implement policies and processes for using the system, and train its users. Because of these costs, companies often limit their ECM implementations to specific areas of their business or types of data, such as documents that pertain to finance. According to many analyst organizations, ECM systems are being used to manage approximately five percent of today?s corporate information.

File Backup ? Many companies attempt to solve the problem of document retention by creating regular backups of all the data on the network. These backups are saved to tapes, which are then stored offsite for disaster recovery purposes. Backing up all data regardless of its business value is an inefficient use of time and resources, increases the cost of tape storage and decreases the likelihood of rapid single file recovery, which is the most-used aspect of file backup.

Doing nothing ? This is the ?solution? that many companies choose for handling unstructured information. Unfortunately, the prevailing thought among many has been that unstructured information is insignificant and therefore does not require management. After all, most of this information ranges from personal files to draft documents or one of dozens of copies of sales presentations, the majority of which aren?t worth the cost required to manage them.

While most files aren?t worth managing, the risk comes from the small number of files that do matter. For instance, your Sarbanes-Oxley policy and procedure manual, which took valuable internal resources, a consulting firm, and many months to create, has likely been copied from the content management system specially created for finance-related documents. The next time you update that manual with critical information, you have fulfilled one aspect of the act by tracking and recording those changes in your records management system. However, what about the dozens of copies that may have spread across the network on shared file servers? How can you be certain those copies are deleted or updated to keep people from following old procedures or controls? If you aren?t doing anything to manage that data, you are leaving your company exposed and vulnerable.

Recognizing Valuable Information
Addressing these issues is key to an effective solution for Sarbanes-Oxley or any information governance initiative. Obviously doing nothing is not the answer. At the same time, it would be cost-prohibitive to manage all files as though they were critical business records. Therefore, the ability to specify which data is critical and worthy of this level of management is a crucial first step. If you are aware of the data?s value, you can make educated decisions as to the disposition of important data and create an appropriate retention policy.

Determining a data?s value is a result of effective information visibility and control.

Information Visibility ? The first aspect of recognizing valuable data requires that it be visible. While your compliance office may have access to all corporate information across the network, the sheer amount of data necessitates the use of technology to find and manage the appropriate documents.

Information Control ? To effectively manage and control unstructured information, you need a solution that allows you to copy, move, delete or tag documents with custom metadata; i.e., information about the document. Even better, the solution should provide an integrated policy engine that can be customized with your company?s information governance regulations. For instance, creating a policy mandates that any document on the employee network that contains a customer account number must be 1) tagged with custom metadata of ?Customer,? and 2) moved to a secured server or file archive system.

Data classification is an important aspect of information visibility and control. Several products have emerged or expanded into this space, to offer an all-embracing solution for complying with Sarbanes-Oxley and other regulations. By implementing one of these data classification systems, documents on your network can be located, opened and tagged according to the content found within each document. A typical classification workflow might look something like this:

1. Catalog ? The system scans the file systems, finding and collecting file metadata from hundreds of file types.

2. Classify ? Opening each document, the system classifies data according to file attributes and keywords or word patterns, and tags with custom metadata according to pre-set policies.

3. Search ? The system allows users to find desirable information based on a combination of metadata and full document text, utilizing standard Windows and UNIX access control lists.

4. Report ? The system should allow appropriate users to create and access summary or detailed reporting functionality.

5. Act ? Finally, the system should integrate actions, such as tagging files with custom metadata, setting retention and monitoring policies, and offering move, copy and delete functionality, again based upon an access control list.

To contrast, an enterprise search engine provides an efficient method to find content that contains the search term you need. But then what? If you wanted to copy, move, delete or perhaps tag the document with customized metadata, you would have to manually do so.

Data Retention, Availability and Recovery Retention is another aspect of corporate information that cannot be overlooked. While many companies elect to back up all data on a weekly or monthly basis, the cost of time and resources increases as the amount of data grows. Knowing what is in your data ? by making information visible, by tagging with metadata and by controlling access ? allows you to intelligently create a retention policy that moves or backs up only the data needed to comply with your corporate information governance policy or government regulation.

Most organizations use a backup solution that periodically copies data to tape or disk drives. An organization may back up its mission-critical data every night and all of its data every week. It may store the backup tapes for up to six months to guard against accidental deletions, send tape copies offsite as a safeguard against disaster and retain backup tapes long-term to meet regulatory requirements.

Lacking the means to gauge the value of the data, companies often take the safe route and back up all of it. Not only is the approach ineffective, it indicates inefficient data management and creates a potential risk. Storing data that is not required to be kept can be used against a company in the event of a lawsuit or regulatory compliance issue. In this respect, backing up data in its entirety creates a liability.

Corporations can meet regulatory data retention requirements, cut backup and recovery costs and manage risk by introducing file archiving into the mix.

A file archiving system uses data classification to determine the content?s value, then moves or copies files according to that value. File archiving systems can find and retrieve files based on their content. Any number of parameters can be used, including author, date, and customized tags such as ?SEC 17a-4? or ?Sarbanes-Oxley.?

This naturally leads us to the tiering of storage services. Backup and file archiving are natural places to start for providing tiered storage services, based upon the value of the data in your network.

As an example, consider a company that has 10 terabytes (TB) of data on production file servers. In the past, the company may have backed up critical files onto disk storage and then backed up all files onto tape once a week. The company catalogued the tapes, kept them for three months and then cycled them back through the process. New government regulations mandate that all data related to quarterly financial results must be kept for five years. Unfortunately, the company has no way to differentiate among the disparate types of data on its network. The company is forced to retain all of the data for five years, expanding the amount retained from 10 TB to 2.5 petabytes (PB). As data amounts double annually, so will the amount that must be stored. The company will find itself devoting more and more time and resources to data backup.

To solve this problem, let us assume that the company implemented a data classification system. By discovering the value of its unstructured information and tagging according to the value, the company copied 500 GB of financial reporting data to WORM storage for long-term retention and moved seven TB to tiered storage, which is backed up to tape every three months. The data in three-month storage would total 42 TB, compared with the 2.5 PB that would have been required if the data had not been archived. With tiered storage, the company significantly reduced backup time and resources, shrank the cost of production file storage and increased its IT service levels by freeing up personnel and data for other tasks.

Tiering your data storage services allows you to put SOX controls only around the data that pertains to your financial information and lock down the appropriate data on compliance-specific storage boxes.

Proving Compliance

The old adage is true: the best defense is a good offense. In the case of Sarbanes-Oxley compliance, the best offense is to create and implement provable policies. Having a data classification system allows you to produce standard reports that show duplicate copies of applicable documents, that show who has accessed the file within a specific time period, and that monitor implementation of your information governance policies. With reporting functionality available in a dashboard implementation, you can think of your system as a burglar alarm: a deterrent to potential wrongdoing and a way to prove that you?re actively checking for compliance-related issues.

Best Practices
Implementing one of today?s data classification systems should be an integral part of your Sarbanes-Oxley best practices. Setting information governance policies fulfills a basic requirement. Active management of your unstructured data will find, tag and move content according to your corporate policies, lowering the risk that information will ?fall through the cracks? and potentially protect you from breaking the law. Creating a tiered storage system will allow you to set retention policies according to the value of the content, saving money and reducing risks. And proving compliance ? or at least show that you?re attempting to comply ? is sometimes the best way to meet and exceed current ? and future ? government regulations not only around financial systems but around employee and customer privacy as well.

Reducing Risk and Lowering Costs
In the end, the benefits of visibility and control of your unstructured information reduces risks ? of compliance violations, litigation exposure, untimely responses and privacy and security breaches ? and lowers costs through streamlined storage operations, improved service levels and automated policy-driven data management.