Home Blog News About Us Contact Us

More Efficient E-Discovery with Document Clustering

Clustify analyzes the text in your electronic documents and groups related documents together into clusters. This provides litigators with a quick overview of the document set, and makes document review more efficient and consistent because related documents can be reviewed together. Clustify can generate concept-based clusters, or it can require documents in the same cluster to contain significant passages of identical text. The latter option is useful for identifying near-duplicates, which can cut the cost of electronic discovery further than simple deduplication. Clustify also offers real-time predictive coding.

Cluster list restricted to a selected set of keywords. During clustering, Clustify labels each cluster with a few keywords that tell you what the documents have in common at a conceptual level (this is done even when clustering for near-dupe detection). The keywords give you a quick idea of what each cluster is about, and they allow you to easily identify the themes of your document set. Clustify reports the frequency for each keyword, and allows you to browse clusters containing a set of keywords you specify.

Clusters with representative documents Clustify also identifies a “representative document” for each cluster. All documents in the cluster are similar to the representative document within the limit that you specify. This allows you to make decisions about all of the documents in the cluster by looking only at the representative document, reducing the amount of labor needed to review the documents.

For example, if you are a litigator hunting for evidence of price fixing by a manufacturer and the representative document for a cluster is about the company's health plan, the other documents in that cluster aren't worth spending time on since they are probably about the health plan too. On the other hand, if the representative document describes a bid for a project, a detailed review of the other documents in that cluster is warranted. Think of Clustify as a tool for organizing your documents into boxes so that you can make decisions one box at a time, instead of one document at a time.

During review, you can, with a single mouse click, categorize or tag a single document, a cluster of documents, or a set of clusters containing a specific combination of keywords. Clustify also has an automatic categorization capability where it can take all documents sufficiently similar to a set of documents and categorize them the same way. This can greatly reduce the amount of labor needed when new documents are added to a case because you can leverage the labor you've put into categorizing the older documents.

Document comparison tool When using Clustify for near-dupe detection, you choose what to do with the near-dupes. You can discard them and simply focus on the representative document for the cluster, or you can examine them to look for meaningful changes between different revisions of the document. Clustify has a document comparison tool that displays documents side-by-side with corresponding sections highlighted, so you can easily spot differences between documents.

Search engines may allow some relevant documents to slip through the cracks if your choice of keywords is not perfect, or if you fail to account for synonyms. When you do find a responsive document with a search engine, you can use Clustify to look at other documents in that document's cluster, which may be relevant even if they don't match the search query exactly. Clustify looks at the entire text of each document to decide which documents are similar to each other, so it is less impacted by the presence or absence of a single word, and it is not influenced by the user's preconceptions. You can also use Clustify to identify keywords that might be useful when constructing search queries.

Clustify uses a proprietary mathematical model to measure the similarity of document pairs, a critical step in achieving accurate clustering. It builds on that accuracy with a proprietary clustering algorithm that was designed from the ground up to achieve excellent scalability. It can cluster 1.3 million non-trivial Wikipedia entries on a desktop computer running Linux in 20 minutes, or 50 minutes under Windows.

Clustify is currently available for Windows and Linux. Please inquire about availability for other operating systems. On Windows, Clustify utilizes IFilter technology to handle many common document formats, including PDF, MDI, searchable TIFF (i.e. with embedded OCR output), all Microsoft Office formats, OpenDocument, WordPerfect, etc.

To learn more about how Clustify can improve the e-discovery process, fill out the form below, or contact us.