Reducing ROT can make an organization more productive, cost-effective and mitigates legal, privacy and regulatory risks – and it’s not that difficult…
What is ROT ?
IT and records managers use a lot of acronyms, ROT being one of them, so what does it mean?
ROT stands for Redundant, Obsolete, and Trivial and refers to all that data that clogs up an organization’s file shares and email servers without adding any business or legal value.
Why is finding ROT important?
- There is a financial cost to storage, and ROT grows quickly
- Not identifying ROT means it takes longer to find the files or emails that are important for productivity and can muddy search results
- It’s risky to keep ROT around as it may contain;
- Personally Identifiable Information (PII)
- Intellectual property
- Legal liabilities
- ROT may exist outside of secured storage and be exposed to hackers
This is essentially duplicated data. Think of the terabytes of email attachments and multiple versions of the same documents that exist across your enterprise.
We can find duplicates by performing a metadata crawl to understand the basics of the data and then by adding a hash (a unique, short identifier based on the binary contents of a file) to the results stored in an index to say with certainty whether a file is an exact duplicate of another or not. Next, we can perform a full-text and entity extraction to identify even more ROT by looking at document versions and nearly identical files.
Shinydocs recently worked with a financial services client where over 80% of their data was found to be duplicated across a 90TB data set.
Obsolete data includes data that is both “old” and data that has not been modified in a very
long time. Generally obsolete data is deemed to have limited strategic value with more risk than reward.
As a start, we can find obsolete data by querying properties like Creation Date, Last Modified Date and Last Accessed Date and then combining them with other properties, like the path where they exist, or the type of document it is.
We recently worked with a large government client with 600 TB of data and found over 50% was obsolete and presented a PII information compliance risk!
Trivial data is that which is simply no longer required – it has little importance or value. Trivial data can typically be identified by file extension (e.g. “.tmp” or “.log” files for example), keywords (e.g “draft”), path or by file size.
We have a predefined list of typical trivial files by levels 0, 1 and 2 which correspond to files
that are almost certainly trivial and should probably be deleted immediately (level 0), up to
those that are probably trivial (level 2) and likely need some human verification before being acted
There is clearly no place in an organization for ROT (ten) data.
To get a handle on ROT, we have a series of tactics to find each kind of ROT, each with increasing levels of detail and sophistication. After that we can look at either deleting the data outright or archiving it in a less expensive and risky way!