[email protected] | +45 91 76 72 94

Data mine your drives for sensitive data

In Confidential 1.2.31 we combined the search based on regular expressions, with the search in file content that we introduced in Confidential 1.2. So basically you can now tell Confidential to “scrape my disks and auto-tag everything that looks like credit number, an email address, invoice number, a name from a list or anything that can be describe by a regular expression”.

This feature can help you reaching GDPR compliance for your unstructured data: after deleting  customers’ entries from your databases, this tool can help you find every bit of information about your customers that his contained on any file on your local and shared drives.

We wrote a guide and list of working regular expressions on our wiki.

  • A quick example of what a RegEx can do for you is:
    If your invoices look like “Inv2010-001” to “Inv2018-999“, you can setup one rule to get your invoices automatically tagged based on year, by matching the pattern “Inv(year)-###”. That rule would look like this: .*Inv(201\d)-\d{3}.*
  • You can now do proper data-mining on the tons of files on your drives and shared drives, telling Confidential for example to find and auto-tag “a social security number” or “any file that contains any email address” or “a phone number from a certain country or from a list or within a range” or “a city from a list”.
  • You can easily create a rule containing a logical OR, that is a rule that matches, for example “Jon OR Jonathan” in the document (the rule would look like .*Jon.*|.*Jonathan.*). In the same way you can create a list that matches thousands of words (names, streets, cities etc.)
  • Each rule can tag files with more tags and you can have multiple rules running at the same time… so this could result in tagging the same document with, for example, a customer id, a city name and invoicing year.

We are building sets of rules that you can import straight into your database from this page.

A couple of screenshots for what data-mining rules can look like 🙂

Data mining files containing social security numbers

Data mining social security number (Danish CPR)

Data mining files containing names or addresses from a list

Data mining names or addresses from a list

DOWNLOAD AUTO-TAGGING RULES

(Here you can download several sets of rules to start scraping your disks)

By |2018-05-03T22:42:48+00:00April 16th, 2018|0 Comments

About the Author:

Co-Founder of Tag Forge. Once a videogame developer, worked with sales for 3D Scanners and 3D printers. Zelda fan. https://www.linkedin.com/in/andrea-d-intino/

Leave A Comment

We use technology (including cookies) to collect, use and transfer information to support this Site, including for data analytics purposes, as described in this Site's updated Privacy Policy. Your continued use of the Site signifies your acceptance of such cookies. To learn more about how to manage your cookie settings and how to exercise your rights under GDPR please see our Privacy Policy

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close