Microsoft Presidio: Context aware, pluggable and customizable data protection and anonymization service for text and images

github.com

This is cool! It provides functionality to parse unstructured text, identify PII, and filter / anonymize it. With the recent focus on data ownership with laws like CCPA and GDPR, there are lots of companies attempting to minimize their PII surface area, and unstructured text is a big one. A recent client didn't want to sync ZenDesk ticket data into their warehouse because of the unstructured nature of ticket contents and file attachments (not at all unreasonable). This tool could help address concerns like this.

Ideally, off-the-shelf pipeline tools like Stitch and Fivetran would implement functionality like this(!).

Read more...
Linkedin

Want to receive more content like this in your inbox?