Microsoft Presidio: Context aware, pluggable and customizable data protection and anonymization service for text and images

This is cool! It provides functionality to parse unstructured text, identify PII, and filter / anonymize it. With the recent focus on data ownership with laws like CCPA and GDPR, there are lots of companies attempting to minimize their PII surface area, and unstructured text is a big one. A recent client didn't want to sync ZenDesk ticket data into their warehouse because of the unstructured nature of ticket contents and file attachments (not at all unreasonable). This tool could help address concerns like this.

Ideally, off-the-shelf pipeline tools like Stitch and Fivetran would implement functionality like this(!).


