Update
The location of the JSON files has been changed to development branch
I’ll cut right to it, I need your help.
I’m developing a new command for dbatools to scan for PII.
I already have a wide variety of different patterns and ways to check on possible personal information but I want to be as thorough and complete as possible.
The command is called Invoke-DbaDbPiiScan and it does two things:
- it scans the columns in the tables and sees if it is named in such a way that it could contain personal information
- it retrieves a given amount of rows and goes through the rows to do pattern recognition
How does it work
The command uses two files:
- pii-knownnames.json; Used for the column name recognition
- pii-patterns.json; Used for the pattern recognition
You can find the files here in the GitHub repository.
The patterns and known names are setup using regex to make the scan really fast.
Also, using regex this way with the JSON files makes the solution modular and easy to extend.
pii-knownnames.json
An example of a known name regex is this:
^.*name.*$
What this does is, it tries to match anything with “name”.
pii-patterns.json
The pattern regexes tend to be more complex than the know names. This is because we have to deal with more complex data.
An example of a pattern:
(3[47]\d{13})|(3[47]\d{2}[-| ]\d{6}[-| ]\d{5})
This particular pattern is used to find any MasterCard credit card numbers.
How can you help
What I need from you is to see if you can come up with more patterns that could lead to a more exact result.
I opened an issue in the Github repository where you can leave a comment with the pattern.
If this pattern is only used in a certain country, make sure you include which country this applies to.
I want to thank beforehand for any input.
If you have any questions leave a comment here, contact me through SQL Community Slack Channel or Twitter both as @SQLStad.