Memory improvements in data masking for dbatools


If you’ve used the data masking command in dbatools you’ve probably noticed that the PowerShell session becomes memory intensive when it has to handle larger tables with one or more unique indexes.

The reason that happens is that during the data masking process the command looks for any unique indexes in the table. If it finds a unique index it will create a unique row for all the columns in the unique index.

The command creates the values in memory. This means that you’ll have the all the values that eventually get inserted into a table in memory. This can lead to massive amount of memory being used when you have wider unique indexes with large data types.

There was also another problem that I had to fix and that was that it would create those unique values for every unique index. This was also the case when there were overlapping columns when multiple unique indexes were being used. This was not efficient and I wanted to make something better for that too.

I’ve been thinking about solutions for this problem because I think this command should be usable in the almost every situation.

I was able to cut down the memory usage of the command from over 10GB to be less than 2 GB for a reasonable sized table. The process memory usage would not get any bigger because it wouldn’t handle those values no longer in memory.

Here is how I made memory improvements in data masking for dbatools.

Problem #1: Moving away from memory

The solution for this problem was pretty obvious, move away from memory and use another storage solution to temporarily save the values.There are a couple of solution we can use.

Store the values

  1. in a file on disk
  2. in a database

There were pros and cons for each solution

Files on disk

Storing the values in rows, like a CSV format, on disk is really easy in PowerShell. The CVS format would’ve been my solution and I was not even considering something like JSON because that would create those large text files.

We can easily iterate through the selection by looking at the row numbers when imported to make it work in the data masking process as well.

The problem comes when we have to read the values. We would have to read the entire file again and therefore use more memory. This was not what I wanted.

Use a database

Using a database seems very logical in this context. We’re already connected to a server and are able to create and use tables.

The downside is that we may be using storage because we’re temporarily creating a part of the table and this could get big with larger tables.

One upside to using databases is, that I can create identifiers for each row. I can then query a single row from that table and get all the unique values fast and efficient.

The solution

The decision was made, I was going to move the process to use a database.

The next decision I had to make was to either use the database that was going to be masked or use a separate one.

Both solutions have again their pros and cons, but I did not want to handle the cleanup of a new database. I also didn’t want the database to masked to become larger because I would grow the data file.

The solution was to start using “tempdb” to create the temporary tables in.
Tempdb is great because:

  • it’s always there
  • you can optimize tempdb
  • in case of bad cleanup it will destroy data when the session is destroyed

Now we have to consider creating the tables for the unique indexes in tempdb which was the next problem.

Problem #2: Creating a single unique row for all unique columns

One thing I didn’t like about the data masking command was the way it handles the unique indexes.

It created an object in memory for each unique index and that added up in processing time and memory usage.

We tackled the problem of the memory usage by using SQL Server’s tempdb database. I still had the problem of the multiple objects/tables for each unique index.

This was a bit harder to solve. I had to

  1. rewrite the process of retrieving all the unique indexes in a table
  2. collect all the columns and remove the duplicates
  3. create a table statement for all the unique columns
  4. add an identifier to make it easy to look up the row
  5. add an index to the identifier to make the lookup query fast

That is quite a bit of work to go through. In the end I decided to make another command to handle that process for me. Because that process is to far away from the actual data masking itself, it was not a good idea to put this in the data masking command.

The command I created is called “Convert-DbaIndexToTable” and is an internal function in dbatools.

By default you cannot call this command, there are obviously ways to do it but it’s only build for the data masking process.

The command does the following

  1. Get all the unique indexes on a particular table
  2. Get all the columns from those indexes in an array
  3. Checks each column for the data type or user-defined data types
    1. In case of an UDDT it will look into that property to get the actual data type
    2. In case of a normal data type it will just use those properties
  4. Adds a column to the array to be the row identifier
  5. Put together the
    1. Create table statement
    2. Create unique index statement for the temporary table

But wait a minute! Why do we need another unique index on the temporary table?

The answer to that is: Because we want to make sure each row is unique across all the unique index colums.

This was a solution I implemented because of the way the unique values are generated.
When the data masking command generates a unique row for all the columns, I want that row to be unique throughout the entire data set.

I could have created a process to check all the values in that table, but I could just as easily let SQL Server return an error when the unique values were already present in the table.
When it returns an error the data masking would perform another iteration for that row as long as it fails to insert it.
This is very fast, efficient and less memory consuming then handling the process myself.

But Sander, what if the unique column is not present in the data masking configuration? Are we still going to generate the unique value for that column?

The answer to that is: No.

When you have a unique index in your table and you don’t add the columns of the unique index to the configuration file, the data masking command will not generate a value for that column.

This again comes back to efficiency and speed. When we have a unique index with 4 columns and we only add 1 column, because we make sure that the value in the entire data set is unique, we can be sure that the collection of the values is still unique even if we don’t add the other columns to the configuration file.

Wow! That’s a lot to take in and I’ve been banging my head on the process for a while to make this work in the code.


Along the way I sometimes stepped out of the main change and changed some other parts of the data masking command too

  1. Moved repetitive to separate functions
  2. Implemented static values feature
  3. Improved randomized value function
  4. Added more unit tests

This change was a lot of work but it was definitely necessary to be able to use the command for larger databases.

You can look into the pull request to get more info about the changes.

I hope you found this informative and happy data masking!

If you have any questions about the data masking commands in dbatools let me know. You can ping me on Twitter and I’m always present in the “SQL Server Community” slack channel. You can join this channel through this link.

Generating SSDT Solutions From Templates


Consider the following scenario, you’re a database developer and your company has just decided that they want to implement DevOps for databases.You have multiple databases that need to be put source control and each database needs it’s own database project.

The first thing you’ll need to do is decide whether or not you want to use migration based or state based deployments.

This post is not going to discuss the pros and cons for these different methods, instead we’re going to use state based migrations using SQL Server Data Tools (SSDT) solutions.

If you want to know more about state vs migration based deployments, follow this link:

Having to create multiple solutions for multiple databases can become a tedious task fast. Besides it’s being repetitive, there is a chance to make mistakes.

That is where templates come in.


Yes, templates. But how are we’re going to create a template for an SSDT solution in such a way that it can be reused?

That’s where the PowerShell module called “PSModuleDevelopment” comes in. PSModuleDevelopment is part of the PSFramework PowerShell module.

The PSModuleDevelopment module enables you to create templates for files but also entire directories. Using placeholders you can replace values in the template making is possible to have the name and other variables set.

This is where the SSDT template comes in. I have created a template for SSDT that containes two projects. One project is meant for the data model and the other project is meant for the unit tests.

I did not yet tell you about that yet, the template enables you to use tSQLt to create your unit tests. In the next blog post I will demonstrate how to generate basic unit tests using the PStSQLTTestGenerator PowerShell module.

The template can be downloaded from this GitHub repository.

Generate the SSDT solution

But to make things easier for you, I created a script that downloads that template from Github, installs it for you and creates the SSDT solution in one go.

Replace the value for the “projectName” variable to t he name of your database and run the script.

# URL for SSDT project template

# Import the global variables
$templateName = "SSDT-With-tSQLt"
$templateDescription = "SSDT project template including tSQLt"
$projectDestinationPath = "c:\temp\SSDT\"

# DON'T CHANGE ANYTHING BELOW                                                          #
if ((Get-Module -ListAvailable).Name -notcontains 'PSFrameWork') {
    Write-Warning "Please install PSFramework using 'Install-Module PSFramework'"

# Setup variables
$url = ""
$zipFilePath = "$($Env:TEMP)\"
$archiveDestPath = "$($Env:TEMP)\SSDT-With-tSQLt-Template"
$pathToTemplate = "$($archiveDestPath)\SSDT-With-tSQLt-Template-master"

# Check if the template is not already there
$templates = Get-PSMDTemplate # Should not contain 'SSDT-With-tSQLt-Template'
if ($templates.Name -contains $templateName) {
    try {
        Write-PSFMessage -Level Host -Message "Removing PSF template"
        Remove-PSMDTemplate -TemplateName $templateName -Confirm:$false
    catch {
        Stop-PSFFunction -Message "Could not remove template"

# Check if the directory is already there
$projectPath = Join-Path -Path $projectDestinationPath -ChildPath $projectName
if ((Test-Path -Path $projectPath)) {
    try {
        Write-PSFMessage -Level Host -Message "Removing project destination path '$projectPath'"
        $null = Remove-Item -Path $projectPath -Recurse -Force
    catch {


# Remove the template directory
if (Test-Path -Path $archiveDestPath) {
    try {
        Write-PSFMessage -Level Host -Message "Removing existing archive destination path 'v'"
        $null = Remove-Item -Path $archiveDestPath -Recurse -Force
    catch {
        Stop-PSFFunction -Message "Could not remove archive destination directory '$archiveDestPath'"

# Create the project dir
try {
    Write-PSFMessage -Level Host -Message "Creating project directory '$projectPath'"
    $null = New-Item -Path $projectPath -ItemType Directory
catch {
    Stop-PSFFunction -Message "Could not create project destination directory"

# Download the file
try {
    Write-PSFMessage -Level Host -Message "Downloading file to '$zipFilePath'"
    $null = Invoke-WebRequest -Uri $url -OutFile $zipFilePath
catch {
    Stop-PSFFunction -Message "Something went wrong downloading the template archive" -Target $url -ErrorRecord $_

# Extract the archive
try {
    Write-PSFMessage -Level Host -Message "Extracting '$zipFilePath' to '$archiveDestPath'"
    Expand-Archive -Path $zipFilePath -DestinationPath $archiveDestPath -Force
catch {
    Stop-PSFFunction -Message "Something went wrong extracting the template" -Target $url -ErrorRecord $_

# Create the template
try {
    Write-PSFMessage -Level Host -Message "Creating new PSF template '$templateName' from '$pathToTemplate'"
    New-PSMDTemplate -ReferencePath $pathToTemplate -TemplateName $templateName -Description $templateDescription -Force
catch {
    Stop-PSFFunction -Message "Something went wrong creating the template" -Target $url -ErrorRecord $_

# Create the project
try {
    Write-PSFMessage -Level Host -Message "Creating solution from template '$templateName'"
    Invoke-PSMDTemplate -TemplateName $templateName -OutPath $projectDestinationPath -Name $projectName -Force
catch {
    Stop-PSFFunction -Message "Something went wrong creating the project" -Target $url -ErrorRecord $_

# Open the windows explorer with this solution
explorer (Join-Path -Path $projectDestinationPath -ChildPath $projectName)

After running the script you should see something like this

The result

After the script ran successfully, it will open an explorer window showing the just generated SSDT solution.

As you can see the solution has the name you gave it in the script. This is done throughout the entire solution.

Opening up the solution with Visual Studio we can see the following in the Solution Explorer

As you can see it has two projects:

  1. YOURDATABASENAME-Data; Meant for the data model
  2. YOURDATABASENAME-Tests: Meant for the unit tests

Importing the current data model

The next step will be to import your current database into the solution.

Right-click the “-Data” project, go to “Import” and click on “Database”.

Then click on the “Select Connection”, select the database and click on “Ok”.

For smaller databases with the same schema I set the “Folder Structure” to “Object Type”. If you have many different schemas then selecting “Schema\Object Type” may be better.

Click on “Start” and the result should look something like this:

Now the only thing that rests is to put your database in source control. Preferably you’re going to use Git, because Git……. is awesome.

You are now done creating the initial project. You can now do the same thing for the next database.

I hope this helps you and any comment is appreciated.

Searching For SMO Objects With Certain Properties


The problem

In some situations I want to search through lots of objects to look for certain properties in SMO (SQL Server Management Objects)

This is also the case in this situation. I wanted to know all the different objects that had a property called “Schema”.

But what to do with all those different properties and methods we could look up. I mean, there are hundreds of objects in there and each of them have many methods and properties.

Getting the objects

$server = Connect-DbaInstance -SqlInstance localhost

$db = $server.Databases['WWI']

$objects = $db | Get-Member

Counting all the stuff we got back we have a count of 284. Going through each of the items is not going to work.

The first thing we have to do is filter out all the properties that are actual objects. We want to exclude all the properties that would return values like boolean, string etc.

Let’s change the object selection

$objects = $db | Get-Member | Where-Object { $_.MemberType -eq 'Property' -and $_.Definition -like 'Microsoft*' }

That only leaves us with 82 objects which makes things a lot easier.

Now for the last part we’ll iterate through the objects and get the properties and check for the name “Schema”

foreach ($object in $objects) {
    $dbObjectProperties = $null

    $dbObjectProperties = $db.($object.Name) | Get-Member -ErrorAction SilentlyContinue | Where-Object Membertype -eq 'Property' | Select-Object Name

    if ($dbObjectProperties.Name -contains 'Schema') {

The result of the objects that have that property

  1. ExtendedStoredProcedures
  2. SecurityPolicies
  3. Sequences
  4. StoredProcedures
  5. Tables
  6. UserDefinedFunctions
  7. UserDefinedTableTypes

Cleaning up the script

I can’t help myself and I always want my scripts to be able to have parameters and have some error handling in them.

The script uses the Connect-DbaInstance command from dbatools.

The end result:



if (-not $SqlInstance) {
    Write-Error "Please enter a SQL Server instance"

if (-not $PropertyName) {
    Write-Error "Please enter a property to search for"

try {
    $server = Connect-DbaInstance -SqlInstance $SqlInstance -SqlCredential $SqlCredential
catch {
    Write-Error "Could not connect to instance '$SqlInstance'"

if (-not $Database) {
    Write-Verbose "Setting database to 'master'"
    $Database = 'master'

if ($Database -notin $server.Databases.Name) {
    Write-Warning "Database could not be found"

$db = $server.Databases[$Database]

$objects = $db | Get-Member | Where-Object { $_.MemberType -eq 'Property' -and $_.Definition -like 'Microsoft*' }

foreach ($object in $objects) {
    Write-Verbose "Retrieving properties for $($object.Name)"
    $dbObjectProperties = $null

    $dbObjectProperties = $db.($object.Name) | Get-Member -ErrorAction SilentlyContinue | Where-Object Membertype -eq 'Property' | Select-Object Name

    if ($dbObjectProperties) {
        $results = Compare-Object -ReferenceObject $dbObjectProperties.Name -DifferenceObject $PropertyName -IncludeEqual

        if ($results.SideIndicator -contains "==") {

Just run the command like this

.\Get-SMOObjectWithProperty.ps1 -SqlInstance localhost -PropertyName Schema

Making it public

For anyone who wants to do something similar, here is the code


Use Azure To Store SQL Server Backups Offsite


You always think your environment is setup correctly and that you’re able to recover in case of a disaster. You make backups, test your backups, setup DR solutions and in the end test the DR plan (very important).

But have you ever considered a situation where all your data is unusable? If you get infected with ransomware, and the trojan gets a hand on your backups, all your precautions and preparations have been for nothing.

A solution for this would be to use Azure to store SQL Server backups offsite. That way at least your backup files will not be easily infected and encrypted and you will at least have your data.

Thanks to Stuart Moore for pointing me to the right direction.

Possible Solutions

Directly Backup to Azure Blob Storage

Since SQL Server 2012 SP1 CU2, you can now write SQL Server backups directly to the Azure Blob storage service. This is very convenient when you directly want to save your backups offsite.

To do this, instead of using a path, you assign a URL to backup, to which would look similar to this:

BACKUP DATABASE [WideWorldImporters] 
TO  URL = N'' 

Ola Hallengren’s Backup Solution

The SQL Server Backup solution Ola Hallengren has created also supports this feature. You specify an URL and a credential to setup the connection.

An example of the command would look like this

EXECUTE dbo.DatabaseBackup
@Databases = 'USER_DATABASES',
@URL = '',
@Credential = 'MyCredential',
@BackupType = 'FULL',
@Compress = 'Y',
@Verify = 'Y'

Azure AzCopy

Another tool we can use to write our backups to Azure BLOB storage is to use the command utility AzCopy. The utility is free and can be downloaded from here.

The advantage of this tool is that it can be used next to any other tool that is used to create the backups.

In most situations we backup files to a local disk, or network location. In the direct backup and Ola Hallengren’s solution you have the choice to either backup to a file system or choose to backup to the Azure Blob storage.

Setting up the solution

In my ideal solution I would like to do both, backup the databases to the local file system or network and copy the files offsite.

To have all the flexibility and the security of the offsite backups I want one job to do all the work.

In normal circumstances I would use my go-to hammer and script everything in PowerShell. Although that’s totally possible, our database servers are setup with Ola Hallengren’s SQL Backup to make the backups.

To accomplish my solution I want to start another process to copy the files right after the backup job step successfully completes.


Most of the scripting will be done in PowerShell for creating the storage account, the container and getting the access key.

Create the storage account

#we're going to use splatting
$NewStorageAccount = @{
   ResourceGroupName  = "dbatools"
   AccountName = "azblogdemo"
   Location = "Uk South"
   SKUName = "Standard_GRS"
   Kind = "StorageV2"
   AccessTier = "Hot"
   EnableHttpsTrafficOnly = $true
#Create the Storage Account
New-AzStorageAccount @NewStorageAccount

In addition you can create additional containers to hold your backups. In my case I created a container called “sqlbackup” but that’s not necessary.

Get access to the storage account

Each storage account has two access keys which gives a resource the ability to access it.

Although very handy, these keys give too many privileges to the resource that wants to access the storage account.

Instead you can create a signature that will enable to specify the privileges more granular including services, resource types, permissions and even the expiration time.

Select the proper permission, set the expiration and hit the “Generate SAS…” button.

This will generate the connection string

We will use the “SAS token” in the next step

Create the job step

You can use the example code below regardless of the application used to execute “AzCopy.exe”.

In my case I wanted to use a SQL Server Agent job to do all the work. I scheduled the job to run every 30 minutes.

Make sure that the SQL Server Agent service account has access to the location of AzCopy.exe. At least read and execute permission

Create a new job step with a Command Line Exec

The command

"[location to azcopy]\AzCopy.exe" /Source:"[backup file location]" /Dest:"[yourstorageaccount]" /DestSAS:"[yoursassignature]" /Y /S /XO

An example

"C:\Program Files (x86)\Microsoft SDKs\Azure\AzCopy\AzCopy.exe" /Source:"V:\SQLServer\Backup\" /Dest:"" /DestSAS:"?sv=2018-03-28&ss=bfqt&srt=sco&sp=rwc&se=2020-08-06T14:43:18Z&st=2019-12-06T06:43:18Z&spr=https&sig=th6lchHLC6pH4TZhVrFHwWaazzddzMLakWkxUydOtH%2FdQo%3D" /Y /S /XO

Some other options

In my case I wanted to separate the full backup files and the log files. To do that we can apply the “/Pattern” option. The code below filters out the “.bak” files.

"C:\Program Files (x86)\Microsoft SDKs\Azure\AzCopy\AzCopy.exe" /Source:"V:\SQLServer\Backup\" /Dest:"" /DestSAS:"?sv=2018-03-28&ss=bfqt&srt=sco&sp=rwc&se=2020-08-06T14:43:18Z&st=2019-12-06T06:43:18Z&spr=https&sig=th6lchHLC6pH4TZhVrFHwWaazzddzMLakWkxUydOtH%2FdQo%3D" /Pattern:"*.bak" /Y /S /XO


This concludes the Azure BLOB storage setup to copy our backup files off site.

I hope you enjoyed this and maybe this comes in handy in your daily work.

T-SQL Tuesday #116: Why adopt SQL Server on Linux


My T-SQL contribution for this month discusses why you should consider adopting SQL Server on Linux.

This month’s T-SQL Tuesday is hosted by Tracy Boggiano. Tracy invites us all to write about what we think everyone should know when working with SQL Server on Linux, or anything else related to SQL running on Linux.

You can read more about the invite in detail by clicking on the T-SQL Tuesday logo on the left.

I have been working with Linux on and off for about 20 years now.

The first time I got in contact with Linux was when RedHat released version 5 of their distribution back in 1997 and fell in love with it. For the first time I was able to do things outside of a GUI.

I must say that back then it was kind of hard to update Linux with a new kernel. I remember spending hours and hours of compiling new kernels, crossing my fingers if I did it right and it would crash my entire server.

Nowadays this process is a lot easier and the distributions are so good that you don’t even have to wonder about it anymore. Installations of distributions are as easy at it comes and updating applications is a breeze.

I have been using Linux at college, at work places and at home for various reasons. I like to work in the command line interface and rarely use the GUI.

That’s probably the reason that I like PowerShell so much too.

Back to 2019

SQL Server on Linux is a fact. If you had told me 10 years ago that SQL Server on Linux would be a fact, I would’ve probably grinned and walked on.

But Microsoft has changed it’s perspective and is actively joining the open-source community.

Microsoft has mentioned recently that they have more Linux VMs running than Windows Server in Azure. That’s all because of the change in mindset to work with the administrators and enable them to use Linux.

Why adopt SQL Server on Linux

If you’re a Linux shop that’s going to be a no-brainer. Many companies are using this in production as we speak. It runs just as fast, maybe even faster, than the Windows version.

The installation of SQL Server on Linux is a matter of running a few small scripts and you have SQL Server running on Linux.

You can run SQL Server on Linux with Active Directory to do the authentication:

Another big thing that has been around for a while is Docker and the ability to run SQL Server on Linux in Docker.

If you haven’t seen Bob Ward’s session about SQL Server on Linux with containers you should visit his OneDrive and take a look at it. I went to this session at SQL Bits 2018 and was amazed by the ease of it.  He was able to switch between instances, update instances and drop them again in minutes.

I tried out his demos and was able to run multiple instances in a matter of minutes. No longer do I have to go through an entire installation of SQL Server on Windows. It just works!

This is a big advantage for the CI/CD pipeline you have been wanting to build with SQL Server where you can just start and stop instances of SQL Server whenever it’s needed.

The next level would be to run SQL Server on Linux in Kubernetes and have a production setup to make sure your instance of SQL Server is always running.

You can of course run containers on Windows but I would advise to run docker on a Linux machine. I have had some trouble with Docker on Windows. The biggest reason was that I also use VMWare Workstation on my laptop. This makes it impossible or run Docker on Windows, because you cannot have two hypervisors on a single machine.


I love SQL Server on Linux and this is probably the best thing that has happened with SQL Server for a long time.

We as a pro Linux shop are looking into running SQL Server on Linux for our production environments. That’s a big thing because we’ve been running SQL Server on Linux forever.

Microsoft has done a great job to make it very easy for us to implement it within our enterprises.

If you’re still hesitant if you should try it out just take a look at all the articles that have been written about it and you’ll probably want to try it out for your self.



Scanning for PII with dbatools


Recently a brand new command was released that could help you scan for PII (Personal Identifiable Information) in our databases.

What Is Personally Identifiable Information (PII)?

Personally identifiable information (PII) is like the name implies, data that can be used to identify a person. It is typically actively collected, meaning the information is provided directly by the individual.

Here are a couple of identifiers that qualify as PII-based data:

  • Name
  • Email address
  • Postal address
  • Phone number
  • Personal ID numbers (e.g., social security, passport, driver’s license, bank account)

Why is this command developed

The idea came from a line of commands that are present in dbatools to mask data. Although these commands are great, going through all of the tables and look through the data was a bit tedious for me.

Especially when you’re dealing with databases  that have hundreds to thousands of tables, you easily run into the thousands to tens of thousands of columns.

So that’s how I came up with the command to scan for PII and it’s called Invoke-DbaDbPiiScan and is present in dbatools from version 0.9.819.

The command returns all the columns that potentially contain PII. I must say potentially, because the results still need to be assessed if it indeed contains PII. But it takes care of eliminating the majority of the columns saving you a lot of time.

This information is very valuable when you have to deal with the GDPR, but also when you have to deal with things like HIPAA.

How does the command work work

I recently wrote about the command to ask for help from the community to come up with patterns and known names to improve the scan.

It’s setup in such a way that to improve the scan, we only need to look at the known name and the patterns. The known names and patterns are setup using regex or regular expressions in full.

Regular Expressions is a sequence of characters that defines a search pattern. It can be used to match a series characters from simple to very complex.

The files with the regular expressions are located in the bin\datamasking folder.

During the scan the command will go through two phases:

  1. Scan for known names
  2. Scan for data patterns

If the command comes across a column that matches in phase one, it will skip that column for phase 2. Because it already flagged this column to potentially have PII, it would not make sense to also try to match all the patterns on it. Avoiding this makes the process fast and efficient.

Known Names

The file that contains all the known column names is called pii-knownnames.json.

A known name has the following properties:

  1. Name
  2. Category
  3. Pattern

An example of a known name is:

    "Name": "Name",
    "Category": "Personal",
    "Pattern": [

In this example, if the name of the column matches anything like firstname, fname, lastname etc, it will return in the scan.

Data Patterns

The file that contains all the data patterns is called pii-patterns.json.

A pattern has the following properties:

  1. Name
  2. Category
  3. Country
  4. CountryCode
  5. Pattern
  6. Description (not yet in production at the time of writing this article)

The pattern has a little more information than the know name. The reason for that is that the known name is not bound to countries and only applies to language. Because a language can be used in multiple countries, adding a country to the known name wouldn’t make sense.

The second reason why there is a country and countrycode property is that this enables the user to filter on specific countries. Imagine you have a database with only data from a specific country, going through a very large set of patterns would be a very long process.

With the country and country code, the command is able to filter on the patterns and only try to match those that make sense for the user.

An example of a pattern is:

    "Name": "Creditcard Mastercard",
    "Category": "Financial",
    "Country": "All",
    "CountryCode": "All",
    "Pattern": "(5[1-5]\\d{14})|(5[1-5]\\d{2}[-| ]\\d{4}[-| ]\\d{4}[-| ]\\d{4})",
    "Description": ""

Running the command

Enough talk how it all works, let’s get to the point to execute the command.

The easiest way of running the command is by executing the following line (replacing the brackets of course)

Invoke-DbaDbPiiScan -SqlInstance [yourserver] -Database [yourdatabase]

The result would look something like this

As you can see the database has a variety of columns that comes into the category of being PII.

In the example above the command finished within a couple of seconds. But when you have a wide range of tables and columns this process can take a little longer. To known the progress the command will display a progress bar to show you the current status of the scan

The example above uses the Out-GridView commandlet to output the results to a GUI matrix. This makes it easy to look at the results. The command would like this

Invoke-DbaDbPiiScan -SqlInstance [yourserver] -Database [yourdatabase] | Out-GridView

The result would look something like this

What’s next?

The next step with this command is to implement the functionality in the New-DbaDbDataMaskingConfig command. I want the user to be able to go straight to the usual suspects in the database and only create a config for those columns that potentially have PII.

The command has several other parameters to make more specific scans. Take a look at the help from the command to get to know more about the other parameters.

Get-Help Invoke-DbaDbPiiScan

There are also several examples that in the help that can also get you very far.

I hope this helps you out a bit. Especially when you’re dealing with the entire GDPR jungle finding all the little pieces within your organization that holds PII.

If you want think you’re missing some patterns or know names please help us out. With all of you we can make this scan really thorough. Please read my previous blog post to know how to help out.


Help needed for new PII command

together we create


The location of the JSON files has been changed to development branch

I’ll cut right to it, I need your help.

I’m developing a new command for dbatools to scan for PII.

I already have a wide variety of different patterns and ways to check on possible personal information but I want to be as thorough and complete as possible.

The command is called Invoke-DbaDbPiiScan and it does two things:

  1. it scans the columns in the tables and sees if it is named in such a way that it could contain personal information
  2. it retrieves a given amount of rows and goes through the rows to do pattern recognition

pii scan result

How does it work

The command uses two files:

  1. pii-knownnames.json; Used for the column name recognition
  2. pii-patterns.json; Used for the pattern recognition

You can find the files here in the GitHub repository.

The patterns and known names are setup using regex to make the scan really fast.
Also, using regex this way with the JSON files makes the solution modular and easy to extend.


An example of a known name regex is this:


What this does is, it tries to match anything with “name”.


The pattern regexes tend to be more complex than the know names. This is because we have to deal with more complex data.

An example of a pattern:

(3[47]\d{13})|(3[47]\d{2}[-| ]\d{6}[-| ]\d{5})

This particular pattern is used to find any MasterCard credit card numbers.

How can you help

What I need from you is to see if you can come up with more patterns that could lead to a more exact result.

I opened an issue in the Github repository where you can leave a comment with the pattern.

If this pattern is only used in a certain country, make sure you include which country this applies to.

I want to thank beforehand for any input.

If you have any questions leave a comment here, contact me through SQL Community Slack Channel or Twitter both as @SQLStad.


T-SQL Tuesday #110 – Automate All the Things


This month’s T-SQL Tuesday is about automating all the things.

I love automating processes, especially when it comes to managing SQL Server, and I think that most processes should be automated.

So technically there are two tasks for this month:

  • What is your go-to technology for automation?
  • What do you want to automate or what automation are you proud of completing?

For me, a process can be automated when:

  • It’s repetitive
  • Takes a long time to execute
  • And needs human interaction

I will go far to automate a process. I’d rather spend a week developing a process to get it automated, than to execute it twice and spend lots of time on it.

What is your go-to technology for automation?

In the last couple of years I’ve automated a whole bunch of processes using different technologies to accomplish that. My go-to technologies for automating are PowerShell, DevOps, tSQLt and dbatools.

You might say; “dbatools is PowerShell right?”, and you’re correct, but this module has grown so much that it has to be mentioned on it’s own.


I’ve been working with PowerShell since version 1. I fell in love with it especially because it filled a hole that we needed something else than a GUI to manage our infrastructure.

PowerShell is my hammer to automate about anything. It crosses multiple domains within the Windows universe, from managing your domains, your users, servers, services.

The PowerShell team has done a great job to make this platform as versatile as possible for anyone to use.


This part is still kind of new for me and I find that this technology uses different technologies to accomplish the job.

I used Jenkins and Octopus Deploy within my CI/CD pipeline to automate releases. The thing is that within that pipeline I used several technologies to accomplish that.

Technologies like Groovy for the pipeline script, PowerShell to execute certain tasks, dbatools to execute some other.

I like Jenkins because it let’s me define exactly what I want to do and it doesn’t have a big learning curve.

Octopus Deploy hooks into the part where Jenkins is finished. Creating releases, deploying them to different servers makes it this CI/CD process complete.


This technology has saved my life. I never did a lot of development but I know how to develop a database. In the past I made changes, pushed the changes to a development server and crossed my fingers if I didn’t break anything.

With tSQLt I have the ability to create unit tests for SQL Server databases to test every little aspect of my database.

If a small change in the schema breaks a procedure I will know before I release it.

I will be presenting about this subject in the near future and help people along the way because I think this needs to be used a lot more.


Last but not least, dbatools. This project changed my life in several ways.

First, it taught me how to develop in a lage open-source project. To have standards and how to handle commits. I did not know Git that well before I started contributing and now I’m one of the major contributors.

Second, it gave me the ability to help others. Having several scripts laying around, that could easily be transformed to proper functions/commands that could be used by other people. If I had that problem there is bound to be someone else that has the same and I could save this person a lot of headache.

Third, it made my automate a lot of processes for myself. Think of the backup testing command “Test-DbaLastBackup“. This command tests your backup by restoring it, doing a DBCC check and returns the result back to you.
You will know if you’re backup can be restored and if that data within the backup is not corrupted.

If you have not worked with dbatools yet, make sure you start with it today. It will change your life as a database administrator or developer.

Fourth, and this one comes with contributing to the project, is that I met a lot of incredible people from the community. I love the community how they share their knowledge and help other people out.

What do I want to automate

I’ve been working on automating our releases for database development.

This process has taken a considerable amount of time because I had no knowledge and experience in that area.

I have almost finished that process and I’m glad to say that this year I’ll have a CI/CD pipeline with Visual Studio, Git, Jenkins and Octopus Deploy.

My next project will be to implement data masking making it possible to ship databases to other servers and people without exposing production data.

What automation are you proud of completing

These are several projects that I’m proud of:

The pattern is getting obvious. Automate repetitive, pain-staking processes.

Change your mindset to embrace automation and you never want to go back.

I have less and less time to get things done.
My agenda is filled with meetings, project deadlines and we get more and more things on my plate.

Start automating your work today. You have to put in some energy in getting to automate your processes, but you’ll never look back.