Memory improvements in data masking for dbatools

Standard

If you’ve used the data masking command in dbatools you’ve probably noticed that the PowerShell session becomes memory intensive when it has to handle larger tables with one or more unique indexes.

The reason that happens is that during the data masking process the command looks for any unique indexes in the table. If it finds a unique index it will create a unique row for all the columns in the unique index.

The command creates the values in memory. This means that you’ll have the all the values that eventually get inserted into a table in memory. This can lead to massive amount of memory being used when you have wider unique indexes with large data types.

There was also another problem that I had to fix and that was that it would create those unique values for every unique index. This was also the case when there were overlapping columns when multiple unique indexes were being used. This was not efficient and I wanted to make something better for that too.

I’ve been thinking about solutions for this problem because I think this command should be usable in the almost every situation.

I was able to cut down the memory usage of the command from over 10GB to be less than 2 GB for a reasonable sized table. The process memory usage would not get any bigger because it wouldn’t handle those values no longer in memory.

Here is how I made memory improvements in data masking for dbatools.

Problem #1: Moving away from memory

The solution for this problem was pretty obvious, move away from memory and use another storage solution to temporarily save the values.There are a couple of solution we can use.

Store the values

  1. in a file on disk
  2. in a database

There were pros and cons for each solution

Files on disk

Storing the values in rows, like a CSV format, on disk is really easy in PowerShell. The CVS format would’ve been my solution and I was not even considering something like JSON because that would create those large text files.

We can easily iterate through the selection by looking at the row numbers when imported to make it work in the data masking process as well.

The problem comes when we have to read the values. We would have to read the entire file again and therefore use more memory. This was not what I wanted.

Use a database

Using a database seems very logical in this context. We’re already connected to a server and are able to create and use tables.

The downside is that we may be using storage because we’re temporarily creating a part of the table and this could get big with larger tables.

One upside to using databases is, that I can create identifiers for each row. I can then query a single row from that table and get all the unique values fast and efficient.

The solution

The decision was made, I was going to move the process to use a database.

The next decision I had to make was to either use the database that was going to be masked or use a separate one.

Both solutions have again their pros and cons, but I did not want to handle the cleanup of a new database. I also didn’t want the database to masked to become larger because I would grow the data file.

The solution was to start using “tempdb” to create the temporary tables in.
Tempdb is great because:

  • it’s always there
  • you can optimize tempdb
  • in case of bad cleanup it will destroy data when the session is destroyed

Now we have to consider creating the tables for the unique indexes in tempdb which was the next problem.

Problem #2: Creating a single unique row for all unique columns

One thing I didn’t like about the data masking command was the way it handles the unique indexes.

It created an object in memory for each unique index and that added up in processing time and memory usage.

We tackled the problem of the memory usage by using SQL Server’s tempdb database. I still had the problem of the multiple objects/tables for each unique index.

This was a bit harder to solve. I had to

  1. rewrite the process of retrieving all the unique indexes in a table
  2. collect all the columns and remove the duplicates
  3. create a table statement for all the unique columns
  4. add an identifier to make it easy to look up the row
  5. add an index to the identifier to make the lookup query fast

That is quite a bit of work to go through. In the end I decided to make another command to handle that process for me. Because that process is to far away from the actual data masking itself, it was not a good idea to put this in the data masking command.

The command I created is called “Convert-DbaIndexToTable” and is an internal function in dbatools.

By default you cannot call this command, there are obviously ways to do it but it’s only build for the data masking process.

The command does the following

  1. Get all the unique indexes on a particular table
  2. Get all the columns from those indexes in an array
  3. Checks each column for the data type or user-defined data types
    1. In case of an UDDT it will look into that property to get the actual data type
    2. In case of a normal data type it will just use those properties
  4. Adds a column to the array to be the row identifier
  5. Put together the
    1. Create table statement
    2. Create unique index statement for the temporary table

But wait a minute! Why do we need another unique index on the temporary table?

The answer to that is: Because we want to make sure each row is unique across all the unique index colums.

This was a solution I implemented because of the way the unique values are generated.
When the data masking command generates a unique row for all the columns, I want that row to be unique throughout the entire data set.

I could have created a process to check all the values in that table, but I could just as easily let SQL Server return an error when the unique values were already present in the table.
When it returns an error the data masking would perform another iteration for that row as long as it fails to insert it.
This is very fast, efficient and less memory consuming then handling the process myself.

But Sander, what if the unique column is not present in the data masking configuration? Are we still going to generate the unique value for that column?

The answer to that is: No.

When you have a unique index in your table and you don’t add the columns of the unique index to the configuration file, the data masking command will not generate a value for that column.

This again comes back to efficiency and speed. When we have a unique index with 4 columns and we only add 1 column, because we make sure that the value in the entire data set is unique, we can be sure that the collection of the values is still unique even if we don’t add the other columns to the configuration file.

Wow! That’s a lot to take in and I’ve been banging my head on the process for a while to make this work in the code.

Conclusion

Along the way I sometimes stepped out of the main change and changed some other parts of the data masking command too

  1. Moved repetitive to separate functions
  2. Implemented static values feature
  3. Improved randomized value function
  4. Added more unit tests

This change was a lot of work but it was definitely necessary to be able to use the command for larger databases.

You can look into the pull request to get more info about the changes.

I hope you found this informative and happy data masking!

If you have any questions about the data masking commands in dbatools let me know. You can ping me on Twitter and I’m always present in the “SQL Server Community” slack channel. You can join this channel through this link.

Tips and Tricks for StreamLab OBS at BITS

Standard

Recently I learned that SQL BITS was going to be an online event. The organizers also decided to do another approach and let the presenters record their own session.

I’m pretty familiar with SLOBS and have been streaming content for the last few months.

This gave me some experience to setup scenes and other parts of SLOBS a little more efficient and make it real easy to record my session.

Here are some of the things I did to make things easier.

  1. Setup with a green screen
  2. Setup multiple scenes
  3. Setup hotkeys

Setup with a green screen

If you don’t have a green screen yet, you can get a green screen from Amazon for about 60 euro which includes a stand and the green screen itself.

The nice thing about this is, is that you’ll be able to set the opacity of your background and make it transparent.

Make sure your lighting is correct make the quality of your video as good as it can be. Without the proper lighting you may see some sections in your video that look distorted.

Using the green screen makes it possible to show more of your screen and it distracts the viewer less letting them focus on your the content.

I’m assuming you already have your webcam source setup and that it’s receiving input.

Right-click on the webcam source in your scene and select “Filters”

Click the “+” sign, look for “Chroma Key” in the filter dropdown list and click “Done”.

You’ll see something like this

You may to tweak some settings based on your lighting, color of the green screen etc but in the end you’ll have a transparent background.

Setup multiple scenes

One thing that really helps, and what BITS wanted people to do, is to create multiple scenes.

In their document they mentioned two scenes, one with your camera filling the screen and one for your slides/demos.

Fortunately I have multiple screens, laptop and a separate monitor, which enables me to separate the slides and my demos.

This will make the transition smoother and I don’t have to close the presentation.

This is  what I created

  1. Full Camera
  2. Presentation Slides
  3. Presentation Demo

SQL Bits Camera

This scene was setup with only the webcam and the audio input capture source.

SQL Bits Presentation Slides

This scene had the webcam, the audio input device but also the display capture set to my screen that would show the slides.

SQL Bits Presentation Demo

This scene was almost exactly the same as the presentation slides, the only difference was I created a new display capture source that would show my screen that contained the demos.

Setup Hotkeys

One other thing I enabled was the setup of hotkeys in SLOBS.

When your recording your session you want the transition between scenes to be as seamless as possible.

In my case it was a bit difficult to switch the scenes without showing the SLOBS screen somewhere in the recording. Instead I wanted to use hotkeys that would switch me from my presentation scene to the demo scene.

This turned out to be very easy to do.

  1. Go to your settings in SLOBS
  2. Click on Hotkeys

Find the scene your want to set a hotkey for and look for the field “Switch to scene”.

In my case I used the combination Shift-1, 2 or 3. Why the Shift button?

Well I also use ZoomIt and the default settings for that application are Ctrl-1, 2 and 3.

Of course you use any key you want but this made sense to me. My full webcam display is set to number 1, my slides scene is set to number 2 and my demo scene is set to number 3.

You don’t have to have t he SLOBS screen active to switch between the scenes.

I hope this was useful for you. If you have any comments let me know

Generating SSDT Solutions From Templates

Standard

Consider the following scenario, you’re a database developer and your company has just decided that they want to implement DevOps for databases.You have multiple databases that need to be put source control and each database needs it’s own database project.

The first thing you’ll need to do is decide whether or not you want to use migration based or state based deployments.

This post is not going to discuss the pros and cons for these different methods, instead we’re going to use state based migrations using SQL Server Data Tools (SSDT) solutions.

If you want to know more about state vs migration based deployments, follow this link: https://lmgtfy.com/?q=database+state+vs+migration

Having to create multiple solutions for multiple databases can become a tedious task fast. Besides it’s being repetitive, there is a chance to make mistakes.

That is where templates come in.

Templates?!

Yes, templates. But how are we’re going to create a template for an SSDT solution in such a way that it can be reused?

That’s where the PowerShell module called “PSModuleDevelopment” comes in. PSModuleDevelopment is part of the PSFramework PowerShell module.

The PSModuleDevelopment module enables you to create templates for files but also entire directories. Using placeholders you can replace values in the template making is possible to have the name and other variables set.

This is where the SSDT template comes in. I have created a template for SSDT that containes two projects. One project is meant for the data model and the other project is meant for the unit tests.

I did not yet tell you about that yet, the template enables you to use tSQLt to create your unit tests. In the next blog post I will demonstrate how to generate basic unit tests using the PStSQLTTestGenerator PowerShell module.

The template can be downloaded from this GitHub repository.

Generate the SSDT solution

But to make things easier for you, I created a script that downloads that template from Github, installs it for you and creates the SSDT solution in one go.

Replace the value for the “projectName” variable to t he name of your database and run the script.

After running the script you should see something like this

The result

After the script ran successfully, it will open an explorer window showing the just generated SSDT solution.

As you can see the solution has the name you gave it in the script. This is done throughout the entire solution.

Opening up the solution with Visual Studio we can see the following in the Solution Explorer

As you can see it has two projects:

  1. YOURDATABASENAME-Data; Meant for the data model
  2. YOURDATABASENAME-Tests: Meant for the unit tests

Importing the current data model

The next step will be to import your current database into the solution.

Right-click the “-Data” project, go to “Import” and click on “Database”.

Then click on the “Select Connection”, select the database and click on “Ok”.

For smaller databases with the same schema I set the “Folder Structure” to “Object Type”. If you have many different schemas then selecting “Schema\Object Type” may be better.

Click on “Start” and the result should look something like this:

Now the only thing that rests is to put your database in source control. Preferably you’re going to use Git, because Git……. is awesome.

You are now done creating the initial project. You can now do the same thing for the next database.

I hope this helps you and any comment is appreciated.

T-SQL Tuesday 123# Life hacks that make your life easier

Standard

It’s that time of the month for another T-SQL Tuesday.

In case you are new to T-SQL Tuesday this is the monthly blog party started by Adam Machanic (b|t) and now hosted by Steve Jones (b|t). It’s a way of encouraging blog posts from the community and helping to share the knowledge.

This month’s T-SQL Tuesday is hosted by Jess Pomfret (b|t). Jess invites us all to write about your favorite life hack.

 

My favorite life hack

My first life hack would be PowerShell itself. I use PowerShell throughout the day automating anything that’s repetitive. If your hammer is big enough any idea will be a nail.

But that would be too easy right?! Let’s see what we can do with PowerShell profiles.

PowerShell Profiles

Because I use PowerShell regularly, I find myself doing the same thing every time within the console. As a good automater, doing things multiple times gets frustrating. We can fix that by adding functionality to our PowerShell profile.

Your profile is a little script that you can find (or create) which will be loaded every time you start PowerShell.

You can find your profile in your PowerShell directory.

For PowerShell 5 and lower that would be  "$env:USERPROFILE\Documents\WindowsPowerShell\" and for PowerShell 6 and above that would be "$env:USERPROFILE\Documents\PowerShell\" .

Profiles for console and VS Code

I have different profiles for the console and VS Code.

The profile for the console is named: Microsoft.PowerShell_profile.ps1

The profile for VS Code is named: Microsoft.VSCode_profile.ps1

Both the profiles have similar code, but I sometimes need different functionality in the VS Code profile.

The console with automatically find your PowerShell profile when you correctname it and place it in the right directory.

For VS Code, make sure you have enabled “Enable Profile Loading”. Go to the settings and search for “Profile” to find this setting.

What’s in my profile

With PowerShell profiles you can create little shortcuts to your favorite commands, write functions that do specific things, change your prompt to show specific information etc etc.

My profile contains the following items:

Create PS drives to navigate to specific folders quickly

I spend a lot of time in certain directories, like my repositories. Having a PS drive that points to that location makes things easier

Set aliases for regularly used programs 

Oh aliases make things so much easier. Just type “np” to open Notepad for example.

Change the location to C:\

It’s very annoying when PowerShell decides that your starting directory should be the system directory or your user profile. I want my console to always open in the root of the C-drive.

Change the prompt

How annoying are those very long paths that are shown when we go about 5 levels deep. You barely have any room to type anything before the cursor jumps to the next line.

You can change the prompt by using the by creating prompt function. In my case I changed the maximum length of the path to be 20 characters.

Show git status in the prompt

Oh I love git and use it all the time but not seeing the status of my git repository is something that can make things easier. Fortunately there is a module called “posh-git” that shows the status of the repo.

We can use that module to display the result in our prompt by using the prompt function again.

My prompt looks something like this:

Re-import certain modules

Doing development on certain modules makes me type the same command, “Import-Module ….” many of times during the development process. What if I wrote a little function that would import all the modules I would use in development in one go.

Open websites

Now I’ve just become too lazy. I wanted to open my favorite search websites from powershell when I was dealing with some questions.

So I created little functions that would open the DuckDuckGo, Google or StackOverflow website.

Get loaded assemblies

I’m one of those people that wants to see what is being loaded and in some cases that can help debug certain problems in code.

Running that script becomes tedious so I created a little function to get all the assemblies by a certain name.

The profile

My PowerShell profile looks like this:

That’s about it

I have a lot more life hacks than just the PowerShell profile, but this is one I don’t really notice when I start the console or VS Code and helps me through the day.

Take advantage of the profile and make your life easier.

Searching For SMO Objects With Certain Properties

Standard

The problem

In some situations I want to search through lots of objects to look for certain properties in SMO (SQL Server Management Objects)

This is also the case in this situation. I wanted to know all the different objects that had a property called “Schema”.

But what to do with all those different properties and methods we could look up. I mean, there are hundreds of objects in there and each of them have many methods and properties.

Getting the objects

Counting all the stuff we got back we have a count of 284. Going through each of the items is not going to work.

The first thing we have to do is filter out all the properties that are actual objects. We want to exclude all the properties that would return values like boolean, string etc.

Let’s change the object selection

That only leaves us with 82 objects which makes things a lot easier.

Now for the last part we’ll iterate through the objects and get the properties and check for the name “Schema”

The result of the objects that have that property

  1. ExtendedStoredProcedures
  2. SecurityPolicies
  3. Sequences
  4. StoredProcedures
  5. Tables
  6. UserDefinedFunctions
  7. UserDefinedTableTypes
  8. Views

Cleaning up the script

I can’t help myself and I always want my scripts to be able to have parameters and have some error handling in them.

The script uses the Connect-DbaInstance command from dbatools.

The end result:

Just run the command like this

Making it public

For anyone who wants to do something similar, here is the code

https://github.com/sanderstad/SMOProperties

 

T-SQL Tuesday #122 – Imposter syndrome

Standard

My T-SQL contribution for this month discusses imposter syndrome.

This month’s T-SQL Tuesday is hosted by Jon Shaulis. Jon invites us all to write about when we have seen, experienced or overcome imposter syndrome.

You can read more about the invite in detail by clicking on the T-SQL Tuesday logo.

 

My Experience

I’ve had my fear share of experiences with the imposter syndrome in my career.

My first time was when I first went on SQL Cruise, now called Tech Outbound, and I had the privilege to meet people like Aaron Bertrand, Grant Fritchey, Kevin Kline etc.

I remember walking up to the group and I did not know how to react to them.  These were the people I read all their books from, read all  the articles that helped my in my career. How do you talk to people that you idolize.

The good thing though, and now that I’m more involved in the community I see it happening, is that they’re just people like you and me. I was welcomed in the group like one of them and I am still honored to call them my friends.

They told me that I should not put them on a pedestal because I would know a lot of things they would not know. At first I thought that was just to make it easier on me, but during the trip I was actually able to teach people things I knew.

That let me think what experience and knowledge I had gained during my career and started to list everything up. That was the point that I wanted to present sessions at conferences which changed my life.

As the years passed the imposter syndrome was not as frequently as before. I still think that some people are way more experienced than I am and I have big respect for them. The imposter syndrome has been replaced with respect for the individual for their contributions to the field and the community.

Some advice to get you going

If you experience the imposter syndrome, don’t be intimidated. Do not compare yourself to others, but compare yourself to the person you were yesterday. In the end, be humble because that’s what will make you go the furthest.

 

Use Azure To Store SQL Server Backups Offsite

Standard

You always think your environment is setup correctly and that you’re able to recover in case of a disaster. You make backups, test your backups, setup DR solutions and in the end test the DR plan (very important).

But have you ever considered a situation where all your data is unusable? If you get infected with ransomware, and the trojan gets a hand on your backups, all your precautions and preparations have been for nothing.

A solution for this would be to use Azure to store SQL Server backups offsite. That way at least your backup files will not be easily infected and encrypted and you will at least have your data.

Thanks to Stuart Moore for pointing me to the right direction.

Possible Solutions

Directly Backup to Azure Blob Storage

Since SQL Server 2012 SP1 CU2, you can now write SQL Server backups directly to the Azure Blob storage service. This is very convenient when you directly want to save your backups offsite.

To do this, instead of using a path, you assign a URL to backup, to which would look similar to this:

Ola Hallengren’s Backup Solution

The SQL Server Backup solution Ola Hallengren has created also supports this feature. You specify an URL and a credential to setup the connection.

An example of the command would look like this

Azure AzCopy

Another tool we can use to write our backups to Azure BLOB storage is to use the command utility AzCopy. The utility is free and can be downloaded from here.

The advantage of this tool is that it can be used next to any other tool that is used to create the backups.

In most situations we backup files to a local disk, or network location. In the direct backup and Ola Hallengren’s solution you have the choice to either backup to a file system or choose to backup to the Azure Blob storage.

Setting up the solution

In my ideal solution I would like to do both, backup the databases to the local file system or network and copy the files offsite.

To have all the flexibility and the security of the offsite backups I want one job to do all the work.

In normal circumstances I would use my go-to hammer and script everything in PowerShell. Although that’s totally possible, our database servers are setup with Ola Hallengren’s SQL Backup to make the backups.

To accomplish my solution I want to start another process to copy the files right after the backup job step successfully completes.

Preparations

Most of the scripting will be done in PowerShell for creating the storage account, the container and getting the access key.

Create the storage account

In addition you can create additional containers to hold your backups. In my case I created a container called “sqlbackup” but that’s not necessary.

Get access to the storage account

Each storage account has two access keys which gives a resource the ability to access it.

Although very handy, these keys give too many privileges to the resource that wants to access the storage account.

Instead you can create a signature that will enable to specify the privileges more granular including services, resource types, permissions and even the expiration time.

Select the proper permission, set the expiration and hit the “Generate SAS…” button.

This will generate the connection string

We will use the “SAS token” in the next step

Create the job step

You can use the example code below regardless of the application used to execute “AzCopy.exe”.

In my case I wanted to use a SQL Server Agent job to do all the work. I scheduled the job to run every 30 minutes.

Make sure that the SQL Server Agent service account has access to the location of AzCopy.exe. At least read and execute permission

Create a new job step with a Command Line Exec

The command

An example

Some other options

In my case I wanted to separate the full backup files and the log files. To do that we can apply the “/Pattern” option. The code below filters out the “.bak” files.

 

This concludes the Azure BLOB storage setup to copy our backup files off site.

I hope you enjoyed this and maybe this comes in handy in your daily work.

T-SQL Tuesday #116: Why adopt SQL Server on Linux

Standard

My T-SQL contribution for this month discusses why you should consider adopting SQL Server on Linux.

This month’s T-SQL Tuesday is hosted by Tracy Boggiano. Tracy invites us all to write about what we think everyone should know when working with SQL Server on Linux, or anything else related to SQL running on Linux.

You can read more about the invite in detail by clicking on the T-SQL Tuesday logo on the left.

I have been working with Linux on and off for about 20 years now.

The first time I got in contact with Linux was when RedHat released version 5 of their distribution back in 1997 and fell in love with it. For the first time I was able to do things outside of a GUI.

I must say that back then it was kind of hard to update Linux with a new kernel. I remember spending hours and hours of compiling new kernels, crossing my fingers if I did it right and it would crash my entire server.

Nowadays this process is a lot easier and the distributions are so good that you don’t even have to wonder about it anymore. Installations of distributions are as easy at it comes and updating applications is a breeze.

I have been using Linux at college, at work places and at home for various reasons. I like to work in the command line interface and rarely use the GUI.

That’s probably the reason that I like PowerShell so much too.

Back to 2019

SQL Server on Linux is a fact. If you had told me 10 years ago that SQL Server on Linux would be a fact, I would’ve probably grinned and walked on.

But Microsoft has changed it’s perspective and is actively joining the open-source community.

Microsoft has mentioned recently that they have more Linux VMs running than Windows Server in Azure. That’s all because of the change in mindset to work with the administrators and enable them to use Linux.

Why adopt SQL Server on Linux

If you’re a Linux shop that’s going to be a no-brainer. Many companies are using this in production as we speak. It runs just as fast, maybe even faster, than the Windows version.

The installation of SQL Server on Linux is a matter of running a few small scripts and you have SQL Server running on Linux.

You can run SQL Server on Linux with Active Directory to do the authentication:

Another big thing that has been around for a while is Docker and the ability to run SQL Server on Linux in Docker.

If you haven’t seen Bob Ward’s session about SQL Server on Linux with containers you should visit his OneDrive and take a look at it. I went to this session at SQL Bits 2018 and was amazed by the ease of it.  He was able to switch between instances, update instances and drop them again in minutes.

I tried out his demos and was able to run multiple instances in a matter of minutes. No longer do I have to go through an entire installation of SQL Server on Windows. It just works!

This is a big advantage for the CI/CD pipeline you have been wanting to build with SQL Server where you can just start and stop instances of SQL Server whenever it’s needed.

The next level would be to run SQL Server on Linux in Kubernetes and have a production setup to make sure your instance of SQL Server is always running.

You can of course run containers on Windows but I would advise to run docker on a Linux machine. I have had some trouble with Docker on Windows. The biggest reason was that I also use VMWare Workstation on my laptop. This makes it impossible or run Docker on Windows, because you cannot have two hypervisors on a single machine.

Conclusion

I love SQL Server on Linux and this is probably the best thing that has happened with SQL Server for a long time.

We as a pro Linux shop are looking into running SQL Server on Linux for our production environments. That’s a big thing because we’ve been running SQL Server on Linux forever.

Microsoft has done a great job to make it very easy for us to implement it within our enterprises.

If you’re still hesitant if you should try it out just take a look at all the articles that have been written about it and you’ll probably want to try it out for your self.