It’s Sunday morning around 9:00 AM (I just can’t sleep in) and I find myself looking through e-mails, reading some interesting articles. I take a good sip of my big cup of coffee when my company cellphone rings.
Normally nobody calls me in the weekend so this had to be important otherwise Monday is going to be a bitch to the person calling me.
“Hi Sander, up already?”
It’s one of the junior DBA’s who had standby duty this weekend.
“Hi Steven, what’s up? I know you’re not calling to have a chat” I replied.
“I’m sorry to bother you but we’re having some weird issues with one of the production database servers and I can’t put my finger on it. Can you maybe take a look because the server performance is really bad and things seem to change”.
“OK, tell me which server it is and what you’ve already investigated and found.” I replied
“It’s dbserver01 and I looked at the performance reports from our monitoring tool and around 6:00 AM this morning the server started to create a lot of disk I/O, more than normal due to the baselines. The server has sufficient disk space, memory and backups all finished successfully.” Steven replied.
I was so pleased that we created these baselines about a month ago because otherwise we wouldn’t have known anything what this server did. I knew that during the weekend several big jobs would be running on this server but the last part Steven told me made the hair on the nape of my neck prickle. These settings don’t change suddenly and somebody or something must have initiated the change.
The first thing that pops in my mind when something doesn’t run like normal is “What changed?”.
I logged into the server and it was sooooo slow. It took about 3 minutes before I was fully logged in and got SSMS started (normally this takes less than a minute).
In the last month we created the baselines but we also documented all the settings of all the database servers. We did this by selecting all the values from the sys.configurations with this query
SELECT * FROM sys.configurations ORDER BY name ; GO
and save these values to an Excel sheet. This is not the best way to do it but we at least had something.
I did the same selection again and I found that some settings were changed and the one that caught my eye was the max memory had changed.
The jobs running on the server are SSIS packages executing an ETL process which costs a lot of memory at certain points. When the SSIS package doesn’t have enough memory all the data is swapped to disk and the process can take more than a day instead of a hour.
The next thing I wanted to know if anything changed in the databases. I looked properties of the database which was known to be queried a lot during the nightly jobs and something changed in the database and one of the changes was the “IsAutoShrink” property which was set to true.
OK. enough for the investigation the database and the server need to get back to the previous settings. I called Steven again.
“Steven, Sander here. Did we have any changes done on the production server in the last week? Did any of the other DBA’s change anything on the server?”
“No, no changes are planned on the server and nobody in their right mind would change anything on that server.” Steven replied.
“OK! I know enough and I will get back to you when I fixed the issues.”
The ETL process was deigned to be able to restart at any moment and continue where it was. That was a great win. I stopped the process, changed all the settings of the server and database back to the previous documented settings.
Now lets restart the processes and see what happens. Not to my surprise everything finished in time and the day was saved.
What or why initiated the changing settings?
As you may know, server and database settings don’t change that easily and you have to have elevated rights on the server and/or database to make it even possible.
I got that underbelly feeling and started asking myself who could’ve changed the settings. I didn’t suspect any automated script to be executed. I want to know if anything changes in the future and who’s initiates the change.
Since SQL Server 2008 it’s possible to create audits on server and database level where changes to can be logged to a file or windows server log. I didn’t want to use triggers or some other extensive kind of logging.
But what do I need to log because the audits can be quite extensive for several actions. I decided that the following changes needed to be logged:
- Changes in server settings
- Changes in database roles
- Changes in database permissions
- Changes in permissions of database objects
- Changes in schema permissions
- Changes of database principals
- Changes of database objects (create, alter or drop databases)
- Changes in schemas
- Changes in database ownership
- Changes in database object ownership
- Changes in schema ownership
- Changes in user passwords
The information about the groups can be found here
The audit it self is logged to a file. I wanted it to have a low impact on the server so I created a folder named “D:\_Audits” which would save the text files. I know this isn’t a secure way to save your audits and probably save it in the Security log of Windows. Unfortunately this server was a production server and when you want to save something in the security log it means you have to restart SQL Server due to some policy settings on the server.
Another reason that saving the files on disk was that nobody but me knew that the audits were set in place. A little security by obscurity.
The script for the audit:
USE [master] GO CREATE SERVER AUDIT [Audit] TO FILE ( FILEPATH = N'D:\_Audits\' ,MAXSIZE = 0 MB ,MAX_ROLLOVER_FILES = 2147483647 ,RESERVE_DISK_SPACE = OFF ) WITH ( QUEUE_DELAY = 1000 ,ON_FAILURE = CONTINUE ,AUDIT_GUID = '3779f882-f6f8-481a-8333-0f58dcc357ba' ) ALTER SERVER AUDIT [Audit] WITH (STATE = ON) GO
The script for the audit specification:
USE [master] GO CREATE SERVER AUDIT SPECIFICATION [ServerAuditSpec] FOR SERVER AUDIT [Audit] ADD (DATABASE_ROLE_MEMBER_CHANGE_GROUP), ADD (DATABASE_PERMISSION_CHANGE_GROUP), ADD (DATABASE_OBJECT_PERMISSION_CHANGE_GROUP), ADD (SCHEMA_OBJECT_PERMISSION_CHANGE_GROUP), ADD (DATABASE_PRINCIPAL_IMPERSONATION_GROUP), ADD (DATABASE_OBJECT_CHANGE_GROUP), ADD (DATABASE_PRINCIPAL_CHANGE_GROUP), ADD (SCHEMA_OBJECT_CHANGE_GROUP), ADD (SERVER_OPERATION_GROUP), ADD (DATABASE_OWNERSHIP_CHANGE_GROUP), ADD (DATABASE_OBJECT_OWNERSHIP_CHANGE_GROUP), ADD (SCHEMA_OBJECT_OWNERSHIP_CHANGE_GROUP), ADD (USER_CHANGE_PASSWORD_GROUP) WITH (STATE = ON) GO
By creating these objects I managed to log anything that would change in a database. To test this I created a database named “MyDB” and executed the following script:
USE [myDB] GO CREATE USER [NETWORK\User1] FOR LOGIN [NETWORK\User1] ALTER ROLE [db_owner] ADD MEMBER [NETWORK\User1] GO
This must have done something in the audits. To view the results right clicked the audit and clicked on “View Audit Logs”.
This window will give all the audits that have happened sorted in time.
This gave me the following result:
As you can see, the audits picked up on the change and logged it.
This will make sure that on a database level nothing will change without me knowing who executed it and when. It even gives back the command that was executed.
A few days later the same issues occurred on the server but this time I was ready. I went to take a look at the audit logs and “Ladies and gentlemen, we’ve got him”. The audits logged the changes made to the server memory and these were not logical or planned.
Hope this helps you out and I appreciate any comment.