As DBA’s one of the jobs is making sure all the databases and database servers are working properly and you have a controlled environment. Once in a while you have skeleton in the closet where suddenly a database server comes around and you’re scratching your head where it came from.
Your manager (and maybe your colleagues) expect you to support it because you’re the DBA. You don’t know the server, your colleague doesn’t (really) know the server. So now what. Are you going to take responsibility for everything on the database server of the database itself when you don’t know anything about it?
I have had this so many times that I decided not to support a database (or entire database server) until I did thorough scan of the environment to see if the database server is up-to-date, the settings in the server are set properly etc.
I use this checklist to make sure I cover all the aspects of the server:
- Is the server a virtual of physical server?
- What version of Windows (hopefully Server edition) is being used?
- What version and edition of SQL Server is being used?
- When was the last time the Windows OS was updated?
- When was the last time the SQL server instance was updated?
- How much memory is in the system
- How is the storage configured?
- What disks are being used?
- What kind of disks are being used?
- How many cores are present?
- What services of SQL server are running?
- What accounts are being used for the services?
- What logins are present in SQL Server and what are their permission?
- And maybe the most improtant one, is there a SLA on this server?
- What’s the size of the database?
- When was the last backup taken?
- When was the last time an integrity check was executed
- How are the files set up?
- What compatibility setting is being used?
- What are the options in the database, i.e. auto close, updates of statistics, page verification etc?
- What users are present in the database and what are their permissions?
This is just a small list of the things I’d like to know about a database or database server. Based on that information I can make a decision either to support the current state of the database or server or not to support it (support it best effort) until I’ve gotten the chance to change settings I don’t approve.
Normally you’d get a discussion why you’re not fully supporting the database or database server. People know you’re the DBA and that it’s your job to support the database servers.
Unfortunately we’re not living in a perfect world and as soon as shit hits the fan, and you’ve taken full responsibility for a database that’s not up to your standards, then you have explain yourself and don’t say the database is not up to your standards because you should’ve have said that from the beginning.
In these situations I would put all the findings in writing and inform the IT manager of the “new” database or database server and what to do to support it. This may sound political but you’ve got to protect yourself. I’ve seen people get lower performance reviews because they supported items that were not up to standards and had gone haywire.
One of the situations I had was a database server installed by a system administrator working at the software vendor. No documentation about the installation, configuration, databases, logins etc. It was all installed during a project and now the project was finished, maintenance was the responsibility of the DBA’s. Logins used as service accounts for the application running under sysadmin privileges, no min or max memory set, SQL Server service running with the local system account etc etc.
I wasn’t going to fully support this server and mentioned this to the IT manager in writing and personally. It wasn’t that the server was buggy but it wasn’t up to standards. After a while we had an issue with the server. Turned out that the service account of the application had destroyed the system database because a consultant forgot to include the “USE [database]” clause in a script and because it had sysadmin privileges, the default database was “master”, it dropped and truncated a lot of tables. Besides the fact that the consultant did all that DDL with a dynamic query, if the privileges were set correctly, the login would never been able to corrupt the system databases.
It took us many hours to get everything back up and I was called into an office with some managers who wanted to know why the server went down the way it did. The managers were angry because this incident cost the business a lot of money, a full department couldn’t work and foremost customers couldn’t be helped.
All the findings were put in changes, the changes were scheduled in a low priority by the business because the chance of things going down was pretty slim.
I mentioned the findings and the correspondence about the low priority changes for the findings. If I hadn’t done that I would probably have been fired because of the impact the incident had on the company.
It’s not about playing the blame game but if it’s not in your power to change things that should be changed than you have to protect yourself. At the end of the day you want to the best job you can do and have a good night sleep without the thought in the back of your head that hell could break loose at any moment.
If you have any stories like this please share them in the comments. I’d like to hear if anybody had the same experience I had and maybe it helps someone else to avoid such a situation.