SQL-Server Temp Database Space Issue
Every SQL-Server professional knows that the Master database is the
brain of SQL-Server and the Temp database is the spinal cord of SQL-Server.
Once it stops, everything starts failing.
In mid-Aug-2021, We faced an interesting problem. The temp database was
getting filled every week. It was failing EMR (Data Warehouse and Application)
Jobs and Application. My team was under tremendous pressure to get it to fix.
One thing I would like to mention is that there is a difference in size
allocation and temp data size use.
Below are the points we had tried before the real fix: -
·
We had monitored the Jobs, like which jobs were filling the database.
·
We had Checked and Tweaked heavy queries with the help of Query-Store.
·
Temp database captures the space on the server till it needs it and then
Stops. So, we had increased the Temp database size.
·
We had started to check the SQL server configuration. Everything was
correct at the configuration level. Like 250 GB drive was allocated to Temp
database only, having multiple logs file and auto increase property set to
true.
The above changes gave us some relaxation. But it had not stopped
completely. So after four days, We again got the "full temp database"
issue.
Then we have checked all the databases configuration for the recovery
model for all our databases was set to "FULL". Because the client
needed the "In Time Recovery”. So, we had thought this might be this can
be causing the issue.
Although the back of the mind, we were clear that the recovery model is
only related to a specific database transaction log and has no relation to the
temp database. But with a valid reason, that reporting database and data hub
can be regenerated again using Jobs and a day-old backup. So, we had set the
Simple Recovery model to Datahub and Reporting Databases. But it had not fixed
the issue.
Then we had checked all the open sleeping session, which were not
allowing temp database to release the space using below query.
SELECT DES.session_id AS [SESSION ID],Db_name(DDSSU.database_id) AS
[DATABASE Name],host_name AS [System Name],program_name AS [Program
Name], login_name AS [USER Name],status,total_elapsed_time AS [Elapsed
TIME (in milisec)], ( memory_usage * 8 ) AS [Memory USAGE (in KB)],CASE
is_user_process WHEN 1 THEN 'user session' WHEN 0 THEN 'system session' END AS
[SESSION Type], row_count AS [ROW COUNT] FROM
tempdb.sys.dm_db_session_space_usage AS DDSSU INNER JOIN
sys.dm_exec_sessions AS DES ON DDSSU.session_id = DES.session_id ORDER BY
7 DESC
These screenshots are
not real, they are just for a reference purpose to show query output.
From query output, We were able to verify each session query. All queries were of short duration and mostly related to application sessions or user executed small queries. So, nothing was blocking the temp database from release space.
Query to Check session query
FROM sys.dm_exec_connections
CROSS APPLY sys.dm_exec_sql_text(most_recent_sql_handle)
WHERE session_id = (53)
Now we had started checking what type of objects were taking space on
temp database. using below query
We found that two spaces were filled (Internal objects and version store GB). When the application had a huge load or jobs were running. After all the jobs were finished and the application load was settled down. The internal objects released their space back to the server. But row version store not.
Then we have checked the default Isolation Level of every database and
found default Isolation was Snapshot Isolation. Now, this was a real problem.
Because we were aware that EMR Jobs were moving large data here and there and
keep the row version through snapshot isolation didn't make sense. Also,
Snapshot Isolation is good when you have a multi-user environment. However, on
Datahub and Reporting databases most data is not changeable after the job
complete. So, we have changed the Isolation level of reporting and datahub
databases to "Read Committed". This change was in transition mode for
five hours. After that, we had shrunk the temp database.
Now all Jobs are running fine and the application databases are still
“Full recovery” model for In Time Recovery. So, our changes were no code change
and fixed the Temp Database Issue.
It is wonderful learning, So thought to share it with the
community.
© 2021 Chinmay Kant Jain. All rights reserved.
Comments
Post a Comment