Sql Event 10002: Exceedingly long update on large tables

We have a simple UPDATE query, joining two tables, that takes much longer than 10 hours to run, but if we break the table in six (10 million rows in each table), it takes only fifteen minutes to run each part.

Why? And how can we tell in advance whether a query will cross the threshold into l.o.n.g.r.u.n.n.i.n.g query? Or, how can we prevent it?

The system is Windows XP Pro with 4GB RAM (/3GB switch), and SQL Server Standard 2005. Log files, swap files, dbf files are on separate drives. The system is dedicated to SQL Server. No other queries are running at the same time. The database is in Simple logging mode. Each table is a few GB with 60 million rows.

An example problem query is: (updating fewer than 10 bytes)
UPDATE bigtable
SET bigtable.custage = scores.custage, bigtable.custscore = scores.custscore
FROM bigtable
JOIN t2 ON bigtable.custid = scores.custid

In this case, each table has 60 million rows. 'custid' is a sequential, unique integer. SCORES table is clustered on 'custid' and is 1.5GB in size. BIGTABLE has an index on 'custid', and is 6GB in size. There is a one-to-one match between the tables on 'custid', but not enforced. The SCORES table was created by exporting a few fields (but all 60 million records) from BIGTABLE, updating the values in a separate program, then importing back in SQL Server into the SCORES table.

The first time this query was run, we stopped it after it ran 16 hours. When we broke up the bigtable into 10 million record chunks (big1, big2, big3..., big6) each update only took 15 minutes, for 90 minutes total.

* How can in we tell in advance that the full chunk would take more than a few hours?
* Why is it taking SO MUCH LONGER than in smaller chunks?
* When a query is taking that long to run, is there any way to tell where in the plan it is?
* What should we do differently?

Thanks for any help; this is a real head scratcher for us.
moving to TSQL forum.|||

Resources on a server is something that has an end.
Each statement needs a certain amount of resources in order to complete.
Each server has a sort of 'magic limit' when resources become scarce. This limit is highly dictated by hardware. (ie disks, amount of RAM etc) What usually happens when this limit is crossed, is that the server no longer can handle it's work 'gracefully', but has to do a lot of managing work to continue. - ie starts swapping like crazy, appears to sleep etc.

The scenario you describe suggest a like scenario, the limit is crossed and the server aren't capable of handling the full 60 million update all at once in a graceful manner.
The solution is to 'chunk' up the work into smaller portions, just like you've done, assuming we stay on the same hardware. That way you feed the system smaller bites, and it doesn't choke.

How to tell in advance?
I don't think you can in any other way than 'gut feel'. With time you get to know your particular system, and you 'feel' where the limits are.

Why it takes so much longer?
When you get into this situation, the additional time consumed by all the extra overhead generated tends to be much greater.

Not sure about how to tell where in the plan a long running query is.. sorry =:o/

There's not much to do differently.
You have to manage your workload in ways that your hardware can handle. If the load is too great, partitioning it into smaller chunks like you've done is one way to solve it.

/Kenneth

|||Surely there's gotta be something? We have to do mostly adhoc processing, so many of our queries will only be used a couple times, or even just once. We're often on a tight schedule, and having to recover from a query that suddenly went from an hour to running overnight without finishing-- we can't handle too many surprises.

Maybe some way of reading the query plan I/O and CPU costs?

Maybe a way of rewriting update queries to force the updates to run in small batches without manually breaking up the databases into magic numbers-- similar in intention to how we break bcp imports into ten thousand row chunks?

Maybe a database setting that will automatically stop and rewind a query that crosses a time threshhold, so that if an overnight query enters thrash-land, at least it'll be back to a stable-point in the morning?

Ideally, the server would tell us, "Hey, this will be really hard for me-- please make some adjustments and try again." and just STOP instead of thrashing, or whatever it's doing. :)
|||

How many processors? I had an issue with a query that did this kind of serious thrashing and had to set MAXDOP 1 and do a bit of tuning. Running this query:

select der.wait_type, der.wait_time,
der.status as requestStatus,
des.login_name,
cast(db_name(der.database_id) as varchar(30)) as databaseName,
des.program_name,
execText.text as objectText,
case when der.statement_end_offset = -1 then '--see objectText--'
else SUBSTRING(execText.text, der.statement_start_offset/2,
(der.statement_end_offset - der.statement_start_offset)/2)
end AS currentExecutingCommand
from sys.dm_exec_sessions des
join sys.dm_exec_requests as der
on der.session_id = des.session_id
cross apply sys.dm_exec_sql_text(der.sql_handle) as execText
where des.session_id <> @.@.spid --eliminate the current connection

will give you an idea of what it doing (really cool thing is that it will show you the exact statement that is being executed. Look especially at the wait_type. I had just slews of CXPACKET waits, which was an internal communcations thing.

Like someone else said. There is a sweet spot for log writes, and you probably have hit it. First things first. Check the plan, take a look at the wait type and statement being executed (some of the wait times I was seeing were in millions of milliseconds, simply nuts! Of course I am testing on a test server that isn't close to my prod server). I will guess you will hit something that is taking a long time pretty quick.

>>There is a one-to-one match between the tables on 'custid', but not enforced. <<

Consider enforcing it with UNIQUE constraints. This could easily do wonders for the query, especially if the data isn't overly volatile.

|||

You can batch the updates if you have a single big table. So updating say 10,000 rows at a time will reduce the logging, index maintainence overhead and provide better performance. With SQL Server 2005, you can use TOP clause in the UPDATE statement to modify N rows at a time. With older versions of SQL Server, you can use SET ROWCOUNT to limit the rows that are being affected. Other things to consider:

1. Dropping unnecessary indexes on the big table

2. Eliminate trigger logic in any since that can slow down the updates

3. Batch updates as mentioned above

4. For joined updates, make sure you have indexes on the primary join conditions

|||2 processors, with Hyperthreading left On.
|||

Not sure if this might help or not:

I am guessing that your performance issues are related to the size of your monolithic transaction. That updating every row of big table at once is causing rollback/transaction log issues.

I might experiment with flipping the dirver around so that small table drives your update

Remove the index on small table all together or alternatively, create a concatenated index using all three columns. The reason I want to do this is if we know we are going to be loading the entire small table to itterate over it then let's not load an index as well. Alternatively, if you need the index then let's use one that has all the data we need in hopes that our index can satisfy our request without loading the table.

Now, lets itterate over the records of small table to update bigtable. At this point we could (and might prefer) that small table was a disconnected recordset.

Some pseduo-code to think about:

==============

dim i as integer =0

for each myRecord in mySmallTable_RecordSet

update bigtable set

bigtable.custage = myRecord.custage,

bigtable.custscore = myRecord.custscore

where bigtable.custid = myRecord.custid;

if 10=i then

commit;

i=0

End If

commit;

==============

Of course, trading a monolithic transaction for 60 million bite-sized ones might potentially make the problem worse not better but it is food for thought. You will also probably want to use a prameterized stored procedure to handle the update, but I wanted to give you an overview of what I was thinking about.

|||

Of course, trading a monolithic transaction for 60 million bite-sized ones might potentially make the problem worse

Neither is likely to be good, but the bite-sized ones are likely to be worse. For performance you want to size the transactions so the maximize disk writes to the log since each transaction requires a write to the log. The more you do at a time the faster, up to the point where you run out of resources to cache the writes, then things slow down fast.

I highlighted performance because for consistency you need to use transactions as needed to protect the consistency of your data. If 10 million rows failed, but the other 50 million succeed are you hosed?

A last thought, if you have to update many many rows like this, there is a good chance that you might need to do a bit of redesign. There are probably easier ways to do this sort of thing (possibly using data warehousing/olap techniques.)

|||

Ya, and I forgot to increment "i" in my pseduo-code but you can see that my intention was to avoide a commit after every update. in my p-code I was committing (or intending to commit) every 10 updates but that might be better after every 10k updates.

As you say, commits are relatively expensive.

Consistancy should always be a concern but in this case the poster is transfering the source of truth from small table back to big table.

If the update had been bigtable.field = bigtable.field + smalltable.field then there would be a much bigger issue with performing the update with many transactions rather than 1 monolithic one.

Monday, March 19, 2012

Exceedingly long update on large tables - why?

No comments:

Post a Comment

Sql Event 10002

Blog Archive

About Me