Friday, March 9, 2012

Everything that will cause a queue to shut down

I have a queue that, after running fine for several days will mysteriously turn off. It doesn't seem to be related to a poison message because I can restart the queue and processing resumes just fine. What are all the scenarios that would cause a queue to turn itself off, so I can 1) take preemptive action to prevent it from happening in the first place and 2) respond appropriately when it occurs.

Also, how to properly setup and verify that the BROKER_QUEUE_DISABLED is working properly. This is the SQL that I have so far, but is there a more direct way to raise the event other than writing an activated stored procedure that rolls back 5 times?

CREATE QUEUE [EventNotificationsQueue];

GO

CREATE SERVICE [EventNotificationsService]

ON QUEUE [EventNotificationsQueue]

([http://schemas.microsoft.com/SQL/Notifications/PostEventNotification]);

GO

CREATE EVENT NOTIFICATION [QueueDisabled]

ON QUEUE [MyQueue]

FOR BROKER_QUEUE_DISABLED

TO SERVICE 'EventNotificationsService', 'MyDatabase';

GO

I certainly can't give you that list... Maybe Remus.

However, I did have something similar happen to me. It turned out it was my poison message handling mechanism.

Here's what I did...

Enable Queue (not activate)

Remove top message (the poison message)

write poison message to a log table

Re-Enable Queue & Activate

Works great, just don't do it all in the same transaction. I pulled the Re-Enable and & activate outside of the trans and it worked fine. I was able to track this down from error message written to the SQL Server Log. Looking at the SQL Server log may help you.

Gary

|||

The only thing that can take a queue offline is five consecutive rollbacks of a RECEIVE

As about testing the event notification, you can simply execute this five times to trigger it:

BEGIN TRANSACTION

RECEIVE * FROM ...

ROLLBACK;

|||

OK, then is this pattern incrorrect or susceptible to unnecessary rollbacks? I suspect that something other than a poison message is causing the rollback because I can restart the queue with existing messages and it runs just fine.

So, for example, if the queue in the following sample does not receive a message for 2+ seconds, will it take the queue offline? Maybe I need to decrease my timeout OR COMMIT instead of ROLLBACK if rowcount = 0?

WHILE (1=1)

BEGIN

BEGIN TRANSACTION;

SAVE TRANSACTION rollbackReceive ;

WAITFOR(RECEIVE TOP(5)

conversation_handle,

message_type_name,

message_body

FROM [MyQueue]

INTO @.tableMessages), TIMEOUT 500;

IF (@.@.ROWCOUNT = 0)

BEGIN

ROLLBACK TRANSACTION;

BREAK;

END

|||

The ROLLBACK of an empty RECEIVE will no trigger the poison message, only ROLLBACK of RECEIVEs that returned messages.

You need to monitor the execution of the stored proc to see why is rolling back the messages. The fact that later it succeeds indicates that the cause of the rollbacks is transient, but does not indicate that no rollbacks happen.

|||

why not move your transaction savepoint following by the Receive Statement. and alway rollback to the savepoint, this can avoid you to shutdown the queue.

WAITFOR(RECEIVE TOP(5)

conversation_handle,

message_type_name,

message_body

FROM [MyQueue]

INTO @.tableMessages), TIMEOUT 500;

SAVE TRANSACTION rollbackReceive ;

suggeste, nevel rollbek the receive statement, you can log the error message and then process it. also you can use try/Catch block.|||

Although this makes some sense, it goes against all the other recommendations I have seen from Remus and others. What about cases where you really do need to rollback the receive? In your example, I assume you would have some counter to determine how many times you have rolled back, in order to perform some action? It seems there should be some happy medium between the two approaches.

|||

I think it really depends on what you are trying to do. For the highest possible performance you must avoid rollbacks at (almost) all cost especially when you are working with large batches of messages. That said, we do have some stuff that if the message is not processed, there is no point in processing any further so a rollback is permitted. If your system is pretty mature and you are fairly certain if you rollback the chance of repeat processing of the message will fail why not just go ahead and do whatever you would do if the recieve rolled back 5 times, poison message got activated. You can always get a timeout though and at times a retry will be successful. It may be possible (although more complicated) to retry without an expensive rollback.

Gary

No comments:

Post a Comment