Thanks, See in line...
I don't have a jconsole, but can I get the same using the admin messages?Hi David,
do you have a jconsole to observe the two nodes?
I will check.
If yes, please check the number of subscriptions the node A has forwarded to node B
(look into node B and check the number of subscriptions of client A) during such a case.
In case the subscribeQos has set
I believe that we set all to false.
<multiSubscribe>true</multiSubscribe>
No
(which is the default) it could be that the subscriptions multiplied during small connection errors and reconnects. This is just a guess. If it is the case please set multiSubscribe to false.
Is there a high CPU load during the 1001 message case?
Are the hearbeat messages persistent messages?Yes, but the only live 30 seconds. At any given time there should only be at most 2 in the history queue
Was the client connected or offline during this message overflow?No, the client was online
Does your heartbeat have a unique id so that you can tell for sure if the sameNo, but the content of the message has a timestamp so I knew they were duplicates
published message is cloned many times (try a peek on the callback queue with jconsole)?Can this be done with the admin messages
We will try this in house, unfortunately, the problem nodes are in a production environment.
A final option is to use the current svn xmlBlaster and switch on the checkpoint logging
to get a better idea what is going on.
Could be, but what I don't see is the queue gradually growing. Instead, it "all-of-a-sudden" appears to be full.
And finally it could be a problem with your client not taking the callback messages.
Another idea: The callback queue contains only a reference on the message.
If it expires the message-'meat' is destroyed but the reference remains in the queue
until it is looked at during delivery (and then thrown to garbage), Michele, could this be?
thanks Marcel
David R Robison wrote:We are experiencing something strange in xmlBlaster 1.6.1. We have two nodes, node A subscribes to messages from node B. These are heartbeat messages and are generated every 15 seconds with a lifetime of 30 seconds. A client connects to node A and subscribes to the messages, node A then passes the subscription onto node B. Watching the callback message queue, everything seems to run well, at most 1 message in the queue waiting to be sent. It can run like this for days. Then, unexpectedly, the callback queue will show as being full (in this case 1001 messages). The queue contains many duplicated messages with different timestamps. From there, the server struggles to deliver the messages and keep the queue empty. The reader never seems to read enough messages to get the queue back down to zero. If I stop the client and reconnect, it will recreate its queue and be back to normal. I know this is a bit sketchy, but it is becoming a real problem for us.
Any thoughts on what might be the problem? Any idea of where to start looking?
One more note, when the client is subscribing to heartbeats that are generated on Node A, the client never fails in this manor, only when it is subscribing to node A for a message generated on node B.
Thanks, in advance, David Robison
--
David R Robison Open Roads Consulting, Inc. 708 S. Battlefield Blvd., Chesapeake, VA 23322 phone: (757) 546-3401 e-mail: drrobison at openroadsconsulting.com web: http://openroadsconsulting.com blog: http://therobe.blogspot.com book: http://www.xulonpress.com/book_detail.php?id=2579