[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xmlblaster] Callback message queue fills up

I think part of the problem might be that the subscriptions, even when you specify a domain, are not domain specific. What I mean is that a user connected to B subscribes to messages for a domain that is mastered on A. However, when the subscription is forwarded to A, it matches messages from all domains, even those generated on B and sent to A. Does this make sense? Could this be part of the problem?


From: David R Robison [mailto:drrobison at openroadsconsulting.com]
To: xmlblaster at server.xmlBlaster.org
Sent: Wed, 21 Nov 2007 10:41:10 -0500
Subject: Re: [xmlblaster] Callback message queue fills up

Here is a dunp of one of the messages:

<MsgUnit index='0'>

<key oid='DomainHeartbeat-Albemarle911' contentMime='text/xml'
contentMimeExtended='1.0' domain='Albemarle911'/>
<content size='46'>Domain Albemarle911 ALIVE at 11/21/07

<subscribe id='__subId:StauntonSTC-XPATH1195628463329000000'/>
<expiration lifeTime='30000' remainingLife='22703' forceDestroy='true'/>
<rcvTimestamp nanos='1195659482613000002'/>
<queue index='0' size='1'/>

The message was created on node B and sent to node A because of a
subscription on node A. But it is now in the callback queue on A to go
back to B. Also, I have never seen the route data in the messages. Is
there a way to turn this on?


Marcel Ruff wrote:
> David R Robison wrote:
>> One other thought. Heartbeat messages are published on node B and
>> subscribed to by clients on node A. Also, there are clients on node B
>> that subscribe to messages on node A. However, it appears that the
>> subscriptions the clients on node B are using are also matching the
>> heartbeat messages from node B that have been sent to node A. Could I
>> have some kind of circular queue? A message is posted on B then sent
>> to A because a subscription by a client on A. Then sent back to B
>> because of a subscription by a client on B for messages on A. Then
>> the message gets sent back to A and the whole cycle repeats?
> Could be, usually the cluster should prevent this ...
> The messages contain in their QoS the nodes traversed:
> <qos>
> <sender>joe</sender>
> <route>
> <node id='bilbo' stratum='2' timestamp='34460239640'/>
> <node id='frodo' stratum='1' timestamp='34460239661'/>
> <node id='heron' stratum='0' timestamp='34460239590'/>
> </route>
> </qos>
> it would be nice to see the dump of such messages,
> Use the jconsole or logging output from your receiving client or use the
> message sniffer, e.g.:
> java javaclients.simplereader.SimpleReaderGui -xpath "//key"
> -session.name simpleReader -passwd secret -protocol SOCKET
> -dispatch/connection/plugin/socket/hostname -dumpToFile true
> or peek the callback queue with administrative messages as described
> in one of your last posts,
> thanks
> Marcel
>> Could this be possible? David
>> David R Robison wrote:
>>> Thanks, See in line...
>>> Marcel Ruff wrote:
>>>> Hi David,
>>>> do you have a jconsole to observe the two nodes?
>>> I don't have a jconsole, but can I get the same using the admin
>>> messages?
>>>> If yes, please check the number of subscriptions the node A has
>>>> forwarded to node B
>>>> (look into node B and check the number of subscriptions of client
>>>> A) during such a case.
>>>> In case the subscribeQos has set
>>> I will check.
>>>> <multiSubscribe>true</multiSubscribe>
>>> I believe that we set all to false.
>>>> (which is the default) it could be that the subscriptions multiplied
>>>> during small connection errors and reconnects.
>>>> This is just a guess.
>>>> If it is the case please set multiSubscribe to false.
>>>> Is there a high CPU load during the 1001 message case?
>>> No
>>>> Are the hearbeat messages persistent messages?
>>> Yes, but the only live 30 seconds. At any given time there should
>>> only be at most 2 in the history queue
>>>> Was the client connected or offline during this message overflow?
>>> No, the client was online
>>>> Does your heartbeat have a unique id so that you can tell for sure
>>>> if the same
>>> No, but the content of the message has a timestamp so I knew they
>>> were duplicates
>>>> published message is cloned many times (try a peek on the callback
>>>> queue with jconsole)?
>>> Can this be done with the admin messages
>>>> A final option is to use the current svn xmlBlaster and switch on
>>>> the checkpoint logging
>>>> to get a better idea what is going on.
>>> We will try this in house, unfortunately, the problem nodes are in a
>>> production environment.
>>>> And finally it could be a problem with your client not taking the
>>>> callback messages.
>>> Could be, but what I don't see is the queue gradually growing.
>>> Instead, it "all-of-a-sudden" appears to be full.
>>>> Another idea: The callback queue contains only a reference on the
>>>> message.
>>>> If it expires the message-'meat' is destroyed but the reference
>>>> remains in the queue
>>>> until it is looked at during delivery (and then thrown to garbage),
>>>> Michele, could this be?
>>>> thanks
>>>> Marcel
>>>> David R Robison wrote:
>>>>> We are experiencing something strange in xmlBlaster 1.6.1. We have
>>>>> two nodes, node A subscribes to messages from node B. These are
>>>>> heartbeat messages and are generated every 15 seconds with a
>>>>> lifetime of 30 seconds. A client connects to node A and subscribes
>>>>> to the messages, node A then passes the subscription onto node B.
>>>>> Watching the callback message queue, everything seems to run well,
>>>>> at most 1 message in the queue waiting to be sent. It can run like
>>>>> this for days. Then, unexpectedly, the callback queue will show as
>>>>> being full (in this case 1001 messages). The queue contains many
>>>>> duplicated messages with different timestamps. From there, the
>>>>> server struggles to deliver the messages and keep the queue empty.
>>>>> The reader never seems to read enough messages to get the queue
>>>>> back down to zero. If I stop the client and reconnect, it will
>>>>> recreate its queue and be back to normal. I know this is a bit
>>>>> sketchy, but it is becoming a real problem for us.
>>>>> Any thoughts on what might be the problem? Any idea of where to
>>>>> start looking?
>>>>> One more note, when the client is subscribing to heartbeats that
>>>>> are generated on Node A, the client never fails in this manor,
>>>>> only when it is subscribing to node A for a message generated on
>>>>> node B.
>>>>> Thanks, in advance,
>>>>> David Robison


David R Robison
Open Roads Consulting, Inc.
708 S. Battlefield Blvd., Chesapeake, VA 23322
phone: (757) 546-3401
e-mail: drrobison at openroadsconsulting.com
web: http://openroadsconsulting.com
blog: http://therobe.blogspot.com
book: http://www.xulonpress.com/book_detail.php?id=2579