[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xmlblaster] ClusterNode.isConnected()



Michael Lum wrote:
Hi,
I was working on a custom load balancer plugin for clustering and message routing. I noticed some behavior that didn't seem intuitive to me.


I modified line 144 of src/java/org/xmlBlaster/engine/cluster/simpledomain/RoundRobin.java:

if (log.TRACE) log.trace(ME, "Selected master node id='" + nodeDomainInfo.getClusterNode().getId() + "' from a choice of " + nodeDomainInfoSet.size() + " nodes");

to

if (log.TRACE) log.trace(ME, "Selected master node id='" + nodeDomainInfo.getClusterNode().getId() + "' from a choice of " + nodeDomainInfoSet.size() + " nodes. isConnected() = " + nodeDomainInfo.getClusterNode().isConnected() + ", and
isPolling() = " + nodeDomainInfo.getClusterNode().isPolling());


I did this to get a better picture of the runtime state of these variables when I shut down cluster members and simulate node crashes, so that I could write my own plugin to dynamically reroute messages to another cluster node of a lower stratum by adding some checks for isConnected() on the cluster node being evaluated as a master candidate.

My setup was, two masters of stratum 0, and one 'relay' as their slave. I start up all 3 instances, and setup a subscriber on both of the masters. I then publish a message on the relay. All is well, as one of the masters is chosen and picks up the message. isConnected() is true, and isPolling() is false for that cluster node, as expected. Next, I kill the master instance that was receiving the message. Then I republish to the relay. However, the trace output on the relay says that isConnected() is still true for that node! and isPolling() is also true! I dug a little deeper and saw that its looking at the connection state of XmlBlasterAccess, but didn't go any deeper than that. My intuition tells me that isConnected() and isPolling() should be mutually exclusive, but maybe my understanding of those methods is incorrect.

One potential problem that I see is in this method in ClusterNode.java:

   public int getConnectionState() throws XmlBlasterException {
      if (isConnected())
         return 0;
      if (isPolling())
         return 1;
      return 2;
   }

which would return '0' even if the node is in the Polling state, because in my example above, even though I killed the node, it is still returning 'true' for isConnected(), as well as 'true' for isPolling().

As a workaround, my custom load balancer checks to see if isPolling() is true, and then skips that node if it is true. If it doesn't find any nodes that have isPolling() false, it will pick the first one on the list that isPolling() so that at least the messages are queued for delivery when it reappears.

Mike


The javadoc of I_XmlBlasterAccess states:

   /**
    * Has the connect() method successfully passed?
    * <p>
    * Note that this contains no information about the current connection state
    * of the protocol layer.
    * </p>
    *  at return true If the connection() method was invoked without exception
    *  at see I_ConnectionHandler#isAlive()
    *  at see I_ConnectionHandler#isPolling()
    *  at see I_ConnectionHandler#isDead()
    */
   boolean isConnected();

isConnected() is correctly doing what it should.

But our usage in ClusterNode.java is buggy,
i have fixed an commited it.

Thanks for finding this bug,

Marcel


-- http://www.xmlBlaster.org