[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xmlblaster-devel] Lock issue with corba protocoll



This seems to be really tricky. My latest tests indicates that the
problem is not in the "sending" end, but in the receiving end. I can for
example duplicate the error when client connect, sends a ping and the
whole connection process end up beeing locked while the serverside
callback server tries to do the initial ping back.

I have trace instructions on all the Corba parts, and there really is no
answer (no notify runaway conditions), the client callback server simple
locks the reply.


// Creating serveside callback server
11776:[Sep 19, 2003 2:57:10 PM TRACE
CbDeliveryConnection-callback:/node/xmlBlaster_80_72_2_80_3412/client/test/-53] Created callback plugin 'IOR'
// Orb wait to receive reply
11804:ReplyReceiver.getInputStream:Entry(org.jacorb.orb.ReplyReceiver at 166340c:RequestProcessor-11) ready=false
11805:ReplyReceiver.getInputStream:Waiting(org.jacorb.orb.ReplyReceiver at 166340c:RequestProcessor-11) ready=false
// Here is kill -3 ffor it, it is locked in 
12269:"RequestProcessor-11" daemon prio=1 tid=0x0x83e3dd0 nid=0x847 in
Object.wait() [4eb25000..4eb25840]
(	at java.lang.Object.wait(Native Method)
	- waiting on <0x4483b1f8> (a org.jacorb.orb.ReplyReceiver)
	at java.lang.Object.wait(Object.java:426)
	at
org.jacorb.orb.giop.ReplyPlaceholder.getInputStream(ReplyPlaceholder.java:146)
	- locked <0x4483b1f8> (a org.jacorb.orb.ReplyReceiver)
	at org.jacorb.orb.ReplyReceiver.getReply(ReplyReceiver.java:237)
	- locked <0x4483b1f8> (a org.jacorb.orb.ReplyReceiver)
	at org.jacorb.orb.Delegate.invoke_internal(Delegate.java:876)
	at org.jacorb.orb.Delegate.invoke(Delegate.java:787)
	at org.omg.CORBA.portable.ObjectImpl._invoke(ObjectImpl.java:457)
	at
org.xmlBlaster.protocol.corba.clientIdl._BlasterCallbackStub.ping(_BlasterCallbackStub.java:31)
	at
org.xmlBlaster.protocol.corba.CallbackCorbaDriver.ping(CallbackCorbaDriver.java:186)
	at
org.xmlBlaster.engine.dispatch.CbDeliveryConnection.connectLowlevel(CbDeliveryConnection.java:92)
	at
org.xmlBlaster.util.dispatch.DeliveryConnection.initialize(DeliveryConnection.java:107)
	at
org.xmlBlaster.util.dispatch.DeliveryConnectionsHandler.initialize(DeliveryConnectionsHandler.java:143)
	- locked <0x4482fbe0> (a java.util.ArrayList)
	at
org.xmlBlaster.util.dispatch.DeliveryManager.<init>(DeliveryManager.java:115)
	at
org.xmlBlaster.authentication.SessionInfo.<init>(SessionInfo.java:144)
	at
org.xmlBlaster.authentication.Authenticate.connect(Authenticate.java:319)
	- locked <0x448180c8> (a org.xmlBlaster.authentication.SubjectInfo)
	at
org.xmlBlaster.protocol.corba.AuthServerImpl.connect(AuthServerImpl.java:225)
	at
org.xmlBlaster.protocol.corba.AuthServerImpl.connect(AuthServerImpl.java:185)
	at
org.xmlBlaster.protocol.corba.authenticateIdl.AuthServerPOATie.connect(AuthServerPOATie.java:48)
	at
org.xmlBlaster.protocol.corba.authenticateIdl.AuthServerPOA._invoke(AuthServerPOA.java:55)
	at
org.jacorb.poa.RequestProcessor.invokeOperation(RequestProcessor.java:247)
	at org.jacorb.poa.RequestProcessor.process(RequestProcessor.java:496)
	at org.jacorb.poa.RequestProcessor.run(RequestProcessor.java:635)
)

// A long while later the client is killed
// Suddenly we receive an answer on the channel
12768:ReplyReceiver.getInputStream:ReleasedWaiting(org.jacorb.orb.ReplyReceiver at 166340c:RequestProcessor-11) ready=true
12770:[Sep 19, 2003 3:24:13 PM TRACE
DeliveryConnection-callback:/node/xmlBlaster_80_72_2_80_3412/client/test/-53] XmlBlasterException errorCode=[communication.noConnection] serverSideException=true node=[xmlBlaster_80_72_2_80_3412] location=[CallbackCorbaDriver] message=[CORBA callback ping failed : org.omg.CORBA.TRANSIENT: Retries exceeded, couldn't reconnect to 80.72.2.80:41041  vmcid: 0x0  minor code: 0  completed: No]
12771:[Sep 19, 2003 3:24:13 PM TRACE
DeliveryConnection-callback:/node/xmlBlaster_80_72_2_80_3412/client/test/-53] Connection transition UNDEF -> toReconnected=false byDeliveryConnectionsHandler=true
12772:[Sep 19, 2003 3:24:13 PM WARN 
DeliveryConnection-callback:/node/xmlBlaster_80_72_2_80_3412/client/test/-53] Connection transition UNDEF -> DEAD: retryCounter=0, maxRetries=0
12774:[Sep 19, 2003 3:24:13 PM TRACE
DeliveryConnectionsHandler-callback:/node/xmlBlaster_80_72_2_80_3412/client/test/-53] Destroyed one callback connection, 0 remain.
12775:[Sep 19, 2003 3:24:13 PM TRACE
DeliveryConnectionsHandler-callback:/node/xmlBlaster_80_72_2_80_3412/client/test/-53] updateState() oldState=UNDEF conList.size=0
12776:[Sep 19, 2003 3:24:13 PM TRACE
DeliveryConnectionsHandler-callback:/node/xmlBlaster_80_72_2_80_3412/client/test/-53] Load IOR:000000000000004049444C3A6F72672E786D6C426C61737465722E70726F746F636F6C2E636F7262612F636C69656E7449646C2F426C617374657243616C6C6261636B3A312E3000000000020000000000000070000102000000000B38302E37322E322E38300000A05100000000001B5374616E64617264496D706C4E616D652F00112447490F3C2A210500000000020000000000000008000000004A414300000000010000001C00000000000100010000000105010001000101090000000105010001000000010000002C0000000000000001000000010000001C00000000000100010000000105010001000101090000000105010001: XmlBlasterException errorCode=[communication.noConnection.dead] serverSideException=true node=[xmlBlaster_80_72_2_80_3412] location=[CallbackCorbaDriver] message=[CORBA callback ping failed : org.omg.CORBA.TRANSIENT: Retries exceeded, couldn't reconnect to 80.72.2.80:41041  vmcid: 0x0  minor code: 0  completed: No]
12777:[Sep 19, 2003 3:24:13 PM TRACE
DeliveryConnectionsHandler-callback:/node/xmlBlaster_80_72_2_80_3412/client/test/-53] updateState() oldState=DEAD conList.size=0
12778:[Sep 19, 2003 3:24:13 PM TRACE
DeliveryConnectionsHandler-callback:/node/xmlBlaster_80_72_2_80_3412/client/test/-53] Reached state = DEAD
12779:[Sep 19, 2003 3:24:13 PM TRACE
SessionInfo-/node/xmlBlaster_80_72_2_80_3412/client/test/-53] Setting
expiry timer for test to 86400000 msec
12781:[Sep 19, 2003 3:24:13 PM TRACE
RequestBroker-/node/xmlBlaster_80_72_2_80_3412] Login event for client
/node/xmlBlaster_80_72_2_80_3412/client/test/-53
12788:[Sep 19, 2003 3:24:13 PM INFO 

// And finally the login succedes
 Authenticate-/node/xmlBlaster_80_72_2_80_3412] Successful login for
client /node/xmlBlaster_80_72_2_80_3412/client/test/-53, session expires
after [ 24 h 0 millis ], 1 of 20 sessions are in use.
13917:[Sep 19, 2003 3:24:49 PM TRACE
DeliveryConnection-callback:/node/xmlBlaster_80_72_2_80_3412/client/test/-53] finalize - garbage collected


However, on the client, I can not find any locks that seems to be the
result of the ping callback.

So either there is a lock problem in some other part of Jacorb, either
at the sending end or the replying end, or there is some strange lockup
in XmlBlaster which translates to these lockups.

Simply put. I do not think the problem is in ReplyPlaceholder or
ReplyReceiver, but I can not understand where else to look. I have a
testclass which can be used to triggers these lockups with the
standalone XmlBlaster, and which dumps (kill -3) when it detects a lock.

Why not, here it is:


package org.xmlBlaster.test.memmoryleak;
import org.jutils.runtime.ThreadLister;
import org.jutils.log.LogChannel;
import org.xmlBlaster.util.Global;
import org.xmlBlaster.util.XmlBlasterException;
import org.xmlBlaster.client.qos.ConnectQos;
import org.xmlBlaster.client.qos.DisconnectQos;
import org.xmlBlaster.client.qos.ConnectReturnQos;
import org.xmlBlaster.util.qos.SessionQos;
import org.xmlBlaster.client.I_XmlBlasterAccess;
import org.xmlBlaster.client.I_Callback;
import org.xmlBlaster.client.key.UpdateKey;
import org.xmlBlaster.client.qos.UpdateQos;
import org.xmlBlaster.util.MsgUnit;
import org.xmlBlaster.test.Util;

import junit.framework.*;

import java.util.ArrayList;
import java.util.Iterator;
import java.io.File;
import java.io.FileReader;
import java.io.BufferedReader;

/**
 * This does a twofold test by createing a number of connection, an
almost infinite number of times. It tests two stuff:
 * 1. if the client side is leaking threads.
 * 2. if Jacorb contains a locking bug.
 *
 *  at author <a href="mailto:pra at tim.se";>Peter Antman</a>
 *  at version $Revision$
 */

public class TestThreadLeak extends TestCase implements I_Callback {
   private static String ME = "TestThreadLeak";
   private final Global glob;
   private final LogChannel log;
   private String fileName;
   private int noConnections = 10;
   private boolean noError = true;
   private ArrayList connections = new ArrayList();
   private int maxThreadDiff = 500;
   private String pid;
   private String osName;
   /** Time a connection should live, before beeing taken down */
   private long cttl = 5000;
   
   public TestThreadLeak (Global glob, String testName) throws
Exception{
      super(testName);
      this.glob = glob;
      this.log = this.glob.getLog("test");
      fileName = glob.getProperty().get("pidFileName", (String)null);
   }

   /**
    * Sets up the fixture. 
   */
   protected void setUp() throws Exception 
   {
      String[] args = {
         "-protocol", 
         "IOR",
         "-session.maxSessions",
         "20"
      };
      glob.init(args);

      // if we have a filename where a pid is, wait until that file has
shown up
      // But now more that 5 times
      if ( fileName != null) {
         int i = 0;
         File file = new File(fileName);
         while (!file.exists()) {
         i++;
         if ( i > 4) {
            Assert.fail("We where given a pid filename " + file + " but
could not find it, giving up");
         } // end of if ()
         
         Thread.sleep(2000);
         } // end of while ()
         BufferedReader r = new BufferedReader(new FileReader(file));
         pid = r.readLine();
      } // end of if ()

      osName = System.getProperty("os.name");

   }
   

   void dumpThreadStack() throws Exception {
      if ( pid != null && !osName.startsWith("Window")) {
         Runtime runtime = Runtime.getRuntime();
         Process p = runtime.exec("kill -3 " + pid);
         p.waitFor();
      } else {
         log.info(ME,"Could not dump stack pid="+pid+" os="+osName);
      } // end of else
      

   }

   void handleLock(ConnectorWorker wr) throws Exception {
      dumpThreadStack();
      //Assert.fail("Thread lock in connector worker "+wr + " giving
up");
   }

   public void testThreadLeakage() throws Exception {
      int startNoThreads = -1;
      int lastNoThreads = -1;
      int round = 0;
      while ( noError ) {
         round++;
         log.info(ME, "Doing a new connection round no " + round);
         for ( int i = 0; i< noConnections;i++) {
            ConnectorWorker conn = new ConnectorWorker(glob, cttl);
            connections.add(conn);
         } // end of for ()
         
         // Wait a while
         Thread.sleep(2000);
         
         // Count threads, if more than maxThreadDiff has been created
since
         // the first round: fail
         int noThreads = ThreadLister.countThreads();
         if (startNoThreads != -1 ) {
            startNoThreads = noThreads;
            lastNoThreads = noThreads;
         } else {
            // Check how many since first round
            int firstDiff = noThreads - startNoThreads;
            int lastDiff = noThreads - lastNoThreads;
            log.info(ME,"No of thread created since start:
"+firstDiff+"; number of threads created since last round: " +
lastDiff);
            lastNoThreads = noThreads;
            if ( firstDiff > maxThreadDiff) {
               ThreadLister.listAllThreads(System.out);
               Assert.fail("Max number of new threads reached " +
firstDiff + " number of threads created since first round: XmlBlaster is
leaking huge numbers of threads. Happened in round " + round);
            } // end of if ()
            
         } // end of else
         
         // Wait a  while for connections to stop
         Thread.sleep( noConnections*1000 );
         
         // Check that all connections are finished.
         Iterator c = connections.iterator();
         while ( c.hasNext() ) {
            ConnectorWorker w = (ConnectorWorker)c.next();

            // Check that its NOT alive
            if ( w.isAlive() ) {
               // Opps, do we have a lock here

               // We give it five rounds if its still in Connecting
state we abort
               int j = 0;
               while (w.isAlive() && j < 4) {
                  log.warn(ME,"Possible lock of connection " + w + "
detected, waiting 30 s round "+j);
                  j++;
                  Thread.sleep(30*1000);
                  if ( j > 3) {
                     log.error(ME,"Possible lock of connection " + w + "
detected, aborting");
                     noError = false;
                     handleLock(w);
                  } // end of if ()
                  
               } // end of while ()
               
            } // end of if ()
            Throwable t = w.getException();
            if ( t != null) {
               log.error(ME,"Connection had exception, giving up : "+t);
               t.printStackTrace();
               Assert.fail("Connection had exception, giving up : "+t);
            } // end of if ()
            
         } // end of while ()
         connections.clear();



      } // end of while ()
      


   }

   /**
    * This is the callback method invoked from xmlBlaster
    * delivering us a new asynchronous message. 
    *  at see org.xmlBlaster.client.I_Callback#update(String, UpdateKey,
byte[], UpdateQos)
    */
   public String update(String cbSessionId, UpdateKey updateKey, byte[]
content, UpdateQos updateQos)
   {
      log.info(ME, "Receiving update of a message " + updateKey.getOid()
+ " for subId: " + updateQos.getSubscriptionId() );
      log.trace(ME,"Got message " + new String(content));
      return "";
   }
   /**
    * Method is used by TestRunner to load these tests
    */
   public static Test suite() throws Exception 
   {
      
      TestSuite suite= new TestSuite();
      suite.addTest(new TestThreadLeak(new Global(),
"testThreadLeakage"));
      return suite;
   }

   /**
    * Invoke: 
    * <pre>
    *   java org.xmlBlaster.test.mime.TestXPathSubscribeFilter
    *   java -Djava.compiler= junit.textui.TestRunner -noloading
org.xmlBlaster.test.mime.TestXPathSubscribeFilter
    * <pre>
    */
   public static void main(String args[]) throws Exception 
   {
      try {
         Global glob = new Global();
         if (glob.init(args) != 0) {
            System.err.println(ME + ": Init failed");
            System.exit(1);
         }
         TestThreadLeak testSub = new TestThreadLeak(glob,
"testThreadLeak");
         testSub.setUp();
         testSub.testThreadLeakage();
         testSub.tearDown();
      } catch (Throwable e) {
         e.printStackTrace();
         System.exit(0);
      } // end of try-catch

   }
   

   /**
    * Connect to XmlBlaster and disconnect after timeout milis.
    */
   class ConnectorWorker implements Runnable {
      Global glob;
      long timeout;
      private I_XmlBlasterAccess con = null;
      Thread internalThread;
      Throwable ie;
      ConnectReturnQos retQos;
      volatile String state = "CREATED";
      volatile long started;
      public ConnectorWorker(Global glob, long timeout) {
         this.glob = glob.getClone(null);
         this.timeout = timeout;
         internalThread = new Thread(this);
         internalThread.start();
      }

      public void run() {
         started = System.currentTimeMillis();
         state = "RUNNING";
         if ( Thread.currentThread() != internalThread ) {
            ie = new RuntimeException("Only internal thread allowed");
            throw (RuntimeException)ie;
         } // end of if ()
         try {
            // Connect
            state = "CONNECTING";
            con = glob.getXmlBlasterAccess();
            ConnectQos qos = new ConnectQos(glob, "test", "dummy");

            retQos = con.connect(qos, TestThreadLeak.this); // Login to
xmlBlaster
            log.info(ME,"Connected "+ this);
            state = "CONNECTED";
            Thread.sleep(timeout);

            // Disconnect
            state = "DISCONNECTING";
            con.disconnect(null);
            state = "DISCONNECTED";
            con=null;
         } catch (Throwable e) {
            ie = e;
            log.error(ME,"Giving up " + e);
         } // end of try-catch
         
         
      }
      public long getAgeSeconds() {
         return (System.currentTimeMillis()-started)/1000;
         
      }

      public String getState() {
         return state;
      }

      public String toString() {
         String s = super.toString();
         String q = "";
         if ( retQos != null) {
            SessionQos ses = retQos.getSessionQos() ;
            q =  "secretId="+ses.getSecretSessionId() + "
publicId="+ses.getPublicSessionId();
         } // end of if ()
         
         return s+":"+state+" age="+getAgeSeconds()+" seconds old,
(thread="+internalThread.getName()+") "+q;
      }
      
      public Throwable getException() {
         return ie;
      }

      public boolean isAlive() {
         return internalThread.isAlive();
      }
      
   }
} // TestThreadLeak


Start it with a shell script in its directory:

#!/bin/bash
WAY="../../../../../../.."
java -classpath
$WAY/lib/junit.jar:$WAY/lib/testsuite.jar:$WAY/lib/xmlBlaster.jar
-Dtrace=false org.xmlBlaster.test.memmoryleak.TestThreadLeak
-pidFileName pidFile &
echo $! > pidFile

For example like this:

./threadLeak.sh  > log 2>&1


//Peter

On Sun, 2003-09-14 at 17:27, Marcel Ruff wrote:
> Peter Antman wrote:
> > sön 2003-09-14 klockan 13.00 skrev Marcel Ruff:
> > 
> >>Peter Antman wrote:
> >>
> >>>Hm, can get jacorb from that date to build. Any tips?
> >>>
> >>>(cvs -d :pserver:anonymous at www.jacorb.org/cvsroot/jacorb checkout -D
> >>>'2003-03-27 18:00' JacORB)
> >>
> >>Hmm, what do you mean ??
> > 
> > 
> > I wanted to get the source for the jacorb used in XmlBlaster to test if
> > I could fix the problem. Checking it out with the above cvs command
> > however gets me a copy that won't compile (missing
> > org.jacorb.orb.connection package). Has the CVS repository of jacorb
> > been reorganized since you did your checkout, or am I doing this the
> > wrong way?
> > 
> > eg:
> > ant
> > [...]
> >      [echo] Compiling for orb
> >     [javac] Compiling 82 source files to
> > /home/pra/src/instant-2.2/ro/jacorb-2003-03-27/JacORB/classes
> >     [javac]
> > /home/pra/src/instant-2.2/ro/jacorb-2003-03-27/JacORB/src/org/jacorb/orb/dii/Request.java:30: package org.jacorb.orb.connection does not exist
> >     [javac] import org.jacorb.orb.connection.*;
> > [...]
> 
> Peter,
> 
> i have updated xmlBlaster to use the current JacORB from cvs
> (Jacorb 2.0 beta2+ from cvs: cvs -d :pserver:anonymous at www.jacorb.org/cvsroot/jacorb checkout -D 
> '2003-09-14 15:00' -P JacORB),
> but i don't think that JacORB has fixed the bug.
> 
> If you can still reproduce the dead lock (use current xmlBlaster from cvs)
> please drop a note so we can update the bugzilla report
>   http://www.jacorb.org/bugzilla/
> #380
> 
> thanks
> 
> Marcel
> 
> -- 
> http://www.xmlBlaster.org
-- 
------------------------------------------------------------
Peter Antman	Chief Technology Officer, Development
Technology in Media, Box 34105 100 26 Stockholm
WWW: http://www.tim.se	WWW: http://www.backsource.org
Email: pra at tim.se	 
Phone: +46-(0)8-506 381 11 Mobile: +46-(0)704 20 58 11
------------------------------------------------------------