[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [xmlblaster] missing volatile messages
Hi, Marcel:
I wrote another script (testsubpub.sh, attached) to do sub, pub and
validate jobs automatically. The script will keep running until messages
get lost.
Three types of servers were tested. They all ran Linux, jdk 1.5.0 and
xmlBlaster 1.1.1.
Server1: one CPU: AMD Athlon 64 3000+ 2.0GHz, 1.0GB RAM
Server2: two CPUs: Pentium III (Coppermine) 866MHz, 1.0GB RAM
Server3: two CPUs: Intel Xeon 3.06GHz, 3.8GB RAM
Test results:
The test script need to take more than 70 loops before messages get
lost on server1. But on server2, it only need 2 ~ 7 loops. Server3, 6 ~
30 loops.
Message losing was always accompanied with the following Exception:
(By the way, the erase flag was false in the test)
[Apr 5, 2006 4:06:01 PM ESC[31;40mERRORESC[0m
XmlBlaster.SOCKET.tcpListener-ewu TopicHandler/topic/AABB] PANIC: invoke
callback is strange in state 'UNREFERENCED'
java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(Thread.java:1158)
at
org.xmlBlaster.engine.TopicHandler.checkIfAllowedToSend(TopicHandler.java:1198)
at
org.xmlBlaster.engine.TopicHandler.invokeCallback(TopicHandler.java:1310)
at
org.xmlBlaster.engine.TopicHandler.invokeCallbackAndHandleFailure(TopicHandler.java:1170)
at org.xmlBlaster.engine.TopicHandler.publish(TopicHandler.java:645)
at
org.xmlBlaster.engine.RequestBroker.publish(RequestBroker.java:1677)
at
org.xmlBlaster.engine.RequestBroker.publish(RequestBroker.java:1483)
at
org.xmlBlaster.engine.RequestBroker.publish(RequestBroker.java:1477)
at
org.xmlBlaster.engine.XmlBlasterImpl.publishArr(XmlBlasterImpl.java:185)
at
org.xmlBlaster.util.protocol.RequestReplyExecutor.receiveReply(RequestReplyExecutor.java:402)
at
org.xmlBlaster.protocol.socket.HandleClient.handleMessage(HandleClient.java:231)
at
org.xmlBlaster.protocol.socket.HandleClient.run(HandleClient.java:352)
at java.lang.Thread.run(Thread.java:595)
Analysis:
1. one cpu is more robust than two cpu, no matter it is faster or not.
2. two fast cpus is better than two slow cpus.
Based on our test results, we think it might be a multithread racing issue.
Regards,
Eugene
Marcel Ruff wrote:
Eugene,
thanks for this detailed test.
We couldn't reproduce loosing messages with your scripts.
We tried on same server machine and distributed over two machines.
But we once got a - none reproducable - stack trace.
So the combination with XPath and volatile messages and
erasing the topic during operation (by the first of your publishers
finishing)
seems to be some open issue.
We have added now some code to handle the described NPE.
> "but some messages(<1%) completely lost."
It could be possible because of the first publisher is erasing the topic
during the second publisher is firing.
Anyhow, i couldn't reproduce any lost here.
We'll keep the case open,
Marcel
Attachment:
testsubpub.sh
Description: application/shellscript