[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [xmlblaster] ssl and compression with SOCKET
PBal wrote:
Hi Marcel!
this is a useful extension to our basic SOCKET implementation.
If you are willing to donate it under LGPL license we would be
happy to add it to the xmlBlaster distribution.
I'm willing.
There are many variants of compression.
--------------------------------------
1. Compress MsgUnit above the protocol plugin layer
This is not easily possible as for example the CORBA publish(MsgUnit) call
can't set a flag that the MsgUnit is compressed.
We would need to set a compression flag on login to tell that all
publishes are compressed.
On subscription we could set a compression flag that all updates
shall be compressed.
+ This runs with all protocol plugins.
- No fine grained control
- CPU overhead (see below)
2. Compress in the SOCKET protocol plugin
This is your way. The SOCKET spec supports it with a compression flag
and a 'lenUnzipped' field.
+ Simple
++ Compresses everything, Key+content+Qos and even MsgUnit[] in a bulk
- The CPU overhead in the xmlBlaster server increases for each subscriber
as the received message is uncompressed on arrival and needs to be
compressed for each subscriber separately.
3. Compress in MessageUnit.setContent()
Here the Key and Qos is transferred uncompressed and only the
message content is compressed. We could support this in our
C/C++/Java MessageUnit struct. We could use a ClientProperty "__gzip"
to mark it.
+ The xmlBlaster server would never uncompress the content
(as it never looks into it) when receiving
it and forwarding it to the subscribers.
+ This runs with all protocol plugins.
++ No CPU overhead
- Key and Qos are not compressed
4. Compress messages in the security plugin
+ This runs with all protocol plugins.
- More a specialized case (similar to 5.)
5. The client developers do it themselves.
A xmlBlaster user can implement compression similar to 3.
+ This runs with all protocol plugins
+ Every fine grained combination is possible
- Reinvent the wheel
The solutions 2 and 3 are probably our ways to go.
Configuration:
--------------
Typically only publish() and update() and get()-return (and their Array & Oneway variants)
need compression.
Other requests like connect(), disconnect(), subscribe(),
unSubscribe(), ping() and erase() don't need compression.
Your configuration switches on/off compression for a publisher
as a whole.
Updates for all subscribers are always delivered uncompressed OR compressed depending
on the SOCKET plugin configuration in the server.
In a future step we will add a <compress type="gzip"/> flag to PublishQos and
to SubscribeQos to have fine grained control.
If so, you can send the patch directly to my mail address.
You would need to add a test case as well and some documentation in
the SOCKET requirement (xmlBlaster/doc/requirements/protocol.socket.xml).
Is there a compatible C compression library with a free license around
which could be added to the C/C++ client library?
Do you have a property for a minimum message size to switch on
compressing?
Yes, there is a compatible C compression library. I mentioned that jzlib
was used as the java library. As it is based on (actually, quite
copy-paste work) the GNU zlib C library, the libs and their
inputs/outputs have to be fully compatible. It's also stated on their
website: http://www.jcraft.com/jzlib/
Let me explain the implementation details:
- Currently, compression is turned on if and only if SSL sockets are
used. However, these features can be separated easily.
- The _whole_ tcp stream is compressed, because it was the easiest to
implement it this way. This means that the "compression window" is not
reinitialized on every message; The dictionary isn't flushed in the
deflater, so repeating sequences are compressed even if they were in a
previous message. I thought that would enable the deflater to compress
the beginning of the message, too, and helps to achive better results
when the same type of small messages are sent frequently (in my case,
that is very true).
What I don't know is how all this behaves when random messages are sent
on a connection. I mean, what if virtually every message would have some
ugly binary content, for example? I still believe this would not make
the messages bigger on average.
I don't understand this. Do you say that the message stream
is compressed on the fly as bytes are pushed in?
I'll look into the code you send to understand it.
A deeper understanding (and more modification) of the SOCKET
implementation would be required if I wanted to compress individual
messages. This is not impossible, either. Moreover, if I put the
compression filter in the right place, we would have compression for
every protocol, wouldn't we?
See discussion above.
Thinking about this, I already have another implementation which, in a
way, supports compressing individual messages: it only compresses a
block of bytes when the stream is flushed or when its buffer is full.
When xmlBlaster sends a message, it writes the whole message to the
OutputStream, then flushes it. When flush is invoked, the buffer state
would enable it to decide whether to compress the current buffer (which
should be a message, or a big part of it) or not. This way, we would
lose the state of the deflater object and start with a new one on every
flush, compress the buffer, and see if it's smaller than the original
buffer. If it is, we'd send it compressed, otherwise not. Some simple
protocol wrapper is also needed, but that is no problem.
If after compression the message is bigger it would consume CPU,
probably a minimal message size could help here.
(The minimal message size could be determined dynamically for
each unnecessary compression...)
I think these are our options.
I will assemble a patch for you as soon as I sepatated the SSL and
compression layer and cleaned up some code. If you like the idea of
deciding compression on a per-flush basis, then that will be included, too.
greetings,
Balázs
Regards
Marcel
--
http://www.xmlBlaster.org