TL;DR: I couldn’t make a custom BlazorPack editor work in Burp, so I used Mallet instead. From an indecipherable binary mess to this, in about 100 lines:
For details on how to do this yourself, even for other protocols, read on!
On a recent assessment, Marianka ran into a website using BlazorPack. As Microsoft describes it: “Today’s modern apps are expected to deliver up-to-date information without hitting a refresh button. Add real-time functionality to your dashboards, maps, games and more.”
The initial login used Office365 credentials, via OAuth, downloaded some resources, then transitioned to WebSockets for the rest of the application. After a quick JSON-formatted protocol negotiation, the remainder of the communication was in a binary format, making it really difficult to try to tamper with it, or even to understand what was really going on.
It turns out that there are two versions/implementations of Blazor, client-side and server-side. The client-side version transfers a large WebAssembly blob from the server, which then interacts with the server using a series of HTTP requests. The server-side version actually keeps the application state on the server, and simply sends a presentation layer down to the browser. Having done that, all user interaction (clicks, keystrokes, etc) gets sent back to the server over WebSockets, and the server then sends new rendering instructions back to the browser. We were dealing with this server-side implementation of Blazor.
(You can find a demo site using the server-side approach here, which can be used to walk through the rest of this blog post.)
Now, while you can sometimes change individual bytes in a binary payload, without too much risk of breaking it horribly, actually figuring out which bytes do what can be quite a task. Ideally, we’d like to figure out how to decode this binary stream into something (somewhat) more comprehensible, and then how to re-encode any changes that we make.
A bit of research turned up the DotNet source code for the blazorpack protocol, part of the Microsoft DotNet repository on GitHub. From there I could see that the binary protocol was constructed using a Protobuf-style varint representing the length of the message, with the indicated number of bytes following being a MessagePack-encoded blob.
My initial thought was to use BurpSuite’s extension API to make a Editor that would decode the various WebSocket frames, and present them in a readable form, perhaps JSON encoded for easy tampering. However, I was stumped almost immediately when I realised that the WebSocket frames shown by Burp were a maximum of 4096 bytes each, but the actual message could be far larger than that, spread over several frames. From what I could see, Burp had no support for aggregating multiple WebSocket frames (Continuation Frames) into a single entity, and so any attempt to decode a message that was spread over multiple frames would be doomed to fail. Perhaps PortSwigger could consider adding this to BurpSuite. (This post was written before PortSwigger announced their new API, but from what I can see of the Montoya API, aggregating WebSocket frames is still not supported. It would also be nice to see whether a WebSocket frame is Text or Binary, but I digress!)
Of course, this was not the end of the road! Mallet is a tool that I have been working on for several years, aimed at exactly this problem – proxying and intercepting arbitrary protocols!
To solve this problem, we’d need to put a few building blocks in place first. Mallet already had support for HTTP (1.0 and 1.1), as well as WebSockets. It also had support for decoding and encoding JSON-formatted messages – needed for the initial handshake. Of course, you could simply assume that the protocol negotiation proceeded as expected, and skip the first request and response before starting the BlazorPack decoding, but for completeness, actually handling the JSON messages would probably be good.
And then finally, we’d need a ProtobufVarint32FrameDecoder, that will break up the stream into actual message-sized chunks, by reading the preceding Varint32, and then that many bytes following. Fortunately, Netty already has that, along with the corresponding FrameEncoder. That just left decoding the MessagePack format itself.
My first approach was to use the MessagePack java implementation, and simply wrap it in a couple of Netty classes, to convert the Netty way of doing things to the MessagePack way. Unfortunately, I ran into the first problem that a round trip of bytes to decoded Object, and back again resulted in a differently encoded output. Trying to make sense of the MessagePack library implementation, so that I could understand where the difference had crept in, also had me scratching my head in frustration. It seemed far more complicated than it needed to be!
I then decided to try implement my own MessagePack decoder and encoder, directly from the specification. It couldn’t be *that* hard, could it?
Famous last words, normally! But in this case, a few hundred lines of code in two classes later, I was decoding and encoding, round tripping back to the exact same input byte array! Fantastic!
This is a great advantage of the Netty framework, and its philosophy. While the MessagePack library needed to cater for decoding in a streaming form, adding chunk after chunk, the Netty approach of knowing up front how many bytes to read before trying to decode simplified the decoder immensely! Not having to be able to record exactly where you are in the object tree, so that you can restart from that point, cuts out an enormous amount of complexity.
(I did decide to skip a few of the more esoteric MessagePack protocol extension features, though, so it isn’t an entirely complete MessagePack implementation, I’m afraid!)
And unfortunately, after getting it all set up in a pipeline, it turned out that I was doing something wrong in my encoding or decoding, and Blazor was reporting errors about “no object ID: 9”, and similar. I made a test suite, with a variety of object types and values, but all that did was confirm that I was decoding things the same way that I was encoding them! I even made use of the “official” Messagepack java implementation to convert the objects to serialised bytes, pass those through my codec, confirm that the decoded object was the same as the original test object, and that the re-encoded bytes were the same as those generated by the official library.
Eventually, still not knowing exactly what data type I was processing incorrectly, I realised that I had been using an older version of the MessagePack-Java library, because it had been renamed at some point to messagepack-core! Tearing out my own implementation, I wrapped the latest version of messagepack-java into a Netty codec, and we were in business! Everything was working, and no errors were being reported!
To give you an idea of what the codec ended up looking like, and how much effort it was to integrate, this is the MessagePackCodec. The Decoder is wrapping a Netty ByteBuf containing the data with an InputStream, then using the library to read the objects from it. (I did have to fight a bit with the Groovy scripting engine, which was invoking the wrong method for some reason!). The Encoder simply invokes the MessagePack library to serialize the Object to a byte array, and then writes that into a Netty ByteBuf. And finally, the Codec simply combines the two into a single class.
So, the Mallet processing pipeline looks like this, from client to proxy:
- A SOCKS handler to figure out where the connection is going to.
- An SSLSniffHandler to determine whether the connection is encrypted or not. This provides a branching capability, so that the necessary SSL handlers can be added to those connections.
- An HttpServerCodec, to decode the incoming bytes into HTTP Request objects, and encode HTTP Response objects to bytes.
- An HttpObjectAggregator, to combine chunked HTTP Content objects into a single entity.
- A WebSocketServerUpgradeHandler, to manage the WebSocket upgrade negotiation, and remove the HTTP codec when the negotiation completes.
- A Groovy ScriptHandler, to install the ProtobufVarInt FrameCodec, and the MessagePackCodec once the WebSocket connection has been negotiated.
- The Intercept handler, that allows us to see and tamper with the messages.
And then effectively the same on the outbound/upstream connection, just with Client implementations of the SSL and HTTP codecs instead of Server implementations. The full graph is available in the Mallet examples.
Note that the non-SSL branch does not have the BlazorPack handlers. This was created purely in case any non-HTTP resources were requested.
In the above image, we can see the connection being established, SOCKS negotiation, SSL negotiation, and then the initial HTTP request and response, performing the WebSocket upgrade handshake.
After the WebSocket channel is established, we can see the initial text frame with a JSON message in it, followed by what is apparently an acknowledgement message (in a binary frame).
Finally, the BlazorPack handlers are added to the pipeline, and the encoded binary messages can be deserialised into representative Java objects. Note that the Mallet Reflection editor allows us to drill down into the individual Objects that are decoded. At this point, we can start trying to understand how this protocol actually works, and look for any possible vulnerabilities.
For reference, here is an example of the Groovy ScriptHandler that was used to skip the first two WebSocket messages, before installing the Blazor protocol interpreters. (The full script is available in the Mallet GitHub repo, linked under the image.) The same script was used both on the client-to-proxy pipeline, and on the proxy-to-server pipeline.
The basic approach was to increment a state variable until we get to the point where the initial handshake has been completed, then just forward messages back and forth. The one remaining wrinkle was dealing with the WebSocket close frame that either end might send. If we see one, we should just close the channel. And we only need to worry about closing one channel. Mallet will see that, and close the other one for us.
One last detail to note is that we need to unwrap the BinaryWebSocketFrames to get just the raw bytes, and rewrap them again on the way back. This is implemented using another small script, BinaryWebSocketFrameCodec.groovy, shown below.
As you can see, it is very simple! Note: The necessary import statements have been omitted for brevity, but are available by clicking the link above.
In the interests of easy development, the MessagePackCodec was implemented in the same script as the BlazorPack UpgradeHandler. The ScriptHandlers are even specifically designed to make this easy, as they will reload the script from disk if given a filename, every time a new connection is made (and therefore a new ScriptHandler instance is created). Mallet has also been updated to automatically load libraries from the ./libext/
directory on startup (and there is a reload button if you update the libraries after startup too!) So if you are looking at a new protocol, drop any existing libraries into the libext
directory, write a bit of Groovy (or other scripting language of your choice! Jython, JavaScript and other JSR-223 engines should all work!) and hack up a quick script to make the changes you need! Of course, for proper integration in your IDE, you may want to add those libraries to the pom.xml and have everything automatically included.
And this ultimately is the power that Netty brings to this arena. A powerful, clean API that makes it very easy to compose small, self-contained handlers, that do one thing only, but do it well. My sincere thanks to the authors and contributors of the Netty project!
Footnote: These were the artifacts used to decode the MessagePack frames, available from the Maven repositories:
- com.fasterxml.jackson.core:jackson-databind version [2.8.11.1,)
- org.msgpack:msgpack-core version 0.9.0
- org.msgpack:jackson-dataformat-msgpack version 0.9.0
If you use the site suggested above for testing this, please make sure that your SOCKS proxy configuration in your browser is set up for remote DNS resolution. cdn.syncfusion.com will not negotiate a TLS connection if the correct SNI name is not provided, which is the case when only the IP address is available. Ask me how I know this!