While working on an Android project, one of my tasks was to integrate a home camera into the app. Its a popular Chinese product and the vendor provides an SDK and (barely) some english documentation. The SDK uses an “audio pair” feature, where sound is used to transfer Wi-Fi type, SSID and password information to the camera. However after performing an SDK upgrade the pairing process started to work without sound, like magic. This is a dive into this mystery.
The what
For some more context, why this was an “adventure”: The vendor provides the SDK fro various platforms, among others: Android, iOS, Windows, Linux. Given that the functionality in the SDK includes at least network protocol implementation, encoding/decoding, remote file handling and device pairing it’s almost natural that the bulk of the code would be provided in a cross-platform manner. So the SDK provides a few .so (.dll) libraries and some platform specific wrappers. In case of Android some JNIs. Naturally, the .so files were NOT accompanied by their respective C source equivalent.
The why
As said, after an (incompatible) upgrade of the SDK the pairing process changed. There was no more sound that carried the Wi-Fi data, however the camera still received the entered Wi-Fi parameters and successfully connected the the network.
How is that possible? How could it be? Are the Chinese sending the data over some spy satellite??
What can a guy do. Reverse it.
The tools
Seeing as we’ll only get an answer from the code this seemed a great opportunity to try out something that just came up on my radar. I read about Ghidra available on https://ghidra-sre.org/. It’s a free and OSS reverse engineering tool from the NSA.
After struggling a bit with the Java madness, learning about setup, options and navigation, I could look at the (not great, not terrible) restored C code of the pairing routine.
What we can see is that it constructs an address (sprintf
) and sends some static
data to it with sendto
, typical for UDP traffic. After digging through some more code
I knew that the routine took a pre-constructed byte array of data containing
the Wi-Fi parameters and was processing this byte by byte, effectively encoding one
byte at a time into the sending address. This can also be seen on a network dump
using Wireshark.
But for the love of it, how did the Camera, that was not connected to the Wi-Fi, not knowing the encryption parameters of the Wi-Fi, get this information?
The how
Actually, after getting this far, I already knew the answer. I knew that 239.x.x.x was Multicast, and that when sending Multicast part of the address gets encoded into the MAC destination. An example for the first packet from Wireshark would read:
Ethernet II, Src: 0x:xx:xx:xx:xx:xx (0x:xx:xx:xx:xx:xx), Dst: IPv4mcast_2d:00:29 (01:00:5e:2d:00:29)
And a little bit of research confirmed the other part pd the mystery. On Wi-Fi, even though we have encryption, the L2 layer, the MAC addresses are actually in plain text.
This means that the camera can listen on popular Wi-Fi channels for multicast packets,
with specific contents. The first 3 MAC octets are 01:00:5e
for IPv4 multicast,
the forts is a checksum, used to verify the validity of the data, the fifth is a
sequence and the last one is the data byte. And each data byte gets sent 4 times.
Conclusion
Any technology, no matter how primitive, is magic to those who don’t understand it.1
What did I learn: Knowing your basics is important. Ghidra.
-
A variant of Clarke’s third law: https://en.wikipedia.org/wiki/Clarke%27s_three_laws ↩︎