Mysterious Camera Pairing

November 6, 2020

3 minutes read

While working on an Android project, one of my tasks was to integrate a home camera into the app. Its a popular Chinese product and the vendor provides an SDK and (barely) some english documentation. The SDK uses an “audio pair” feature, where sound is used to transfer Wi-Fi type, SSID and password information to the camera. However after performing an SDK upgrade the pairing process started to work without sound, like magic. This is a dive into this mystery.

The what

For some more context, why this was an “adventure”: The vendor provides the SDK fro various platforms, among others: Android, iOS, Windows, Linux. Given that the functionality in the SDK includes at least network protocol implementation, encoding/decoding, remote file handling and device pairing it’s almost natural that the bulk of the code would be provided in a cross-platform manner. So the SDK provides a few .so (.dll) libraries and some platform specific wrappers. In case of Android some JNIs. Naturally, the .so files were NOT accompanied by their respective C source equivalent.

The why

As said, after an (incompatible) upgrade of the SDK the pairing process changed. There was no more sound that carried the Wi-Fi data, however the camera still received the entered Wi-Fi parameters and successfully connected the the network.

How is that possible? How could it be? Are the Chinese sending the data over some spy satellite??

What can a guy do. Reverse it.

The tools

Seeing as we’ll only get an answer from the code this seemed a great opportunity to try out something that just came up on my radar. I read about Ghidra available on https://ghidra-sre.org/. It’s a free and OSS reverse engineering tool from the NSA.

After struggling a bit with the Java madness, learning about setup, options and navigation, I could look at the (not great, not terrible) restored C code of the pairing routine.

What we can see is that it constructs an address (sprintf) and sends some static data to it with sendto, typical for UDP traffic. After digging through some more code I knew that the routine took a pre-constructed byte array of data containing the Wi-Fi parameters and was processing this byte by byte, effectively encoding one byte at a time into the sending address. This can also be seen on a network dump using Wireshark.

But for the love of it, how did the Camera, that was not connected to the Wi-Fi, not knowing the encryption parameters of the Wi-Fi, get this information?

The how

Actually, after getting this far, I already knew the answer. I knew that 239.x.x.x was Multicast, and that when sending Multicast part of the address gets encoded into the MAC destination. An example for the first packet from Wireshark would read:

Ethernet II, Src: 0x:xx:xx:xx:xx:xx (0x:xx:xx:xx:xx:xx), Dst: IPv4mcast_2d:00:29 (01:00:5e:2d:00:29)

And a little bit of research confirmed the other part pd the mystery. On Wi-Fi, even though we have encryption, the L2 layer, the MAC addresses are actually in plain text.

This means that the camera can listen on popular Wi-Fi channels for multicast packets, with specific contents. The first 3 MAC octets are 01:00:5e for IPv4 multicast, the forts is a checksum, used to verify the validity of the data, the fifth is a sequence and the last one is the data byte. And each data byte gets sent 4 times.

Conclusion

Any technology, no matter how primitive, is magic to those who don’t understand it.¹

What did I learn: Knowing your basics is important. Ghidra.

A variant of Clarke’s third law: https://en.wikipedia.org/wiki/Clarke%27s_three_laws ↩︎

Damjan Cvetko