Fuzzing Redux, leveraging AFL++ Frida-Mode on Android native libraries

# Intro Welcome fellow `antiquarians`, after detouring a little bit on the app layer we are back to the native layer. Both parts are very important for `Android userland` of course. The apps provide interactive access and can themselves be a vehicle for RCE and [LPE](https://android.googlesource.com/platform/frameworks/base/+/6a9250ec7fc9801a883cedd7860076f42fb518ac%5E%21/) (`<- wtf lol`). On the other hand, applications leverage native code to do any number of things and as we know native code has bugs (no matter how much copium people are smoking about `C`). Fuzzing these native interfaces is going to be fruitful to find more typical memory corruption issues (UAF, double free, type confusion, integer over/undeflow, race conditions, etc.). There are many scenarios to consider, some of these require manual decompilation and analysis and others lend themselves to pure brute force. This brute force aspect is what we are looking at today. I think it's important to mention that this is a follow-up on a previous post and that a lot of background information on what we are doing is explained there: - Coverage guided fuzzing for native Android libraries (Frida & Radamsa) - [here](https://knifecoat.com/Posts/Coverage+guided+fuzzing+for+native+Android+libraries+(Frida+%26+Radamsa)) Our goal here, more or less, is to play `fall guys` with millions of characters simultaneously that are being selected on fitness. ![[afl-01.webp]] # Preamble Ramble In another life I studied formal logic and I have a long held belief that large scale software development is subject to Gödel's incompleteness theorems. ``` (1) Any consistent formal system 𝐹 within which a certain amount of elementary arithmetic can be carried out is incomplete; i.e., there are statements of the language of 𝐹 which can neither be proved nor disproved in 𝐹. (2) For any consistent system 𝐹 within which a certain amount of elementary arithmetic can be carried out, the consistency of 𝐹 cannot be proved in 𝐹 itself. ``` I think this should also inform us on how we can reduce vulnerabilities. Instead of focusing only on writing code without vulnerabilities (which is impossible in an abstract sense) we should focus on writing small, well-understood, pieces of code that mitigate bug classes and exploitation primitives. You can see that this is also where the industry is going (VT-rp/PAC/..). The same is true of memory-safe languages btw, they are a mitigation for types of bug classes. # Salting the Battlefield Ok, what are we doing here today? Great question! As I mentioned before, I was reading [a post](https://blog.quarkslab.com/android-greybox-fuzzing-with-afl-frida-mode.html) by [Quarkslab](https://twitter.com/quarkslab) on fuzzing Android applications, on-device, using `AFL++ Frida mode`. In the [previous post](https://knifecoat.com/Posts/Coverage+guided+fuzzing+for+native+Android+libraries+(Frida+%26+Radamsa)) we jury-rigged our own fuzzer using `frida` for coverage and `radamsa` for mutation. This time we are going to actually replicate the `Quarkslab` results because the performance just far outstrips anything we could do ourselves. # Tool Building We will need to build `AFL++` with [`frida mode`](https://github.com/AFLplusplus/AFLplusplus/tree/stable/frida_mode). For a layperson, like me, I feel like OSS build documentation is lacking, maybe this is an artifact of working mostly on the `beautiful` Windows platform. To do our business we are going to need to set up a few things. - We need the [Android NDK](https://developer.android.com/ndk) - We need `AFL++ Frida-Mode` - We need `lldb-server` I am lazy so I don't like manual processes, especially if I will forget the process 3 months down the line 🫠. Anyway, what I did was write a small `Ansible` playbook that lets you build a release package on `mac arm64`. ![[afl-02.mov]] See, isn't that a much nicer experience, stop it, do some product builds for your customers! I have made my playbook available [here](https://github.com/FuzzySecurity/afl-frida-build). Notice just a few things. At the top you can set two variables. ```yml --- - name: AFL++ Frida-Mode MAC ARM64 Build hosts: localhost gather_facts: yes vars: ANDROID_PLATFORM: 31 # Platform version to build for AFL_TAG: "latest" # Use "latest" or set to specific tag (e.g. "v4.08c") ``` These two variables let you control what `platform version` you are building for and which `release tag` to use for `AFL++`. Quick, reproducible builds. You will see that I also check the `OS` and `architecture`. ```yml tasks: - name: Register OS Family and Architecture set_fact: os_family: "{{ ansible_facts['os_family'] }}" os_arch: "{{ ansible_architecture }}" tags: [ 'always' ] - name: Check if OS and architecture are supported fail: msg: "Unsupported OS or architecture" when: os_family != 'Darwin' or os_arch != 'arm64' tags: [ 'always' ] ``` There are a few minor changes needed to make this playbook build on different platforms. If you update the script for your purposes, why not make a pull request so others can enjoy in the bounty of your labour 👀. I do want to credit [@quarkslab](https://twitter.com/quarkslab) for the original work and [@Ch0pin](https://twitter.com/Ch0pin) for the changes he highlighted to make the build work on `mac arm64` ([here](https://valsamaras.medium.com/fuzzing-android-binaries-using-afl-frida-mode-57a49cf2ca43)). # How bout Fuzz tho? When we want to bring brute force leverage to native libraries, there are a few cases to consider. - *Native functions*: The library may have a pure native function we want to test. This can basically be equivalent to the top level `Java` function invocation (in practice) or it can be some sub-component of a larger native function. - *Weakly linked JNI*: Here we fuzz the `JNI interface`, we create a `JVM` to satisfy the requirements of the `JNI interface parameters` and then we have the ability to construct simple argument objects (like `jbyteArray`). - *Strongly linked JNI*: This is similar to the last case but the difference is that we can't manually construct the argument options. Here we either `load the apk` or create a `custom java class`. The harness loads the dependency and uses it to make complex argument objects. The further down this list you go, the more performance you lose. Keep in mind as well that you probably do want to target the `JNI interface` in many cases because that will just give you the most complete results. You will need the files uploaded by `Quarkslab` [here](https://github.com/quarkslab/android-fuzzing/tree/main). Let's look at the first and the last case. #### Native Invocation In the native android library we have a function (`fuzzMe`) which takes a `char[]` and a `char[].length`. On receiving the correct string the function calls a `null pointer` resulting in a crash. ![[afl-03.png]] We can call this function using its export (`_Z6fuzzMePKai`) or using `base + 0x0ffs3t` (if we want). Now lets's look at the `Quarkslab` harness. ```c #include <errno.h> #include <stdint.h> #include <stdio.h> #define BUFFER_SIZE 256 /* Target function */ extern void _Z6fuzzMePKai(const uint8_t*, uint64_t); /* Persistent loop */ void fuzz_one_input(const uint8_t *buf, int len) { _Z6fuzzMePKai(buf, len); } int main(void) { const uint8_t buffer[BUFFER_SIZE]; ssize_t rlength = fread((void*) buffer, 1, BUFFER_SIZE, stdin); if (rlength == -1) return errno; fuzz_one_input(buffer, rlength); return 0; } ``` Very easy to understand, the harness imports the native library and then defines the function by its export. The fuzzing itself will pass inputs to the intermediate wrapper `fuzz_one_input` that then calls the library function. The `JavaScript` then is pretty boilerplate. `Frida` [CModule](https://frida.re/docs/javascript-api/#cmodule) is used to prepare `fuzz_one_input` to receive data from `afl`. It manipulates the registers to simulate the calling convention for the intermediate function. ```js Afl.print(`[*] Starting FRIDA config for PID: ${Process.id}`); const cm = new CModule(` #include <string.h> #include <gum/gumdefs.h> #define BUF_LEN 256 void afl_persistent_hook(GumCpuContext *regs, uint8_t *input_buf, uint32_t input_buf_len) { uint32_t length = (input_buf_len > BUF_LEN) ? BUF_LEN : input_buf_len; memcpy((void *)regs->x[0], input_buf, length); regs->x[1] = length; } `, { memcpy: Module.getExportByName(null, "memcpy") } ); const pStartAddr = DebugSymbol.fromName("fuzz_one_input").address; Afl.setPersistentHook(cm.afl_persistent_hook); Afl.setPersistentAddress(pStartAddr); Afl.setEntryPoint(pStartAddr); Afl.setInMemoryFuzzing(); Afl.setInstrumentLibraries(); Afl.done(); Afl.print("[*] All done!"); ``` For the finer points you should look at the original post but these are the key parts to understand: - The harness loads the `apk native library` - The harness loads `afl-frida-trace.so` that we compiled - `afl-fuzz` uses `frida` to fuzz the intermediate function - Coverage is generated using `frida` ![[afl-04.mov]] The performance is really incredible on my physical `Pixel 7`, we have `27-28k` executions per second. This is much more than the original `Quarkslab` post which had about `10k/s`. One assumes this is just because of the power of ~~`love`~~ better hardware. The original testing was done on a `Samsung Galaxy A32 (SM-A325F)`. ![[afl-05.png]] Now that we have these amazing results, I want to quickly address the performance of our previous `DIY fuzzer`. It's sort of embarrassing but in that scenario (which was quite different, keep that in mind) we had around `30-60` executions per second. ![[afl-06.webp]] Just consider for a moment the amount of leverage we have here, we tested `3.32m` cases in less than `2 minutes` of runtime 🔥. #### Strong JNI Invocation I encourage you to try the `Quarkslab` weak and strong `JNI` cases but lets look at something a little bit different here. Once I had finished testing everything from the post I applied what I had learned to a `native` android library from a different `apk`. Lets have a look at some `anonymized` details. In the `apk` there is a `Java` interface that calls the native module like this: ```java public static native JumanjiObject dragon(JumanjiObject jumanji); ``` Now, it is difficult for us to make a `JumanjiObject` and just call the `JNI` export, in other words we are in this third category. Lets look at parts of the adjusted fuzzer harness (again details are `anonymized`). Here, we create a `JVM` and because we are lazy we just load the `apk` as a dependency. ```c #include "jenv.h" static JavaCTX ctx; // JNI function export in the native library extern jobject Java_com_jumanji_bats_object_dragon(JNIEnv*, jclass, jobject); ... int main(void) { char* options = "-Djava.class.path=/data/local/tmp/jumanji/jumanji.apk"; .... if ((status = init_java_env(&ctx, &options, 1)) != 0) { fprintf(stderr, "Failed to initialize JNI environment: %d\n", status); return status; } fuzz_one_input(buffer, rlength); return 0; } ``` You can see this is very similar to the `Quarkslab` example. Now, lets look at the `fuzz_one_input` function itself. ```c void fuzz_one_input(const uint8_t* buffer, size_t length) { JNIEnv *env = ctx.env; // Create a byte array from the buffer jbyteArray jBuffer = (*env)->NewByteArray(env, length); (*env)->SetByteArrayRegion(env, jBuffer, 0, length, (const jbyte*)buffer); // Load the class jclass byteArrayJumanjiClass = (*env)->FindClass(env, "com/jumanji/bats/object"); if (byteArrayJumanjiClass == NULL) { fprintf(stderr, "Could not find jumanji/bats/object class\n"); return; } // Find the constructor that takes a byte array jmethodID ctor = (*env)->GetMethodID(env, byteArrayJumanjiClass, "<init>", "([B)V"); if (ctor == NULL) { fprintf(stderr, "Could not find the constructor for JumanjiObject\n"); return; } // Create the object jobject oJumanjiObject = (*env)->NewObject(env, byteArrayJumanjiClass, ctor, jBuffer); if (oJumanjiObject == NULL) { fprintf(stderr, "Could not create JumanjiObject\n"); return; } // Directly call the JNI exported function jobject roJumanjiObject = Java_com_jumanji_bats_objects_dragon(env, NULL, oJumanjiObject); // Clean up local references (*env)->DeleteLocalRef(env, oJumanjiObject); (*env)->DeleteLocalRef(env, jBuffer); if (roJumanjiObject != NULL) { (*env)->DeleteLocalRef(env, roJumanjiObject); } } ``` Again, this isn't exactly what my code looks like but it gives you an idea. It is pretty easy to understand, we resolve the `class` that has the constructor for our `JumanjiObject`, we take the `byte[]` and pass it to the constructor so it can make the correct object for us and then we directly call the `JNI` function. Notice that this is still a pretty simple example and actually the prototype for `fuzz_one_input` hasn't changed at all. ![[afl-07.mov]] We just cut the tape here ok 😅.. ![[afl-08.webp]] I didn't analyse the crashes too diligently because I previously did some manual analysis on this library but I want to point out that `Afl++` found an `OOB write` that I knew was present in the library. Something else here is that a few crashes are stability related but overall that doesn't really matter, it's much faster to triage `15 crashes` than to look for `5 bugs`. This scenario is much more close to the setup we did before with our `DIY fuzzer` and you can see that we lose a lot of performance compared to the `pure native` case but we are still running at `13-14k` executions per second so `¯\_(ツ)_/¯`.. Something else to keep in mind is this exchange with [@Ch0pin](https://twitter.com/Ch0pin). ![[afl-09.png]] We should try to reduce complexity and implement our own `Java` wrapper. Actually the `Quarkslab` post does go into this but here we were lazy of course. The other important point is that, where possible, we should initialize as much of the object we need to create outside of the tight loop. Anything extra that doesn't have to be called thousands of times per second is going to increase our performance. # How bout Triage tho? Great .. we have to learn a new debugging stack `lldb` (the `k00l kids` are using this, that is the word on the streets). Doesn't `WinDBG` support connecting to a [gdb server](https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/linux-live-remote-process-debugging) now, `please(!) Microsoft` get support in there for `lldb`. Show these `UNIX types` what a `real debugger` is ok. #### Improving toolchains Anyway, we are stuck with what we have for now I guess. Still, we don't have to put up with the bare-bones `lldb` experience, we can at least use [llef](https://github.com/foundryzero/llef) which is a plugin similar to `GEF` for `gdb`. You can get more details about this from the [@FoundryZero](https://twitter.com/foundryzero) [blogpost](https://foundryzero.co.uk/2023/07/13/llef.html). #### Debugging Quark-Native First lets adjust the `fuzz` harness so we can actually pass an input as an argument and compile it. ```c #include <errno.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #define BUFFER_SIZE 256 /* Target function */ extern void _Z6fuzzMePKai(const uint8_t*, uint64_t); /* Persistent loop */ void fuzz_one_input(const uint8_t *buf, int len) { _Z6fuzzMePKai(buf, len); } int main(int argc, char *argv[]) { if (argc != 2) { fprintf(stderr, "Usage: %s <path_to_input_file>\n", argv[0]); exit(EXIT_FAILURE); } const char *file_path = argv[1]; FILE *file = fopen(file_path, "rb"); if (!file) { fprintf(stderr, "Error opening file: %s\n", strerror(errno)); exit(EXIT_FAILURE); } uint8_t buffer[BUFFER_SIZE]; ssize_t rlength = fread(buffer, 1, BUFFER_SIZE, file); if (rlength == -1) { fprintf(stderr, "Error reading file: %s\n", strerror(errno)); fclose(file); exit(EXIT_FAILURE); } fclose(file); fuzz_one_input(buffer, rlength); return 0; } ``` ![[afl-10.png]] Next transfer the `lldb-server` to the phone, if you ran the `Ansible` playbook you should have the binary ready to go. ``` panther:/data/local/tmp/quark-repro # ./aarch64-lldb-server version lldb version 17.0.2 (/mnt/disks/build-disk/src/android/llvm-r487747/out/llvm-project/lldb revision d9f89f4d16663d5012e5c09495f3b30ece3d2362) clang revision d9f89f4d16663d5012e5c09495f3b30ece3d2362 llvm revision d9f89f4d16663d5012e5c09495f3b30ece3d2362 ``` Launch the `lldb-server` and forward the port. ``` adb shell /data/local/tmp/quark-repro/aarch64-lldb-server platform --server --listen \*:12345 adb forward tcp:12345 tcp:12345 ``` Finally, on our localhost we can launch `lldb` and interface with the server. ``` ➜ ~ lldb Stop hook #1 added. (lldb) platform select remote-android Platform: remote-android Connected: no (lldb) platform connect connect://localhost:12345 Platform: remote-android Triple: aarch64-unknown-linux-android OS Version: 34 (5.10.189-android13-4-00012-g1217bb583cc5-ab11174560) Hostname: localhost Connected: yes WorkingDir: /data/local/tmp Kernel: #1 SMP PREEMPT Mon Dec 4 18:59:42 UTC 2023 (lldb) target create /data/local/tmp/quark-repro/fuzz warning: (aarch64) /Users/b33f/.lldb/module_cache/remote-android/.cache/8D2C789E-447E-3AA1-419F-4D4F529540EF/libm.so No LZMA support found for reading .gnu_debugdata section warning: (aarch64) /Users/b33f/.lldb/module_cache/remote-android/.cache/6754D6E7-000D-2408-43DB-2AF9537B7527/libdl.so No LZMA support found for reading .gnu_debugdata section warning: (aarch64) /Users/b33f/.lldb/module_cache/remote-android/.cache/56B60BF7-F712-8BC8-F262-62845040279B/ld-android.so No LZMA support found for reading .gnu_debugdata section Current executable set to '/data/local/tmp/quark-repro/fuzz' (aarch64). (lldb) settings set -- target.run-args /data/local/tmp/quark-repro/case.bin (lldb) process launch ``` ![[afl-11.png]] At this point we can step into `frame 1` and do whatever we need there to analyse what happened. One issue we have is that we need to deal with `arm assembly`. This is not a big problem, we will survive this ordeal like many that have come before it but it would be nice if we could write a headless `binja` plugin for `lldb` so we can get some `pseudo-c` or `HLIL`. # Conclusions Good, good, our knowledge progresses at a geometric rate 🙇‍♂️! We learned a lot of things here. Previously we decompiled the `apk` and `native library` to analyse them and construct automation wrappers around the target code, this gave us a better foundational understanding of the problem-space. In this post we built on that knowledge to apply some very powerful leverage with `Alf++ Frida-Mode`. Additionally, putting this into practice let us get a better understanding of the toolchain, build some automation, enhance our processes and apply all of this to a totally new target (reproducing an `OOB write`). ``` "The etching of micro-integrated circuits can only occur at the macro-scale and only on a macroscopic two-dimensional plane, thus we must unfold a proton into two dimensions." "Unfold a nine-dimensional structure into two dimensions? How big would the area be?" "Very big, as you will see." The science consul smiled. - Trisolaris, pendulum monument ```