Runtime Android Object Instrumentation

# Intro This year I have been doing quite a bit Android userland analysis. Android is a wonderful platform to work on, great decompiler support ([JEB](https://www.pnfsoftware.com/)), easy access to rooted devices (unless you buy NA locked bootloaders, ~~thanks Samsung~~ 😅), java and of course dynamic instrumentation with [Frida](https://frida.re/). Doing this research, exploring large black-box (obfuscated) applications requires some tactical warfare strategies. It's hard to overstate how important Frida is for discovery of functionality, runtime logging of sensitive application classes/methods, POC reproduction, debugging and fuzzing. In this post I want to repackage a small portion of my learnings in short form and touch briefly on Frida, dynamic analysis and data enrichment. We will spin this post `just a little bit` and put it in the context of the research that may go into post-exploitation capability development. As vendors get more aggressive with device mitigations (e.g. [Memory Integrity Enforcement](https://security.apple.com/blog/memory-integrity-enforcement/)), some portion of capabilities will inevitably move to the target app layer. One wonders, on a time, if device mitigation improvements always result in safer users or if they instead push regulators to legislate for mandated remote access capabilities. Just something to think about. ``` There's a line somewhere in the desert, invisible, shifting with the heat, and by God, we didn’t just cross it, we built a bar on the other side and started charging admission. By nightfall we were serving gasoline in crystal glasses and calling it progress, because once you've sold the sin, the sermon writes itself. ``` # Dynamic Instrumentation We will focus on `SQLite` as a case-study because it is useful for `in-app` post-exploitation (and anti-forensics) but also because it is generally a great resource to map application functionality. However, it is important to understand that this can be applied in a generalized way to a variety of scenarios. ### Mischief Managed If you look online, you can find some pretty good code snippets to get you started. This `codeshare` entry for example: - [https://codeshare.frida.re/@ninjadiary/sqlite-database/](https://codeshare.frida.re/@ninjadiary/sqlite-database/) A lot of these methods have overloads. If you look how this is implemented in the `codeshare` entry you will see that each overload is defined specifically by its argument composition. For example [`SQLiteDatabase.execSQL`](https://developer.android.com/reference/android/database/sqlite/SQLiteDatabase#execSQL(java.lang.String)) has two overloads: ``` SQLiteDatabase.execSQL(java.lang.String) SQLiteDatabase.execSQL(java.lang.String, java.lang.Object[]) ``` The `codeshare` has entries that correspond to each of these: ```js // execSQL(String sql) sqliteDatabase.execSQL.overload('java.lang.String').implementation = function(var0) { console.log("[*] SQLiteDatabase.exeqSQL called with query: " + var0 + "\n"); var execSQLRes = this.execSQL(var0); return execSQLRes; }; // execSqL(String, sql, Obj[] bindArgs) sqliteDatabase.execSQL.overload('java.lang.String', '[Ljava.lang.Object;').implementation = function(var0, var1) { console.log("[*] SQLiteDatabase.exeqSQL called with query: " + var0 + " and arguments: " + var1 + "\n"); var execSQLRes = this.execSQL(var0, var1); return execSQLRes; }; ``` This is great and I would recommend this approach when implementing these types of hooks. I just want to add a note that you can do runtime dynamic overload reconstruction (even if you don't know the arguments). ```js Java.perform(function () { const SQLiteDatabase = Java.use('android.database.sqlite.SQLiteDatabase'); SQLiteDatabase.execSQL.overloads.forEach((overload) => { const paramTypes = overload.argumentTypes.map(t => t.className); console.log('[+] Hooking execSQL overload: (' + paramTypes.join(', ') + ')'); const origImpl = overload.implementation; overload.implementation = function () { const sql = arguments[0]; const args = arguments[1] || null; console.log('[*] SQLiteDatabase.execSQL [' + paramTypes.join(', ') + ']: ' + sql + (args ? ' | args: ' + args : '')); return origImpl.apply(this, arguments); }; }); }); ``` You can use this to create a single handler that processes all overloads for a specific function. There are some advantages to doing this but it also adds a lot of complexity to your code. ### Data Enrichment Printing text is a good first step but we may experience some problems here: - High-volume hooks produce a lot of data just streaming past on the console. Saving text to disk is `boring` and `not insightful` as a vehicle for scalable analysis. What we really need is to package the data entries into `objects we can process programmatically`. - Raw data is great and very helpful, but we should remember that we are executing `in the context of the target application` and we can actually `enrich our events` with custom information. Any time we hook a function we capture the arguments, process the original implementation and then populate a layered event envelope. In my case, all hooks have a minimal set of properties represented by a `BaseEvent`. ```js /** * Composite event object with common metadata. * * @param {string} funcName * @returns {Record<string, unknown>} */ function createBaseEvent(funcName) { const now = new Date(); const event = { ts: now.toISOString(), pid: Process.id, tid: getThreadId(), package: state.packageName, process: state.processName, layer: 'java', func: funcName, stack: [] }; const threadInfo = state.threadAccessor ? safeInvoke(state.threadAccessor) : null; if (threadInfo) { event.thread = threadInfo.name; event.java_tid = threadInfo.javaId; } if (CONFIG.includeJavaStack) { event.stack = captureJavaStack(); } return event; } ``` This creates a normalized event envelope with common fields that we can modify and augment with function specific information. For example, if we return to one of the `execSQL` overloads, we can do something like this. ```js SQLiteDatabase.execSQL.overload('java.lang.String').implementation = function (sql) { const result = this.execSQL.overload('java.lang.String').call(this, sql); dispatchDatabaseEvent(this, 'SQLiteDatabase.execSQL', { sql: sanitizeSql(sql) }); return result; }; ``` Here we don't invoke the `BaseEvent` we invoke a special case event handler (`DatabaseEvent`) that wraps the base event. ```js /** * Dispatch a database-level event. * * @param {Java.Wrapper} database * @param {string} funcName * @param {Record<string, unknown>} details * @returns {void} */ function dispatchDatabaseEvent(database, funcName, details) { const event = createBaseEvent(funcName); event.db_path = safeInvoke(() => database.getPath && database.getPath()); Object.assign(event, details); dispatchEvent(event); } ``` Now, instead of seeing this on the console: ``` [*] SQLiteDatabase.exeqSQL called with query: "CREATE TEMP TABLE room_table_modification_log (table_id INTEGER PRIMARY KEY, invalidated INTEGER NOT NULL DEFAULT 0)" ``` We instead get this: ```json { "type": "sqlite-event", "event": { "ts": "2025-11-06T05:32:31.250Z", "pid": 32222, "tid": 32324, "package": "jp.naver.line.android", "process": "jp.naver.line.android", "layer": "java", "func": "SQLiteDatabase.execSQL", "stack": [ "android.database.sqlite.SQLiteDatabase.execSQL(Native Method)", "k9.c.execSQL(SourceFile:1)", "a9.x.n(Unknown Source:31)", "com.linecorp.line.generalsetting.room.GeneralSettingDatabase_Impl$a.d(Unknown Source:8)", "a9.a0.f(Unknown Source:117)", "k9.d$b.onOpen(Unknown Source:15)", "android.database.sqlite.SQLiteOpenHelper.getDatabaseLocked(SQLiteOpenHelper.java:427)", "android.database.sqlite.SQLiteOpenHelper.getWritableDatabase(SQLiteOpenHelper.java:316)", "k9.d$b.e(Unknown Source:4)", "k9.d$b.f(Unknown Source:34)", "k9.d$b.c(Unknown Source:23)", "k9.d.getWritableDatabase(Unknown Source:9)" ], "thread": "arch_disk_io_0", "java_tid": "117", "db_path": "/data/user/0/jp.naver.line.android/databases/GeneralStorageSettings", "sql": "CREATE TEMP TABLE room_table_modification_log (table_id INTEGER PRIMARY KEY, invalidated INTEGER NOT NULL DEFAULT 0)" } } ``` Hopefully this provides some insights on the `why` and `how`; we can go from printing plaintext arguments to shipping standardized JSON object for our hooks. Keep in mind as well that on the receiving end we can apply type validation to the event stream so we can capture any parsing issues and drop invalid events (using [`Zod`](https://zod.dev/) in my case). # Let's Talk Lets have a look at some sample output from the [Line](https://www.line.me/en/) messenger app. Here we just pick some end-to-end encryption (`e2ee`) data as a case study. On opening a chat with a third-party, we see the application load a database dedicated to `e2ee` (if it isn't loaded already). ```json { "db_path": "/data/user/0/jp.naver.line.android/databases/e2ee", "flags": null, "func": "SQLiteDatabase.openDatabase(Ljava/lang/String;,Landroid/database/sqlite/SQLiteDatabase$OpenParams;)", "java_tid": "132", "layer": "java", "package": "jp.naver.line.android", "pid": 26560, "process": "jp.naver.line.android", "sqlite_handle": "android.database.sqlite.SQLiteDatabase@f3646e", "stack": [ "android.database.sqlite.SQLiteDatabase.openDatabase(Native Method)", "android.database.sqlite.SQLiteDatabase.openDatabase(SQLiteDatabase.java:991)", "android.database.sqlite.SQLiteOpenHelper.getDatabaseLocked(SQLiteOpenHelper.java:373)", "android.database.sqlite.SQLiteOpenHelper.getReadableDatabase(SQLiteOpenHelper.java:340)", "wp5.b.getReadableDatabase(Unknown Source:11)", "wp5.d.b(Unknown Source:23)", "aq5.b.c(Unknown Source:2)", "oq5.j.<init>(Unknown Source:64)", "oq5.t.a(Unknown Source:48)", "o20.d.b(Unknown Source:73)", "e1.c.F(Unknown Source:12)", "ei1.h.<init>(Unknown Source:65)" ], "thread": "RxCachedThreadScheduler-2", "tid": 26633, "ts": "2025-10-12T21:14:31.643Z" } ``` I hope you can see that it was worth it to create standardized events and enrich them. We get a lot of very valuable information here. - We of course see that the operation type is `SQLiteDatabase.openDatabase`. - We have the full path to the database on disk. - We have a reference to a manged [`OpenParams`](https://developer.android.com/reference/android/database/sqlite/SQLiteDatabase.OpenParams) object that tells us how the database is accessed. - We get a reference to the database itself. - We get a `callstack` for the operation. This allows us to understand which classes caused that call to happen, even if they are obfuscated. `LINE` then queries the database for the users `e2ee` key based on their `mid` to secure the communication. ```json { "cursor_id": "android.database.sqlite.SQLiteCursor@a90f1a5", "db_path": "/data/user/0/jp.naver.line.android/databases/e2ee", "func": "SQLiteDatabase.query(Ljava/lang/String;,[object Object],Ljava/lang/String;,[object Object],Ljava/lang/String;,Ljava/lang/String;,Ljava/lang/String;,Ljava/lang/String;)", "java_tid": "132", "layer": "java", "package": "jp.naver.line.android", "pid": 27238, "process": "jp.naver.line.android", "query_args": { "columns": [ { "type": "text", "value": "version" }, { "type": "text", "value": "key_id" }, { "type": "text", "value": "pubkey" }, { "type": "text", "value": "prikey" }, { "type": "text", "value": "created_time" } ], "distinct": false, "groupBy": { "type": "null", "value": null }, "having": { "type": "null", "value": null }, "limit": { "type": "null", "value": null }, "orderBy": { "type": "text", "value": "key_id desc, created_time desc" }, "selection": { "type": "text", "value": "mid=? AND version=?" }, "selectionArgs": [ { "type": "text", "value": "uf4e771dc82d8cae43d01cf00cd331d18" }, { "type": "text", "value": "1" } ], "table": { "type": "text", "value": "keystore" } }, "stack": [ "android.database.sqlite.SQLiteDatabase.query(Native Method)", "wp5.m$d$d.b(Unknown Source:52)", "aq5.b.c(Unknown Source:198)", "oq5.j.<init>(Unknown Source:64)", "oq5.t.a(Unknown Source:48)", "o20.d.b(Unknown Source:73)", "e1.c.F(Unknown Source:12)", "ei1.h.<init>(Unknown Source:65)", "rh1.h.O1(Unknown Source:169)", "n20.a.c(Unknown Source:90)", "rh1.e$a.a(Unknown Source:5)", "o20.d.b(Unknown Source:73)" ], "thread": "RxCachedThreadScheduler-2", "tid": 27311, "ts": "2025-10-12T21:28:18.101Z" } ``` Keep in mind that we don't actually care about crypto here as we live inside the `LINE` process but the key material may be useful for a variety of reasons. Notice also that we have access to a `cursor_id` which allows us to track corresponding operations over time. Again our post-hook enrichment is adding a lot of additional value to this event. Finally we can capture messages exchanged between both parties if we want (we can also extract them directly from the database). ```json { "binds": [ { "type": "text", "value": "" }, { "type": "text", "value": "Jumanji" }, { "type": "text", "value": "1760304513538" }, { "type": "text", "value": "1760304513538" }, { "type": "text", "value": "0" }, { "type": "text", "value": "0" }, { "type": "text", "value": "ufceb2b192233285b67362bbfeecb6613" } ], "db_path": "/data/user/0/jp.naver.line.android/databases/naver_line", "func": "SQLiteDatabase.execSQL", "java_tid": "171", "layer": "java", "package": "jp.naver.line.android", "pid": 27238, "process": "jp.naver.line.android", "sql": "update chat set last_message_meta_data=?,last_message=?,last_message_display_time=?,last_created_time=?,message_count=MAX(message_count+?,0),latest_mentioned_position=MAX(latest_mentioned_position+?,0) where chat_id=?", "stack": [ "android.database.sqlite.SQLiteDatabase.execSQL(Native Method)", "gq5.d.t(Unknown Source:260)", "qn5.j.w(Unknown Source:150)", "qn5.j.y(Unknown Source:12)", "ht5.w2$e.invoke(Unknown Source:52)", "qn5.j.l(Unknown Source:36)", "ht5.w2.n(Unknown Source:111)", "ht5.w2.y(Unknown Source:31)", "ht5.w2.x(Unknown Source:28)", "ht5.c3.b(Unknown Source:10)", "jp.naver.line.android.thrift.client.impl.LegacyTalkServiceClientImpl$j.b(Unknown Source:8)", "qt5.c.a(Unknown Source:72)" ], "thread": "[LINE] #5 FWorker k4.a", "tid": 27369, "ts": "2025-10-12T21:28:34.216Z" } ``` The data we can record like this is extensive (in any application). In the example I use `e2ee` only because `it sparks joy` but a lot of sensitive runtime information can be recovered this way. For example, it is possible to coerce the `LINE` application into producing events that reveal the general geographic location of the phone. ![[droid-inst-01.webp]] Recording for less than a minute without any intervention at all produces more than `1000` individual calls for a variety of `SQLite operations` and `application classes`. This is a very nice way to catalogue some application functionality (based on stack traces) and use that as a springboard to dive into specific areas of the app in a decompiler (JEB, JADX, etc.). This may in fact be an iterative process where you pilot the app and explore classes that emit events based on your actions (not just for `SQLite`). ### Can hazz API call? Of course, here we are in a position to also call `managed` and `native` code. Lets revisit the `e2ee` example. When the application opened the database we saw this: ```js SQLiteDatabase.openDatabase(Ljava/lang/String;,Landroid/database/sqlite/SQLiteDatabase$OpenParams;) ``` What we would like to do here is capture data about this call that would allow us to open the database ourselves with the same access as the application. The key point is that we need to be able to parse the [`OpenParams`](https://developer.android.com/reference/android/database/sqlite/SQLiteDatabase.OpenParams) object. This is something we can do easily through `reflection`. ``` ➜ frida -U -f jp.naver.line.android -l e2ee.js ____ / _ | Frida 17.2.17 - A world-class dynamic instrumentation toolkit | (_| | > _ | Commands: /_/ |_| help -> Displays the help system . . . . object? -> Display information about 'object' . . . . exit/quit -> Exit . . . . . . . . More info at https://frida.re/docs/home/ . . . . . . . . Connected to Pixel 7 (id=2C161FDH200CY0) Spawned `jp.naver.line.android`. Resuming main thread! [Pixel 7::jp.naver.line.android ]-> [e2ee] Hook installed on android.database.sqlite.SQLiteDatabase.openDatabase [e2ee] handle = SQLiteDatabase: /data/user/0/jp.naver.line.android/databases/e2ee [e2ee] OpenParams object: getCursorFactory = null getErrorHandler = yw.a@70cfb28 getIdleConnectionTimeout = -1 getJournalMode = null getLookasideSlotCount = -1 getLookasideSlotSize = -1 getOpenFlags = 268435456 (0x10000000) [OPEN_READWRITE | CREATE_IF_NECESSARY] getSynchronousMode = null [e2ee] Reusing existing database object [e2ee] Reading e2ee keymaterial from keystore.. row 0: version=1, mid=ufceb2b192233285b67362bbfeecb6613, key_id=4776671, created_time=0, pubkey=<blob size=48 preview=7cc7ecc7516d628541721744c6fa4f384df85d77b61de01656561020de5aefdec4814e35f1f22018e1fee04489c7dedb>, prikey=null row 1: version=1, mid=uf4e771dc82d8cae43d01cf00cd331d18, key_id=4754484, created_time=1713115452683, pubkey=<blob size=48 preview=25acc0bdbb372f9af7a5b982664d3aba972b3b8a41c9e38f88503068f9d293ce86d81bde770f487554f9aa120e2ef4de>, prikey=<blob size=48 preview=e49234c16d8baf2f66e3bd2604cfec998855abe92e993148a4b0a99eab2e4082a36ed8fbc8c58a87ef31212138742aba> ``` Here we simply reuse the existing connection to dump all `e2ee key material` from the database but we could also create our own `OpenParams` object if we wanted to. This can be very useful in cases where we want to reuse existing application functionality to do something else. It's possible, for example, to write a generic harness to have a feature complete database reader/writer in memory. Notice however that the application opens the database with `OPEN_READWRITE`, so you would have to design the harness to deal with concurrency (if you need to write). This is beyond the scope of this small blog. Again, we could also do something like this at the native layer. I would have to do some do some research but I assume the call chain is something like this: ```js SQLiteDatabase.openDatabase(...) -> SQLiteConnectionPool.open(...) -> SQLiteConnection.nativeOpen(...) -> sqlite3_open_v2(...) ``` You may have to use `stalker` to actually track that execution flow down, but you can do that and you should be able to open the database with native calls only. Something else I was thinking about (this is more theoretical, I haven't tried), if you had a read primitive you could scan memory sections with specific flags for `sqlite3*` objects. You could use `SQLITE_MAGIC_OPEN (0xa029a697)` as a marker and then validate know values at offsets from the marker to see if the match is a true positive. For example you could validate `sqlite3->*aDb->*zName`, sounds like an interesting engineering problem. # The ART of Automation We have seen that it is useful to have enriched data but you may ask yourself, *aren't we just looking at a lot of large JSON blobs now*? Yes this is true of course but like I said we now have the ability to process the data programmatically. This means that we can write tools on top of our collection capabilities to do visualization and exploration. Earlier this year I wrote a `node express app` that integrates directly with [`frida-node`](https://github.com/frida/frida-node) to do all of the collection and analysis for me. `SQlite` is only one of its use-cases. You can see a screenshot below. ![[droid-inst-02.png]] This makes it highly convenient to explore many thousands of records, even for `tired minds`. If you apply some graph theory to the problem you can also get a lot of interesting insights to focus your efforts in the decompiler. ![[droid-inst-03.png]] Finally, we live in `the age of the AIs`. I just want to recommend that, if you build tools like this, you spend some time thinking about API design. If you have a robust implementation that allows for targeted searchers, multi-layer filtering and special data transformations then you can multiply the value of your tools with integrations for remote (`MCP`) clients. Hopefully this has given you some ideas on how you can apply Frida to your Android research efforts or application assessments! ![[droid-inst-04.gif]]