Yuru Shao 2018-03-20T01:09:15+00:00 shaoyuru@gmail.com Mobile Advertising and Tracking Ecosystem 2017-12-28T00:00:00+00:00 Yuru Shao http://windflyer.github.io/2017/12/app-trackers I recently read a paper that has been accepted into NDSS 2018. This paper presents some interesting insights into the mobile advertising and tracking ecosystem and its stakeholders.

The authors developed automated methods to detect third-party advertising and tracking services (ATS) at the traffic level. User data were collected with Lumen privacy monitor, which is an Android app runs locally on the device and intercepts all network traffic over both WiFi and mobile network. It’s essentially a VPN proxy that sits between apps and network interfaces. The app itself doesn’t need root access, but it asks the user to grant it the VPN permission. In summary, with Lumen privacy monitor, the authors collected 8.5M flows from 14,599 apps to 40,533 unique fully-qualified domain names (FQDNs) with 13,454 unique second-level domains (SLDs). After classification, 2,121 ATSes (233 are previously unknown) and 730 ATS-capable services were identified. By characterizing ATS domains, the paper uncovers business relationships between service providers. 8/10 top organizations reserve the right to sell or share data with other organizations, while all of them reserve the right to share data with their subsidiaries.

Third-party domains

This paper makes a reasonable assumption that first-party domains are considered to be trusted by users when they install apps. Therefore, only third-party domains are in the scope of study. Two categories of third-party domains are defined:

  1. ATS domains: ones that belong to companies whose primary service is providing advertising and tracking services.
  2. ATS-capable domains (ATS-C): domains that collect tracking information, but whose primary service is not specifically providing ads and analytics to app developers.

An example of ATS-C is a map API for collecting location data and other information to provide area maps and directions. It doesn’t necessarily rely on tracking users for monetizing their service. However, ATS-Cs can possibly later share data with other parties.

Domain classification

The goal of domain classification is to identify third-party domains, then identifying ATS-related domains (i.e., ATSes and ATS-Cs).

Existing ATS blacklists and URL categorization services both have limitations. Rather than completely relying them, the authors use them to train, test, and curate their domain classifier and results. Their approach consists of three major steps: (1) identifying third-party domains, (2) classifying third-party domains to identify ATS domains (with manual validation of 200 domains), and (3) labeling domains that receive unique identifiers from user devices but were not identified in the previous step as ATS-C


The mobile ATS ecosystem

The paper reports that 292 parent organizations own nearly 2000 ATS and ATS-C domains. Alphabet (Google’s parent company) has penetration in over 73% of all measured apps with ownership of only 3.6% of all ATS/ATS-C domains. Domains belonging to the same organization are more likely to co-occur.

Application characteristics

The numbers of trackers: free apps with in-app purchases > free apps > paid apps. Apps with in-app purchases may have more aggressive monetization strategies.

Games and educational apps are the two categories with the highest number of ATS/ATS-C domains Cross-device tracking is widespread: 39% identified ATSes are present as third-parties in at least one of the Alexa top 1000 websites.

Regulatory challenge

ATS services in US have disproportionately higher access to ATS related data: 40% ATS servers are the end of 73% ATS-related flows. Flows of UIDs from nations in the European Union mostly go to US (89.27%) and China (4.02%). They are likely to be the most impacted by the upcoming regulations.

Xposed cannot find my module's class 2016-05-13T00:00:00+00:00 Yuru Shao http://windflyer.github.io/2016/05/Xposed-cannot-find-class If you’re pretty sure that you did everything right but Xposed still cannot load the class of your module, you need to check the apk file generated by Android Studio.

In my case, if my module’s apk has multiple dex inside, Xposed cannot find the class. After forcing Android Studio to output an apk having only one classes.dex, it works like a charm.

error only position independent executables (PIE) are supported 2016-03-14T00:00:00+00:00 Yuru Shao http://windflyer.github.io/2016/03/Android-PIE After pushing some binaries onto devices that run newer versions of Android, a frustrating error often occurs:

error: only position independent executables (PIE) are supported.

The error message is quite straightforward — the binary is not position independent. If you have the source code, take a look at this stack overflow answer and edit your Android.mk accordingly.

Things get more complicated if you only have the binary or it’s not feasbile to rebuild it with PIE support. Then this post can help. I stumbled across a workaround on a Chinese website. It’s very easy, even less painful than rebuilding the binary with PIE enabled.

Basically what you only need is a binary editor like 010 Editor. Use it open the binary file, count carefully to find the 17th byte, change the value 02 to 03, and that’s it!

$ xxd gdb | head -2
0000000: 7f45 4c46 0101 0100 0000 0000 0000 0000  .ELF............
0000010: 0200 2800 0100 0000 b06a 0100 3400 0000  ..(......j..4...

$ xxd gdb_pie | head -2
0000000: 7f45 4c46 0101 0100 0000 0000 0000 0000  .ELF............
0000010: 0300 2800 0100 0000 b06a 0100 3400 0000  ..(......j..4...

Save your changes and the modified binary should work well on your Android device.

Download Software from Dreamspark in Linux 2015-10-05T00:00:00+00:00 Yuru Shao http://windflyer.github.io/2015/10/Linux-SDM After making purchase, you’ll only be able to download a .sdx file. This file can be opened by SDM (Secure Download Manager), which is not available in Linux. Fortunately, we have an alternative: Linux SDM Downloader. Clone Linux SDM Downloader’s source code, enter its folder, and run:

$ python main.py ms-software.sdx

You’ll get a .sdc file and a .sdc.key file. Next you need another tool to decrypt the sdc file together with the key: xSDM

$ xsdm ms-software.sdc

Make sure that your sdc and sdc.key are located in the same folder.

New Features of Debian Apport 2015-07-03T00:00:00+00:00 Yuru Shao http://windflyer.github.io/2015/07/Debian-Apport-GSoC Apport intercepts program crashes, collects debugging information about the crash and runtime, and sends it to bug trackers. It also offers the user to report a bug about a package, with again collecting as much information about it as possible.

I’m working on the improvement of Apport in Debian this summer. So far, a few new features has been implemented.

  1. Ensure only one instance of apport-notifyd is running
  2. Invoke apport-retrace to install absent debug symbols
  3. Leverage system cache to save bandwith
  4. Integrate Apport with Debian BTS

1. One apport-notifyd Instance

Previously, apport-notifyd notification daemon does not have the ability to shutdown by itself. Thus if the user logs out of his desktop session and then logs back in, he will have more than one instances of the daemon. This will result in multiple apport popups when a crash occurs.

Now apport-notifyd can detect if there has already been a notification daemon running. If not it will start a new one, otherwise it quits silently.

2. Install Debug Symbols

Sometimes Apport fails to report a crash due to the lack of debugging symbols in the crashed package or its dependencies. When this happens, Apport will prompt a message dialog, telling the user to install absent debug symbols. We added an install button in the dialog, which can let the user install debug symbols immediately, without starting apport-retrace manually.

-> Fig 1. Click “Install” to install debug symbols <-

3. Leverage System Cache

apport-retrace downloads/installs the necessary packages and debug symbols. It used to download all required packages direclty from Debian servers.

The cache folder /var/cache/apt contains debian packages downloaded by apt-get. It’s very likely that packages apport-retrace wants to download have existed in /var/cache/apt. For such packages we don’t need to download them again – just make copies. This will save bandwith and shorten the waiting time.

4. Integrate Debian BTS

Debian has a bug tracking system (BTS) in which we file details of bugs reported by users and developers. Each bug is given a number, and is kept on file until it is marked as having been dealt with.

We’ve integrated Debian BTS into Apport. Before reporting a bug of the crashed package users can browse existing bug reports fetched from Debian BTS. They don’t need to report it again if the bug has alreadly been reported by someone else!

-> Fig 2. Click “Existing Reports” to view reported bugs <-

-> Fig 3. All existing bug reports <-

-> Fig 4. Display details of one bug report <-

Debian Font Configuration Warning 2015-04-01T00:00:00+00:00 Yuru Shao http://windflyer.github.io/2015/04/Debian-Fontconfig-Warning I’ve been learning to develop GUI applications for Debian with PyQt4 these days. When I ran the program in shell, two annoying warnings showed up.

Fontconfig warning: "/etc/fonts/conf.d/65-droid-sans-fonts.conf", line 103: Having multiple values in <test> isn't supported and may not work as expected
Fontconfig warning: "/etc/fonts/conf.d/65-droid-sans-fonts.conf", line 138: Having multiple values in <test> isn't supported and may not work as expected

As the warning messages hinted, to get rid of them I should convert multiple values in one <test> into multiple <test> tags, each of which only has one value. So I edited the file /etc/fonts/conf.d/65-droid-sans-fonts.conf.


<test name="lang">


<test name="lang">
<test name="lang">
<test name="lang">
<test name="lang">
Macbook安装SSD 2014-01-03T00:00:00+00:00 Yuru Shao http://windflyer.github.io/2014/01/macbook-ssd 最近从美国亚马逊海淘了一块SSD,Crucial M500 240G。加上运费共1250HKD,相比香港的价格(1400HKD左右)还是非常划算的。今天安装到了Macbook Pro (Mid 2012 13英寸)上,并完成了系统的迁移。


拆卸光驱比较麻烦,需要参考网上的拆机教程。SSD安装成功后重新启动系统,使用Disk Utility进行格式化,然后系统就会自动挂载它为第二块硬盘了。


1. 克隆系统

使用了Carbon Copy Cloner这个app,操作起来非常方便。


2. 设置默认启动盘

接下来,在System Preference - Startup Disk中选择默认启动盘为SSD,重启会发现系统的启动速度变快了。


Collaboration vs Cooperation 2013-12-23T00:00:00+00:00 Yuru Shao http://windflyer.github.io/2013/12/collaboration-vs-cooperation 这两个词都是“合作”的意思,有时候也可以互换使用。但是它们代表着两种根本不同的对group做出贡献的方式。

When collaborating, people work together (co-labor) on a single shared goal. When cooperating, people perform together (co-operate) while working on selfish yet common goals.



  1. http://cloudhead.headmine.net/post/3279118157/cooperation-vs-collaboration
JDWP线程的启动与Dalvik栈 2013-09-27T00:00:00+00:00 Yuru Shao http://windflyer.github.io/2013/09/jdwp-thread-and-dalvik-stack 1. JDWP线程

1.1 启动方式

Dalvik VM启动时的serversuspend这两个参数决定了jdwp线程的启动方式:

  1. server=n suspend=n 直接尝试去连接host:port,失败了就放弃连接。
  2. server=n suspend=y 同上,连接成功后会暂停VM的执行。
  3. server=y suspend=n 等待debugger的主动连接。
  4. server=y suspend=y 同上,debugger成功连接后会暂停VM执行。

zygote进程派生的app的jdwp线程进行采用上面第三种方式启动。所以当从DDM (Dalvik Debug Monitor)中查看线程状态时,jdwp线程的会显示为Runnable.

Status一栏表示线程的状态,守护线程的ID前面用星号(*)标注。可能的状态有 [1]:

  • running [?] - executing application code
  • sleeping - called Thread.sleep()
  • monitor - waiting to acquire a monitor lock
  • wait - in Object.wait()
  • native - executing native code
  • vmwait - waiting on a VM resource
  • zombie - thread is in the process of dying
  • init - thread is initializing (you shouldn’t see this)
  • starting - thread is about to start (you shouldn’t see this either)

[?] 文档中只有running而没有runnable状态,经过验证其实是一回事。


static void* jdwpThreadStart(void* arg) 
	 while (state->run) {
        if (state->params.server) {
             * Block forever, waiting for a connection.  To support the
             * "timeout=xxx" option we'll need to tweak this.
            if (!dvmJdwpAcceptConnection(state))
        } else {


static bool acceptConnection(JdwpState* state)
	// loop to wait connection
    do {
        sock = accept(netState->listenSock, &addr.addrPlain, &addrlen);
        if (sock < 0 && errno != EINTR) {
            // When we call shutdown() on the socket, accept() returns with
            // EINVAL.  Don't gripe about it.
            if (errno == EINVAL)
                LOGVV("accept failed: %s", strerror(errno));
                ALOGE("accept failed: %s", strerror(errno));
            return false;
    } while (sock < 0);


1.2 App如何防止自己被调试器attach


  1. 在主线程中加载so,然后在so中使用kill系统调用给jdwp线程发送TERM或者QUIT信号。 虽然kill接受的参数是进程的pid,但是传入线程的tid也没问题。查看当前进程的所有线程可以通过读取/proc/self/task目录的状态实现,目前还没发现可以通过Java代码准确获取tid的方法。不过经过测试,jwdp线程对QUIT信号没有反应,TERM信号会导致整个app进程被结束。

  2. 新开一个线程伪装成debugger,这样会使别的debugger就无法再连接jdwp线程。 此方法暂时还没测试,但是可能会需要root权限。

2. 通过调试能获得的数据


这个方法总共使用了5个寄存器,in2, in1, in0是方法的参数,占用了3个寄存器;局部变量占用了2个。这个结构与寄存器的p命名法和v命名法 [2]是相对应的。

breakSaveBlock,记录的是break frame的地址,它的作用是能在方法返回或者异常发生时,定位和追踪异常。所以我们在异常产生时,可以通过printStackTrace打印当前栈的结构,追踪异常产生的位置。下面这张图 [3]更加清晰地说明了这个过程。


要获取某个方法的寄存器数据,需要threadIDframeID以及寄存器的index,而且不需要调试信息 [4]。

Typically, this index can be determined for method arguments from the method signature without access to the local variable table information.

3. 条件断点



invoke-virtual {p1}, Landroid/view/MotionEvent;->getRawX()F

move-result v0

4. 参考资料

  1. http://code.google.com/p/smali/wiki/Registers
  2. http://myresearch-exe.blogspot.com/2010/10/threads-stack-management-in-dalvik-vm.html
  3. http://www.kandroid.org/guide/developing/tools/ddms.html
  4. http://docs.oracle.com/javase/1.5.0/docs/guide/jpda/jdwp/jdwp-protocol.html#JDWP_StackFrame
Android Breakpoints 2013-09-12T00:00:00+00:00 Yuru Shao http://windflyer.github.io/2013/09/android-breakpoints 本文主要试图回答一个问题:Dalvik VM是如何实现断点的?先对Android应用程序调试模型和JDWP协议进行了简要介绍,然后从源码分析的角度,说明了断点的实现过程。

1. Android App的调试模型

对Android应用的调试属于远程调试,被调试的app进程和调试器进程运行分别运行在不同的系统中。adb,如其名Android Debug Bridge,在调试的过程中扮演了“桥梁”的角色。下图说明了Android应用程序的调试模型。

运行在PC上的adb server和运行在Android设备/模拟器中的adbd守护进程通过USB或者无线网络建立连接,分别负责与Debugger和app的Dalvik VM进行通信。更准确地说,每个Dalvik VM中,都有一个jdwp线程负责处理调试器发来的执行调试命令,Debugger实际上是与jdwp线程进行通信。

一旦连接建立起来,Debugger与Dalvik VM通过“桥梁”进行调试数据的交换,adb server和adbd对他们来说就是透明的了。


同Java VM一样,Dalvik VM使用的也是[JDWP(Java Debug Wire Protocal)][jdwp]协议,通信数据以jdwp packet为单位。Debugger发送调试命令格式为Command Packet,格式为:

  • Header
    • length (4 bytes)
    • id (4 bytes)
    • flags (1 byte)
    • command set (1 byte)
    • command (1byte)
  • data (Variable)

VM返回的数据Reply Packet格式为:

  • Header
    • length (4bytes)
    • id (4 bytes)
    • flags (1 byte)
    • error code (2 bytes)
  • data (Variable)

Command Packet和Reply Packet都是由头部+数据组成的,头部大小固定,而数据大小是可变的。数据的类型与Command Packet的command set和command两个变量有关,具体可以参考[JDWP文档的协议部分][jdwp-details]。

JDWP中有一个特殊的命令集(Command Set),[EventRequest Command Set][er-cmd-set]:Debugger发送一个event request给VM,request中对事件进行了具体的描述;在VM运行过程中,当Debugger请求的事件发生时,VM会把事件相关的数据返回给Debugger。

3. Breakpoints

下面以断点为例 说明Dalvik VM的处理过程,从Debugger发送请求直到断点设置完成。

3.1 Event Request

首先,Debugger会发送一个event request,这个request的格式为:

对于断点来说,eventKind为BREAKPOINT,suspendPolicy指的是断点发生时程序暂停的方式(仅暂停当前线程还是暂停所有线程),modifiers为Modifier的个数——一个event request可以包含0个或者更多个Modifier。


 * Event modifiers.  A JdwpEvent may have zero or more of these.
union JdwpEventMod {
   	u1      modKind;                /* JdwpModKind */
   	struct {
       	u1          modKind;
       	JdwpLocation loc;
   	} locationOnly;


struct JdwpLocation {
	u1          typeTag;        /* class or interface? */
	RefTypeId   classId;        /* method->clazz */
	MethodId    methodId;       /* method in which "idx" resides */
	u8          idx;            /* relative index into code block */

实际上,在指定位置处下断点的一个event request,最终可以描述为:

  • eventKind: BREAKPOINT
  • suspendPolicy: SP_THREAD
  • modifiers: 1
  • JdwpLocation:
    • typeTag
    • classId
    • classId
    • idx

3.2 处理断点请求

Dalvik VM的jdwp线程对event request处理过程在platform_dalvik/vm/jdwp/JdwpHandler.cpp中的handleER_Set函数中完成:

static JdwpError handleER_Set(JdwpState* state,
   	const u1* buf, int dataLen, ExpandBuf* pReply)
   	const u1* origBuf = buf;

	// 读取顺序与前文中对event request的说明一致
   	u1 eventKind = read1(&buf);
   	u1 suspendPolicy = read1(&buf);
   	u4 modifierCount = read4BE(&buf);
   	// 初始化一个JdwpEvent,描述事件
   	JdwpEvent* pEvent = dvmJdwpEventAlloc(modifierCount);
   	pEvent->eventKind = static_cast(eventKind);
   	pEvent->suspendPolicy = static_cast(suspendPolicy);
   	pEvent->modCount = modifierCount;
   	// 按顺序读取modifiers
   	for (u4 idx = 0; idx < modifierCount; idx++) {
       	u1 modKind = read1(&buf);

       	pEvent->mods[idx].modKind = modKind;

       	switch (modKind) {
			case MK_LOCATION_ONLY: // 位置断点
               	JdwpLocation loc;
               	// 读取断点位置
               	jdwpReadLocation(&buf, &loc);
               	pEvent->mods[idx].locationOnly.loc = loc;
	// 注测event
   	JdwpError err = dvmJdwpRegisterEvent(state, pEvent);


JdwpError dvmJdwpRegisterEvent(JdwpState* state, JdwpEvent* pEvent)

    for (int i = 0; i < pEvent->modCount; i++) {
        const JdwpEventMod* pMod = &pEvent->mods[i];
        if (pMod->modKind == MK_LOCATION_ONLY) {
            // 告诉Dalvik VM“监视”相应位置
	// 将event插入事件链表中
	// 双向链表,头插法
    if (state->eventList != NULL) {
        pEvent->next = state->eventList;
        state->eventList->prev = pEvent;
    state->eventList = pEvent;
bool dvmDbgWatchLocation(const JdwpLocation* pLoc)
	// 从method ID构造Method对象
    Method* method = methodIdToMethod(pLoc->classId, pLoc->methodId);
    // 在method对象的idx偏移处添加断点
    dvmAddBreakAddr(method, pLoc->idx);
    return true;        /* assume success */
接下来代码到了Dalvik VM解释器部分的实现中,platform_dalvik/vm/interp/interp.cpp
void dvmAddBreakAddr(Method* method, unsigned int instrOffset)
	// 有一个全局的断点集合
    BreakpointSet* pSet = gDvm.breakpointSet;
    // 上一个函数中传进来的pLoc->idx改名为instrOffset
    dvmBreakpointSetAdd(pSet, method, instrOffset);
static bool dvmBreakpointSetAdd(BreakpointSet* pSet, Method* method,
    unsigned int instrOffset)
	// vector每次增长的大小为10
    const int kBreakpointGrowth = 10;	    	    
    // 计算断点的绝对地址,方法地址+偏移
    const u2* addr = method->insns + instrOffset;
    // 目标位置处是否已经存在断点
    int idx = dvmBreakpointSetFind(pSet, addr);
    Breakpoint* pBreak;

    if (idx < 0) { // 没有断点
        if (pSet->count == pSet->alloc) { // vector空间不够,开辟新空间
            int newSize = pSet->alloc + kBreakpointGrowth;
            Breakpoint* newVec;
            newVec = (Breakpoint*)realloc(pSet->breakpoints, 
            	newSize * sizeof(Breakpoint));
            if (newVec == NULL)
                return false;

            pSet->breakpoints = newVec;
            pSet->alloc = newSize;
		// 构造Breakpoint结构
		// Breakpoint结构体在当前文件定义
        pBreak = &pSet->breakpoints[pSet->count++];
        pBreak->method = method;
        pBreak->addr = (u2*)addr;
        pBreak->originalOpcode = *(u1*)addr;
        pBreak->setCount = 1;

        // 改opcode
        assert(*(u1*)addr != OP_BREAKPOINT);
        // VM不会执行未验证的(unverified)的代码
        // 所以如果类未被验证,不用插入断点
        if (dvmIsClassVerified(method->clazz)) {
            if (instructionIsMagicNop(addr)) { 
            	// If it's a "magic" NOP, indicating the 
            	// start of switch or array data in 
            	// the instruction stream, 
            	// we don't want to set a breakpoint.
            } else {
                ANDROID_MEMBAR_FULL(); // ??
				// 将addr处的opcode改为OP_BREAKPOINT,即0xec
                dvmDexChangeDex1(method->clazz->pDvmDex, (u1*)addr,
        } else {
            ALOGV("Class %s NOT verified, deferring breakpoint at %p",
                method->clazz->descriptor, addr);
    } else {
         * Breakpoint already exists, just increase the count.
        pBreak = &pSet->breakpoints[idx];

    return true;
bool dvmDexChangeDex1(DvmDex* pDvmDex, u1* addr, u1 newVal)
    if (*addr == newVal) { // 新旧值相同,不需要改
        ALOGV("+++ byte at %p is already 0x%02x", addr, newVal);
        return true;

    // 更改内存读写属性,默认情况下指令代码部分不可写
    // 第三个参数true表示改为可读可写
    if (sysChangeMapAccess(addr, 1, true, &pDvmDex->memMap) != 0) {
        ALOGD("NOTE: DEX page access change (->RW) failed");
        /* expected on files mounted from FAT; keep going (may crash) */
    *addr = newVal; // 新opcode的值写入addr

	// 改回属性,false表示只读
    if (sysChangeMapAccess(addr, 1, false, &pDvmDex->memMap) != 0) {
        ALOGD("NOTE: DEX page access change (->RO) failed");
        /* expected on files mounted from FAT; keep going */


    return true;
## 4. 总结 从上面的分析中可见,jdwp线程对断点的处理方式与OllyDbg中的int 3断点类似。然而,Android并不是一开始就采用这种处理方式。 在[Android 2.0 Dalvik部分的文档][android-2.0-debugger]中,有这样一段话: >Because Dalvik maps bytecode into memory read-only, some common techniques are difficult to implement without allocating additional memory. For example, suppose the debugger sets a breakpoint in a method. The quick way to handle this is to insert a breakpoint instruction directly into the code. When the instruction is reached, the breakpoint handler engages. Without this, it's necessary to perform an "is there a breakpoint here" scan. Even with some optimizations, the debug-enabled interpreter is much slower than the regular interpreter (perhaps 5x). > 后来,变成了: >Pre-Froyo implementations of the Dalvik VM used read-only memory mappings for all bytecode, which made it necessary to scan for breakpoints by comparing the program counter to a set of addresses. In Froyo this was changed to allow insertion of breakpoint opcodes. This allows the VM to execute code more quickly, and does away with the hardcoded limit of 20 breakpoints. > [jdwp]: http://docs.oracle.com/javase/6/docs/technotes/guides/jpda/jdwp-spec.html [jdwp-details]: http://docs.oracle.com/javase/6/docs/platform/jpda/jdwp/jdwp-protocol.html [er-cmd-set]: http://docs.oracle.com/javase/6/docs/platform/jpda/jdwp/jdwp-protocol.html#JDWP_EventRequest [android-2.0-debugger]: http://www.netmite.com/android/mydroid/2.0/dalvik/docs/debugger.html