SkyEye : SkyeyeDBCTen

Most recent edit on 2007-07-12 06:49:06 by FaiF

Additions:
An introduction to the internals of skyeye can be found at skyeye study notes (Chinese).
The acronym DBCT will be used in all the following text to stand for Dynamic Binary Code Translation.
The implementation of DBCT is not perfect. If you're interested in dynamic translation, I personally recommend reading the QEMU source code.

2. Abstract

The design of DBCT is influenced by QEMU( http://fabrice.bellard.free.fr/qemu/ ), but there are differences in the implementation. I will note the differences in the discussion of the each component.
DBCT combines several consecutive emulated instructions into a group (called a Translation Block, TB). According to its functionality, each instruction is directly translated into several micro instructions. (This is different than QEMU, which uses intermediate code during the translation process). Each micro instruction represents one operation and consists of several local instructions. Finally, we get a group of local instructions that corresponding the the TB, plus the return instruction at the end. To emulate, we make a function call to the beginning of this group of (local) instructions.
There are other types emulation methods. The most common method is to fetch one instruction, interpret it and perform its designated operations, and fetch the next instruction, and repeat. The normal instruction emulation mode in SkyEye is done this way.
There is also a method that translates the target hardware operations into languages such as C, and compiles it in order to emulate instructions.

3. Micro Instructions

3.1. Structure and Initialization of Micro Instructions.

In the function arm2x86_init, all the functions that are called before tb_insn_len_max_init are used to initialize the micro instructions.
In the DBCT code, each micro instruction is packaged in a op_table_t structure, which is defined in arm2x86.h. The op field in this structure points to the micro instruction, and the len field gives its length. The initialization of each micro instruction is done in a function named in the form of get_op_xxx. This function returns the address of the micro instruction, to be stored in op. The function's parameter pointer is for setting up len. These functions are called during micro instruction initialization.
This function uses two macros defined in arm2x86.h: OP_BEGIN and OP_END; both of them are X86 instructions. The code between these two macros implements the micro instruction:
#define OP_BEGIN(f) __asm__ __volatile__ ("jmp ."f"_teawater_op_end\n\t""."f"_teawater_op_begin:\n\t")
#define OP_END(f) __asm__ __volatile__ ("."f"_teawater_op_end:\n\t""movl $."f"_teawater_op_begin,%0\n\t""movl $."f"_teawater_op_end,%1\n\t":"=g"(begin), "=g"(end));

OP_BEGIN starts with a branch instruction that jumps to the symbol "."f"_teawater_op_end declared in OP_END. The purpose is to jump over the code between OP_BEGIN and OP_END to prevent them from being executed. Why don't I use the simple goto statement? The reason is if you use goto, the C compiler knows that the micro instruction's implementation code will not be executed, and will optimize it out. But if I use the in-line assembler code which the C compiler cannot understand, then the code between the two macros will be preserved. After that, there's a pseudo instruction that declares a symbol that marks the start of the micro instruction.
OP_END starts with a pseudo instruction that declares a symbol, which marks the end of the micro instruction; it's also the target of the branch instruction in OP_BEGIN. This is followed by two assignment instructions that aissign the begin- and end addresses of the micro instruction to the variables begin and end, which are declared at the beginning of the initialization function. This is how the initialization function obtains the begin- and end addresses of the micro instruction.
From the description of these two macros we can understand the process of generating the micro instructions -- we can calculate the length of the micro instruction from its begin- and end addresses. QEMU's process of generating micro instructions is different than DBCT: QEMC puts each micro instruction in its own function. After compilation, it uses a special procedure to gather the begin- and end addresses of the micro instruction.

3.2. Variables in Micro Instructions

There are no local variables in all implementation code of the micro instructions. Instead, register variables are used. Of course, when the registers are not enough, global (sic?) variables will be used. This method is similar to how QEMU uses variables in micro instructions.
I think this is because a single instruction is composed of several micro instructions (especially in the ARM architecture which performs several operations inside a single instruction). These micro instructions need to pass computed values between them. We could do this using a stack, but that would be relatively complicated. Also, frequent memory access will impact speed.
The declaration of these registers are declared in the header file arm2x86_self.h. Becuase register value declarations may impact the C compiler, only files related to DBCT include this header file.
The EPB register is declared to point to the global variable state, which is of the struct type ARMul_State and stores all information about CPU emulation in SkyEye. This way the micro instructions can easily reference this struct. EBX, ESI and EDI are declared to the variables T0, T1 and T2 of the type uint32_t. These 3 variables are frequently used by the micro instructions. Note that these registers are usually used to store things such as the stack pointer in the C calling convention, we need to save their values before executing the micro instructions (explained in detail below).
The other registers such as EAX are only local registers in GCC and cannot be declared to be global variables, so they are not used as variables in the micro instructions.

3.3. Calling Functions in Micro Instructions

Sometimes we need to call functions inside micro instructions. Because of the way that micro instructions are implemented, we cannot directly call regular functions and must go through a special process. The following is an example of how arm2x86.c:get_op_begin calls the function tea_begin:
First, we subtract 0xc from ESP to allocate space of size 0xc bytes in the stack. Then, EBP, which points to ARMul_State, is pushed to the stack. These two in-line assembler instructions pass st as a parameter to the function tea_begin. The 0xc bytes of space allocated before is to ensure that the parameter passing is aligned by 0x10 (0xc plus 32 bits equals to 0x10).
There's no need to preserve EBP, EBX, ESI and EDI, because these global registers are caller-save -- if the function that we call modifies these registers, it will save and restore these registers.
The next step is to assign the address of tea_begin to T2 before calling it (by its absolute address). Usually functions are called using relative branches. However, if the micro instructions are copied into the TB, its address will change and such relative branches will not work. Therefore, we must use absolute addresses to make the call. For similar reasons, these are some other functions that, in order to port to CYGWIN, use function pointers to make function calls.
Finally, if we need the return value, which is stored in the registers EAX, we would store it into a register variable such as T0.

3.4. Exception Handling in Micro Instructions

Usually emulators must emulate exceptions. This is also true for DBCT.
In DBCT, when an exception happens, we first set st->trap or state->trap to be the exception type (see TRAP_XXX in arm2x86.h). Then, we use the X86 ret instruction to return to the non-DBCT emulation mode to perform the actual exception processing. This way, we simplify the complexity of the micro instructions. Hence, TRAP_SETS_R15、TRAP_SET_CPSR and TRAP_SET_R15, etc, are also handled the same way.

3.5. Branches inside Micro Instructions

Here "branch" doesn't mean the emulated branch instructions. Rather, inside the micro instructions, we have branches whose lengths may need to be adjusted. For example, in an emulated instruction that has a condition check, you will need to skip the compiled micro instruction code if the specified condition does not match the PSR. Here, we need to know the length of the branch. In DBCT, we first write a line like the following: __asm__ __volatile__ ("jmp 0xffffffff");. Usually, the last 4 bytes contain the branch length. We just have to fill in the length at translation time. (There's more detail below when we describe the translation process).

3.6. Categories of Micro Instructions

3.6.1. Overview
Due to design issues, the categories of the micro instructions are not well organized. Here we categorize them according to their initialization functions.
3.6.2. arm2x86.c:op_init
The most important micro instructions we initialize here are op_begin and op_begin_test_T0. These two micro instructions are placed in front of each translated ARM instruction.
op_begin is used when the ARM instruction's condition is AL or NV (i.e., no condition check is needed). Here, we first call the function arm2x86.c:tea_begin, which calls arm2x86.c:tea_check_out, which (as we do in the normal emulation mode) checks if single stepping is needed, checks if there are any hardware exceptions, and checks if the current TB is dirty. (About dirty: If the micro instructions in the current TB modifies memory associated with the current TB, we need to return to mornal emulation mode to automatucally re-translate this TB). The last step, armio.c: io_do_cycle invokes all virtual devices. After the tea_begin function returns, we check its return value. If it's TRUE, we return.
The micro instruction op_begin_test_T0 is used if the translated ARM instruction needs a condition check. Before this instruction is executed, the condition of the translated instruction is already stored in T0. Here we use st and T0 as parameters to invoke arm2x86.c:tea_begin_test, which also uses tea_check_out to check if there are any exceptions which require a return to the normal execution mode. Otherwise, we call arm2x86_psr.h:gen_op_condition to check if the current translated instruction needs to be executed. After tea_begin_test returns, we first check if we need to return to normal execution mode due to exceptions. Then we check if the instruction would be skipped due to the condition, in which case we execute a jmp instruction (as described above)
There are a few other rather simple micro instructions so we'll not describe them here.
3.6.3. arm2x86_test.c:arm2x86_test_init
Here we initialize a few simple micro instructions that first make a condition check and then perform some operations.
3.6.4. arm2x86_shift.c:arm2x86_shift_init
Here we initialize various shift micro instructions.
If the shift count is variable, it's rather simple. The preceding micro instructions would have stored the shift count in a register, which can be use directly in this micro instruction. If the shift count is an immediate value, we handle it smilar to the branches inside micro instructions as described above. We first use a line similar to “T1 = T1 << 31;”, followed by a 8-bit shift count. During translation, we replace it with the actual immediate shift count.
3.6.5. arm2x86_psr.c:arm2x86_psr_init
Here we initialize micro instructions related to the ARM status registers (CPSR and SPSR).
3.6.6. arm2x86_movl.c:arm2x86_movl_init
Here we initialize the micro instructions used to assign values to the emulated registers or the register variables.
The handling of immediate values is similar to the branch micro instructions described above. We first use a statement like “T2 = ULONG_MAX;”. This way the last 32 bits will be the immediate value ULONG_MAX, which will be replaced to the actual immediate value during translation.
3.6.7. arm2x86_mul.c:arm2x86_mul_init
Here we initialize the micro instructions for emulating multiplication.
3.6.8. arm2x86_mem.c:arm2x86_mem_init
Here we initialize the micro instructions for emulating memory access operations.
To access memory, we call SkyEye's memory access functions and use the returned values for various operations. We do it this way instead of directly accessing memory. This is because if the MMU is emulated, we need to use the TLB and page table for address translation. Also, an address could be memory mapped IO. Therefore, it's much simpler to just call the memory access functions.
3.6.9. arm2x86_dp.c:arm2x86_dp_init
Here we initialize the micro instructions for emulating the ARM DP instructions.
3.6.10. arm2x86_coproc.c:arm2x86_coproc_init
Here we initialize the micro instructions for emulating the ARM co-processor instructions.
3.6.11. arm2x86_other.c:arm2x86_other_init
Here we initialize the other micro instructions.

4. Translation Block (TB)

4.1. Overview

DBCT translates emulated instructions of length tb.h:TB_LEN into a series of micro instructions (whose maximum length is tb.h:TB_INSN_LEN_MAX, which is obtained in tb.c:tb_insn_len_max_init). These micro instructions (stored in memory called TBP), together with other information such as addresses, are packaged into a TB. During DBCT initialization, we initialize the TB according to the configuration files and the current status. We will see more details in the description of tb_memory_init.

4.2. tb.h:struct tb_s

This is the core structure in a TB. Each TB has a corresponding tb_s struct.
The following is a description of each field:
struct list_head list;
When we use the second method (sic?) to use the TBs, all the TBs that are in use are put into the link list tb.c:tbp_dynamic_list. Each time a TB is used, it is moved to the end of the link list. When we need to execute code at an untranslated address, and there are no free TBs left, we need to pick a TB that's currently in use -- we always pick the first TB in the list. The advantage is that the least frequently used TB is the head of the list. Choosing it will have the smallest impact on performance. There are more details below when we discuss the translation process.
int ted;
A value 0 for this field means this TB does not contain translated data, 1 means otherwise. When we need to mark a TB as dirty, we set ted to 0.
uint8_t *insn_addr[TB_LEN / sizeof(uint8_t *)];
During translation, the starting micro instruction address of each translated instruction is stored in this array. When we execute a translated TB, we can obtain the corresponding micro instruction start address and start executing from there.
Originally, the DBCT did not store the addresses of the micro instructions. Rather, during execution, it reran the translation to obtain the addresses, perhaps saving the obtained addresses (translator: temporarily?). Later, I realized that even if I store all the addresses, it won't take too much space, so that's what we do now.
uint8_t *tbp;
This field points to the memory where the TB stres the micro instructions. Obviously, the size of the memory block is TB_LEN / sizeof (ARMword) * TB_INSN_LEN_MAX.
ARMword addr;
This field is the address of the emulated instruction that correspond to this TB.
ARMword tran_addr;
During translation, we do not translate the emulated instruction covered by a TB into micro instructions all at once. Instead, when we reach an unconditional return instruction (translator: backward branch instr??), and the actual target micro instruction address is already available (note that translation begins at the starting address of the TB), we stop translating. Later, when a requested address is higher than the address we have translated thus far, we will resume translation of this TB. The trans_addr field stores the address that immediately follows the the highest address that has been translated, i.e., the next address that would be translated.
uint8_t *tbp_now;
Points to the address where we can write the next micro instruction into. When tbt->ted is 0 (i.e, we just started translating this TB at tbt->addr) this field is initialized to be tbt->tbp, and is incremented each time a micro instruction is added. Also, when the translation is resumed, as described at tran_addr above, you can continue to use this field.
ARMword last_addr;
uint8_t *last_tbp;
These two fields are used during translation. last_addr stores the (emulated) address that was used when the TB was invoked the last time, and last_tbp stores the address of the corresponding micro instruction. This way, when the TB is invoked the next time with the same address as last_addr, we can quickly determine the micro instruction address using last_tbp. This improves performance.
ARMword ret_addr;
When we discussed tran_addr above, we mentioned that the translation stops when it reaches an unconditional return instruction. However, there's one more case that we have to consider -- the DBCT translates a emulated branch instruction into a regular branch micro instruction (not by modifying the PC register and then returning). When such a branch is in the forward direction, it could branch over the end of the currently translated code. This would be bad.
To prevent the premature ending of translation, we initialize ret_addr to 0 when translation starts. Whenever we translate a branch instruction whose target address is higher than ret_addr, we update ret_addr to this new address. Each time after an instruction is translated, we chech the value of ret_addr, and terminate translation only if ret_addr is lower than the next instruction to be translated.

4.3. TB_TBT_SIZE and TB_TBP_SIZE

tb.c contains TB_TBT_SIZE and TB_TBP_SIZE, which control the initialization of TB. Their definitions are:
#define TB_TBT_SIZE skyeye_config.tb_tbt_size
#define TB_TBP_SIZE skyeye_config.tb_tbp_size
TB_TBT_SIZE defines the total number of TBs inside the DBCT. I.e., the space taken up by the tb_t struct. When it is set to 0, it means the TBs will be allocated on demand and stored in the field armmem.h:mem_state_t->tbt in the data structure used by the emulated memory. I.e., whenever a block of memory is executed, its tb_t structure is allocated. If the value is non-zero, the TBs will be dynamically allocated using the memory in tb.c:tbt_table and tb.c:tbt_table_size.
TB_TBP_SIZE is the space in TB that actually stores the micro instrucions. When set to 0, the space will be allocated in the emulated memory structure armmem.h:mem_state_t->tbp. I.e., when a memory block is executed, tbp is allocated. When set to non-zero, and the tag tbp_dynamic is 1, that means TBP is dynamically allocated, using the memory in tb.c:tbp_begin, b.c:tbp_now and tbp_now_size.
skyeye_config.tb_tbt_size and skyeye_config.tb_tbp_size are read from the configuration file. Initially they are set to their default values in the function skyeye_options.c:skyeye_option_init. Here we can set that config-> tb_tbt_size (the same as TB_TBT_SIZE) is initialized to be 0, because the tb_t structure is small. config->tb_tbp_size (the same as TB_TBP_SIZE) is initialized to TB_TBP_DEFAULT(1024 * 1024 * 64). Because an emulated ARM instruction can expand to several micro instructions, the space needed to store the micro instructions for a TB could be rather large, sometimes exceeding 32-bit addressing range (translator: really?), so we usually don't set config->tb_tbp_size to 0.
Note that TB_TBT_SIZE and TB_TBP_SIZE are not used immediately after they are read from the configuration file. They are used after tb_memory_init has started initialization. A few related variables are also initialized in tb_memory_init.

4.4. tb.c:tb_memory_init

This function is used to initialize the TB in DBCT. The procedure is as follows:
Step 1, it checks if TB_TBT_SIZE is 0. If it is not 0, it runs some code related to TB_TBT_SIZE. (Note that here I made a rather big mistake. The struct should be used as tb_t, but by mistake, I used tb_cache_t here. This tb_cache_t is obsolete and should have been removed from the source code. The following discussion assumes that all occurrences of tb_cache_t have been replaced with tb_t). First we perform some basic processing and checking of TB_TBT_SIZE. After that, we compute the space needed for the tb_t of all the emulated memory, and compares this value against TB_TBT_SIZE: if TB_TBT_SIZE is larger, it means the DBCT does not need to dynamically allocate tb_t for the purpose of saving space. In this case, we set TB_TBT_SIZE to be 0 and use static allocation; if TB_TBT_SIZE is smaller, we initialize the tbt_table for storing the tb_t, as well as tbt_table_size, which is the number of the tb_t.
This way, the initialization of TB_TBT_SIZE is complete. We also configured tbt_table and tbt_table_size.
Step 2, if TB_TBP_SIZE is not 0, perform related basic processing and checking.
Step 3, check once again if TB_TBT_SIZE is 0, and process TB_TBP_SIZE.
If TB_TBT_SIZE is not 0, we first compute tmp_u64, the size of the tbp needed by TB_TBT_SIZE number of tb_t structs. If TB_TBP_SIZE is larger than tmp_u64, or if TB_TBP_SIZE is 0, we set TB_TBP_SIZE to this value. We do this because after TB_TBT_SIZE is dynamically allocated, TB cannot manage the micro instruction storage TBP that are outside of its management limit. If TB_TBP_SIZE is smaller than tmp_u64, we set tb.c:tbp_dynamic to be 1, which means the DBCT dynamically allocateds TBP.
If TB_TBT_SIZE is 0, we check if TB_TBP_SIZE is 0. If it is also 0, obviously there's no need for any further initialization, and tbp_dynamic assumes the default value of 0. When thd DBCT operates, it allocates mem_state_t->tbt and mem_state_t->tbp on-demand. If it's not 0, we first compute tmp_u64, which is the size of the TBP needed for all the emulated memory, and compare this value against TB_TBP_SIZE. If TB_TBP_SIZE is larger or equal to tmp_u64, it means there's no need for dynamic allocation, and we set TB_TBP_SIZE to 0. If TB_TBP_SIZE is smaller than tmp_u64, dynamic allocation is necessary, so we set tbp_dynamic to be 1.
Now the initialization of TB_TBP_SIZE is finished, and tbp_dynamic is configured accordingly.
Step 4, now that the value of TB_TBP_SIZE is determined, we allocate space for tbp_begin. Note that when we use mmap to to allocate space, we set the permission to be executable, i.e., PROT_EXEC. After that, we initialize tbp_now_size and tbp_now.
This way, tbp_begin, tbp_now_size and tbp_now, which are used for dynamically allocating TBP, are initialized.
In conclusion, the way that DBCT uses the variables for managing dynamic allocation of TBT and TBP has become a bit convoluted.

4.5 tb.c:tb_insn_len_max_init

This function initializes tb.c:tb_insn_len_max, which is also TB_INSN_LEN_MAX.
For all (ARM) instructions, it computes their lengths of the corresponding translated micro instruction. tb_insn_len_max is set to be the the length of the longest sequence.

5. Initialization function arm2x86.c:arm2x86_init

This is the initialization function for DBCT. It is alled by the function arminit.c:ARMul_Reset.
This function first calls the micro instruction initialization functions described above. It then calls the function tb_insn_len_max_init, and lastly, the function tb_memory_init.

6. Translation Process

6.1. armemu.c:ARMul_Emulate32_dbct

这是整个DBCT翻译执行的核心函数,类似普通指令执行方式的ARMul_Emulate32函数,也是在arminit.c:ARMul_DoProg和arminit.c:ARMul_DoInstr被调用。下面介绍执行过程:
第一步,给R15寄存器也就是PC寄存器的值增加一个指令长度INSN_SIZE,这是因为ARM的多级流水线PC寄存器对应用是非透明的,而在这个函数外面的函数都将R15当作当前PC值,所以在开始执行前先对R15寄存器进行设置。
第二步,设置state->trap为0。
第三步,调用函数tb.c:tb_find,在这个函数中根据参数提供的PC寄存器值,进行全部的分配TB以及指令翻译的工作,最后将跟PC对应的微指令地址返回。如果返回NULL则表示执行失败,设置state->trap为TRAP_INSN_ABORT也就是取指异常,跳转到后面对 state->trap进行处理的部分。
第四步,对将在微指令中作为变量的寄存器进行保存,保存的原因前面介绍过,因为这几个寄存器的值都是被调用函数来保存,所以在这里进行保存。
调用取得的指向微指令内存的指针gen_func。
返回后恢复几个寄存器的值。
第五步,在介绍微指令的时候,介绍过异常等特殊情况,都是先设置state->trap然后就返回,而这里就是实际对异常等进行处理的地方。这部分代码比较清晰,就是根据state->trap进行不同的处理,不作详细介绍。
第六步,判断是否还继续执行,或者函数返回。如果继续执行就返回到第二步。
第七步,state->Reg[15]减INSN_SIZE,恢复PC指向当前程序执行的地址,然后返回。
6.2.tb.c:tb_find
在这个函数中根据参数提供的PC寄存器值,进行全部的分配TB以及指令翻译的工作,最后将跟PC对应的微指令地址返回。下面介绍执行过程:
第一步,调用armmmu.c:mmu_v2p_dbct函数通过SKYEYE的MMU功能取得跟执行地址ADDR对应的被模拟物理地址 addr,如果失败则函数出错返回。然后通过TB_ALIGN取得跟TB_LEN长度对齐的地址align_addr,这个地址就是addr对应TB的地址。
第二步,检查align_addr是否和静态局部变量save_align_addr相同,如果相同表明前面已经对这个物理地址的TB进行过请求,已经取得了翻译前需要的各种指针,都存在静态局部变量中,所以跳过分配TB的代码直接执行指令翻译的代码。注意save_align_addr的初始值为0x1是为了保证不跟任何地址一样。
第三步,这里开始的就是对TB进行分配的代码,首先判断tbt_table_size是否为0来确定tb_t是否是动态分配的。
第四步,如果是动态分配,就会以哈希计算的方法从tbt_table中取出跟align_addr对应地址的tb_t。
比较tbt->addr和align_addr,如果tbt->addr跟align_addr不同表明其先前是其他地址的TB,就会进行一些清除过去记录的工作,设置tbt->ted为0,设置tbt->addr为align_addr。
然后就是取得tbt->tbp也就是TBP。如果tbt->tbp为NULL,则表明这个TB中的TBP没有分配或者已经被其他TB 使用,这时候需要调用tb.c:tb_get_tbp进行TBP的分配。如果tbt->tbp不为NULL,则TBP已经分配过,则按照前面在介绍 tbt->list那样,先将其从tbp_dynamic_list链表中删除掉。
第五步,如果不是动态分配,首先通过函数tb.c:tb_get_mbp取得align_addr对应模拟内存的mem_bank_t结构指针mbp。
检查结构中的state->mem.tbt[bank_num]是否为空,如果为空表明tbt和tbp未分配相应的空间,如果 tbp_dynamic为0表明是静态分配TBP,则将先给state->mem.tbp[bank_num]分配空间,然后给state- >mem.tbt[bank_num]分配空间。
分配好空间后设置TB结构。
在取得TB结构后检查tbt->tbp也就是TBP是否为空。如果为空就根据tbp_dynamic对其进行设置,动态分配跟前面一样使用 tb.c:tb_get_tbp函数,静态从state->mem.tbp[bank_num]中取得。如果不为空也跟前面一样判断 tbp_dynamic根据情况将TB结构从列表中删除。
现在,TB结构和其中的TBP都已经取得。
第六步,用取得的TB进行一些设置。
设置state->tb_now为刚取得的TB结构,其的作用是微指令在运行的时候可以访问当前运行的TB,比如在标记TB为脏之后,微指令可以马上判断出来然后退出。
设置为save_align_addr为align_addr,目的在第二步介绍过。
如果tbp_dynamic为真表明是动态TBP分配,将TB结构增加到tbp_dynamic_list链表的最后面,这么作的目的在介绍tbt->list已经介绍过。
第七步,现在开始的就是对被模拟指令进行翻译的代码。先判断tbt->ted的值来确定这个TB结构是否被翻译过。
第八步,如果这个TB结构已经翻译过。
先检查tbt->last_addr是否跟addr相同,如果相同就返回tbt->last_tbp。这里在前面介绍tbt->last_addr和tbt->last_tbp的已经介绍过了。
判断需要翻译的物理地址addr是否大于等于tbt->tran_addr,这个tbt->tran_addr在前面也介绍过。
如果addr小于tbt->tran_addr则表明TB中现有微指令代码已经可以满足addr的需要,直接从tbt->insn_addr取出跟addr对应的TBP地址作为返回值设置到ret就可以。
如果addr大于等于tbt->tran_addr则表明需要继续翻译,首先取得跟tbt->tran_addr地址对应的在被模拟内存块中指针real_begin_addr,以及和addr对应的在被模拟内存块中指针real_addr。然后就调用tb.c: tb_translate从给定的real_begin_addr开始的内存进行翻译。最后取得跟addr对应的微指令地址设置到ret。
第九步,如果这个TB结构还没有翻译过,就需要重新翻译。
也是首先取得跟tbt->tran_addr地址对应的在被模拟内存块中指针real_begin_addr,以及和addr对应的在被模拟内存块中指针real_addr。然后初始化tbt->tran_addr为align_addr,初始化tbt->tbp_now为 tbp,这两个成员变量在前面介绍过,这里就不再介绍。调用tb.c:tb_translate从给定的real_begin_addr开始的内存进行翻译。最后取得跟addr对应的微指令地址设置到ret。并且设置tbt->ted为1表明这个TB已经被翻译过。
现在返回值ret,也就是跟ADDR对应的微指令地址已经取得。
第十步,将addr和ret都设置到tbt->last_addr和tbt->last_tbp上,将ret返回。
6.3.tb.c:tb_get_tbp
这个函数用来对TBP进行动态分配。下面介绍执行过程:
第一步,判断tbp_now_size是否为0,前面介绍过tbp_now_size记录了可以分配的tbp的长度。
第二步,如果tbp_now_size不是0,表明还可以直接从tbp_now中分配TBP。
第三部,如果tbp_now_size是0,表明tbp_now中的空间已经分配光了,这时就要取tbp_dynamic_list的第一个TB 结构中的TBP,这是整个链表中最不常用的一个TB结构,原因见上面对tbt->list的介绍。在取完后要将被取走TBP的TB结构从链表中删除,同时标记其tbp为NULL还有ted为0。
6.4.tb.c:tb_translate
这个函数从指定参数tb_begin_addr开始的内存进行指令翻译,最后将跟addr对应的微指令地址返回。下面介绍执行过程:
第一步,用tb_begin_addr通过计算取得这个TB对应被模拟内存块结束的地址tb_end_addr。
第二步,初始化链表tb_branch_save_list,这个链表的作用是记录每个TB内跳转,因为在翻译的过程中后面的指令地址还不知道,无法计算跳转长度,所以在这里将要写入的地址以及要跳转到的地址等信息记录起来,待翻译结束后,再循环对连表中的每个跳转长度进行设置。
设置全局变量now_tbt为tbt,这个now_tbt是给每个翻译函数可以方便访问当前TB结构。
设置tbt->ret_addr为0,目的见前面对ret_addr的介绍。
第三步,下面开始循环翻译,每次都会检查tb_begin_addr是否小于tb_end_addr,如果是就翻译,如果不是就不再进行循环。在每次一条指令翻译结束最后,都会给tb_begin_addr增加ARMword的长度到下一条指令。下面开始介绍一条指令的翻译过程。
检查tb_begin_addr是否跟addr相同,如果相同表明将翻译的指令是跟addr相关的指令,设置返回值ret为tbt->tbp_now。
设置tbt->insn_addr,这样做的目的见前面对tbt->insn_addr的介绍。
以当前要翻译的指令*tb_begin_addr、当前写入微指令的指针tbt->tbp_now等为参数,对函数tb.c:translate_word进行调用,这个函数就是对某个指令进行翻译的函数,其会返回写入微指令的长度len。
给tbt->tbp_now增加len,跳过已经使用的伪指令存储空间。
给tbt->tran_addr加4,令其对应下一个指令。
最后就是前面介绍过的如果指令一定发生返回就中断翻译。这里先要提一下state->trap,前面提过其在微指令执行时候的作用是返回异常类型,其在指令翻译的时候的作用是标记前面翻译的指令是肯定返回。这里就可以看到在检查了state->trap的同时,还检查了ret以及 tbt->tran_addr是否大于tbt->ret_addr,这些工作的目的在前面都介绍过。如果确定可以停止翻译,就终端循环。
第四步,这时已经翻译指令结束,判断如果TB已经被全部翻译,也就是state->trap为0,在最后加上op_return微指令,让这个TB执行结束后返回。
第五步,现在将前面介绍过的tb_branch_save_list链表中的每个跳转结构依次取出,进行设置。
最后返回ret也就是跟addr对应的伪指令地址。
6.5.tb.c:translate_word
这是对参数中一个被模拟指令insn翻译成微指令并存储到参数tbp中最后将写入的微指令长度返回的函数。
这个函数本身结构比较大,而且主要都是指令翻译的工作,所以不做详细介绍了。


Deletions:
An introduction to the internals of skyeye can be found at [[http://www.linuxforum.net/forum/gshowthreaded.php?Cat=



Edited on 2007-07-04 20:58:12 by Id5Fia

Additions:
An introduction to the internals of skyeye can be found at [[http://www.linuxforum.net/forum/gshowthreaded.php?Cat=

Deletions:
An introduction to the internals of skyeye can be found at skyeye study notes (Chinese).
The acronym DBCT will be used in all the following text to stand for Dynamic Binary Code Translation.
The implementation of DBCT is not perfect. If you're interested in dynamic translation, I personally recommend reading the QEMU source code.

2. Abstract

The design of DBCT is influenced by QEMU( http://fabrice.bellard.free.fr/qemu/ ), but there are differences in the implementation. I will note the differences in the discussion of the each component.
DBCT combines several consecutive emulated instructions into a group (called a Translation Block, TB). According to its functionality, each instruction is directly translated into several micro instructions. (This is different than QEMU, which uses intermediate code during the translation process). Each micro instruction represents one operation and consists of several local instructions. Finally, we get a group of local instructions that corresponding the the TB, plus the return instruction at the end. To emulate, we make a function call to the beginning of this group of (local) instructions.
There are other types emulation methods. The most common method is to fetch one instruction, interpret it and perform its designated operations, and fetch the next instruction, and repeat. The normal instruction emulation mode in SkyEye is done this way.
There is also a method that translates the target hardware operations into languages such as C, and compiles it in order to emulate instructions.

3. Micro Instructions

3.1. Structure and Initialization of Micro Instructions.

In the function arm2x86_init, all the functions that are called before tb_insn_len_max_init are used to initialize the micro instructions.
In the DBCT code, each micro instruction is packaged in a op_table_t structure, which is defined in arm2x86.h. The op field in this structure points to the micro instruction, and the len field gives its length. The initialization of each micro instruction is done in a function named in the form of get_op_xxx. This function returns the address of the micro instruction, to be stored in op. The function's parameter pointer is for setting up len. These functions are called during micro instruction initialization.
This function uses two macros defined in arm2x86.h: OP_BEGIN and OP_END; both of them are X86 instructions. The code between these two macros implements the micro instruction:
#define OP_BEGIN(f) __asm__ __volatile__ ("jmp ."f"_teawater_op_end\n\t""."f"_teawater_op_begin:\n\t")
#define OP_END(f) __asm__ __volatile__ ("."f"_teawater_op_end:\n\t""movl $."f"_teawater_op_begin,%0\n\t""movl $."f"_teawater_op_end,%1\n\t":"=g"(begin), "=g"(end));

OP_BEGIN starts with a branch instruction that jumps to the symbol "."f"_teawater_op_end declared in OP_END. The purpose is to jump over the code between OP_BEGIN and OP_END to prevent them from being executed. Why don't I use the simple goto statement? The reason is if you use goto, the C compiler knows that the micro instruction's implementation code will not be executed, and will optimize it out. But if I use the in-line assembler code which the C compiler cannot understand, then the code between the two macros will be preserved. After that, there's a pseudo instruction that declares a symbol that marks the start of the micro instruction.
OP_END starts with a pseudo instruction that declares a symbol, which marks the end of the micro instruction; it's also the target of the branch instruction in OP_BEGIN. This is followed by two assignment instructions that aissign the begin- and end addresses of the micro instruction to the variables begin and end, which are declared at the beginning of the initialization function. This is how the initialization function obtains the begin- and end addresses of the micro instruction.
From the description of these two macros we can understand the process of generating the micro instructions -- we can calculate the length of the micro instruction from its begin- and end addresses. QEMU's process of generating micro instructions is different than DBCT: QEMC puts each micro instruction in its own function. After compilation, it uses a special procedure to gather the begin- and end addresses of the micro instruction.

3.2. Variables in Micro Instructions

There are no local variables in all implementation code of the micro instructions. Instead, register variables are used. Of course, when the registers are not enough, global (sic?) variables will be used. This method is similar to how QEMU uses variables in micro instructions.
I think this is because a single instruction is composed of several micro instructions (especially in the ARM architecture which performs several operations inside a single instruction). These micro instructions need to pass computed values between them. We could do this using a stack, but that would be relatively complicated. Also, frequent memory access will impact speed.
The declaration of these registers are declared in the header file arm2x86_self.h. Becuase register value declarations may impact the C compiler, only files related to DBCT include this header file.
The EPB register is declared to point to the global variable state, which is of the struct type ARMul_State and stores all information about CPU emulation in SkyEye. This way the micro instructions can easily reference this struct. EBX, ESI and EDI are declared to the variables T0, T1 and T2 of the type uint32_t. These 3 variables are frequently used by the micro instructions. Note that these registers are usually used to store things such as the stack pointer in the C calling convention, we need to save their values before executing the micro instructions (explained in detail below).
The other registers such as EAX are only local registers in GCC and cannot be declared to be global variables, so they are not used as variables in the micro instructions.

3.3. Calling Functions in Micro Instructions

Sometimes we need to call functions inside micro instructions. Because of the way that micro instructions are implemented, we cannot directly call regular functions and must go through a special process. The following is an example of how arm2x86.c:get_op_begin calls the function tea_begin:
First, we subtract 0xc from ESP to allocate space of size 0xc bytes in the stack. Then, EBP, which points to ARMul_State, is pushed to the stack. These two in-line assembler instructions pass st as a parameter to the function tea_begin. The 0xc bytes of space allocated before is to ensure that the parameter passing is aligned by 0x10 (0xc plus 32 bits equals to 0x10).
There's no need to preserve EBP, EBX, ESI and EDI, because these global registers are caller-save -- if the function that we call modifies these registers, it will save and restore these registers.
The next step is to assign the address of tea_begin to T2 before calling it (by its absolute address). Usually functions are called using relative branches. However, if the micro instructions are copied into the TB, its address will change and such relative branches will not work. Therefore, we must use absolute addresses to make the call. For similar reasons, these are some other functions that, in order to port to CYGWIN, use function pointers to make function calls.
Finally, if we need the return value, which is stored in the registers EAX, we would store it into a register variable such as T0.

3.4. Exception Handling in Micro Instructions

Usually emulators must emulate exceptions. This is also true for DBCT.
In DBCT, when an exception happens, we first set st->trap or state->trap to be the exception type (see TRAP_XXX in arm2x86.h). Then, we use the X86 ret instruction to return to the non-DBCT emulation mode to perform the actual exception processing. This way, we simplify the complexity of the micro instructions. Hence, TRAP_SETS_R15、TRAP_SET_CPSR and TRAP_SET_R15, etc, are also handled the same way.

3.5. Branches inside Micro Instructions

Here "branch" doesn't mean the emulated branch instructions. Rather, inside the micro instructions, we have branches whose lengths may need to be adjusted. For example, in an emulated instruction that has a condition check, you will need to skip the compiled micro instruction code if the specified condition does not match the PSR. Here, we need to know the length of the branch. In DBCT, we first write a line like the following: __asm__ __volatile__ ("jmp 0xffffffff");. Usually, the last 4 bytes contain the branch length. We just have to fill in the length at translation time. (There's more detail below when we describe the translation process).

3.6. Categories of Micro Instructions

3.6.1. Overview
Due to design issues, the categories of the micro instructions are not well organized. Here we categorize them according to their initialization functions.
3.6.2. arm2x86.c:op_init
The most important micro instructions we initialize here are op_begin and op_begin_test_T0. These two micro instructions are placed in front of each translated ARM instruction.
op_begin is used when the ARM instruction's condition is AL or NV (i.e., no condition check is needed). Here, we first call the function arm2x86.c:tea_begin, which calls arm2x86.c:tea_check_out, which (as we do in the normal emulation mode) checks if single stepping is needed, checks if there are any hardware exceptions, and checks if the current TB is dirty. (About dirty: If the micro instructions in the current TB modifies memory associated with the current TB, we need to return to mornal emulation mode to automatucally re-translate this TB). The last step, armio.c: io_do_cycle invokes all virtual devices. After the tea_begin function returns, we check its return value. If it's TRUE, we return.
The micro instruction op_begin_test_T0 is used if the translated ARM instruction needs a condition check. Before this instruction is executed, the condition of the translated instruction is already stored in T0. Here we use st and T0 as parameters to invoke arm2x86.c:tea_begin_test, which also uses tea_check_out to check if there are any exceptions which require a return to the normal execution mode. Otherwise, we call arm2x86_psr.h:gen_op_condition to check if the current translated instruction needs to be executed. After tea_begin_test returns, we first check if we need to return to normal execution mode due to exceptions. Then we check if the instruction would be skipped due to the condition, in which case we execute a jmp instruction (as described above)
There are a few other rather simple micro instructions so we'll not describe them here.
3.6.3. arm2x86_test.c:arm2x86_test_init
Here we initialize a few simple micro instructions that first make a condition check and then perform some operations.
3.6.4. arm2x86_shift.c:arm2x86_shift_init
Here we initialize various shift micro instructions.
If the shift count is variable, it's rather simple. The preceding micro instructions would have stored the shift count in a register, which can be use directly in this micro instruction. If the shift count is an immediate value, we handle it smilar to the branches inside micro instructions as described above. We first use a line similar to “T1 = T1 << 31;”, followed by a 8-bit shift count. During translation, we replace it with the actual immediate shift count.
3.6.5. arm2x86_psr.c:arm2x86_psr_init
Here we initialize micro instructions related to the ARM status registers (CPSR and SPSR).
3.6.6. arm2x86_movl.c:arm2x86_movl_init
Here we initialize the micro instructions used to assign values to the emulated registers or the register variables.
The handling of immediate values is similar to the branch micro instructions described above. We first use a statement like “T2 = ULONG_MAX;”. This way the last 32 bits will be the immediate value ULONG_MAX, which will be replaced to the actual immediate value during translation.
3.6.7. arm2x86_mul.c:arm2x86_mul_init
Here we initialize the micro instructions for emulating multiplication.
3.6.8. arm2x86_mem.c:arm2x86_mem_init
Here we initialize the micro instructions for emulating memory access operations.
To access memory, we call SkyEye's memory access functions and use the returned values for various operations. We do it this way instead of directly accessing memory. This is because if the MMU is emulated, we need to use the TLB and page table for address translation. Also, an address could be memory mapped IO. Therefore, it's much simpler to just call the memory access functions.
3.6.9. arm2x86_dp.c:arm2x86_dp_init
Here we initialize the micro instructions for emulating the ARM DP instructions.
3.6.10. arm2x86_coproc.c:arm2x86_coproc_init
Here we initialize the micro instructions for emulating the ARM co-processor instructions.
3.6.11. arm2x86_other.c:arm2x86_other_init
Here we initialize the other micro instructions.

4. Translation Block (TB)

4.1. Overview

DBCT translates emulated instructions of length tb.h:TB_LEN into a series of micro instructions (whose maximum length is tb.h:TB_INSN_LEN_MAX, which is obtained in tb.c:tb_insn_len_max_init). These micro instructions (stored in memory called TBP), together with other information such as addresses, are packaged into a TB. During DBCT initialization, we initialize the TB according to the configuration files and the current status. We will see more details in the description of tb_memory_init.

4.2. tb.h:struct tb_s

This is the core structure in a TB. Each TB has a corresponding tb_s struct.
The following is a description of each field:
struct list_head list;
When we use the second method (sic?) to use the TBs, all the TBs that are in use are put into the link list tb.c:tbp_dynamic_list. Each time a TB is used, it is moved to the end of the link list. When we need to execute code at an untranslated address, and there are no free TBs left, we need to pick a TB that's currently in use -- we always pick the first TB in the list. The advantage is that the least frequently used TB is the head of the list. Choosing it will have the smallest impact on performance. There are more details below when we discuss the translation process.
int ted;
A value 0 for this field means this TB does not contain translated data, 1 means otherwise. When we need to mark a TB as dirty, we set ted to 0.
uint8_t *insn_addr[TB_LEN / sizeof(uint8_t *)];
During translation, the starting micro instruction address of each translated instruction is stored in this array. When we execute a translated TB, we can obtain the corresponding micro instruction start address and start executing from there.
Originally, the DBCT did not store the addresses of the micro instructions. Rather, during execution, it reran the translation to obtain the addresses, perhaps saving the obtained addresses (translator: temporarily?). Later, I realized that even if I store all the addresses, it won't take too much space, so that's what we do now.
uint8_t *tbp;
This field points to the memory where the TB stres the micro instructions. Obviously, the size of the memory block is TB_LEN / sizeof (ARMword) * TB_INSN_LEN_MAX.
ARMword addr;
This field is the address of the emulated instruction that correspond to this TB.
ARMword tran_addr;
During translation, we do not translate the emulated instruction covered by a TB into micro instructions all at once. Instead, when we reach an unconditional return instruction (translator: backward branch instr??), and the actual target micro instruction address is already available (note that translation begins at the starting address of the TB), we stop translating. Later, when a requested address is higher than the address we have translated thus far, we will resume translation of this TB. The trans_addr field stores the address that immediately follows the the highest address that has been translated, i.e., the next address that would be translated.
uint8_t *tbp_now;
Points to the address where we can write the next micro instruction into. When tbt->ted is 0 (i.e, we just started translating this TB at tbt->addr) this field is initialized to be tbt->tbp, and is incremented each time a micro instruction is added. Also, when the translation is resumed, as described at tran_addr above, you can continue to use this field.
ARMword last_addr;
uint8_t *last_tbp;
These two fields are used during translation. last_addr stores the (emulated) address that was used when the TB was invoked the last time, and last_tbp stores the address of the corresponding micro instruction. This way, when the TB is invoked the next time with the same address as last_addr, we can quickly determine the micro instruction address using last_tbp. This improves performance.
ARMword ret_addr;
When we discussed tran_addr above, we mentioned that the translation stops when it reaches an unconditional return instruction. However, there's one more case that we have to consider -- the DBCT translates a emulated branch instruction into a regular branch micro instruction (not by modifying the PC register and then returning). When such a branch is in the forward direction, it could branch over the end of the currently translated code. This would be bad.
To prevent the premature ending of translation, we initialize ret_addr to 0 when translation starts. Whenever we translate a branch instruction whose target address is higher than ret_addr, we update ret_addr to this new address. Each time after an instruction is translated, we chech the value of ret_addr, and terminate translation only if ret_addr is lower than the next instruction to be translated.

4.3. TB_TBT_SIZE and TB_TBP_SIZE

tb.c contains TB_TBT_SIZE and TB_TBP_SIZE, which control the initialization of TB. Their definitions are:
#define TB_TBT_SIZE skyeye_config.tb_tbt_size
#define TB_TBP_SIZE skyeye_config.tb_tbp_size
TB_TBT_SIZE defines the total number of TBs inside the DBCT. I.e., the space taken up by the tb_t struct. When it is set to 0, it means the TBs will be allocated on demand and stored in the field armmem.h:mem_state_t->tbt in the data structure used by the emulated memory. I.e., whenever a block of memory is executed, its tb_t structure is allocated. If the value is non-zero, the TBs will be dynamically allocated using the memory in tb.c:tbt_table and tb.c:tbt_table_size.
TB_TBP_SIZE is the space in TB that actually stores the micro instrucions. When set to 0, the space will be allocated in the emulated memory structure armmem.h:mem_state_t->tbp. I.e., when a memory block is executed, tbp is allocated. When set to non-zero, and the tag tbp_dynamic is 1, that means TBP is dynamically allocated, using the memory in tb.c:tbp_begin, b.c:tbp_now and tbp_now_size.
skyeye_config.tb_tbt_size and skyeye_config.tb_tbp_size are read from the configuration file. Initially they are set to their default values in the function skyeye_options.c:skyeye_option_init. Here we can set that config-> tb_tbt_size (the same as TB_TBT_SIZE) is initialized to be 0, because the tb_t structure is small. config->tb_tbp_size (the same as TB_TBP_SIZE) is initialized to TB_TBP_DEFAULT(1024 * 1024 * 64). Because an emulated ARM instruction can expand to several micro instructions, the space needed to store the micro instructions for a TB could be rather large, sometimes exceeding 32-bit addressing range (translator: really?), so we usually don't set config->tb_tbp_size to 0.
Note that TB_TBT_SIZE and TB_TBP_SIZE are not used immediately after they are read from the configuration file. They are used after tb_memory_init has started initialization. A few related variables are also initialized in tb_memory_init.

4.4. tb.c:tb_memory_init

This function is used to initialize the TB in DBCT. The procedure is as follows:
Step 1, it checks if TB_TBT_SIZE is 0. If it is not 0, it runs some code related to TB_TBT_SIZE. (Note that here I made a rather big mistake. The struct should be used as tb_t, but by mistake, I used tb_cache_t here. This tb_cache_t is obsolete and should have been removed from the source code. The following discussion assumes that all occurrences of tb_cache_t have been replaced with tb_t). First we perform some basic processing and checking of TB_TBT_SIZE. After that, we compute the space needed for the tb_t of all the emulated memory, and compares this value against TB_TBT_SIZE: if TB_TBT_SIZE is larger, it means the DBCT does not need to dynamically allocate tb_t for the purpose of saving space. In this case, we set TB_TBT_SIZE to be 0 and use static allocation; if TB_TBT_SIZE is smaller, we initialize the tbt_table for storing the tb_t, as well as tbt_table_size, which is the number of the tb_t.
This way, the initialization of TB_TBT_SIZE is complete. We also configured tbt_table and tbt_table_size.
Step 2, if TB_TBP_SIZE is not 0, perform related basic processing and checking.
Step 3, check once again if TB_TBT_SIZE is 0, and process TB_TBP_SIZE.
If TB_TBT_SIZE is not 0, we first compute tmp_u64, the size of the tbp needed by TB_TBT_SIZE number of tb_t structs. If TB_TBP_SIZE is larger than tmp_u64, or if TB_TBP_SIZE is 0, we set TB_TBP_SIZE to this value. We do this because after TB_TBT_SIZE is dynamically allocated, TB cannot manage the micro instruction storage TBP that are outside of its management limit. If TB_TBP_SIZE is smaller than tmp_u64, we set tb.c:tbp_dynamic to be 1, which means the DBCT dynamically allocateds TBP.
If TB_TBT_SIZE is 0, we check if TB_TBP_SIZE is 0. If it is also 0, obviously there's no need for any further initialization, and tbp_dynamic assumes the default value of 0. When thd DBCT operates, it allocates mem_state_t->tbt and mem_state_t->tbp on-demand. If it's not 0, we first compute tmp_u64, which is the size of the TBP needed for all the emulated memory, and compare this value against TB_TBP_SIZE. If TB_TBP_SIZE is larger or equal to tmp_u64, it means there's no need for dynamic allocation, and we set TB_TBP_SIZE to 0. If TB_TBP_SIZE is smaller than tmp_u64, dynamic allocation is necessary, so we set tbp_dynamic to be 1.
Now the initialization of TB_TBP_SIZE is finished, and tbp_dynamic is configured accordingly.
Step 4, now that the value of TB_TBP_SIZE is determined, we allocate space for tbp_begin. Note that when we use mmap to to allocate space, we set the permission to be executable, i.e., PROT_EXEC. After that, we initialize tbp_now_size and tbp_now.
This way, tbp_begin, tbp_now_size and tbp_now, which are used for dynamically allocating TBP, are initialized.
In conclusion, the way that DBCT uses the variables for managing dynamic allocation of TBT and TBP has become a bit convoluted.

4.5 tb.c:tb_insn_len_max_init

This function initializes tb.c:tb_insn_len_max, which is also TB_INSN_LEN_MAX.
For all (ARM) instructions, it computes their lengths of the corresponding translated micro instruction. tb_insn_len_max is set to be the the length of the longest sequence.

5. Initialization function arm2x86.c:arm2x86_init

This is the initialization function for DBCT. It is alled by the function arminit.c:ARMul_Reset.
This function first calls the micro instruction initialization functions described above. It then calls the function tb_insn_len_max_init, and lastly, the function tb_memory_init.

6. Translation Process

6.1. armemu.c:ARMul_Emulate32_dbct

这是整个DBCT翻译执行的核心函数,类似普通指令执行方式的ARMul_Emulate32函数,也是在arminit.c:ARMul_DoProg和arminit.c:ARMul_DoInstr被调用。下面介绍执行过程:
第一步,给R15寄存器也就是PC寄存器的值增加一个指令长度INSN_SIZE,这是因为ARM的多级流水线PC寄存器对应用是非透明的,而在这个函数外面的函数都将R15当作当前PC值,所以在开始执行前先对R15寄存器进行设置。
第二步,设置state->trap为0。
第三步,调用函数tb.c:tb_find,在这个函数中根据参数提供的PC寄存器值,进行全部的分配TB以及指令翻译的工作,最后将跟PC对应的微指令地址返回。如果返回NULL则表示执行失败,设置state->trap为TRAP_INSN_ABORT也就是取指异常,跳转到后面对 state->trap进行处理的部分。
第四步,对将在微指令中作为变量的寄存器进行保存,保存的原因前面介绍过,因为这几个寄存器的值都是被调用函数来保存,所以在这里进行保存。
调用取得的指向微指令内存的指针gen_func。
返回后恢复几个寄存器的值。
第五步,在介绍微指令的时候,介绍过异常等特殊情况,都是先设置state->trap然后就返回,而这里就是实际对异常等进行处理的地方。这部分代码比较清晰,就是根据state->trap进行不同的处理,不作详细介绍。
第六步,判断是否还继续执行,或者函数返回。如果继续执行就返回到第二步。
第七步,state->Reg[15]减INSN_SIZE,恢复PC指向当前程序执行的地址,然后返回。
6.2.tb.c:tb_find
在这个函数中根据参数提供的PC寄存器值,进行全部的分配TB以及指令翻译的工作,最后将跟PC对应的微指令地址返回。下面介绍执行过程:
第一步,调用armmmu.c:mmu_v2p_dbct函数通过SKYEYE的MMU功能取得跟执行地址ADDR对应的被模拟物理地址 addr,如果失败则函数出错返回。然后通过TB_ALIGN取得跟TB_LEN长度对齐的地址align_addr,这个地址就是addr对应TB的地址。
第二步,检查align_addr是否和静态局部变量save_align_addr相同,如果相同表明前面已经对这个物理地址的TB进行过请求,已经取得了翻译前需要的各种指针,都存在静态局部变量中,所以跳过分配TB的代码直接执行指令翻译的代码。注意save_align_addr的初始值为0x1是为了保证不跟任何地址一样。
第三步,这里开始的就是对TB进行分配的代码,首先判断tbt_table_size是否为0来确定tb_t是否是动态分配的。
第四步,如果是动态分配,就会以哈希计算的方法从tbt_table中取出跟align_addr对应地址的tb_t。
比较tbt->addr和align_addr,如果tbt->addr跟align_addr不同表明其先前是其他地址的TB,就会进行一些清除过去记录的工作,设置tbt->ted为0,设置tbt->addr为align_addr。
然后就是取得tbt->tbp也就是TBP。如果tbt->tbp为NULL,则表明这个TB中的TBP没有分配或者已经被其他TB 使用,这时候需要调用tb.c:tb_get_tbp进行TBP的分配。如果tbt->tbp不为NULL,则TBP已经分配过,则按照前面在介绍 tbt->list那样,先将其从tbp_dynamic_list链表中删除掉。
第五步,如果不是动态分配,首先通过函数tb.c:tb_get_mbp取得align_addr对应模拟内存的mem_bank_t结构指针mbp。
检查结构中的state->mem.tbt[bank_num]是否为空,如果为空表明tbt和tbp未分配相应的空间,如果 tbp_dynamic为0表明是静态分配TBP,则将先给state->mem.tbp[bank_num]分配空间,然后给state- >mem.tbt[bank_num]分配空间。
分配好空间后设置TB结构。
在取得TB结构后检查tbt->tbp也就是TBP是否为空。如果为空就根据tbp_dynamic对其进行设置,动态分配跟前面一样使用 tb.c:tb_get_tbp函数,静态从state->mem.tbp[bank_num]中取得。如果不为空也跟前面一样判断 tbp_dynamic根据情况将TB结构从列表中删除。
现在,TB结构和其中的TBP都已经取得。
第六步,用取得的TB进行一些设置。
设置state->tb_now为刚取得的TB结构,其的作用是微指令在运行的时候可以访问当前运行的TB,比如在标记TB为脏之后,微指令可以马上判断出来然后退出。
设置为save_align_addr为align_addr,目的在第二步介绍过。
如果tbp_dynamic为真表明是动态TBP分配,将TB结构增加到tbp_dynamic_list链表的最后面,这么作的目的在介绍tbt->list已经介绍过。
第七步,现在开始的就是对被模拟指令进行翻译的代码。先判断tbt->ted的值来确定这个TB结构是否被翻译过。
第八步,如果这个TB结构已经翻译过。
先检查tbt->last_addr是否跟addr相同,如果相同就返回tbt->last_tbp。这里在前面介绍tbt->last_addr和tbt->last_tbp的已经介绍过了。
判断需要翻译的物理地址addr是否大于等于tbt->tran_addr,这个tbt->tran_addr在前面也介绍过。
如果addr小于tbt->tran_addr则表明TB中现有微指令代码已经可以满足addr的需要,直接从tbt->insn_addr取出跟addr对应的TBP地址作为返回值设置到ret就可以。
如果addr大于等于tbt->tran_addr则表明需要继续翻译,首先取得跟tbt->tran_addr地址对应的在被模拟内存块中指针real_begin_addr,以及和addr对应的在被模拟内存块中指针real_addr。然后就调用tb.c: tb_translate从给定的real_begin_addr开始的内存进行翻译。最后取得跟addr对应的微指令地址设置到ret。
第九步,如果这个TB结构还没有翻译过,就需要重新翻译。
也是首先取得跟tbt->tran_addr地址对应的在被模拟内存块中指针real_begin_addr,以及和addr对应的在被模拟内存块中指针real_addr。然后初始化tbt->tran_addr为align_addr,初始化tbt->tbp_now为 tbp,这两个成员变量在前面介绍过,这里就不再介绍。调用tb.c:tb_translate从给定的real_begin_addr开始的内存进行翻译。最后取得跟addr对应的微指令地址设置到ret。并且设置tbt->ted为1表明这个TB已经被翻译过。
现在返回值ret,也就是跟ADDR对应的微指令地址已经取得。
第十步,将addr和ret都设置到tbt->last_addr和tbt->last_tbp上,将ret返回。
6.3.tb.c:tb_get_tbp
这个函数用来对TBP进行动态分配。下面介绍执行过程:
第一步,判断tbp_now_size是否为0,前面介绍过tbp_now_size记录了可以分配的tbp的长度。
第二步,如果tbp_now_size不是0,表明还可以直接从tbp_now中分配TBP。
第三部,如果tbp_now_size是0,表明tbp_now中的空间已经分配光了,这时就要取tbp_dynamic_list的第一个TB 结构中的TBP,这是整个链表中最不常用的一个TB结构,原因见上面对tbt->list的介绍。在取完后要将被取走TBP的TB结构从链表中删除,同时标记其tbp为NULL还有ted为0。
6.4.tb.c:tb_translate
这个函数从指定参数tb_begin_addr开始的内存进行指令翻译,最后将跟addr对应的微指令地址返回。下面介绍执行过程:
第一步,用tb_begin_addr通过计算取得这个TB对应被模拟内存块结束的地址tb_end_addr。
第二步,初始化链表tb_branch_save_list,这个链表的作用是记录每个TB内跳转,因为在翻译的过程中后面的指令地址还不知道,无法计算跳转长度,所以在这里将要写入的地址以及要跳转到的地址等信息记录起来,待翻译结束后,再循环对连表中的每个跳转长度进行设置。
设置全局变量now_tbt为tbt,这个now_tbt是给每个翻译函数可以方便访问当前TB结构。
设置tbt->ret_addr为0,目的见前面对ret_addr的介绍。
第三步,下面开始循环翻译,每次都会检查tb_begin_addr是否小于tb_end_addr,如果是就翻译,如果不是就不再进行循环。在每次一条指令翻译结束最后,都会给tb_begin_addr增加ARMword的长度到下一条指令。下面开始介绍一条指令的翻译过程。
检查tb_begin_addr是否跟addr相同,如果相同表明将翻译的指令是跟addr相关的指令,设置返回值ret为tbt->tbp_now。
设置tbt->insn_addr,这样做的目的见前面对tbt->insn_addr的介绍。
以当前要翻译的指令*tb_begin_addr、当前写入微指令的指针tbt->tbp_now等为参数,对函数tb.c:translate_word进行调用,这个函数就是对某个指令进行翻译的函数,其会返回写入微指令的长度len。
给tbt->tbp_now增加len,跳过已经使用的伪指令存储空间。
给tbt->tran_addr加4,令其对应下一个指令。
最后就是前面介绍过的如果指令一定发生返回就中断翻译。这里先要提一下state->trap,前面提过其在微指令执行时候的作用是返回异常类型,其在指令翻译的时候的作用是标记前面翻译的指令是肯定返回。这里就可以看到在检查了state->trap的同时,还检查了ret以及 tbt->tran_addr是否大于tbt->ret_addr,这些工作的目的在前面都介绍过。如果确定可以停止翻译,就终端循环。
第四步,这时已经翻译指令结束,判断如果TB已经被全部翻译,也就是state->trap为0,在最后加上op_return微指令,让这个TB执行结束后返回。
第五步,现在将前面介绍过的tb_branch_save_list链表中的每个跳转结构依次取出,进行设置。
最后返回ret也就是跟addr对应的伪指令地址。
6.5.tb.c:translate_word
这是对参数中一个被模拟指令insn翻译成微指令并存储到参数tbp中最后将写入的微指令长度返回的函数。
这个函数本身结构比较大,而且主要都是指令翻译的工作,所以不做详细介绍了。




Edited on 2006-09-01 16:05:51 by TomeiNingen

Additions:
Step 1, it checks if TB_TBT_SIZE is 0. If it is not 0, it runs some code related to TB_TBT_SIZE. (Note that here I made a rather big mistake. The struct should be used as tb_t, but by mistake, I used tb_cache_t here. This tb_cache_t is obsolete and should have been removed from the source code. The following discussion assumes that all occurrences of tb_cache_t have been replaced with tb_t). First we perform some basic processing and checking of TB_TBT_SIZE. After that, we compute the space needed for the tb_t of all the emulated memory, and compares this value against TB_TBT_SIZE: if TB_TBT_SIZE is larger, it means the DBCT does not need to dynamically allocate tb_t for the purpose of saving space. In this case, we set TB_TBT_SIZE to be 0 and use static allocation; if TB_TBT_SIZE is smaller, we initialize the tbt_table for storing the tb_t, as well as tbt_table_size, which is the number of the tb_t.
This way, the initialization of TB_TBT_SIZE is complete. We also configured tbt_table and tbt_table_size.
Step 2, if TB_TBP_SIZE is not 0, perform related basic processing and checking.
Step 3, check once again if TB_TBT_SIZE is 0, and process TB_TBP_SIZE.
If TB_TBT_SIZE is not 0, we first compute tmp_u64, the size of the tbp needed by TB_TBT_SIZE number of tb_t structs. If TB_TBP_SIZE is larger than tmp_u64, or if TB_TBP_SIZE is 0, we set TB_TBP_SIZE to this value. We do this because after TB_TBT_SIZE is dynamically allocated, TB cannot manage the micro instruction storage TBP that are outside of its management limit. If TB_TBP_SIZE is smaller than tmp_u64, we set tb.c:tbp_dynamic to be 1, which means the DBCT dynamically allocateds TBP.
If TB_TBT_SIZE is 0, we check if TB_TBP_SIZE is 0. If it is also 0, obviously there's no need for any further initialization, and tbp_dynamic assumes the default value of 0. When thd DBCT operates, it allocates mem_state_t->tbt and mem_state_t->tbp on-demand. If it's not 0, we first compute tmp_u64, which is the size of the TBP needed for all the emulated memory, and compare this value against TB_TBP_SIZE. If TB_TBP_SIZE is larger or equal to tmp_u64, it means there's no need for dynamic allocation, and we set TB_TBP_SIZE to 0. If TB_TBP_SIZE is smaller than tmp_u64, dynamic allocation is necessary, so we set tbp_dynamic to be 1.
Now the initialization of TB_TBP_SIZE is finished, and tbp_dynamic is configured accordingly.
Step 4, now that the value of TB_TBP_SIZE is determined, we allocate space for tbp_begin. Note that when we use mmap to to allocate space, we set the permission to be executable, i.e., PROT_EXEC. After that, we initialize tbp_now_size and tbp_now.
This way, tbp_begin, tbp_now_size and tbp_now, which are used for dynamically allocating TBP, are initialized.
In conclusion, the way that DBCT uses the variables for managing dynamic allocation of TBT and TBP has become a bit convoluted.

4.5 tb.c:tb_insn_len_max_init

This function initializes tb.c:tb_insn_len_max, which is also TB_INSN_LEN_MAX.
For all (ARM) instructions, it computes their lengths of the corresponding translated micro instruction. tb_insn_len_max is set to be the the length of the longest sequence.

5. Initialization function arm2x86.c:arm2x86_init

This is the initialization function for DBCT. It is alled by the function arminit.c:ARMul_Reset.
This function first calls the micro instruction initialization functions described above. It then calls the function tb_insn_len_max_init, and lastly, the function tb_memory_init.


Deletions:
第一步,先判断TB_TBT_SIZE是否为0,如果不为0,就会执行一部分针对TB_TBT_SIZE的代码。注意这里我犯了一个比较大的错误,其中的结构应该使用tb_t,而我这里全部错误的使用了tb_cache_t,而这个tb_cache_t也是一个不再需要的东西,早应该从代码中去掉,下面的介绍全都假定成tb_cache_t已经被换成了tb_t。先对TB_TBT_SIZE进行基本的处理和检查,然后取得所有被模拟内存一共需要 tb_t所占的空间,其跟TB_TBT_SIZE进行比较。如果TB_TBT_SIZE大于等于这个值,则表明DBCT不需要动态分配tb_t来节省空间,就设置TB_TBT_SIZE为0,使用固定分配的形势。如果TB_TBT_SIZE小于这个值,则初始化存储tb_t的空间tbt_table和 tb_t的数量tbt_table_size。
这样TB_TBT_SIZE就初始化完成,同时还对tbt_table和tbt_table_size进行了设置。
第二步,如果TB_TBP_SIZE为非0,则对其进行基本的处理和检查。
第三步,再次判断TB_TBT_SIZE是否为0,然后对TB_TBP_SIZE进行处理。
如果TB_TBT_SIZE不为0,首先取得这个长度的tb_t结构组需要的tbp的长度tmp_u64,跟TB_TBP_SIZE进行比较。如果TB_TBP_SIZE大于tmp_u64或者TB_TBP_SIZE为0,则TB_TBP_SIZE设置为这个值,这么作因为在 TB_TBT_SIZE动态分配后,TB无法对大于其管理范围的微指令内存TBP进行管理,所以进行这个设置。如果TB_TBP_SIZE小于 tmp_u64,则设置tb.c:tbp_dynamic为1,也就是设置DBCT中TBP为动态分配。
如果TB_TBT_SIZE为0,将判断TB_TBP_SIZE是否为0。如果为0很显然不需要再作任何初始化工作,tbp_dynamic使用默认值0,全部在DBCT运行时根据需要分配在mem_state_t->tbt和mem_state_t->tbp上就可以。如果不为0 则先取得全部被模拟内存需要的TBP的长度tmp_u64,然后跟TB_TBP_SIZE进行比较。如果TB_TBP_SIZE大于等于tmp_u64,则表明TBP已经不需要动态分配,就设置TB_TBP_SIZE为0。如果TB_TBP_SIZE小于tmp_u64,则表明需要动态分配,设置 tbp_dynamic为1。
这样TB_TBP_SIZE就初始化完成,同时也根据需要对tbp_dynamic进行了设置。
第四步,这时TB_TBP_SIZE的值已经得到了确定,这里就是给tbp_begin分配空间,注意这里用mmap分配内存的时候设置了权限为可运行PROT_EXEC。然后对tbp_now_size和tbp_now也进行了初始化。
这样用来进行TBP动态分配的tbp_begin、tbp_now_size和tbp_now进行了初始化。
总结一下,DBCT中用来维护TBT和TBP动态分配的几个变量用的有点繁琐了。
4.5.tb.c:tb_insn_len_max_init
这个函数用来对tb.c:tb_insn_len_max也就是TB_INSN_LEN_MAX进行了初始化。
做法是将所有被翻译指令被翻译成的微指令长度都取得,然后进行比较,将最长的设置为tb_insn_len_max。
5.初始化函数arm2x86.c:arm2x86_init
这个函数是DBCT的初始化函数,其在函数arminit.c:ARMul_Reset中被调用。
这个函数先会调用前面介绍过的几个微指令初始化函数,然后是函数tb_insn_len_max_init,最后是函数tb_memory_init。




Edited on 2006-08-12 21:00:31 by TomeiNingen [More translations]

Additions:
When we discussed tran_addr above, we mentioned that the translation stops when it reaches an unconditional return instruction. However, there's one more case that we have to consider -- the DBCT translates a emulated branch instruction into a regular branch micro instruction (not by modifying the PC register and then returning). When such a branch is in the forward direction, it could branch over the end of the currently translated code. This would be bad.
To prevent the premature ending of translation, we initialize ret_addr to 0 when translation starts. Whenever we translate a branch instruction whose target address is higher than ret_addr, we update ret_addr to this new address. Each time after an instruction is translated, we chech the value of ret_addr, and terminate translation only if ret_addr is lower than the next instruction to be translated.

4.3. TB_TBT_SIZE and TB_TBP_SIZE

tb.c contains TB_TBT_SIZE and TB_TBP_SIZE, which control the initialization of TB. Their definitions are:
TB_TBT_SIZE defines the total number of TBs inside the DBCT. I.e., the space taken up by the tb_t struct. When it is set to 0, it means the TBs will be allocated on demand and stored in the field armmem.h:mem_state_t->tbt in the data structure used by the emulated memory. I.e., whenever a block of memory is executed, its tb_t structure is allocated. If the value is non-zero, the TBs will be dynamically allocated using the memory in tb.c:tbt_table and tb.c:tbt_table_size.
TB_TBP_SIZE is the space in TB that actually stores the micro instrucions. When set to 0, the space will be allocated in the emulated memory structure armmem.h:mem_state_t->tbp. I.e., when a memory block is executed, tbp is allocated. When set to non-zero, and the tag tbp_dynamic is 1, that means TBP is dynamically allocated, using the memory in tb.c:tbp_begin, b.c:tbp_now and tbp_now_size.
skyeye_config.tb_tbt_size and skyeye_config.tb_tbp_size are read from the configuration file. Initially they are set to their default values in the function skyeye_options.c:skyeye_option_init. Here we can set that config-> tb_tbt_size (the same as TB_TBT_SIZE) is initialized to be 0, because the tb_t structure is small. config->tb_tbp_size (the same as TB_TBP_SIZE) is initialized to TB_TBP_DEFAULT(1024 * 1024 * 64). Because an emulated ARM instruction can expand to several micro instructions, the space needed to store the micro instructions for a TB could be rather large, sometimes exceeding 32-bit addressing range (translator: really?), so we usually don't set config->tb_tbp_size to 0.
Note that TB_TBT_SIZE and TB_TBP_SIZE are not used immediately after they are read from the configuration file. They are used after tb_memory_init has started initialization. A few related variables are also initialized in tb_memory_init.

4.4. tb.c:tb_memory_init

This function is used to initialize the TB in DBCT. The procedure is as follows:


Deletions:
介绍tran_addr的时候,已经提到了只翻译到必定发生返回的指令就不再继续翻译,但是这里还有一种情况需要考虑到,就是DBCT在翻译当前 TB范围内的被模拟指令跳转的时候,都是将这个跳转指令翻译为微指令跳转指令,而不是通常的设置PC寄存器然后返回的方式,这样的跳转如果是向后跳转,并且超过了翻译结束的地址肯定是不行的。
如何防止提前翻译结束?在翻译开始的时候设置ret_addr为0,一旦有TB内跳转出现,并且这个地址的值比ret_addr大,就设置 ret_addr为这个地址。在翻译完一条指令并确定翻译也许可以结束的时候,对ret_addr进行检查,只有在ret_addr小于下一条将翻译的指令地址的时候,翻译才结束。
4.3.TB_TBT_SIZE和TB_TBP_SIZE
在tb.c中有TB_TBT_SIZE和TB_TBP_SIZE,TB的初始化就要根据其的值来进行,他们的定义为:
TB_TBT_SIZE是DBCT中所有TB的条目也就是tb_t结构所占空间。如果设置为0则就在使用的时候分配在被模拟内存结构 armmem.h:mem_state_t->tbt上,也就是只要运行某块内存,就分配其的tb_t结构。如果设置为非0就使用tb.c: tbt_table和tb.c:tbt_table_size的内存进行动态分配。
TB_TBP_SIZE是TB中实际储存微指令的内存所占空间。如果设置为0则就在使用的时候分配在被模拟内存结构armmem.h: mem_state_t->tbp上,也就是只要运行某块内存,就分配其的tbp。如果设置为非0会同时标记tbp_dynamic为1表明是 TBP动态分配,并且TBP所用的内存使用tb.c:tbp_begin、tb.c:tbp_now和tbp_now_size进行动态分配。
skyeye_config.tb_tbt_size和skyeye_config.tb_tbp_size是从配置文件中读出的值,他们的初始化也就是设置默认值在skyeye_options.c:skyeye_option_init函数中。在这里我们可以看到config-> tb_tbt_size也就是TB_TBT_SIZE初始化为0,因为tb_t结构占用空间不大;config->tb_tbp_size也就是 TB_TBP_SIZE初始化为TB_TBP_DEFAULT(1024 * 1024 * 64),这里没有也初始化因为每条被模拟的ARM指令都包含若干条微指令,这样跟一个TB存储的指令对应的微指令存储空间会比较大,甚至有超过32位寻址空间大小的情况,所以这里一般不设置为0。
注意TB_TBT_SIZE和TB_TBP_SIZE并不是从配置文件中读出后直接使用,而是在tb_memory_init进行过初始化后才使用,和其相关的几个变量也是在tb_memory_init中进行的初始化。
4.4.tb.c:tb_memory_init
这个函数用来对DBCT中的TB进行初始化。下面介绍执行过程:




Edited on 2006-08-10 23:38:54 by TomeiNingen [More translation.]

Additions:
The following is a description of each field:
struct list_head list;
When we use the second method (sic?) to use the TBs, all the TBs that are in use are put into the link list tb.c:tbp_dynamic_list. Each time a TB is used, it is moved to the end of the link list. When we need to execute code at an untranslated address, and there are no free TBs left, we need to pick a TB that's currently in use -- we always pick the first TB in the list. The advantage is that the least frequently used TB is the head of the list. Choosing it will have the smallest impact on performance. There are more details below when we discuss the translation process.
int ted;
A value 0 for this field means this TB does not contain translated data, 1 means otherwise. When we need to mark a TB as dirty, we set ted to 0.
uint8_t *insn_addr[TB_LEN / sizeof(uint8_t *)];
During translation, the starting micro instruction address of each translated instruction is stored in this array. When we execute a translated TB, we can obtain the corresponding micro instruction start address and start executing from there.
Originally, the DBCT did not store the addresses of the micro instructions. Rather, during execution, it reran the translation to obtain the addresses, perhaps saving the obtained addresses (translator: temporarily?). Later, I realized that even if I store all the addresses, it won't take too much space, so that's what we do now.
uint8_t *tbp;
This field points to the memory where the TB stres the micro instructions. Obviously, the size of the memory block is TB_LEN / sizeof (ARMword) * TB_INSN_LEN_MAX.
ARMword addr;
This field is the address of the emulated instruction that correspond to this TB.
ARMword tran_addr;
During translation, we do not translate the emulated instruction covered by a TB into micro instructions all at once. Instead, when we reach an unconditional return instruction (translator: backward branch instr??), and the actual target micro instruction address is already available (note that translation begins at the starting address of the TB), we stop translating. Later, when a requested address is higher than the address we have translated thus far, we will resume translation of this TB. The trans_addr field stores the address that immediately follows the the highest address that has been translated, i.e., the next address that would be translated.
uint8_t *tbp_now;
Points to the address where we can write the next micro instruction into. When tbt->ted is 0 (i.e, we just started translating this TB at tbt->addr) this field is initialized to be tbt->tbp, and is incremented each time a micro instruction is added. Also, when the translation is resumed, as described at tran_addr above, you can continue to use this field.
ARMword last_addr;
uint8_t *last_tbp;
These two fields are used during translation. last_addr stores the (emulated) address that was used when the TB was invoked the last time, and last_tbp stores the address of the corresponding micro instruction. This way, when the TB is invoked the next time with the same address as last_addr, we can quickly determine the micro instruction address using last_tbp. This improves performance.
ARMword ret_addr;

6. Translation Process

6.1. armemu.c:ARMul_Emulate32_dbct



Deletions:
下面对其的每个成员变量进行介绍: struct list_head list;
当使用第二种方法使用TB的时候,这个list将所有使用过的TB全部用tb.c:tbp_dynamic_list以及这个list结构组成的链表连接起来,在每次使用某个TB的时候,都将其先从链表中删除,然后连接到链表的最后。当对某地址进行执行,而所有TB都不是这个地址相关的,并且没有未使用过的TB,需要从现有TB中选择一个TB使用的时候,就会使用链表第一个TB。这样做的好处是,最不常用的已翻译过的TB肯定是在链表第一个,选择其影响会最小,提高了效率。在后面介绍翻译执行的时候会进行更具体的介绍。
int ted;
这个变量为0表明这个TB中数据是没有翻译过的,为1表明其中数据是翻译过的。在需要标记某个TB为脏的时候,就可以通过设置ted为0来实现。
uint8_t *insn_addr[TB_LEN / sizeof(uint8_t *)];
在翻译过程中,将把每条指令对应的微指令地址都存储到这个数组中,这样在执行已经翻译过的TB的时候,可以直接取得对应地址的微指令地址开始执行。
原来的DBCT中也采用过不存储微指令地址,在执行的时候再重新翻译取得地址的方法,最多是将取得后的地址存储起来,后来考虑即使将所有地址都存起来也不会使用很多内存,所以就用了所有翻译过的地址都存起来的方法。
uint8_t *tbp;
这个成员变量指向当前TB存储微指令的内存,显然这里指向内存块的大小为TB_LEN / sizeof (ARMword) * TB_INSN_LEN_MAX。
ARMword addr;
这个成员变量是当前TB对应的被模拟指令的地址。
ARMword tran_addr;
在TB翻译过程中,并不是一次将整个TB范围内的被模拟指令都翻译成微指令,而是每次翻译到一个必定发生返回的指令,并且实际要取得的微指令地址 (注意翻译是从TB开始的地址开始的 )也已经取得,就不再继续进行翻译,等下次请求一个地址比翻译到的地址大的时候,就继续对TB进行翻译。
这个成员变量tran_addr记录的就是翻译到的指令地址的下一个指令的地址,也就是如果继续翻译的地址。
uint8_t *tbp_now;
在成员变量指向当前可以写入微指令的地址,在tbt->ted为0也就是这个TB从tbt->addr开始翻译的时候初始化为tbt->tbp,每增加一个微指令都顺序增加,并且在向上面tran_addr提到的那种继续翻译的时候,可以继续使用。
ARMword last_addr;
uint8_t *last_tbp;
这2个成员在指令翻译的时候使用。last_addr存储这个TB上次被使用时候的地址,而last_tbp就是对应的微指令地址。这样如果下次还使用这个TB的这个地址last_addr,就可以快速取得微指令地址last_tbp,提高执行速度。
ARMword ret_addr;
6.翻译执行过程
6.1.armemu.c:ARMul_Emulate32_dbct




Edited on 2006-08-10 01:03:55 by TomeiNingen [More translations before nap time.]

Additions:
Here we initialize the micro instructions used to assign values to the emulated registers or the register variables.
The handling of immediate values is similar to the branch micro instructions described above. We first use a statement like “T2 = ULONG_MAX;”. This way the last 32 bits will be the immediate value ULONG_MAX, which will be replaced to the actual immediate value during translation.
3.6.7. arm2x86_mul.c:arm2x86_mul_init
Here we initialize the micro instructions for emulating multiplication.
3.6.8. arm2x86_mem.c:arm2x86_mem_init
Here we initialize the micro instructions for emulating memory access operations.
To access memory, we call SkyEye's memory access functions and use the returned values for various operations. We do it this way instead of directly accessing memory. This is because if the MMU is emulated, we need to use the TLB and page table for address translation. Also, an address could be memory mapped IO. Therefore, it's much simpler to just call the memory access functions.
3.6.9. arm2x86_dp.c:arm2x86_dp_init
Here we initialize the micro instructions for emulating the ARM DP instructions.
3.6.10. arm2x86_coproc.c:arm2x86_coproc_init
Here we initialize the micro instructions for emulating the ARM co-processor instructions.
3.6.11. arm2x86_other.c:arm2x86_other_init
Here we initialize the other micro instructions.

4. Translation Block (TB)

4.1. Overview

DBCT translates emulated instructions of length tb.h:TB_LEN into a series of micro instructions (whose maximum length is tb.h:TB_INSN_LEN_MAX, which is obtained in tb.c:tb_insn_len_max_init). These micro instructions (stored in memory called TBP), together with other information such as addresses, are packaged into a TB. During DBCT initialization, we initialize the TB according to the configuration files and the current status. We will see more details in the description of tb_memory_init.

4.2. tb.h:struct tb_s

This is the core structure in a TB. Each TB has a corresponding tb_s struct.


Deletions:
这里初始化是用来对某模拟寄存器或者某寄存器变量等赋值的微指令。
其中的立即数赋值也比较类似前面的微指令跳转,先用类似“T2 = ULONG_MAX;”的语句,这样其最后一个值就是32位长度的立即数ULONG_MAX,然后在实际翻译的过程中替换成实际翻译的立即数就可以了。
3.6.7.arm2x86_mul.c:arm2x86_mul_init
这里初始化是用来对乘法指令进行模拟的微指令。
3.6.8.arm2x86_mem.c:arm2x86_mem_init
这里初始化是用来对内存操作进行模拟的微指令。
这里对内存的操作采用的办法是调用SKYEYE原有的内存操作函数,直接取得返回值,然后进行各种操作。这里这么作而不是直接访问相应的内存地址因为有MMU的时候,需要先通过MMU中的TLB和页表等转换地址,而且还有地址是IO地址,所以直接调用内存操作函数是比较简单的实现方法。
3.6.9.arm2x86_dp.c:arm2x86_dp_init
这里初始化的是对ARM中DP指令进行模拟的微指令。
3.6.10.arm2x86_coproc.c:arm2x86_coproc_init
这里初始化的是对ARM中协处理器指令进行模拟的微指令。
3.6.11.arm2x86_other.c:arm2x86_other_init
这里对所有其他微指令进行初始化。
4.翻译块(TB)
4.1.概述
在DBCT中,将tb.h:TB_LEN长度的被模拟指令翻译成一系列微指令(这一系列微指令的长度最大值为tb.h: TB_INSN_LEN_MAX,由tb.c:tb_insn_len_max_init取得),这一系列微指令(存储这些微指令的内存称为TBP)以及地址信息等其他信息封装在一起称为一个TB。在DBCT初始化的时候,会根据配置文件以及实际情况对TB进行初始化,介绍tb_memory_init的时候会详细进行介绍。
4.2.tb.h:struct tb_s
这个结构是TB的核心结构,每个TB块都将对应一个tb_s结构。




Edited on 2006-08-09 18:51:23 by TomeiNingen [More translations.]

Additions:

3.1. Structure and Initialization of Micro Instructions.

In DBCT, when an exception happens, we first set st->trap or state->trap to be the exception type (see TRAP_XXX in arm2x86.h). Then, we use the X86 ret instruction to return to the non-DBCT emulation mode to perform the actual exception processing. This way, we simplify the complexity of the micro instructions. Hence, TRAP_SETS_R15、TRAP_SET_CPSR and TRAP_SET_R15, etc, are also handled the same way.

3.5. Branches inside Micro Instructions

Here "branch" doesn't mean the emulated branch instructions. Rather, inside the micro instructions, we have branches whose lengths may need to be adjusted. For example, in an emulated instruction that has a condition check, you will need to skip the compiled micro instruction code if the specified condition does not match the PSR. Here, we need to know the length of the branch. In DBCT, we first write a line like the following: __asm__ __volatile__ ("jmp 0xffffffff");. Usually, the last 4 bytes contain the branch length. We just have to fill in the length at translation time. (There's more detail below when we describe the translation process).

3.6. Categories of Micro Instructions

3.6.1. Overview
Due to design issues, the categories of the micro instructions are not well organized. Here we categorize them according to their initialization functions.
3.6.2. arm2x86.c:op_init
The most important micro instructions we initialize here are op_begin and op_begin_test_T0. These two micro instructions are placed in front of each translated ARM instruction.
op_begin is used when the ARM instruction's condition is AL or NV (i.e., no condition check is needed). Here, we first call the function arm2x86.c:tea_begin, which calls arm2x86.c:tea_check_out, which (as we do in the normal emulation mode) checks if single stepping is needed, checks if there are any hardware exceptions, and checks if the current TB is dirty. (About dirty: If the micro instructions in the current TB modifies memory associated with the current TB, we need to return to mornal emulation mode to automatucally re-translate this TB). The last step, armio.c: io_do_cycle invokes all virtual devices. After the tea_begin function returns, we check its return value. If it's TRUE, we return.
The micro instruction op_begin_test_T0 is used if the translated ARM instruction needs a condition check. Before this instruction is executed, the condition of the translated instruction is already stored in T0. Here we use st and T0 as parameters to invoke arm2x86.c:tea_begin_test, which also uses tea_check_out to check if there are any exceptions which require a return to the normal execution mode. Otherwise, we call arm2x86_psr.h:gen_op_condition to check if the current translated instruction needs to be executed. After tea_begin_test returns, we first check if we need to return to normal execution mode due to exceptions. Then we check if the instruction would be skipped due to the condition, in which case we execute a jmp instruction (as described above)
There are a few other rather simple micro instructions so we'll not describe them here.
3.6.3. arm2x86_test.c:arm2x86_test_init
Here we initialize a few simple micro instructions that first make a condition check and then perform some operations.
3.6.4. arm2x86_shift.c:arm2x86_shift_init
Here we initialize various shift micro instructions.
If the shift count is variable, it's rather simple. The preceding micro instructions would have stored the shift count in a register, which can be use directly in this micro instruction. If the shift count is an immediate value, we handle it smilar to the branches inside micro instructions as described above. We first use a line similar to “T1 = T1 << 31;”, followed by a 8-bit shift count. During translation, we replace it with the actual immediate shift count.
3.6.5. arm2x86_psr.c:arm2x86_psr_init
Here we initialize micro instructions related to the ARM status registers (CPSR and SPSR).
3.6.6. arm2x86_movl.c:arm2x86_movl_init


Deletions:
3.1. Structure and Initialization of Micro Instructions.
DBCT的做法是当有异常处理的时候,设置st->trap或者state->trap为异常的类型(定义在arm2x86.h,TRAP_XXX的都是),然后根据情况调用X86汇编指令ret返回到正常模式,然后在非DBCT运行模式中进行实际的处理。
因为这种处理减少了微指令中实现的难度,所以类似TRAP_SETS_R15、TRAP_SET_CPSR以及TRAP_SET_R15等几个非异常处理也采用了同一种处理方式。
3.5.微指令中的跳转
这里的跳转不是指模拟的被模拟指令的跳转,而是指微指令根据需要跳转指定的长度。比如某条被模拟指令有condition判断,其中就会需要这种跳转,当condition和PSR中的值不符合的时候,就需要跳过当前指令被翻译成的微指令代码,这个时候就需要跳转过指令的长度。
在DBCT中的做法是先写一条类似__asm__ __volatile__ ("jmp 0xffffffff");的指令,一般来说后4个字节就是跳转长度,在翻译指令的时候将要跳转的长度写入就可以。
在后面介绍指令翻译过程的时候,还会对微指令跳转的使用再作详细介绍。
3.6.微指令分类介绍
3.6.1.概述
因为规划设计的问题,微指令的分类有点乱,所以就以初始化函数为分类基础进行分类介绍。
3.6.2.arm2x86.c:op_init
这里初始化的最重要的2个微指令是op_begin和op_begin_test_T0,这2个指令都是在翻译ARM指令的时候放在每条指令最开始的部分的。
op_begin是被翻译的ARM指令的condition是AL或者NV也就是不进行条件判断时候使用的微指令。这里先调用了函数 arm2x86.c:tea_begin,而这个函数调用了arm2x86.c:tea_check_out,这个函数类似普通指令执行模式中指令开始执行时候作的操作一样,检查是否需要单步返回,检查是否有硬件中断发生进行异常处理,检查当前TB是否已经标记为脏(如果当前TB中执行的微指令写了当前 TB相关的内存,就会有这样的情况发生,这时返回到普通模式重新执行,就会自动对这个TB进行重新翻译),最后一步执行armio.c: io_do_cycle调用全部虚拟设备执行。tea_begin函数返回以后会判断返回值,如果为真就返回。
op_begin_test_T0是被翻译的ARM指令需要条件判断时候使用的微指令。在这条指令运行以前,被翻译指令的condition已经被存到T0中,这里首先以st和T0为参数调用arm2x86.c:tea_begin_test,这个函数也是调用tea_check_out检查是否有异常等需要从DBCT模式返回普通执行模式,如果没有则调用arm2x86_psr.h:gen_op_condition判断当前被翻译指令是否可以执行。tea_begin_test返回后,先判断是否有异常需要返回普通执行模式;然后判断是否这条指令是否因为condition而不被执行,如果是则就执行一条jmp指令,也就是前面介绍过的微指令跳转。
其他几个微指令比较简单不作详细介绍。
3.6.3.arm2x86_test.c:arm2x86_test_init
这里初始化了几个简单的测试情况然后进行一些处理的微指令。
3.6.4.arm2x86_shift.c:arm2x86_shift_init
这里初始化的是各种移位微指令。
这里移位的长度如果是变量的处理起来比较简单,前面在别的微指令中存到某个寄存器变量,然后在这条微指令中直接操作就可以。
如果移位的长度是立即数,则处理办法有点类似前面的微指令跳转。先用一条类似“T1 = T1 << 31;”的语句,这样其最后一个值是一个8位的移位长度,然后在实际翻译的过程中替换成实际翻译的立即数就可以了。
3.6.5.arm2x86_psr.c:arm2x86_psr_init
这里初始化的是跟ARM的状态寄存器PSR(包括CPSR和SPSR)相关的微指令。
3.6.6.arm2x86_movl.c:arm2x86_movl_init




Edited on 2006-08-09 14:27:21 by TomeiNingen [More translation.]

Additions:
An introduction to the internals of skyeye can be found at skyeye study notes (Chinese).
OP_END starts with a pseudo instruction that declares a symbol, which marks the end of the micro instruction; it's also the target of the branch instruction in OP_BEGIN. This is followed by two assignment instructions that aissign the begin- and end addresses of the micro instruction to the variables begin and end, which are declared at the beginning of the initialization function. This is how the initialization function obtains the begin- and end addresses of the micro instruction.
From the description of these two macros we can understand the process of generating the micro instructions -- we can calculate the length of the micro instruction from its begin- and end addresses. QEMU's process of generating micro instructions is different than DBCT: QEMC puts each micro instruction in its own function. After compilation, it uses a special procedure to gather the begin- and end addresses of the micro instruction.

3.2. Variables in Micro Instructions

There are no local variables in all implementation code of the micro instructions. Instead, register variables are used. Of course, when the registers are not enough, global (sic?) variables will be used. This method is similar to how QEMU uses variables in micro instructions.
I think this is because a single instruction is composed of several micro instructions (especially in the ARM architecture which performs several operations inside a single instruction). These micro instructions need to pass computed values between them. We could do this using a stack, but that would be relatively complicated. Also, frequent memory access will impact speed.
The declaration of these registers are declared in the header file arm2x86_self.h. Becuase register value declarations may impact the C compiler, only files related to DBCT include this header file.
The EPB register is declared to point to the global variable state, which is of the struct type ARMul_State and stores all information about CPU emulation in SkyEye. This way the micro instructions can easily reference this struct. EBX, ESI and EDI are declared to the variables T0, T1 and T2 of the type uint32_t. These 3 variables are frequently used by the micro instructions. Note that these registers are usually used to store things such as the stack pointer in the C calling convention, we need to save their values before executing the micro instructions (explained in detail below).
The other registers such as EAX are only local registers in GCC and cannot be declared to be global variables, so they are not used as variables in the micro instructions.

3.3. Calling Functions in Micro Instructions

Sometimes we need to call functions inside micro instructions. Because of the way that micro instructions are implemented, we cannot directly call regular functions and must go through a special process. The following is an example of how arm2x86.c:get_op_begin calls the function tea_begin:
First, we subtract 0xc from ESP to allocate space of size 0xc bytes in the stack. Then, EBP, which points to ARMul_State, is pushed to the stack. These two in-line assembler instructions pass st as a parameter to the function tea_begin. The 0xc bytes of space allocated before is to ensure that the parameter passing is aligned by 0x10 (0xc plus 32 bits equals to 0x10).
There's no need to preserve EBP, EBX, ESI and EDI, because these global registers are caller-save -- if the function that we call modifies these registers, it will save and restore these registers.
The next step is to assign the address of tea_begin to T2 before calling it (by its absolute address). Usually functions are called using relative branches. However, if the micro instructions are copied into the TB, its address will change and such relative branches will not work. Therefore, we must use absolute addresses to make the call. For similar reasons, these are some other functions that, in order to port to CYGWIN, use function pointers to make function calls.
Finally, if we need the return value, which is stored in the registers EAX, we would store it into a register variable such as T0.

3.4. Exception Handling in Micro Instructions

Usually emulators must emulate exceptions. This is also true for DBCT.


Deletions:
An introduction to the internals of skyeye can be found at skyeye study notes.
OP_END先是一条声明符号的伪指令,这个符号的地址就是指向微指令结束的地址,OP_BEGIN中的跳转指令也是跳到了这个地址。然后跟着是两条赋值指令,将微指令开始和结束的地址存到初始化函数开始声明的begin和end变量中,这样在函数中就取得微指令的开始和结束地址。
从上面对2个宏的的介绍就可以基本清楚微指令的生成过程,在取得了微指令的开始和接受地址后,就可以通过计算取得其的长度最后返回。
在QEMU的微指令的生成过程跟DBCT不同,QEMU采用的方式是每个微指令都在一个函数中,编译完成后,通过特定的程序将微指令的开始地址和长度取出。
3.2.微指令中的变量
在微指令的代码中,都没有使用动态局部变量,也就是栈中的地址,而直接使用了寄存器,当然在寄存器不够的情况下也使用了全局变量。这个使用的方法是参考的QEMU中微指令对变量的使用方法。
个人认为这样作因为若干条微指令才能组成一条指令(尤其是在ARM这种单条指令实现若干功能的体系结构更是如此),这些微指令之间要相互传递计算等得到的值,如果通过栈传递也可以,但是实现相对复杂,而且频繁的内存操作速度也回受到影响。
对这些寄存器的声明在arm2x86_self.h这个单独的头文件中,因为对寄存器的声明有时候会影响代码编译,所以只在DBCT相关的文件中包含了这个头文件。
ebp寄存器声明为ARMul_State结构的指针st,其将在微指令的代码开始之前指向state也就是存储了SKYEYE模拟CPU所有信息的结构,这样微指令就可以方便的访问这个结构。ebx、esi和edi声明为uint32_t类型的变量T0,T1和T2,这几个变量在微指令中可以灵活使用。注意因为这几个寄存器保存了栈指针等信息,所以在运行微指令以前需要进行保存,具体在介绍DBCT运行的时候会进行介绍。
其他eax等寄存器因为是在GCC中是局部寄存器,不能作为全局变量声明,所以不能作为微指令中的变量使用。
3.3.微指令中的函数调用
在微指令中有时候需要对函数进行调用,因为微指令使用方法的原因,这里不能进行普通的函数调用,而需要特殊处理。
下面就以arm2x86.c:get_op_begin中对tea_begin函数的调用为例子进行介绍。
这里首先是对esp减0xc也就是在栈中分配0xc的空间,然后将ebp也就是指向ARMul_State结构的指针push到栈中,这2条汇编指令是将st作为参数传递给被调用函数tea_begin,前面在栈中分配0xc的空间是要保证传递参数的0x10对齐,0xc加32位的长度就是 0x10。
没有存储使用的ebp、ebx、esi和edi,因为这几个全局寄存器的值是由被调用函数进行保存的,如果调用函数修改了这几个寄存器的值就进行保存,在函数返回时恢复,如果没有就使用就不进行保存和恢复。
然后是将tea_begin的地址赋值给T2,然后再对其进行调用,这样做的目的是进行直接地址调用。因为一般的函数调用都是直接地址调用,所以调用跟当前这个调用指令的地址是相关的,但是微指令会被拷贝到TB上的某地址执行,调用指令的地址发生了变化,所以如果进行相对地址调用,肯定要产生错误,所以这里都使用直接地址调用。在某些函数中为了适应CYGWIN使用函数指针进行调用也是出于同样的目的。
最后是取得返回值,一般一个32位数的返回值都是放在寄存器eax中,如果有需要就将其存入T0等寄存器变量然后使用。
3.4.微指令中的异常处理
模拟指令一般来说都模拟异常处理,DBCT也不例外。




Oldest known version of this page was edited on 2006-08-09 12:03:34 by TomeiNingen [Started translation from Chinese to English, based on SkyeyeDBCT as of 2006/08/09.]
Page view:

Introduction to the Implementation of SKYEYE Dynamic Binary Code Translation (DBCT) v0.0

teawater(teawater@gmail.com)
If you cross post this article elsewhere, please indicate its source as http://www.linuxforum.net

Translator's note:
This page is being translated by TomeiNingen. The original Chinese text can be found here

Revision History
v0.0
2006-06-16,v0.0 finished
2006-05-27,initial revision

Table of Contents

1. Foreword
2. Abstract
3. Micro Instructions
3.1. Structure and Initialization of Micro Instructions
3.2. Variables in Micro Instructions
3.3. Calling Functions inside Micro Instructions
3.4. Exception Handling in Micro Instructions
3.5. Branches in Micro Instructions
3.6. Categories of Micro Instructions
3.6.1. Overview
3.6.2. arm2x86.c:op_init
3.6.3. arm2x86_test.c:arm2x86_test_init
3.6.4. arm2x86_shift.c:arm2x86_shift_init
3.6.5. arm2x86_psr.c:arm2x86_psr_init
3.6.6. arm2x86_movl.c:arm2x86_movl_init
3.6.7. arm2x86_mul.c:arm2x86_mul_init
3.6.8. arm2x86_mem.c:arm2x86_mem_init
3.6.9. arm2x86_dp.c:arm2x86_dp_init
3.6.10. arm2x86_coproc.c:arm2x86_coproc_init
3.6.11. arm2x86_other.c:arm2x86_other_init
4. Translation Block (TB)
4.1. Overview
4.2. tb.h:struct tb_s
4.3. TB_TBT_SIZE和TB_TBP_SIZE
4.4. tb.c:tb_memory_init
4.5. tb.c:tb_insn_len_max_init
5. Initialization Function arm2x86.c:arm2x86_init
6. Translation Process
6.1. armemu.c:ARMul_Emulate32_dbct
6.2. tb.c:tb_find
6.3. tb.c:tb_get_tbp
6.4. tb.c:tb_translate
6.5. tb.c:translate_word

1. Foreword

This article is based on skyeye-1.2-RC7-3.
An introduction to the internals of skyeye can be found at skyeye study notes.

The acronym DBCT will be used in all the following text to stand for Dynamic Binary Code Translation.

The implementation of DBCT is not perfect. If you're interested in dynamic translation, I personally recommend reading the QEMU source code.

2. Abstract


The design of DBCT is influenced by QEMU( http://fabrice.bellard.free.fr/qemu/ ), but there are differences in the implementation. I will note the differences in the discussion of the each component.

DBCT combines several consecutive emulated instructions into a group (called a Translation Block, TB). According to its functionality, each instruction is directly translated into several micro instructions. (This is different than QEMU, which uses intermediate code during the translation process). Each micro instruction represents one operation and consists of several local instructions. Finally, we get a group of local instructions that corresponding the the TB, plus the return instruction at the end. To emulate, we make a function call to the beginning of this group of (local) instructions.

There are other types emulation methods. The most common method is to fetch one instruction, interpret it and perform its designated operations, and fetch the next instruction, and repeat. The normal instruction emulation mode in SkyEye is done this way.

There is also a method that translates the target hardware operations into languages such as C, and compiles it in order to emulate instructions.

3. Micro Instructions


3.1. Structure and Initialization of Micro Instructions.

In the function arm2x86_init, all the functions that are called before tb_insn_len_max_init are used to initialize the micro instructions.

In the DBCT code, each micro instruction is packaged in a op_table_t structure, which is defined in arm2x86.h. The op field in this structure points to the micro instruction, and the len field gives its length. The initialization of each micro instruction is done in a function named in the form of get_op_xxx. This function returns the address of the micro instruction, to be stored in op. The function's parameter pointer is for setting up len. These functions are called during micro instruction initialization.

This function uses two macros defined in arm2x86.h: OP_BEGIN and OP_END; both of them are X86 instructions. The code between these two macros implements the micro instruction:

#define OP_BEGIN(f) __asm__ __volatile__ ("jmp ."f"_teawater_op_end\n\t""."f"_teawater_op_begin:\n\t")
#define OP_END(f) __asm__ __volatile__ ("."f"_teawater_op_end:\n\t""movl $."f"_teawater_op_begin,%0\n\t""movl $."f"_teawater_op_end,%1\n\t":"=g"(begin), "=g"(end));


OP_BEGIN starts with a branch instruction that jumps to the symbol "."f"_teawater_op_end declared in OP_END. The purpose is to jump over the code between OP_BEGIN and OP_END to prevent them from being executed. Why don't I use the simple goto statement? The reason is if you use goto, the C compiler knows that the micro instruction's implementation code will not be executed, and will optimize it out. But if I use the in-line assembler code which the C compiler cannot understand, then the code between the two macros will be preserved. After that, there's a pseudo instruction that declares a symbol that marks the start of the micro instruction.

OP_END先是一条声明符号的伪指令,这个符号的地址就是指向微指令结束的地址,OP_BEGIN中的跳转指令也是跳到了这个地址。然后跟着是两条赋值指令,将微指令开始和结束的地址存到初始化函数开始声明的begin和end变量中,这样在函数中就取得微指令的开始和结束地址。
从上面对2个宏的的介绍就可以基本清楚微指令的生成过程,在取得了微指令的开始和接受地址后,就可以通过计算取得其的长度最后返回。
在QEMU的微指令的生成过程跟DBCT不同,QEMU采用的方式是每个微指令都在一个函数中,编译完成后,通过特定的程序将微指令的开始地址和长度取出。

3.2.微指令中的变量
在微指令的代码中,都没有使用动态局部变量,也就是栈中的地址,而直接使用了寄存器,当然在寄存器不够的情况下也使用了全局变量。这个使用的方法是参考的QEMU中微指令对变量的使用方法。
个人认为这样作因为若干条微指令才能组成一条指令(尤其是在ARM这种单条指令实现若干功能的体系结构更是如此),这些微指令之间要相互传递计算等得到的值,如果通过栈传递也可以,但是实现相对复杂,而且频繁的内存操作速度也回受到影响。
对这些寄存器的声明在arm2x86_self.h这个单独的头文件中,因为对寄存器的声明有时候会影响代码编译,所以只在DBCT相关的文件中包含了这个头文件。
ebp寄存器声明为ARMul_State结构的指针st,其将在微指令的代码开始之前指向state也就是存储了SKYEYE模拟CPU所有信息的结构,这样微指令就可以方便的访问这个结构。ebx、esi和edi声明为uint32_t类型的变量T0,T1和T2,这几个变量在微指令中可以灵活使用。注意因为这几个寄存器保存了栈指针等信息,所以在运行微指令以前需要进行保存,具体在介绍DBCT运行的时候会进行介绍。
其他eax等寄存器因为是在GCC中是局部寄存器,不能作为全局变量声明,所以不能作为微指令中的变量使用。



3.3.微指令中的函数调用
在微指令中有时候需要对函数进行调用,因为微指令使用方法的原因,这里不能进行普通的函数调用,而需要特殊处理。
下面就以arm2x86.c:get_op_begin中对tea_begin函数的调用为例子进行介绍。
这里首先是对esp减0xc也就是在栈中分配0xc的空间,然后将ebp也就是指向ARMul_State结构的指针push到栈中,这2条汇编指令是将st作为参数传递给被调用函数tea_begin,前面在栈中分配0xc的空间是要保证传递参数的0x10对齐,0xc加32位的长度就是 0x10。
没有存储使用的ebp、ebx、esi和edi,因为这几个全局寄存器的值是由被调用函数进行保存的,如果调用函数修改了这几个寄存器的值就进行保存,在函数返回时恢复,如果没有就使用就不进行保存和恢复。
然后是将tea_begin的地址赋值给T2,然后再对其进行调用,这样做的目的是进行直接地址调用。因为一般的函数调用都是直接地址调用,所以调用跟当前这个调用指令的地址是相关的,但是微指令会被拷贝到TB上的某地址执行,调用指令的地址发生了变化,所以如果进行相对地址调用,肯定要产生错误,所以这里都使用直接地址调用。在某些函数中为了适应CYGWIN使用函数指针进行调用也是出于同样的目的。
最后是取得返回值,一般一个32位数的返回值都是放在寄存器eax中,如果有需要就将其存入T0等寄存器变量然后使用。



3.4.微指令中的异常处理
模拟指令一般来说都模拟异常处理,DBCT也不例外。
DBCT的做法是当有异常处理的时候,设置st->trap或者state->trap为异常的类型(定义在arm2x86.h,TRAP_XXX的都是),然后根据情况调用X86汇编指令ret返回到正常模式,然后在非DBCT运行模式中进行实际的处理。
因为这种处理减少了微指令中实现的难度,所以类似TRAP_SETS_R15、TRAP_SET_CPSR以及TRAP_SET_R15等几个非异常处理也采用了同一种处理方式。



3.5.微指令中的跳转
这里的跳转不是指模拟的被模拟指令的跳转,而是指微指令根据需要跳转指定的长度。比如某条被模拟指令有condition判断,其中就会需要这种跳转,当condition和PSR中的值不符合的时候,就需要跳过当前指令被翻译成的微指令代码,这个时候就需要跳转过指令的长度。
在DBCT中的做法是先写一条类似__asm__ __volatile__ ("jmp 0xffffffff");的指令,一