Introduction to the Implementation of SKYEYE Dynamic Binary Code Translation (DBCT) v0.0
teawater(teawater@gmail.com)
If you cross post this article elsewhere, please indicate its source as
http://www.linuxforum.net∞
Translator's note:
This page is being translated by
TomeiNingen. The original Chinese text can be found
here
Revision History
v0.0
2006-06-16,v0.0 finished
2006-05-27,initial revision
Table of Contents
1. Foreword
2. Abstract
3. Micro Instructions
3.1. Structure and Initialization of Micro Instructions
3.2. Variables in Micro Instructions
3.3. Calling Functions inside Micro Instructions
3.4. Exception Handling in Micro Instructions
3.5. Branches in Micro Instructions
3.6. Categories of Micro Instructions
3.6.1. Overview
3.6.2. arm2x86.c:op_init
3.6.3. arm2x86_test.c:arm2x86_test_init
3.6.4. arm2x86_shift.c:arm2x86_shift_init
3.6.5. arm2x86_psr.c:arm2x86_psr_init
3.6.6. arm2x86_movl.c:arm2x86_movl_init
3.6.7. arm2x86_mul.c:arm2x86_mul_init
3.6.8. arm2x86_mem.c:arm2x86_mem_init
3.6.9. arm2x86_dp.c:arm2x86_dp_init
3.6.10. arm2x86_coproc.c:arm2x86_coproc_init
3.6.11. arm2x86_other.c:arm2x86_other_init
4. Translation Block (TB)
4.1. Overview
4.2. tb.h:struct tb_s
4.3. TB_TBT_SIZE和TB_TBP_SIZE
4.4. tb.c:tb_memory_init
4.5. tb.c:tb_insn_len_max_init
5. Initialization Function arm2x86.c:arm2x86_init
6. Translation Process
6.1. armemu.c:ARMul_Emulate32_dbct
6.2. tb.c:tb_find
6.3. tb.c:tb_get_tbp
6.4. tb.c:tb_translate
6.5. tb.c:translate_word
1. Foreword
This article is based on skyeye-1.2-RC7-3.
An introduction to the internals of skyeye can be found at
skyeye study notes (Chinese)∞.
The acronym DBCT will be used in all the following text to stand for Dynamic Binary Code Translation.
The implementation of DBCT is not perfect. If you're interested in dynamic translation, I personally recommend reading the QEMU source code.
2. Abstract
The design of DBCT is influenced by QEMU(
http://fabrice.bellard.free.fr/qemu/∞ ), but there are differences in the implementation. I will note the differences in the discussion of the each component.
DBCT combines several consecutive emulated instructions into a group (called a Translation Block, TB). According to its functionality, each instruction is directly translated into several micro instructions. (This is different than QEMU, which uses intermediate code during the translation process). Each micro instruction represents one operation and consists of several local instructions. Finally, we get a group of local instructions that corresponding the the TB, plus the return instruction at the end. To emulate, we make a function call to the beginning of this group of (local) instructions.
There are other types emulation methods. The most common method is to fetch one instruction, interpret it and perform its designated operations, and fetch the next instruction, and repeat. The normal instruction emulation mode in
SkyEye is done this way.
There is also a method that translates the target hardware operations into languages such as C, and compiles it in order to emulate instructions.
3. Micro Instructions
3.1. Structure and Initialization of Micro Instructions.
In the function arm2x86_init, all the functions that are called before tb_insn_len_max_init are used to initialize the micro instructions.
In the DBCT code, each micro instruction is packaged in a op_table_t structure, which is defined in arm2x86.h. The op field in this structure points to the micro instruction, and the len field gives its length. The initialization of each micro instruction is done in a function named in the form of get_op_xxx. This function returns the address of the micro instruction, to be stored in op. The function's parameter pointer is for setting up len. These functions are called during micro instruction initialization.
This function uses two macros defined in arm2x86.h: OP_BEGIN and OP_END; both of them are X86 instructions. The code between these two macros implements the micro instruction:
#define OP_BEGIN(f) __asm__ __volatile__ ("jmp ."f"_teawater_op_end\n\t""."f"_teawater_op_begin:\n\t")
#define OP_END(f) __asm__ __volatile__ ("."f"_teawater_op_end:\n\t""movl $."f"_teawater_op_begin,%0\n\t""movl $."f"_teawater_op_end,%1\n\t":"=g"(begin), "=g"(end));
OP_BEGIN starts with a branch instruction that jumps to the symbol "."f"_teawater_op_end declared in OP_END. The purpose is to jump over the code between OP_BEGIN and OP_END to prevent them from being executed. Why don't I use the simple goto statement? The reason is if you use goto, the C compiler knows that the micro instruction's implementation code will not be executed, and will optimize it out. But if I use the in-line assembler code which the C compiler cannot understand, then the code between the two macros will be preserved. After that, there's a pseudo instruction that declares a symbol that marks the start of the micro instruction.
OP_END starts with a pseudo instruction that declares a symbol, which marks the end of the micro instruction; it's also the target of the branch instruction in OP_BEGIN. This is followed by two assignment instructions that aissign the begin- and end addresses of the micro instruction to the variables begin and end, which are declared at the beginning of the initialization function. This is how the initialization function obtains the begin- and end addresses of the micro instruction.
From the description of these two macros we can understand the process of generating the micro instructions -- we can calculate the length of the micro instruction from its begin- and end addresses. QEMU's process of generating micro instructions is different than DBCT: QEMC puts each micro instruction in its own function. After compilation, it uses a special procedure to gather the begin- and end addresses of the micro instruction.
3.2. Variables in Micro Instructions
There are no local variables in all implementation code of the micro instructions. Instead, register variables are used. Of course, when the registers are not enough, global (sic?) variables will be used. This method is similar to how QEMU uses variables in micro instructions.
I think this is because a single instruction is composed of several micro instructions (especially in the ARM architecture which performs several operations inside a single instruction). These micro instructions need to pass computed values between them. We could do this using a stack, but that would be relatively complicated. Also, frequent memory access will impact speed.
The declaration of these registers are declared in the header file arm2x86_self.h. Becuase register value declarations may impact the C compiler, only files related to DBCT include this header file.
The EPB register is declared to point to the global variable state, which is of the struct type ARMul_State and stores all information about CPU emulation in
SkyEye. This way the micro instructions can easily reference this struct. EBX, ESI and EDI are declared to the variables T0, T1 and T2 of the type uint32_t. These 3 variables are frequently used by the micro instructions. Note that these registers are usually used to store things such as the stack pointer in the C calling convention, we need to save their values before executing the micro instructions (explained in detail below).
The other registers such as EAX are only local registers in GCC and cannot be declared to be global variables, so they are not used as variables in the micro instructions.
3.3. Calling Functions in Micro Instructions
Sometimes we need to call functions inside micro instructions. Because of the way that micro instructions are implemented, we cannot directly call regular functions and must go through a special process. The following is an example of how arm2x86.c:get_op_begin calls the function tea_begin:
First, we subtract 0xc from ESP to allocate space of size 0xc bytes in the stack. Then, EBP, which points to ARMul_State, is pushed to the stack. These two in-line assembler instructions pass st as a parameter to the function tea_begin. The 0xc bytes of space allocated before is to ensure that the parameter passing is aligned by 0x10 (0xc plus 32 bits equals to 0x10).
There's no need to preserve EBP, EBX, ESI and EDI, because these global registers are caller-save -- if the function that we call modifies these registers, it will save and restore these registers.
The next step is to assign the address of tea_begin to T2 before calling it (by its absolute address). Usually functions are called using relative branches. However, if the micro instructions are copied into the TB, its address will change and such relative branches will not work. Therefore, we must use absolute addresses to make the call. For similar reasons, these are some other functions that, in order to port to CYGWIN, use function pointers to make function calls.
Finally, if we need the return value, which is stored in the registers EAX, we would store it into a register variable such as T0.
3.4. Exception Handling in Micro Instructions
Usually emulators must emulate exceptions. This is also true for DBCT.
In DBCT, when an exception happens, we first set st->trap or state->trap to be the exception type (see TRAP_XXX in arm2x86.h). Then, we use the X86 ret instruction to return to the non-DBCT emulation mode to perform the actual exception processing. This way, we simplify the complexity of the micro instructions. Hence, TRAP_SETS_R15、TRAP_SET_CPSR and TRAP_SET_R15, etc, are also handled the same way.
3.5. Branches inside Micro Instructions
Here "branch" doesn't mean the emulated branch instructions. Rather, inside the micro instructions, we have branches whose lengths may need to be adjusted. For example, in an emulated instruction that has a condition check, you will need to skip the compiled micro instruction code if the specified condition does not match the PSR. Here, we need to know the length of the branch. In DBCT, we first write a line like the following: __asm__ __volatile__ ("jmp 0xffffffff");. Usually, the last 4 bytes contain the branch length. We just have to fill in the length at translation time. (There's more detail below when we describe the translation process).
3.6. Categories of Micro Instructions
3.6.1. Overview
Due to design issues, the categories of the micro instructions are not well organized. Here we categorize them according to their initialization functions.
3.6.2. arm2x86.c:op_init
The most important micro instructions we initialize here are op_begin and op_begin_test_T0. These two micro instructions are placed in front of each translated ARM instruction.
op_begin is used when the ARM instruction's condition is AL or NV (i.e., no condition check is needed). Here, we first call the function arm2x86.c:tea_begin, which calls arm2x86.c:tea_check_out, which (as we do in the normal emulation mode) checks if single stepping is needed, checks if there are any hardware exceptions, and checks if the current TB is dirty. (About dirty: If the micro instructions in the current TB modifies memory associated with the current TB, we need to return to mornal emulation mode to automatucally re-translate this TB). The last step, armio.c: io_do_cycle invokes all virtual devices. After the tea_begin function returns, we check its return value. If it's TRUE, we return.
The micro instruction op_begin_test_T0 is used if the translated ARM instruction needs a condition check. Before this instruction is executed, the condition of the translated instruction is already stored in T0. Here we use st and T0 as parameters to invoke arm2x86.c:tea_begin_test, which also uses tea_check_out to check if there are any exceptions which require a return to the normal execution mode. Otherwise, we call arm2x86_psr.h:gen_op_condition to check if the current translated instruction needs to be executed. After tea_begin_test returns, we first check if we need to return to normal execution mode due to exceptions. Then we check if the instruction would be skipped due to the condition, in which case we execute a jmp instruction (as described above)
There are a few other rather simple micro instructions so we'll not describe them here.
3.6.3. arm2x86_test.c:arm2x86_test_init
Here we initialize a few simple micro instructions that first make a condition check and then perform some operations.
3.
6.4. arm2x86_shift.c:arm2x86_shift_init
Here we initialize various shift micro instructions.
If the shift count is variable, it's rather simple. The preceding micro instructions would have stored the shift count in a register, which can be use directly in this micro instruction. If the shift count is an immediate value, we handle it smilar to the branches inside micro instructions as described above. We first use a line similar to “T1 = T1 << 31;”, followed by a 8-bit shift count. During translation, we replace it with the actual immediate shift count.
3.6.5. arm2x86_psr.c:arm2x86_psr_init
Here we initialize micro instructions related to the ARM status registers (CPSR and SPSR).
3.6.6. arm2x86_movl.c:arm2x86_movl_init
Here we initialize the micro instructions used to assign values to the emulated registers or the register variables.
The handling of immediate values is similar to the branch micro instructions described above. We first use a statement like “T2 = ULONG_MAX;”. This way the last 32 bits will be the immediate value ULONG_MAX, which will be replaced to the actual immediate value during translation.
3.6.7. arm2x86_mul.c:arm2x86_mul_init
Here we initialize the micro instructions for emulating multiplication.
3.6.8. arm2x86_mem.c:arm2x86_mem_init
Here we initialize the micro instructions for emulating memory access operations.
To access memory, we call
SkyEye's memory access functions and use the returned values for various operations. We do it this way instead of directly accessing memory. This is because if the MMU is emulated, we need to use the TLB and page table for address translation. Also, an address could be memory mapped IO. Therefore, it's much simpler to just call the memory access functions.
3.6.9. arm2x86_dp.c:arm2x86_dp_init
Here we initialize the micro instructions for emulating the ARM DP instructions.
3.6.10. arm2x86_coproc.c:arm2x86_coproc_init
Here we initialize the micro instructions for emulating the ARM co-processor instructions.
3.6.11. arm2x86_other.c:arm2x86_other_init
Here we initialize the other micro instructions.
4. Translation Block (TB)
4.1. Overview
DBCT translates emulated instructions of length tb.h:TB_LEN into a series of micro instructions (whose maximum length is tb.h:TB_INSN_LEN_MAX, which is obtained in tb.c:tb_insn_len_max_init). These micro instructions (stored in memory called TBP), together with other information such as addresses, are packaged into a TB. During DBCT initialization, we initialize the TB according to the configuration files and the current status. We will see more details in the description of tb_memory_init.
4.2. tb.h:struct tb_s
This is the core structure in a TB. Each TB has a corresponding tb_s struct.
The following is a description of each field:
struct list_head list;
When we use the second method (sic?) to use the TBs, all the TBs that are in use are put into the link list tb.c:tbp_dynamic_list. Each time a TB is used, it is moved to the end of the link list. When we need to execute code at an untranslated address, and there are no free TBs left, we need to pick a TB that's currently in use -- we always pick the first TB in the list. The advantage is that the least frequently used TB is the head of the list. Choosing it will have the smallest impact on performance. There are more details below when we discuss the translation process.
int ted;
A value 0 for this field means this TB does not contain translated data, 1 means otherwise. When we need to mark a TB as dirty, we set ted to 0.
uint8_t *insn_addr[TB_LEN / sizeof(uint8_t *)];
During translation, the starting micro instruction address of each translated instruction is stored in this array. When we execute a translated TB, we can obtain the corresponding micro instruction start address and start executing from there.
Originally, the DBCT did not store the addresses of the micro instructions. Rather, during execution, it reran the translation to obtain the addresses, perhaps saving the obtained addresses (translator: temporarily?). Later, I realized that even if I store all the addresses, it won't take too much space, so that's what we do now.
uint8_t *tbp;
This field points to the memory where the TB stres the micro instructions. Obviously, the size of the memory block is TB_LEN / sizeof (ARMword) * TB_INSN_LEN_MAX.
ARMword addr;
This field is the address of the emulated instruction that correspond to this TB.
ARMword tran_addr;
During translation, we do not translate the emulated instruction covered by a TB into micro instructions all at once. Instead, when we reach an unconditional return instruction (translator: backward branch instr??), and the actual target micro instruction address is already available (note that translation begins at the starting address of the TB), we stop translating. Later, when a requested address is higher than the address we have translated thus far, we will resume translation of this TB. The trans_addr field stores the address that immediately follows the the highest address that has been translated, i.e., the next address that would be translated.
uint8_t *tbp_now;
Points to the address where we can write the next micro instruction into. When tbt->ted is 0 (i.e, we just started translating this TB at tbt->addr) this field is initialized to be tbt->tbp, and is incremented each time a micro instruction is added. Also, when the translation is resumed, as described at tran_addr above, you can continue to use this field.
ARMword last_addr;
uint8_t *last_tbp;
These two fields are used during translation. last_addr stores the (emulated) address that was used when the TB was invoked the last time, and last_tbp stores the address of the corresponding micro instruction. This way, when the TB is invoked the next time with the same address as last_addr, we can quickly determine the micro instruction address using last_tbp. This improves performance.
ARMword ret_addr;
When we discussed tran_addr above, we mentioned that the translation stops when it reaches an unconditional return instruction. However, there's one more case that we have to consider -- the DBCT translates a emulated branch instruction into a regular branch micro instruction (not by modifying the PC register and then returning). When such a branch is in the forward direction, it could branch over the end of the currently translated code. This would be bad.
To prevent the premature ending of translation, we initialize ret_addr to 0 when translation starts. Whenever we translate a branch instruction whose target address is higher than ret_addr, we update ret_addr to this new address. Each time after an instruction is translated, we chech the value of ret_addr, and terminate translation only if ret_addr is lower than the next instruction to be translated.
4.3. TB_TBT_SIZE and TB_TBP_SIZE
tb.c contains TB_TBT_SIZE and TB_TBP_SIZE, which control the initialization of TB. Their definitions are:
#define TB_TBT_SIZE skyeye_config.tb_tbt_size
#define TB_TBP_SIZE skyeye_config.tb_tbp_size
TB_TBT_SIZE defines the total number of TBs inside the DBCT. I.e., the space taken up by the tb_t struct. When it is set to 0, it means the TBs will be allocated on demand and stored in the field armmem.h:mem_state_t->tbt in the data structure used by the emulated memory. I.e., whenever a block of memory is executed, its tb_t structure is allocated. If the value is non-zero, the TBs will be dynamically allocated using the memory in tb.c:tbt_table and tb.c:tbt_table_size.
TB_TBP_SIZE is the space in TB that actually stores the micro instrucions. When set to 0, the space will be allocated in the emulated memory structure armmem.h:mem_state_t->tbp. I.e., when a memory block is executed, tbp is allocated. When set to non-zero, and the tag tbp_dynamic is 1, that means TBP is dynamically allocated, using the memory in tb.c:tbp_begin, b.c:tbp_now and tbp_now_size.
skyeye_config.tb_tbt_size and skyeye_config.tb_tbp_size are read from the configuration file. Initially they are set to their default values in the function skyeye_options.c:skyeye_option_init. Here we can set that config-> tb_tbt_size (the same as TB_TBT_SIZE) is initialized to be 0, because the tb_t structure is small. config->tb_tbp_size (the same as TB_TBP_SIZE) is initialized to TB_TBP_DEFAULT(1024 * 1024 * 64). Because an emulated ARM instruction can expand to several micro instructions, the space needed to store the micro instructions for a TB could be rather large, sometimes exceeding 32-bit addressing range (translator: really?), so we usually don't set config->tb_tbp_size to 0.
Note that TB_TBT_SIZE and TB_TBP_SIZE are not used immediately after they are read from the configuration file. They are used after tb_memory_init has started initialization. A few related variables are also initialized in tb_memory_init.
4.4. tb.c:tb_memory_init
This function is used to initialize the TB in DBCT. The procedure is as follows:
Step 1, it checks if TB_TBT_SIZE is 0. If it is not 0, it runs some code related to TB_TBT_SIZE. (Note that here I made a rather big mistake. The struct should be used as tb_t, but by mistake, I used tb_cache_t here. This tb_cache_t is obsolete and should have been removed from the source code. The following discussion assumes that all occurrences of tb_cache_t have been replaced with tb_t). First we perform some basic processing and checking of TB_TBT_SIZE. After that, we compute the space needed for the tb_t of all the emulated memory, and compares this value against TB_TBT_SIZE: if TB_TBT_SIZE is larger, it means the DBCT does not need to dynamically allocate tb_t for the purpose of saving space. In this case, we set TB_TBT_SIZE to be 0 and use static allocation; if TB_TBT_SIZE is smaller, we initialize the tbt_table for storing the tb_t, as well as tbt_table_size, which is the number of the tb_t.
This way, the initialization of TB_TBT_SIZE is complete. We also configured tbt_table and tbt_table_size.
Step 2, if TB_TBP_SIZE is not 0, perform related basic processing and checking.
Step 3, check once again if TB_TBT_SIZE is 0, and process TB_TBP_SIZE.
If TB_TBT_SIZE is not 0, we first compute tmp_u64, the size of the tbp needed by TB_TBT_SIZE number of tb_t structs. If TB_TBP_SIZE is larger than tmp_u64, or if TB_TBP_SIZE is 0, we set TB_TBP_SIZE to this value. We do this because after TB_TBT_SIZE is dynamically allocated, TB cannot manage the micro instruction storage TBP that are outside of its management limit. If TB_TBP_SIZE is smaller than tmp_u64, we set tb.c:tbp_dynamic to be 1, which means the DBCT dynamically allocateds TBP.
If TB_TBT_SIZE is 0, we check if TB_TBP_SIZE is 0. If it is also 0, obviously there's no need for any further initialization, and tbp_dynamic assumes the default value of 0. When thd DBCT operates, it allocates mem_state_t->tbt and mem_state_t->tbp on-demand. If it's not 0, we first compute tmp_u64, which is the size of the TBP needed for all the emulated memory, and compare this value against TB_TBP_SIZE. If TB_TBP_SIZE is larger or equal to tmp_u64, it means there's no need for dynamic allocation, and we set TB_TBP_SIZE to 0. If TB_TBP_SIZE is smaller than tmp_u64, dynamic allocation is necessary, so we set tbp_dynamic to be 1.
Now the initialization of TB_TBP_SIZE is finished, and tbp_dynamic is configured accordingly.
Step 4, now that the value of TB_TBP_SIZE is determined, we allocate space for tbp_begin. Note that when we use mmap to to allocate space, we set the permission to be executable, i.e., PROT_EXEC. After that, we initialize tbp_now_size and tbp_now.
This way, tbp_begin, tbp_now_size and tbp_now, which are used for dynamically allocating TBP, are initialized.
In conclusion, the way that DBCT uses the variables for managing dynamic allocation of TBT and TBP has become a bit convoluted.
4.5 tb.c:tb_insn_len_max_init
This function initializes tb.c:tb_insn_len_max, which is also TB_INSN_LEN_MAX.
For all (ARM) instructions, it computes their lengths of the corresponding translated micro instruction. tb_insn_len_max is set to be the the length of the longest sequence.
5. Initialization function arm2x86.c:arm2x86_init
This is the initialization function for DBCT. It is alled by the function arminit.c:ARMul_Reset.
This function first calls the micro instruction initialization functions described above. It then calls the function tb_insn_len_max_init, and lastly, the function tb_memory_init.
6. Translation Process
6.1. armemu.c:ARMul_Emulate32_dbct
这是整个DBCT翻译执行的核心函数,类似普通指令执行方式的ARMul_Emulate32函数,也是在arminit.c:ARMul_DoProg和arminit.c:ARMul_DoInstr被调用。下面介绍执行过程:
第一步,给R15寄存器也就是PC寄存器的值增加一个指令长度INSN_SIZE,这是因为ARM的多级流水线PC寄存器对应用是非透明的,而在这个函数外面的函数都将R15当作当前PC值,所以在开始执行前先对R15寄存器进行设置。
第二步,设置state->trap为0。
第三步,调用函数tb.c:tb_find,在这个函数中根据参数提供的PC寄存器值,进行全部的分配TB以及指令翻译的工作,最后将跟PC对应的微指令地址返回。如果返回NULL则表示执行失败,设置state->trap为TRAP_INSN_ABORT也就是取指异常,跳转到后面对 state->trap进行处理的部分。
第四步,对将在微指令中作为变量的寄存器进行保存,保存的原因前面介绍过,因为这几个寄存器的值都是被调用函数来保存,所以在这里进行保存。
调用取得的指向微指令内存的指针gen_func。
返回后恢复几个寄存器的值。
第五步,在介绍微指令的时候,介绍过异常等特殊情况,都是先设置state->trap然后就返回,而这里就是实际对异常等进行处理的地方。这部分代码比较清晰,就是根据state->trap进行不同的处理,不作详细介绍。
第六步,判断是否还继续执行,或者函数返回。如果继续执行就返回到第二步。
第七步,state->Reg[15]减INSN_SIZE,恢复PC指向当前程序执行的地址,然后返回。
6.2.tb.c:tb_find
在这个函数中根据参数提供的PC寄存器值,进行全部的分配TB以及指令翻译的工作,最后将跟PC对应的微指令地址返回。下面介绍执行过程:
第一步,调用armmmu.c:mmu_v2p_dbct函数通过SKYEYE的MMU功能取得跟执行地址ADDR对应的被模拟物理地址 addr,如果失败则函数出错返回。然后通过TB_ALIGN取得跟TB_LEN长度对齐的地址align_addr,这个地址就是addr对应TB的地址。
第二步,检查align_addr是否和静态局部变量save_align_addr相同,如果相同表明前面已经对这个物理地址的TB进行过请求,已经取得了翻译前需要的各种指针,都存在静态局部变量中,所以跳过分配TB的代码直接执行指令翻译的代码。注意save_align_addr的初始值为0x1是为了保证不跟任何地址一样。
第三步,这里开始的就是对TB进行分配的代码,首先判断tbt_table_size是否为0来确定tb_t是否是动态分配的。
第四步,如果是动态分配,就会以哈希计算的方法从tbt_table中取出跟align_addr对应地址的tb_t。
比较tbt->addr和align_addr,如果tbt->addr跟align_addr不同表明其先前是其他地址的TB,就会进行一些清除过去记录的工作,设置tbt->ted为0,设置tbt->addr为align_addr。
然后就是取得tbt->tbp也就是TBP。如果tbt->tbp为NULL,则表明这个TB中的TBP没有分配或者已经被其他TB 使用,这时候需要调用tb.c:tb_get_tbp进行TBP的分配。如果tbt->tbp不为NULL,则TBP已经分配过,则按照前面在介绍 tbt->list那样,先将其从tbp_dynamic_list链表中删除掉。
第五步,如果不是动态分配,首先通过函数tb.c:tb_get_mbp取得align_addr对应模拟内存的mem_bank_t结构指针mbp。
检查结构中的state->mem.tbt[bank_num]是否为空,如果为空表明tbt和tbp未分配相应的空间,如果 tbp_dynamic为0表明是静态分配TBP,则将先给state->mem.tbp[bank_num]分配空间,然后给state- >mem.tbt[bank_num]分配空间。
分配好空间后设置TB结构。
在取得TB结构后检查tbt->tbp也就是TBP是否为空。如果为空就根据tbp_dynamic对其进行设置,动态分配跟前面一样使用 tb.c:tb_get_tbp函数,静态从state->mem.tbp[bank_num]中取得。如果不为空也跟前面一样判断 tbp_dynamic根据情况将TB结构从列表中删除。
现在,TB结构和其中的TBP都已经取得。
第六步,用取得的TB进行一些设置。
设置state->tb_now为刚取得的TB结构,其的作用是微指令在运行的时候可以访问当前运行的TB,比如在标记TB为脏之后,微指令可以马上判断出来然后退出。
设置为save_align_addr为align_addr,目的在第二步介绍过。
如果tbp_dynamic为真表明是动态TBP分配,将TB结构增加到tbp_dynamic_list链表的最后面,这么作的目的在介绍tbt->list已经介绍过。
第七步,现在开始的就是对被模拟指令进行翻译的代码。先判断tbt->ted的值来确定这个TB结构是否被翻译过。
第八步,如果这个TB结构已经翻译过。
先检查tbt->last_addr是否跟addr相同,如果相同就返回tbt->last_tbp。这里在前面介绍tbt->last_addr和tbt->last_tbp的已经介绍过了。
判断需要翻译的物理地址addr是否大于等于tbt->tran_addr,这个tbt->tran_addr在前面也介绍过。
如果addr小于tbt->tran_addr则表明TB中现有微指令代码已经可以满足addr的需要,直接从tbt->insn_addr取出跟addr对应的TBP地址作为返回值设置到ret就可以。
如果addr大于等于tbt->tran_addr则表明需要继续翻译,首先取得跟tbt->tran_addr地址对应的在被模拟内存块中指针real_begin_addr,以及和addr对应的在被模拟内存块中指针real_addr。然后就调用tb.c: tb_translate从给定的real_begin_addr开始的内存进行翻译。最后取得跟addr对应的微指令地址设置到ret。
第九步,如果这个TB结构还没有翻译过,就需要重新翻译。
也是首先取得跟tbt->tran_addr地址对应的在被模拟内存块中指针real_begin_addr,以及和addr对应的在被模拟内存块中指针real_addr。然后初始化tbt->tran_addr为align_addr,初始化tbt->tbp_now为 tbp,这两个成员变量在前面介绍过,这里就不再介绍。调用tb.c:tb_translate从给定的real_begin_addr开始的内存进行翻译。最后取得跟addr对应的微指令地址设置到ret。并且设置tbt->ted为1表明这个TB已经被翻译过。
现在返回值ret,也就是跟ADDR对应的微指令地址已经取得。
第十步,将addr和ret都设置到tbt->last_addr和tbt->last_tbp上,将ret返回。
6.3.tb.c:tb_get_tbp
这个函数用来对TBP进行动态分配。下面介绍执行过程:
第一步,判断tbp_now_size是否为0,前面介绍过tbp_now_size记录了可以分配的tbp的长度。
第二步,如果tbp_now_size不是0,表明还可以直接从tbp_now中分配TBP。
第三部,如果tbp_now_size是0,表明tbp_now中的空间已经分配光了,这时就要取tbp_dynamic_list的第一个TB 结构中的TBP,这是整个链表中最不常用的一个TB结构,原因见上面对tbt->list的介绍。在取完后要将被取走TBP的TB结构从链表中删除,同时标记其tbp为NULL还有ted为0。
6.4.tb.c:tb_translate
这个函数从指定参数tb_begin_addr开始的内存进行指令翻译,最后将跟addr对应的微指令地址返回。下面介绍执行过程:
第一步,用tb_begin_addr通过计算取得这个TB对应被模拟内存块结束的地址tb_end_addr。
第二步,初始化链表tb_branch_save_list,这个链表的作用是记录每个TB内跳转,因为在翻译的过程中后面的指令地址还不知道,无法计算跳转长度,所以在这里将要写入的地址以及要跳转到的地址等信息记录起来,待翻译结束后,再循环对连表中的每个跳转长度进行设置。
设置全局变量now_tbt为tbt,这个now_tbt是给每个翻译函数可以方便访问当前TB结构。
设置tbt->ret_addr为0,目的见前面对ret_addr的介绍。
第三步,下面开始循环翻译,每次都会检查tb_begin_addr是否小于tb_end_addr,如果是就翻译,如果不是就不再进行循环。在每次一条指令翻译结束最后,都会给tb_begin_addr增加ARMword的长度到下一条指令。下面开始介绍一条指令的翻译过程。
检查tb_begin_addr是否跟addr相同,如果相同表明将翻译的指令是跟addr相关的指令,设置返回值ret为tbt->tbp_now。
设置tbt->insn_addr,这样做的目的见前面对tbt->insn_addr的介绍。
以当前要翻译的指令*tb_begin_addr、当前写入微指令的指针tbt->tbp_now等为参数,对函数tb.c:translate_word进行调用,这个函数就是对某个指令进行翻译的函数,其会返回写入微指令的长度len。
给tbt->tbp_now增加len,跳过已经使用的伪指令存储空间。
给tbt->tran_addr加4,令其对应下一个指令。
最后就是前面介绍过的如果指令一定发生返回就中断翻译。这里先要提一下state->trap,前面提过其在微指令执行时候的作用是返回异常类型,其在指令翻译的时候的作用是标记前面翻译的指令是肯定返回。这里就可以看到在检查了state->trap的同时,还检查了ret以及 tbt->tran_addr是否大于tbt->ret_addr,这些工作的目的在前面都介绍过。如果确定可以停止翻译,就终端循环。
第四步,这时已经翻译指令结束,判断如果TB已经被全部翻译,也就是state->trap为0,在最后加上op_return微指令,让这个TB执行结束后返回。
第五步,现在将前面介绍过的tb_branch_save_list链表中的每个跳转结构依次取出,进行设置。
最后返回ret也就是跟addr对应的伪指令地址。
6.5.tb.c:translate_word
这是对参数中一个被模拟指令insn翻译成微指令并存储到参数tbp中最后将写入的微指令长度返回的函数。
这个函数本身结构比较大,而且主要都是指令翻译的工作,所以不做详细介绍了。
There are no comments on this page. [Add comment]