A non-deterministic event is one of the reasons for transiting system states. In our system, we considerate all the deterministic event, or you can call them arithmetic, are in the black box instead of non-deterministic event. The states of external devices, such as hardware disk, network card, keyboard and mouse, can be ignore because they are too complicated to synchronize their states as previous execution. However, it is also impossible to re-generate a real hardware signal during replay. Thus we record the data in VM hardware simulator, and then reply the return data at exact time. Our system only cares about hardware interrupt, port I/O, memory-mapped I/O and DMA (direct memory access). We discuss in details following.
5.1. Hardware Interrupt
Computers need interrupts to communicate with each external device. Numerous tasks are accomplished by them. For example, context-switching, disk I/O, DMA data transfer, etc.
Interrupting is a way to disturb CPU execution, and hence becomes a source which generates non-deterministic events. However, not all of them are non-deterministic. For example, exceptions and software interrupts could never be non-deterministic. Exceptions are generated by deterministic execution of instructions existing in memory which remains constant before any hardware interrupt. A software interrupt can only be generated by instructions existing in memory, which is internal data storage. In other words, we can ignore all the exceptions and software interrupts and reduce time and space usage when recording.
QEMU translates the binary to translate blocks for execution. It will not be interrupted at anytime, so it checks whether interrupts are peddling or not before execution function. We modify this function for recording our interrupt, including instruction counter, which represents how many instructions are executed, interrupt number, the number indicate what should the PIC (programmable interrupt controller) handled.
15
5.2. External Input
A non-deterministic event is one of the reasons for system state transitions. In our system, we only take account of the difference of execution between runs and record factors of change.
However, system transition is arithmetic event so that it cannot change result with identical inputs. Thus we do not considerate all the outputs of every device. Also, we only care about the state of VM but other external device. The states of devices outside VM, such as hardware disk, network card, keyboard and mouse, can be ignore because they are too complicated to synchronize when replaying. For example, we can simply generate a keyboard interrupt in VM without synchronizing the state of real keyboard. It has no influence if we do not synchronize their states. Therefore, we record all non-deterministic events during original execution, and then reply them at exact time. Our system only takes hardware interrupts, port I/O, memory-mapped I/O and DMA (direct memory access) into account to record and replay efficiently.
Port I/O and memory-mapped IO are the same concept of the system inputs. Almost non-deterministic inputs of the system are from user or other computer. For example, a user types commands as input or receives a packet from other computer. Programs cannot decide when those data coming, this is a reason that external inputs are non-deterministic.
5.3. Time Related Instruction & Clock
Some instruction, rdtsc (Read Timestamp Counter) and rdpmc (read performance counter), can access data of CPU directly. It is possible to return different value in re-execution time as non-deterministic events. Because they cannot be trapped by hooking system calls or monitoring hardware interrupts, those instructions need to be handled to achieve record and replay ability. However, we want to ensure the time of replay system is identical with
16
original run, so the return value must be replayed correctly with same value and same related time.
All of hardware devices need clocks to work correctly. QEMU emulates clock interrupt which is used for context-switch by host system timer. When host timer sends a signal to QEMU, it will check which task is expired and selects next task from waiting queue.
Replaying clocks becomes a difficult mission that we need to trigger the clock interrupts with correct value and correct timing. There are tens of millions instructions per seconds are computed on VM, and we need to decide which instruction should be corresponding to clock interrupts for context-switch. Without the same timing of content-switch, the order of instructions cannot be replayed because of different task dependency.
5.4. DMA and Multi-processor
DMA is another non-deterministic event in our system. DMA allows devices within the computer to access system memory independently of the CPU. We have to handle with this, because it would affect the integrity of data for read and write. This event is very difficult to synchronize between original executions and replay of recorded. When DMA happening, QEMU would read host disk for simulating the guest system's disk. In the real world, it is unlikely for accessing disk in the same time. However, we plan to propose a transparent replay system, and ensure all the sequence of execution is the same as previous.
Synchronizing the state of memory and device transition is tricky, so we do not address this problem into our implementation. Therefore, all of DMA events are blocked during recording.
By doing this, there are no any factors that can impact the correctness of replay system.
Multi-processor is current trend of operating system. To synchronize the recourse of processor makes system debugging more complicate than before. In our system, we let the instruction order of re-execution is completely the same as recorded. Therefore, replaying
17
transparent multi-processor execution is so tricky that we do not solve this problem in our system. We disable the multi-processor ability on QEMU for correctness of replay.
18