• 沒有找到結果。

Table 4: Gas Cost With Formula

Instruction Gas Formula

EXP(x y) if(exp==0)?10:10+(10*(1+log256(xy)))

SHA3(x y) 30+6y

CALLDATACOPY(x y z) 2+3z

CODECOPY(x y z) 2+3z

EXTCODECOPY(x y z a) 700+3z

SSTORE(x y) if(x=0 and y!=0)?20000:5000

LOG0(x y) 375+8y

LOG1(x y a) 2*375+8y

LOG2(x y a b) 3*375+8y

LOG3(x y a b c) 4*375+8y

LOG4(x y a b c d) 5*375+8y

However, some EVM opcode consumes gas based on its parameter (Table 4). For instance, instruction CALLDAT ACOP Y take three parameters t, f and s to copy s bytes from ”calldata” at position f to memory at position t.

2 + 3s (1)

If we want to find out the exact value of the gas consumption on the positive cycle path, we have to know the parameter for all those instruction in order to calculate the answer of the gas formula. So we will convert that gas formula into the symbolic form like the equation above and use stack-based symbolic simulation to get the parameter for the gas formula.

4.5 Stack-based Symbolic Simulation

Symbolic execution is a program analysis technique to determine what input values cause each path of a program to execute. Symbolic execution assumes symbolic values for the program inputs rather than using concrete values as normal execution of the program would. Expressions encountered during symbolic execution are expressed as functions of the symbolic variables. At any point during symbolic execution, program state is described with the value of the program counter and with a symbolic expression known

Algorithm 4: Stack Count and Gas Estimation input : Cycle nodes

output: Stack size usage, gas consumption and gas constraint

1 import opcodeTable;

7 /* Using Algorithm 5’s function stackSimulation */

8 S, ST, MEM, G ← se.stackSimulation(ins, S, ST, MEM);

9 if isinstance(G, str): gasConstraint ← G;

10 else: gasTotal += G;

11 if ’LOG’ in ins:

12 LN ← ins.split(’LOG’);

13 stackSum += (LN + 2);

14 else: stackSum += opcodeTable[ins];

15 return stackSum, gasTotal, gasConstraint;

as the path condition (PC). A PC is a constraint on input values that must be satisfied in order for a program to reach the location that PC corresponds. The set of all possible executions of a program is represented by a symbolic execution tree.

EVM is a stack-based machine so we will use a program to simulate the state of the EVM stack. Meanwhile, the state of memory or storage will all be update if the opcode use it. Inside the control flow graph of the positive cycle path, there are some instructions which load data from the user input. For example, CALLDAT ALOAD will load a user input once at a time. User input is in a format of 16 bits hexadecimal-digit and all the input will be append together as one parameter for the execution. As we know, most of the functions inside a smart contract need parameters to determine the program logic. However, we will not be able to know the exact value of function parameter or user input from EVM opcode during static analysis. Function parameter will be present as some symbolic variable during analysis. We will carry those symbolic variables as a constraint along with the path. When facing a branch in the path, we will need to know the condition that determines which path should the EVM choose to execute. If the condition is a symbolic variable, we will union all the constraints in the cycle path and use

input : instruction I, stack S, storage ST, memory MEM output: stack S, storage ST, memory MEM, gas G

1 import gasTable, z3;

2 def stackSimulation(I, S, ST, MEM ):

3 INS ← I.split(’ ’), OPCODE ← INS[0], G ← 0;

21 oldSize ← index.length // 32;

22 newSize ← ceil32(index+32) // 32;

23 memExtend ← (newSize − oldSize) ∗ 32;

24 MEM.extend([0] ∗ memExtend);

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

SMT solver to find if there is a likely optimal answer that satisfied all the constraints. In the end, apply the answer to those constraint, we will know if the cycle path is feasible and finally we can calculate the stack variation and total gas consumption on this path. We develop a program to symbolically simulate the behavior of the EVM stack by implement every EVM opcode. Stack size and gas consumption are both accumulating bases on rules. Below are some instructions with their rules. (s: Stack, m: Memory, S: Storage, G: gas table, g:gas consumption)

For arithmetic instructions like ADD and SU B, first, we pop the top two items on the stack as parameter x and parameter y. Second, do the arithmetic operation on x and y. Finally, we will get a new value, which is the result of the arithmetic, and push it on the EVM stack. Thus, we can tell that arithmetic instructions normally will decrease the EVM stack size by one. In addition, the gas consumption will add by a number base on table 3.

x = s.pop(0) y = s.pop(0) s0 ← s.push(x + y) g0 ← g + G(ADD)

(2)

x = s.pop(0) y = s.pop(0) s0 ← s.push(x − y) g0 ← g + G(SU B)

(3)

Some instructions such as M LOAD and M ST ORE will store or load data using the EVM memory. For M LOAD, it will load a value v index from memory location p to p + 32, and push it on the EVM stack. The state of the stack will be updated as s0 and

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

the memory will stay the same after executing the M LOAD instruction. M ST ORE has a totally different behavior from M LOAD. It will pop the top two items from the stack, p is the head of the storing index, and v is the value which is going to be stored in the memory. So after execute M ST ORE, a new value v will be stored at memory location p to p + 32. The state of the memory will be updated as m0 and nothing is going to push on the EVM stack. The gas consumption will be change following the rules on table 3.

p = s.pop(0) v = m[p...(p + 32)]

s0 ← s.push(v) g0 ← g + G(M LOAD)

(4)

p = s.pop(0) v = s.pop(0) m0 ← m[p...(p + 32)].insert(v) g0 ← g + G(M ST ORE)

(5)

Instructions for the EVM storage are similar to the memory instructions. SLOAD load the value v at index p, which is pop from top of the stack, from the storage and push it on the stack. SST ORE store v at index p of the storage. So SST ORE will update the state of the EVM storage once it was executed. Gas consumption for executing SLOAD will be concrete value. However, for SST ORE, gas consumption is a formula which is depend on the value of p and v (row 6 on table 4). During the symbolic simulation, we will pop the top two item on the stack represent p and v. If both the value are symbolic value, this formula will be the constraint for the gas consumption. Otherwise, we can calculate the exact value of the gas consumption for SST ORE. The gas consumption for

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

operating the EVM storage is relatively massive than other instructions.

p = s.pop(0) v = S[p]

s0 ← s.push(v) g0 ← g + G(SLOAD)

(6)

p = s.pop(0) v = s.pop(0) S0 ← S[p].insert(v) g0 ← g + G(SST ORE)

(7)

For jumping instructions, we want to know which block will be the destination after the jump. So the first step for executing J U M P and J U M P I will pop the top item from the stack, which is the label LABEL for the destination block. For J U M P I, the condition value will be pop from top of the stack. If the condition value is T RU E, LABEL indicate the destination block. As for F ALSE, the label of jump destination will be the next block after the block of J U M P I. Now we know the label of the next block, the simulation will be relocating to the new block after executing jump instruction.

label = s.pop(0) label0 ← label g0 ← g + G(JU M P ) (label : label of node block)

(8)

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

label = s.pop(0) condition = s.pop(0) if (condition = true)label0 ← label if (condition = f alse)next label ← label g0 ← g + G(JU M P I)

(9)

Other important instruction such as P U SH, P OP , DU P ∗ and SW AP ∗ will modified the state of the stack once it was executed. EVM is a state machine, so it rely on those instruction to shift the value on the stack for other instructions. For example, as mentioned early, CALLDAT ALOAD will load a user input during execution. If the user input data is being use multiple times, EVM will use DU P ∗ or SW AP ∗ to get the data to the top of the stack for further execution.

s0 ← s.push(X) g0 ← g + G(P U SH)

(10)

s0 ← s.pop(0) g0 ← g + G(P OP )

(11)

v = s.idx(∗) s0 ← s.push(v) g0 ← g + G(DU P ∗)

(12)

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

v = s.idx(∗) s.idx(∗) = s.idx(0) s.idx(0) = v g0 ← g + G(SW AP ∗)

(13)

5 Experiment

Despite the example provided in the last section, there are still lots of open source smart contract on blockchain that everyone can reach. Once a smart contract was mined by miners, the compiled bytecode will be recorded on the block. By using blockchain platform such as Etherscan, people can easily find those smart contract. Etherscan allows smart contract creator to upload their source code and the authenticity of code will be verified automatically. Therefore, all those smart contracts with additional information will be our experimental data. As mentioned in our analysis process, Solidity Remix will be our development tool and compiler due to the variety of information provided and fully functional it is.

5.1 Source Code Fetching

In order to get all the source code on the web-based blockchain explorer, we develop a content crawler, written in Python, to accelerate the download procedure. All the source code will be download and store in our database. The reachable smart contracts on Etherscan are limited, but once in a while, the platform will update block information and the new smart contract will be download simultaneously by the crawler.

相關文件