智能合約堆疊溢位偵測與靜態分析 - 政大學術集成
全文
(2) Static Stack Overflow Detection on Smart Contracts Chung Yu Huang August 26, 2018. 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. DOI:10.6814/THE.NCCU.MIS.024.2018.A05.
(3) Abstract Decentralized cryptocurrencies are based on a distributed ledger (a publiclyreadable record) of a sequence of transactions. Recently, applications based on blockchain technologies have grown fast in many fields in our life, not only in technical industries but also in others such as financial industries. Smart contracts are programs that can be triggered by blockchain transactions. The transaction logic is coding inside the smart contract and can be executed automatically. Transactions will be kept on blockchain and cannot be modified by others. However, just like other programs, there exist some dangerous actions that can make the smart. 政 治 大. contract vulnerable. It is essential for having a rigorous approach to checking the. 立. correctness of smart contracts. In this work, we investigate static analysis tech-. ‧ 國. 學. niques to detect vulnerabilities that could be exploited by malicious executions at runtime. Mainly, we focus on unexpected exceptions that occur inside a smart con-. ‧. tract, such as EVM stack overflow or massive consumption of gas, which may lead to abnormal termination or abortion of contract execution. We target on contract. y. Nat. sit. opcodes and build instruction level symbolic execution for gas consumption and. al. er. io. stack simulation. The analysis process consists of control flow graph construction. v i n detect cycles that may raise C stack size and generateU h e n g c h i their path constraints for execution accordingly. By solving the path constraints, we identify the inputs to trigger n. with gas and stack status associated blocks. We then apply depth-first search to. the execution and detect potential vulnerabilities or massive gas consumption. Our model checking approach benefits from soundness from formal reasoning, as well as automation from systematic symbolic execution. We report our analysis results against various contracts on Etherscan.. DOI:10.6814/THE.NCCU.MIS.024.2018.A05.
(4) Contents Abstract 1 Introduction. 1. 2 Related works. 3. 2.1. Blockchain and Smart Contract Applications . . . . . . . . . . . . . . . . .. 3. 2.2. Smart Contract Security . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3. 2.3. Security Issue Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5. 3 Motivating Example. 立. 政 治 大. 7. Cycle Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3.2. Stack Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11. ‧. ‧ 國. 學. 3.1. 4 Methodology. 9 9. Nat. y. 13. 4.2. Control flow graph construction . . . . . . . . . . . . . . . . . . . . . . . . 16. 4.3. Stack overflow detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18. 4.4. Gas Estimation. 4.5. Stack-based Symbolic Simulation . . . . . . . . . . . . . . . . . . . . . . . 23. al. er. sit. Contract opcode extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 13. io. 4.1. n. v i n C . . . . . . . . . . U. . . . . . . . . . . . . . . . . . . . . . .h engchi. 5 Experiment. 22. 30. 5.1. Source Code Fetching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30. 5.2. Source Code Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30. 5.3. Compile Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31. 6 Conclusions and Future Work. 33. References. 35. DOI:10.6814/THE.NCCU.MIS.024.2018.A05.
(5) List of Figures 1. CFG of cycle path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10. 2. Smart Contract Analyze Process . . . . . . . . . . . . . . . . . . . . . . . . 14. 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. DOI:10.6814/THE.NCCU.MIS.024.2018.A05.
(6) List of Tables 1. Symbolic Simulation on Stack . . . . . . . . . . . . . . . . . . . . . . . . . 12. 2. Instruction set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15. 3. Gas Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19. 4. Gas Cost With Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23. 5. Tested Smart Contracts from Ethereum Blockchain . . . . . . . . . . . . . 31. 6. Tested Smart Contracts from III . . . . . . . . . . . . . . . . . . . . . . . . 33. 7. Maximum Gas Consumption for Smart Contracts from Ethereum Blockchain 34. 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. DOI:10.6814/THE.NCCU.MIS.024.2018.A05.
(7) 1. Introduction. The idea of smart contracts was first mentioned in an article written by Nick Szabo [1] in 1997. Adding an additional software layer between clients and blockchain, a distributed ledger and publicly-readable record of a sequence of transactions, which directed the requests of client to some scripts, perform the logic needed to provide a various aspect of service, are appealing in many categories, especially in those which require transfers of money between accounts. Ethereum is a decentralized platform that runs smart contracts, with a Turing-complete. 政 治 大 from non-termination. The most widely used is a high-level programming language for 立 virtual machine which charges clients for contract running times to protected program. ‧ 國. 學. developing a smart contract, called Solidity[2]. Smart Contract and their effects are recorded in the blockchain history, and they can be used to encode or execute agreements. ‧. between parties. For example, an organization invoking a smart contract could cause cryptocurrency to be transferred to another organization or could record a state change. y. Nat. sit. which makes the other party eligible to invoke other transactions.. n. al. er. io. There are many platforms for smart contract available other than Ethereum such as. i n U. v. Counterparty[3], a platform embeds its data into Bitcoin transactions. Stellar[4], features. Ch. engchi. a public blockchain with its own cryptocurrency, governed by a consensus algorithm. Monax supports the execution of Ethereum contracts and Lisk[5], a public blockchain with a delegated proof-of-stake consensus mechanism. While Etherum’s smart contracts have reached tens of thousands of contracts with billions of dollars-worth of virtual coins, enhancing security of smart contracts is of paramount importance. Due to the openness of blockchain and the smart contract, potentially dangerous compositions of trusted and untrusted code may occur when anonymous users or programs call into the public methods of other applications. The Etherum community has been focusing on systematic mechanisms preventing the similar errors. Furthermore, a contract may leak money in corner cases or encounter some logic errors in encoding state machines. Furthermore, Ethers may have lost due to those unex-. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 1.
(8) pected exceptions. Miners serially execute smart contracts before appending them to the blockchain. Some malicious behavior will make use of the intermediate state the caller is in which may not be immediately apparent when writing the contract. Such scenario is call reentrancy that is a well-known vulnerability with TheDao hack[6], where the attacker exploited the reentrancy vulnerability to steal over 3,600,000 Ether, or 60 million US Dollars at the time the attack happened. Therefore, to better understand the behavior of a smart contract before running the program, reduce the risk of loss, we investigate static analysis and model checking tech-. 政 治 大. niques in this part with the aim of providing a sound approach to ensure the correctness of smart contracts at the program level. Specifically, we propose the very first automatic. 立. model checking approach that combines static flow analysis and automata-based numer-. ‧ 國. 學. ical and string analysis techniques directly applied on smart contract source codes and opcodes. Using the model, we can either calculate the stack size variation or the gas con-. ‧. sumption during the execution. From the calculating result, we can tell if the program. sit. y. Nat. contained unexpected exceptions, such as stack overflow or block gas exceeding the upper. io. er. limit , before executing the smart contract. We believe with more information, more confidently the user will feel when using the smart contract.. n. al. Ch. i n U. v. The remainder of the paper is organized as follows. In Section 3, we list some recent. engchi. researches on the blockchain. From those previous works, we found there were still many issues about blockchain security. We will use a short example to illustrate the static analysis procedure we proposed for smart contract vulnerability detection in Section 4. Section 5 describes the process we proposed to convert the Solidity grammar into a more regular form. In Section 6, we present the result by using the analyze process mentioned in section 5 on some source code we gathers from the blockchain network. Finally, Section 7 presents the conclusions and outlines ideas of future work.. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 2.
(9) 2. Related works. 2.1. Blockchain and Smart Contract Applications. According to Bartoletti’s et al.[7] research, they offer the first systematic exposition of the security vulnerabilities of Ethereum and its high-level programming language, Solidity. The semantics of Solidity and the intuition of programmers being misalignment seems to be the major vulnerabilities in developing smart contract. That’s because Solidity implements some of these features in an unusual way.. 治 政 which organized smart contract into five categories, reflects 大 their application domain. In 立 financial category, program manage, gather or distribute money. For instance, ColorCoins In addition, from their analysis, usage of smart contracts, proposing a classification. ‧ 國. 學. [8], it certifies the ownership of asset, endorse its value, and keep track of trades on Bitcoin. The Ethereum DAO[9] project create crowdfunding services, gathering money. ‧. from investors so users can fund projects.. sit. y. Nat. Contracts in notary category take advantage of the immutability of the blockchain. io. er. to store some data persistently or certify their ownership. For instance, Monegraph[10]. al. allow clients to declare copyrights on photos or music. Other categories like game, gath-. n. v i n ers contracts which implement C some games which mix chance and skill. On the other hengchi U hand, wallet enable users to send transactions and manage money all together in just one application. And finally, library implement general-purpose operations to be used by other contracts. Ethereum are mainly focusing on financial contracts. Mostly, the design pattern of financial contract are token, authorization, and time constraint.. 2.2. Smart Contract Security. Due to the openness of Ethereum, potentially dangerous compositions of trusted and untrusted code may be occurred when anonymous users or programs call into the public methods of other programs. Delmolino et al.[11] summarized some common programming errors noticed in smart. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 3.
(10) contracts. For example, a contract may leak money in corner cases or encounter some logic errors in encoding state machines. Furthermore, Ethers may have lose due to those unexpected exceptions. Smart contracts are serially executed by miners before appending them to the blockchain. None concurrency limits system throughput and fails to exploit recent concurrent multicore and cluster architectures. Therefore, Dickerson et al.[12] propose to allow miners to execute contract codes in parallel by adapting techniques from Software Transactional Memory (STM)[13]. Luu et al.[14] identify some security vulnerabilities and pitfalls in the Ethereum smart. 政 治 大. contract which allow malicious miners or users to exploit and gain profit. First, in blockchian system, miner who mines the block decide the order of these transactions.. 立. As a result, the final state of a contract depends on how the miner orders the transactions. ‧ 國. 學. invoking it. Thus, the state of the contract that invoke at and the actual state execution happens may be inconsistence. Such contracts are call as transaction ordering dependent. ‧. (or TOD) contracts. A benign invocation to the contract only cause an unexpected result,. sit. y. Nat. but a malicious one can exploit the TOD contracts to gain more profits, even steal money.. io. er. In addition, timestamp-dependent problem happened when miner try to modify the block timestamp by changing the time on its local mining machine to be a specific value. n. al. Ch. i n U. v. within 900 seconds which influences the value of the timestamp-dependent condition inside. engchi. the contract and somehow favor the miner.. Lastly, when a contract calls another, the current execution waits for the call to finish. Some malicious behavior will make use of the intermediate state the caller is in which may not be immediately obvious when writing the contract. Such scenario is call reentrancy that is a well-known vulnerability with TheDao hack, where the attacker exploited the reentrancy vulnerability to steal over 3,600,000 Ether, or 60 million US Dollars at the time the attack happened.. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 4.
(11) 2.3. Security Issue Detection. In order to detect those security issues on Ethereum smart contracts, Luu et al. build a symbolic execution tool called Oyente[15] to find potential bugs pre-deploy. Oyente ran on 19,366 smart contracts and found that 8,833 contracts potentially have bugs. These contracts currently have a total balance of about 3,068,654 million Ethers, approximately equivalent to 30 million USD. From a different perspective, K. Bhargavan et al.[16] outline a framework to analyze and formally verify Ethereum smart contracts using F*[17], a functional programming. 政 治 大 system to generate automated queries for a SMT solver that can then statically verify 立 language targeted at program verification. The F* language comes with a rich type. such properties of contracts.. ‧ 國. 學. Other than Oyente, there are many alternative tools for smart contract security check-. ‧. ing. Motivated by the security issues and recurring financial losses in smart contracts, A.Dika[18] provide a taxonomy of all known security issues and security code analy-. y. Nat. sit. sis tools used to identify smart contract vulnerabilities. Based on previous attacks on. er. io. Ethereum smart contracts, they also categorizes all known vulnerabilities within their ar-. al. n. v i n C hanalysis tools onUEthereum. the investigation of security code engchi. chitectural and severity level. By assessing their effectiveness and accuracy, they conduct They analyze four ma-. jor Ethererum smart contracts security detecting tools, Oyente, Securify[19], Remix, and SmartCheck[20]. We can tell from the research result that there are overall inconsistencies between the tools on different security properties. SmartCheck obviously is the most effectiveness tool. Whereas in terms of accuracy, Oyente give the best performance. Also, they propose future improvements within the user interfaces, interpretation of results, and additional vulnerability checks. There are several other approaches to analyze security of smart contracts. Not only using source code, some existing tools turn to find out the problems from EVM bytecode. Porosity[21] is a decompiler for EVM bytecode that generates readable Solidity syntax contracts. Enabling static and dynamic analysis of such compiled contracts.. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 5.
(12) KEVM[22], the first fully executable formal semantics of the bytecode in which smart contracts are executed. The K framework, a semantic definition of EVM for use in verifying properties of smart contracts, not only passes the official test suite for EVM implementations, but also reveals ambiguities and potential sources of error in the existing on-paper formalization of EVM semantics. These properties make it an ideal formal reference implementation. S.Amani et al.[23] extend an existing EVM formalisation in Isabelle/HOL by a sound program logic at the level of bytecode. They target unstructured bytecode rather than. 政 治 大. a high-level programming language. They structure bytecode sequences into blocks of straight-line code and create a program logic to reason about these.. 立. To promote the development of secure smart contracts, A.Mavridou et al.[24] have. ‧ 國. 學. created the FSolidM framework, that allows developers to define contracts as finite state machines (FSMs)[25] with strict and clear semantics. It provides an user friendly editor. ‧. for specifying FSMs, a code generator for creating smart contracts, and a set of plugins. sit. y. Nat. that developers may add to their FSMs to enhance security and functionality. There goal. io. er. is to helps developers to create smart contracts that are secure by design. For some popular scripting languages used in existing cryptocurrencies such as Bitcoin,. n. al. Ch. i n U. v. Nxt[26] and Ethereum, Seijas, Pablo Lamela et al.[27] survey different approaches, and. engchi. give an overview of critiques of existing languages. They also cover technologies that might be used to strengthen extensions and innovations in scripting and contracts, including technologies for verification, such as zero knowledge proofs[28], proof-carrying code[29] and static analysis, as well as approaches to making systems more efficient, e.g. Merkelized Abstract Syntax Trees[30]. Cryptocurrency is a virtual form of money but it create a total different transacting method due to the power of blockchain. However, same as concrete currency, trust between user and blockchain system has to be established to make sure people won’t have to face the situation that their money were stolen due to system vulnerability. Public blockchain may be the most used chain in recent ecosystem because of the ability for. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 6.
(13) every person to join the network. To use public chain, user has to obey those basic rules created by the creator of public chain. So there might be some limitation to prevent any unintentional mistakes or malicious attack. The transaction amounts and total capital will be rich in public blockchain. That is why hackers might try to get benefit from it. Rather using public chain, an enterprise or a group of company might be interested to using private chain as there blockchain infrastructure. Unlike public chain, private chain only allowed specific user, such as company’s employee, to access the network. Transaction and total capital will be relatively low because of less participants, but the flexibility. 政 治 大. is higher so there might be less restriction. Although more customizable, the problem of exceeding the limitation of blockchain system might happened.. 立. Based on previous research, we want to check whether a smart contract contains any. ‧ 國. 學. positive cycle that might cause EVM stack to overflow during execution. EVM stack has 1,024 frames. When EVM execute an instruction, it will push or pop certain instructions.. ‧. By keep executing the part of positive cycle in a smart contract, stack size will eventually. sit. y. Nat. exceed the upper bound and cause an exception which makes smart contract terminate. io. er. with all data rolling back to initial state. But, to make a stack overflow, we need enough gas to consume for the execution. During creation of private chain, gas limit can be. n. al. Ch. i n U. v. customize to any possible number. So in a private chain environment with enormous gas. engchi. limit, it is possible that stack overflow occur in a smart contract. So our approach is to use static analysis strategies to find out the stack increment and estimate gas consumption when execute positive cycle.. 3. Motivating Example. We propose automatic model checking techniques on smart contracts. Our model checking approach benefits from soundness like formal reasoning approaches, as well as automation like symbolic execution approaches. This is achieved by conducting automatic reachability analysis on the opcode-level control flow graphs along with arithmetic and string automata.. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 7.
(14) 1 pragma solidity ^0.4.10; 2 3 contract SimpleMember { 4. struct member {. 5. uint id ;. 6. string name ;. 7. }. 8. mapping ( address = > member ) memberlist ;. 9. address [] public members ;. 10. function addMember ( address addr , uint memberId , string memberName ). 政 治 大. public {. member storage m = memberlist [ addr ];. 12. m . id = memberId ;. 13. m . name = memberName ;. 14. members . push ( addr ) ;. ‧ 國. 立. 學. 11. }. 16. function removeMember ( uint memberId , uint len ) public {. ‧. 15. 18. if ( i > members . length ) { break ;. n. al. er. io. 19. sit. for ( uint i = 0; i < len ; i ++) {. 20. }. 21. else {. y. Nat. 17. Ch. engchi. i n U. v. 22. address memberFind = members [ i ];. 23. if ( memberlist [ memberFind ]. id == memberId ) {. 24. memberFind . transfer (100) ;. 25. delete members [ i ];. 26. }. 27. assembly {. 28. timestamp. 29. }. 30. }. 31 32. } }. 33 }. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 8.
(15) The automata associated with each program point characterize all potential values of target variables on executions up to that program point. The reachability analysis may not terminate when such states are infinite, but when it terminates, the automata represent a fixpoint of reachable states on values of target variables. We can then apply automata operations for checking properties that are related to target variables. Compared to symbolic executions on observable paths, our objective is to have a sound approach to compute all potential status and behaviors of smart contracts. This is achieved by conducting the reachability analysis on the control flow graphs that abstract. 政 治 大. all potential execution paths. This challenges our opcode analysis to having a sound approximation on flows but tight enough to prevent a high false positive rate.. 立. In this section, we will use a short example to demonstrate how we analysis smart. ‧ 國. 學. contract. Contract SimpleM ember is an example code written in high-level language, Solidity.. ‧. Cycle Detection. sit. y. Nat. 3.1. er. io. For the first step, we compile above example source code into opcode and use the opcode to. al. v i n Cinhthe example source reveal that one cycle is contained e n g c h i U code, starting from line 21 to 30. n. construct control flow graph. After running the analysis program, our analyzer’s algorithm. Figure 1 is the output of the analyzer, which is a control flow graph of the cycle part. A cycle may cause some unexpected things to happen during the execution. That is to say, if the code of that cycle part is executing multiple times, an exception might appear and make the whole process stopped. We will use the control flow graph of the cycle path for further analyze and finally we will calculate the stack size usage and gas consumption to confirm our hypothesis.. 3.2. Stack Simulation. So as to know the precise value of the stack size variation and the gas consumption, we simulating stack state in our process to find the answer. By implement all the EVM. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 9.
(16) 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. Figure 1: CFG of cycle path. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 10.
(17) instructions in our program’s algorithm, the final results of simulation show that executing the cycle part will increase the stack size by one and consume 1272 gas, per cycle. Table 1 is the result of symbolic simulation on the EVM stack. We can see there are two path constraints for the loop path. If all the constraints have a possible answer, this loop path will have a chance of being executed during a transaction. For unknown parameter one, if it matches the function hash of function removeM ember, then the execution path will jump to the cycle path. Simulate through the cycle path, we encounter the second unknown parameter, which is the parameter len of function removeM ember. Parameter. 政 治 大. len determines the time of iteration of the loop. After simulating the cycle path for 1024 times, which make the EVM stack overflow, the path constraint with parameter len is. 立. ((1023 < P arameter2) == F alse) == F alse. That is to say, if we set the parameter len. ‧ 國. 學. over 1023, the EVM stack will overflow and thus cause the execution to stop.. ‧. 3.3. Results. Nat. sit. y. According to Ethstat.net [31] at the time of writing, current gas limit of each block is. er. io. approximately 8,000,000 gas. So if we set the function parameter of the cycle we found. al. v i n code, an exception of exceeding C gashlimit is going to happen. From another perspective, if engchi U n. to a number over 6290, which means there are more than 6290 members using this source. this situation is happening in a private blockchain environment, according to the creator. of the private blockchain, the block gas limit could be set to a vast number way over 8 million to prevent gas limitation assertion. Thus, setting the function parameter len of function removeM ember in the example SimpleM ember to a number over 1024 will cause the program to execute cycle part over 1024 times, which will trigger stack overflow exception, base on our analyze results. So that is an example to demonstrate how a few lines of code inside a smart contract make an impact on the execution and the result.. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 11.
(18) Table 1: Symbolic Simulation on Stack Opcode PUSH 60 MSTORE PUSH 0 CALLDATALOAD PUSH 10000000. . . SWAP1 DIV AND. Constraints X X X X X X X X. DUP1. X X 立 X. ... X X X .... y. n. al. [’(FFFFFFFF &(Parameter/1000...))’] ... [36; 4;...] [’Parameter2’;36;...] [36; ’Parameter2’;...] .... sit. X. er. ‧ 國 io. ... SWAP1 CALLDATALOAD SWAP1 ... tag 15 JUMPDEST DUP3 DUP3 LT ISZERO PUSH [tag] 16 JUMPI ... POP POP PUSH [tag] 15 JUMP. ‧. X (5CB2384C= (FFFFFFFF& (Parameter/1000... )))==True. Nat. tag 2 JUMPDEST. 政 治 大. 學. PUSH 5CB2384C EQ PUSH [tag] 2 JUMPI. Stack [60] [] [0] [’Parameter’] [1000...; ’Parameter’] [’Parameter’; 1000...] [’(Parameter/1000...)’] [’(FFFFFFFF& (Parameter/1000...))’] [’(FFFFFFFF& (Parameter/1000...))’;...] [5CB2384C;...] [’(5CB2384C=(FFFFFFFF...))’;...] [2;...] [’(FFFFFFFF &(Parameter/1000...))’]. Ch. engchi. X X X X X X ((0<Parameter2) ==False)==False ... X X X X. i n U. v. [0; 0;...] [’Parameter2’;0;...] [0; ’Parameter2’;0;...] [’(0<Parameter2)’; 0;...] [’((0<Parameter2)==False)’; 0;...] [22; ’((0<Parameter2)==False)’;...] [0; 0;...] ... [0;0;...] [0;...] [21;0;...] [0; 1;...]. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 12.
(19) 4. Methodology. The EVM opcode runs on a stack-based virtual machine. Unlike ArmV7 instructions for iOS mobile applications, the EVM uses a fixed size stack of 1,024, rather than registers. We need to keep track of values in the stack to model the flow precisely. In the first phase, we gather some smart contracts from Ethereum blockchain and compile them into EVM opcode. Once we have the EVM opcode, we investigate static flow analysis techniques and develop our own analysis tool to generate the control flow graph directly from the EVM opcode. EVM opcode has around 70 kinds of instructions. Using the control flow graph,. 政 治 大 If the program contains any positive cycle, the EVM opcode and the control flow graph 立. we will traverse through all the nodes inside the graph to find the potential positive cycle.. ‧ 國. 學. fragment will be extracted for further analysis, such as gas consumption calculation and symbolic simulation on the stack. Note that the graph may not be acyclic since jumps. ‧. may result in cycles due to recursive calls or loop.. To know more details about the process of EVM opcode execution, we keep track of. y. Nat. sit. values in the stack to model the flow precisely. In the first phase, we gather some smart. n. al. er. io. contracts from Ethereum blockchain and compile them into EVM opcode. With the. i n U. v. EVM opcode, we investigate static flow analysis techniques and develop our analysis tool. Ch. engchi. to generate the control flow graph directly from the EVM opcode. Figure2 is our analysis process. In the following content, we will explain the idea of each analysis procedure.. 4.1. Contract opcode extraction. Most of the information about Ethereum blockchain can be found on Etherscan[32]. It is a powerful blockchain explorer and platform which contain lots of valuable information about Ethereum. For example, recent market capital, transactions status, blocks detail and smart contract source code. It will be our data sources for the experiment part. With the smart contract source code, our first step is to compile them into the EVM opcode. There are two major approaches to convert source code into bytecode or EVM opcode. The first one is using Solidity compiler(solc)[33], all you need to do is downloading the. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 13.
(20) 立. 政 治 大. ‧ 國. 學 ‧. Figure 2: Smart Contract Analyze Process. y. sit. Nat. package from the Internet and install it on your computer.. io. er. After installed, use solc command to compile smart contract into the EVM opcode. With argument –asm or –asm-json, you can generate the EVM opcode from the smart. n. al. Ch. i n U. v. contract source code using the solidity compiler. If you want to get more information. engchi. about this source code, just use the help argument for more details about the compiler. However, we prefer to use another option in our analysis process. Without installing anything, just a web browser with Solidity Remix IDE[34], you can compile it right after finishing writing the source code and deploy it onto either the testing, public or private blockchain. There are some unusable parts inside the output from Solidity Remix IDE, which is called the pre-loader of the smart contract. Those opcode will be trigger during the deploy of the smart contract. Because we do not deploy the smart contract during our analysis, so those pre-loader opcodes will be ignore. Therefore, we will do some preprocessing to extract the most important fragment of the EVM opcode from the result of the compiled opcode. EVM opcode is a set of operations that tell us how the logic of source code work. In. DOI:10.6814/THE.NCCU.MIS.024.2018.A05. 14.
(21) Table 2: Instruction set. 政 治 大. n. Ch. δ α 1 1 4 0 1 1 0 1 0 1 0 1 0 1 0 1 1 0 1 1 2 0 2 0 1 1 2 0 1 0 2 0 0 1 0 1 0 1 0 0 0 1 * *+1 *+1 *+1 2 0 3 0 4 0 5 0 6 0 3 1 7 1 7 1 2 0 6 1 NA NA 1 0. y. sit. io. Mnemonic EXTCODESIZE EXTCODECOPY BLOCKHASH COINBASE TIMESTAMP NUMBER DIFFICULTY GASLIMIT POP MLOAD MSTORE MSTORE8 SLOAD SSTORE JUMP JUMPI PC MSIZE GAS JUMPDEST PUSH* DUP* SWAP* LOG0 LOG1 LOG2 LOG3 LOG4 CREATE CALL CALLCODE RETURN DELEGATECALL INVALID SELFDESTRUCT. er. Nat. al. α 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1. ‧. ‧ 國. 立. δ 0 2 2 2 2 2 2 2 3 3 2 2 2 2 2 2 2 1 2 2 2 1 2 2 0 1 0 0 0 1 0 3 0 3 0. 學. Mnemonic STOP ADD MUL SUB DIV SDIV MOD SMOD ADDMOD MULMOD EXP SIGNEXTEND LT GT SLT SGT EQ ISZERO AND OR XOR NOT BYTE SHA3 ADDRESS BALANCE ORIGIN CALLER CALLVALUE CALLDATALOAD CALLDATASIZE CALLDATACOPY CODESIZE CODECOPY GASPRICE. engchi. i n U. v. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 15.
(22) addition, EVM is a stack machine. Every instruction will use the identical stack to push or pop item to calculate to an exact value. After executing an instruction, the stack state will be changed depending on each instruction. Table 2 show how each instruction change the state of the stack size. Follow the instruction set, we can find out how many items will be place on stack (α) or remove (δ) when the instruction executed. For example, ADD instruction will pop two times from the stack, using the first and second value on stack to calculate the sum of those two value, and push the result answer to the stack. So after the EVM executed the instruction ADD, the stack size will decrease by one. Take another. 政 治 大. instruction for example, DU P ∗ will pop * items from stack, which depend on the * value. If the * value equal to three, it will pop the top three value from stack and duplicate the. 立. third item. Then push all the value back to the stack along with the duplicated item.. ‧ 國. 學. Therefore, DU P will increase the stack size by one. For further analysis, we want to transform those program logic into control flow graph, so by traversing through it, we can. ‧. construct the flow of the source code and tell if any malicious things are hiding inside the. y. sit. Control flow graph construction. al. er. io. 4.2. Nat. source code.. n. v i n C h It starts toUexecute the EVM opcode from the EVM opcode is executing sequentially. engchi top of the opcode file. When it encounters a jump instruction, it will change to another position. So after having the EVM opcode, we will load instruction from the beginning of the opcode file once at a time and build the graph. Eventually, it will be able to describe how this smart contract works. During analysis, we will read instructions from a preprocessed file and divide them into groups following these rules: 1. JUMPDEST indicates a jump destination and the beginning of a block. 2. JUMP, JUMPI, STOP, RETURN, REVERT and INVALID indicate the end of a block. 3. All remaining instructions.. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 16.
(23) Algorithm 1: CFG Construction input : Opcode File in txt format output: CFG nodes and edges in list format. 6 7 8 9 10. 12 13. 15 16. 18. 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36. al. n. 20. io. 19. Nat. 17. ‧. 14. 學. 11. 立. 政 治 大. y. 5. sit. 4. er. 3. def cfgConstruction(inputOpcode): N ← node list, E ← edge list, C ← node content ; tagNum ← 0, prevIns ← ’ ’, prevTag ← 0 ; fromJumpi ← False ; blockIns ← [’tag’, ’JUMP’, ’JUMPI’, ’STOP’, ’REVERT’, ’INVALID’, ’RETURN’]; open(inputOpcode, ’r’) as F: for idx, ins in F : S ← ins.split(’ ’) ; if S[0] in blockIns: if S[0] == ’tag’ : if C == ’ ’ : tagNum ← S[1]; C.append(ins); else: N.append((tagNum, ’label’: C)); if fromJumpi == True: E.append((prevTag, tagNum)); fromJumpi ← False; C ← ’ ’; C.append(ins); prevTag ← tagNum; tagNum ← S[1]; E.append((prevTag, tagNum)); if S[0] == ’JUMP’ or ’JUMPI’ : C.append(ins); N.append((tagNum, ’label’: C)); jumpTag ← prevIns[2]; E.append((tagNum, jumpTag)); C ← ’ ’; if S[0] == ’JUMPI’ : prevTag ← tagNum; fromJumpi ← True; tagNum ← idx + 10000; else: C.append(ins); return N, E;. ‧ 國. 1 2. Ch. engchi. i n U. v. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 17.
(24) Our algorithm for constructing the control flow graph (1) is based on this rules. In the algorithm, in line 5, we use a list to record that special instruction. If the instruction being read is one of the special instruction, for example (line 24 to line 33), rule two, the JU M P or JU M P I instruction will mark ask the end of a execution block, thus a new block will be appended to the list of node with label (opcodes for the block) and a new edge which starts from the current block to the jumping destination will also be appended to the list of edge. Another tag f romJU M P I will be marked as true for the instruction JU M P I because it will branch the execution path for the different condition. So after. 政 治 大. executing JU M P I, two new blocks will be appended to the list of the node. One for the truth condition path, and one for the false condition path. In algorithm line 10 to line. 立. 23, JU M P DEST instruction will be the first instruction inside a block. Furthermore,. ‧ 國. 學. if the block is not ended with JU M P or JU M P I, we ill append a new block into the list of node and a new edge from the last block to the current block when we execute the. ‧. JU M P DEST instruction. For none special instruction (rule three), they will be record. sit. y. Nat. and append as the block’s label (line 34 to line 35).. io. er. Using our algorithm we will be able to construct EVM opcode into control flow graph. Inside the graph, EVM opcode is grouping as a block and at the same time, the variation. n. al. Ch. i n U. v. of stack size and the total gas consumption of this instruction group will be calculate and. engchi. recorded underneath the block. How we calculate the stack size and the gas consumption for each block is based on table 2 and table 3. Here we use Graphviz[35], an open source control flow graph construct and visualization module, to generate a control flow graph from EVM opcode. By using this module, all the nodes and edges will be stored in a list data structure individually, and we can plot out the entire graph to check if there is any vulnerability inside it.. 4.3. Stack overflow detection. Using the control flow graph, we want to find out if any positive cycle is contain in the graph. We use the depth-first-search algorithm to traverse through the graph. Depth-first. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 18.
(25) Table 3: Gas Cost. 政 治 大. y. sit. io. n. Ch. er. Nat. al. Mnemonic EXTCODESIZE EXTCODECOPY BLOCKHASH COINBASE TIMESTAMP NUMBER DIFFICULTY GASLIMIT POP MLOAD MSTORE MSTORE8 SLOAD SSTORE JUMP JUMPI PC MSIZE GAS JUMPDEST PUSH* DUP* SWAP* LOG0 LOG1 LOG2 LOG3 LOG4 CREATE CALL CALLCODE RETURN DELEGATECALL INVALID SELFDESTRUCT RETURNDATACOPY. ‧. ‧ 國. 立. Gas Used 0 3 5 3 5 5 5 5 8 8 *f 5 3 3 3 3 3 3 3 3 3 3 3 *f 2 400 2 2 2 3 2 *f 2 *f 2 2. 學. Mnemonic STOP ADD MUL SUB DIV SDIV MOD SMOD ADDMOD MULMOD EXP SIGNEXTEND LT GT SLT SGT EQ ISZERO AND OR XOR NOT BYTE SHA3 ADDRESS BALANCE ORIGIN CALLER CALLVALUE CALLDATALOAD CALLDATASIZE CALLDATACOPY CODESIZE CODECOPY GASPRICE RETURNDATASIZE. engchi. i n U. v. Gas Used 700 *f 20 2 2 2 2 2 2 3 3 3 200 *f 8 10 2 2 2 1 3 3 3 *f *f *f *f *f 32000 700 700 0 700 NA 5000 *f. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 19.
(26) search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node (the entry point of the opcode will be our root node) and explores as far as possible along each branch before backtracking. During the traversal, if any block is visiting more than one time, then there is a cycle in this path. Also, we will accumulate the stack size usage and gas consumption during depth-first-search traversal to calculate if the path is a positive cycle. For instance, if a block’s edge is a backward visit edge, that is a cycle contained in the graph. We will check the accumulate stack size, and if the number is increasing after visit the block of. 政 治 大. the backward edge, then this cycle is a positive cycle, that means traverse this cycle path too many time will make stack overflow happen. On the other hand, if the path is not. 立. a cycle, we will proceed to visit those unvisited block, and in the end, all blocks will be. ‧ 國. 學. visited and checked.. al. y. n. 5. io. 4. sit. 3. Nat. 2. def cycleDetection(N, E ): V ← []; RS ← []; SN ← N[0]; DFS(SN, N, E, V, RS);. er. 1. ‧. Algorithm 2: Cycle Detection input : Two lists; nodes list N, edges list E. Ch. engchi. i n U. v. In the beginning of the algorithm (algorithm 2), we will initiate two lists, visited node and record of the stack. Total five parameter will be use as input of the DFS algorithm (algorithm 3. Here is the details about the DFS positive cycle checking process. We break the process into four steps. 1. Put the block (father block) into the visited list and record stack(line 5 to line 6). 2. Find the next block (child block) that is connected to the father block(line 7 to line 11). 3. If the child block is not inside the visited list, go back and do step one(line 12 to line 14).. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 20.
(27) 4. If the child block is inside the visited list and stack, then a cycle is found. At the same time, sum up the stack size usage, if the number is bigger than the previous number, this path is a positive cycle(line 15 to line 22). Following the above steps, we can detect any positive cycle inside the control flow graph. Using the path, we want to check if it is reachable when the smart contract is executing. Maybe the positive cycle path is not going to be executed over 1024 time due to the limitation of the parameter of the loop condition. Or the gas consumption is going to reach the gas limitation before the EVM stack overflow. As the result, the positive. 政 治 大 more accurate checking on the stack state and gas consumption. 立. cycle path is extracting from the original control flow graph after the DFS checking for a. 9 10 11 12 13 14 15 16 17 18 19 20 21 22. ‧ 國. al. n. 7 8. io. 6. Ch. engchi. y. sit. 5. er. 4. Nat. 3. def depthFirstSearch(SN, N, E, V, RS ): SS ← 0; SI ← 0; GC ← 0; V.append(SN); RS.append(SN); for e in E : if e[0][1] == SN[0] : for n in N : if n[0] == e[0][1] : NN ← n; if NN not in V : if DFS(NN, N, E, V, RS): break; elif NN in RS : TS ← SS; if SS > 0 : DFS(NN, N, E, V, RS); if SS > 1024 : return True; else: return True;. ‧. 1 2. 學. Algorithm 3: Depth-First-Search input : staring node SN, nodes list N, edges list E, visited list V, record stack RS. i n U. v. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 21.
(28) 4.4. Gas Estimation. Gas is the unit in Ethereum ecosystem for counting transaction fee. If someone wants to transfer cryptocurrencies, using a function inside a smart contract will be an option. When triggering a function inside a smart contract, a transaction fee is calculated by gas usage multiply with gas price. Gas price is a custom parameter defined by the user who wants to make a transaction by smart contract. Higher the gas price is, faster the miners will mine the transaction due to higher transaction fees they will get. On the other hand, the gas used by a transaction is determining by the functional complexity. Executing a. 政 治 大. complicated function cost more fees than a short one because EVM calculates the gas by. 立. opcodes involved in the function.. As we know in blockchain system, each block will contain several transactions. Since. ‧ 國. 學. there is an upper bound of total gas used for each block to prevent smart contract con-. ‧. sumes too many gases that cause the total gas used to exceed the upper bound, blockchain system will reject the request to deploy the smart contract to the block. Besides, when. y. Nat. sit. the user wants to use a smart contract to make a transaction, they have to set a gas. er. io. limit, a value of estimation that this transaction might consume. If the actual gas used is. al. n. v i n transaction and still charge theC user fee it cost. h ethentransaction gchi U. larger than the gas limit, EVM will rollback all the status that had changed during this. In most case, the user can set the gas limit to a very high value, so the chance of. outnumbering is relatively low. However, as we discovered, many smart contracts contain a function with a loop inside. Moreover, the condition value which determines the precise iteration of this loop in most of the smart contract is a function parameter or a local variable. Sometimes we do not even know the exact value when we trigger this function. That is to say, if we can estimate how much gas will be used by each function inside the smart contract, we will get a reference gas value when setting the gas limit. Each opcode has its specific gas consumption. Table 3 shows how much gas will be used during executing of each instruction. Most opcode consume a specific number of gas. It is easy to calculate the gas consumption for those opcodes.. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 22.
(29) Table 4: Gas Cost With Formula Instruction EXP(x y) SHA3(x y) CALLDATACOPY(x y z) CODECOPY(x y z) EXTCODECOPY(x y z a) SSTORE(x y) LOG0(x y) LOG1(x y a) LOG2(x y a b) LOG3(x y a b c) LOG4(x y a b c d). Gas Formula if(exp==0)?10:10+(10*(1+log256(xy))) 30+6y 2+3z 2+3z 700+3z if(x=0 and y!=0)?20000:5000 375+8y 2*375+8y 3*375+8y 4*375+8y 5*375+8y. 政 治 大 However, some EVM opcode consumes gas based on its parameter (Table 4). 立. For. instance, instruction CALLDAT ACOP Y take three parameters t, f and s to copy s. ‧ 國. 學. bytes from ”calldata” at position f to memory at position t.. (1). Nat. sit. y. ‧. 2 + 3s. io. er. If we want to find out the exact value of the gas consumption on the positive cycle path,. al. we have to know the parameter for all those instruction in order to calculate the answer. n. v i n So we will C convert that gas formula into the symbolic form like the hengchi U. of the gas formula.. equation above and use stack-based symbolic simulation to get the parameter for the gas formula.. 4.5. Stack-based Symbolic Simulation. Symbolic execution is a program analysis technique to determine what input values cause each path of a program to execute. Symbolic execution assumes symbolic values for the program inputs rather than using concrete values as normal execution of the program would. Expressions encountered during symbolic execution are expressed as functions of the symbolic variables. At any point during symbolic execution, program state is described with the value of the program counter and with a symbolic expression known. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 23.
(30) Algorithm 4: Stack Count and Gas Estimation input : Cycle nodes output: Stack size usage, gas consumption and gas constraint 2 3 4 5 6 7 8 9 10 11 12 13 14. 立. */. 政 治 大. 學. 15. import opcodeTable; def gasEstimation(CN ): S ← stack, ST ← storage, MEM ← memory; stackSum ← 0, gasTotal ← 0, gasConstraint ← ’ ’; open(CN, ’r’) as F: for ins in F : /* Using Algorithm 5’s function stackSimulation S, ST, MEM, G ← se.stackSimulation(ins, S, ST, MEM); if isinstance(G, str): gasConstraint ← G; else: gasTotal += G; if ’LOG’ in ins: LN ← ins.split(’LOG’); stackSum += (LN + 2); else: stackSum += opcodeTable[ins]; return stackSum, gasTotal, gasConstraint;. ‧ 國. 1. ‧. as the path condition (PC). A PC is a constraint on input values that must be satisfied in order for a program to reach the location that PC corresponds. The set of all possible. y. Nat. sit. executions of a program is represented by a symbolic execution tree.. n. al. er. io. EVM is a stack-based machine so we will use a program to simulate the state of the. i n U. v. EVM stack. Meanwhile, the state of memory or storage will all be update if the opcode. Ch. engchi. use it. Inside the control flow graph of the positive cycle path, there are some instructions which load data from the user input. For example, CALLDAT ALOAD will load a user input once at a time. User input is in a format of 16 bits hexadecimal-digit and all the input will be append together as one parameter for the execution. As we know, most of the functions inside a smart contract need parameters to determine the program logic. However, we will not be able to know the exact value of function parameter or user input from EVM opcode during static analysis. Function parameter will be present as some symbolic variable during analysis. We will carry those symbolic variables as a constraint along with the path. When facing a branch in the path, we will need to know the condition that determines which path should the EVM choose to execute. If the condition is a symbolic variable, we will union all the constraints in the cycle path and use. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 24.
(31) Algorithm 5: Stack Simulation input : instruction I, stack S, storage ST, memory MEM output: stack S, storage ST, memory MEM, gas G. 5 6 7 8 9 10 11 12 13 14. 16 17. 19 20. 22. 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45. al. n. 24. io. 23. Nat. 21. ‧. 18. 學. 15. 立. 政 治 大. y. 4. sit. 3. er. 2. import gasTable, z3; def stackSimulation(I, S, ST, MEM ): INS ← I.split(’ ’), OPCODE ← INS[0], G ← 0; switch OPCODE do case PUSH do if INS[1] == ’[tag]’ : pushValue ← INS[2]; else: pushValue ← INS[1]; S.insert(0, pushValue); G += gasTable[’PUSH’]; break; case MSTORE do if S.length > 1 : index ← S.pop(0); value ← S.pop(0); if isinstance(index, int): sol ← Solver(); sol.add(index); index ← sol.check(); oldSize ← index.length // 32; newSize ← ceil32(index+32) // 32; memExtend ← (newSize − oldSize) ∗ 32; MEM.extend([0] ∗ memExtend); for i in range(31, -1, -1): MEM[index + i] ← value; G += gasTable[’MSTORE’]; else: Error: stack underflow; break; case SSTORE do if S.length > 1 : index ← S.pop(0); value ← S.pop(0); if isinstance(index, str): sol ← Solver(); sol.add(INDEX); index ← sol.check(); ST[index] ← value; G += gasTable[’SSTORE’]; else: Error: stack underflow; break; DOI:10.6814/THE.NCCU.MIS.024.2018.A05 /* rest of the instructions */ return S, ST, MEM, G; 25. ‧ 國. 1. Ch. engchi. i n U. v.
(32) SMT solver to find if there is a likely optimal answer that satisfied all the constraints. In the end, apply the answer to those constraint, we will know if the cycle path is feasible and finally we can calculate the stack variation and total gas consumption on this path. We develop a program to symbolically simulate the behavior of the EVM stack by implement every EVM opcode. Stack size and gas consumption are both accumulating bases on rules. Below are some instructions with their rules. (s: Stack, m: Memory, S: Storage, G: gas table, g:gas consumption) For arithmetic instructions like ADD and SU B, first, we pop the top two items on. 政 治 大. the stack as parameter x and parameter y. Second, do the arithmetic operation on x and y. Finally, we will get a new value, which is the result of the arithmetic, and push it on. 立. the EVM stack. Thus, we can tell that arithmetic instructions normally will decrease the. ‧ 國. 學. EVM stack size by one. In addition, the gas consumption will add by a number base on. ‧. Nat. y. x = s.pop(0). io. 0. n. al. s ← s.push(x + y). Ch. i n U. g 0 ← g + G(ADD). engchi. sit. y = s.pop(0). (2). er. table 3.. v. x = s.pop(0) y = s.pop(0) (3) 0. s ← s.push(x − y) g 0 ← g + G(SU B). Some instructions such as M LOAD and M ST ORE will store or load data using the EVM memory. For M LOAD, it will load a value v index from memory location p to p + 32, and push it on the EVM stack. The state of the stack will be updated as s0 and. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 26.
(33) the memory will stay the same after executing the M LOAD instruction. M ST ORE has a totally different behavior from M LOAD. It will pop the top two items from the stack, p is the head of the storing index, and v is the value which is going to be stored in the memory. So after execute M ST ORE, a new value v will be stored at memory location p to p + 32. The state of the memory will be updated as m0 and nothing is going to push on the EVM stack. The gas consumption will be change following the rules on table 3. p = s.pop(0) v = m[p...(p + 32)]. 政 s ←治 s.push(v) 大. (4). 0. 立. g 0 ← g + G(M LOAD). y. (5). m0 ← m[p...(p + 32)].insert(v). n. al. er. io. sit. Nat. v = s.pop(0). ‧. ‧ 國. 學 p = s.pop(0). g 0 ← g + G(M ST ORE). Ch. engchi. i n U. v. Instructions for the EVM storage are similar to the memory instructions. SLOAD load the value v at index p, which is pop from top of the stack, from the storage and push it on the stack. SST ORE store v at index p of the storage. So SST ORE will update the state of the EVM storage once it was executed. Gas consumption for executing SLOAD will be concrete value. However, for SST ORE, gas consumption is a formula which is depend on the value of p and v (row 6 on table 4). During the symbolic simulation, we will pop the top two item on the stack represent p and v. If both the value are symbolic value, this formula will be the constraint for the gas consumption. Otherwise, we can calculate the exact value of the gas consumption for SST ORE. The gas consumption for. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 27.
(34) operating the EVM storage is relatively massive than other instructions. p = s.pop(0) v = S[p] (6) 0. s ← s.push(v) g 0 ← g + G(SLOAD). p = s.pop(0). (7). 0. S ← S[p].insert(v). 學. ‧ 國. 立. 政 v治 = s.pop(0) 大 g 0 ← g + G(SST ORE). ‧. For jumping instructions, we want to know which block will be the destination after the. y. Nat. sit. jump. So the first step for executing JU M P and JU M P I will pop the top item from the. er. io. stack, which is the label LABEL for the destination block. For JU M P I, the condition. al. n. v i n C h the label of jump the destination block. As for F ALSE, e n g c h i U destination will be the next block. value will be pop from top of the stack. If the condition value is T RU E, LABEL indicate. after the block of JU M P I. Now we know the label of the next block, the simulation will be relocating to the new block after executing jump instruction. label = s.pop(0) label0 ← label. (8) 0. g ← g + G(JU M P ) (label : label of node block). DOI:10.6814/THE.NCCU.MIS.024.2018.A05 28.
(35) label = s.pop(0) condition = s.pop(0) if (condition = true)label0 ← label. (9). if (condition = f alse)next label ← label g 0 ← g + G(JU M P I). 政 治 大 the state of the stack once it was executed. EVM is a state machine, so it rely on 立 those instruction to shift the value on the stack for other instructions. For example, as. Other important instruction such as P U SH, P OP , DU P ∗ and SW AP ∗ will modified. ‧ 國. 學. mentioned early, CALLDAT ALOAD will load a user input during execution. If the user input data is being use multiple times, EVM will use DU P ∗ or SW AP ∗ to get the data. ‧. to the top of the stack for further execution.. y. Nat. n. Ch. engchi. er. io. al. g 0 ← g + G(P U SH). sit. s0 ← s.push(X). i n U. (10). v. s0 ← s.pop(0) (11) 0. g ← g + G(P OP ). v = s.idx(∗) s0 ← s.push(v). (12). g 0 ← g + G(DU P ∗). DOI:10.6814/THE.NCCU.MIS.024.2018.A05 29.
(36) v = s.idx(∗) s.idx(∗) = s.idx(0) (13) s.idx(0) = v g 0 ← g + G(SW AP ∗). 5. Experiment. 治 政 大a smart contract was mined by contract on blockchain that everyone can reach. Once 立 miners, the compiled bytecode will be recorded on the block. By using blockchain platform Despite the example provided in the last section, there are still lots of open source smart. ‧ 國. 學. such as Etherscan, people can easily find those smart contract. Etherscan allows smart contract creator to upload their source code and the authenticity of code will be verified. ‧. automatically. Therefore, all those smart contracts with additional information will be. sit. y. Nat. our experimental data. As mentioned in our analysis process, Solidity Remix will be. al. n. 5.1. io. functional it is.. er. our development tool and compiler due to the variety of information provided and fully. Ch. engchi. i n U. v. Source Code Fetching. In order to get all the source code on the web-based blockchain explorer, we develop a content crawler, written in Python, to accelerate the download procedure. All the source code will be download and store in our database. The reachable smart contracts on Etherscan are limited, but once in a while, the platform will update block information and the new smart contract will be download simultaneously by the crawler.. 5.2. Source Code Comparison. In the next procedure, we use the build-in Solidity compiler to compile smart contract source code we’ve got. Before the process, we found out that the similarity of some. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 30.
(37) Table 5: Tested Smart Contracts from Ethereum Blockchain Balance 6.08960513010001 212.510667068223383179 0 1.131884071130000015 100.633469345624563517 406.607596164599999 0.0141 4.9886 2.23225700000000001 3.266396182497500183 3.00001 0 0 2.011800000727 0.004030007539536001 22.755499999999999 0.354711104826512158 0.507739999999513218 0 0. 政 治 大. Ins 8650 6441 387 7642 1856 2537 2496 842 5039 1290 1234 828 80 1186 5584 2505 1820 11809 2313 1653. Block 458 321 32 438 140 161 166 79 338 81 103 56 8 71 421 181 1320 714 128 113. Cycle Y N N Y Y Y Y N Y N Y N N N Y Y N Y N Y. Nat. sit. y. ‧. ‧ 國. 立. Tx 6647 18296 1 359 1559 176 17 31 54 86 3 380 1 29 35 48 2 8 465 143. 學. Name MoonCatRescue CryptoPunksMarket SafeMathLibExt PromissoryToken DigiPulse PreTgeExperty ProtectTheCastle NoFeePonzi NamiMarket PonzICO MultiplyX10 EthergotchiOwnershipV2 Ownable QuizTime Dragon PublicSaleManager Insurance DiceRoll XPS MultiMerkleMine. io. er. verified smart contract is high. Further observation, those highly similar smart contract is an updated version of the same source. Considering this circumstance, we treated those. n. al. Ch. smart contract as the same source.. 5.3. engchi. i n U. v. Compile Result. After doing the comparison, total numbers of unique smart contract we had downloaded are 292. Compiling those source code, we encounter several situations such as semantic error or uncompilable source code which the code is an interface or library rather than a callable smart contract we want. In the end, we use our analysis tool to test total twenty successfully compiled smart contracts. Among those smart contract, half of them contained at least one cycle path inside the control flow graph. The information such the balance of the smart contract, total transactions, instructions in the smart contract, and blocks in the control flow graph of the cycle path for those smart contract all list in table. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 31.
(38) 5. Although some of the smart contract contained cycle path, no positive cycle was found In those twenty on-chain smart contracts. But we generate the path constraints for the cycle path. (size(calldata) < 4) == 0 ∧ (146CA531 = (F F F F F F F F &(P arameter/10000000...))) == 0 ∧ (3BCF 7D22 = (F F F F F F F F &(P arameter/10000000...))) == 0 ∧ (4316ABBB = (F F F F F F F F &(P arameter/10000000...))) == 0 ∧. 政 治 大 (94181D27 = (F F F F F F F F &(P arameter/10000000...))) == 0 ∧ 立. (7E32A592 = (F F F F F F F F &(P arameter/10000000...))) == 0 ∧. ‧ 國. 學. (9C8A5A56 = (F F F F F F F F &(P arameter/10000000...))) == 0 ∧ (A2E7241C = (F F F F F F F F &(P arameter/10000000...))) == 0 ∧. ‧. (B7D5D4C0 = (F F F F F F F F &(P arameter/10000000...))) == 0 ∧. (14). sit. y. Nat. (BD7B09E4 = (F F F F F F F F &(P arameter/10000000...))) == 0 ∧. io. er. (D466A03F = (F F F F F F F F &(P arameter/10000000...))) == 0 ∧ (D4D5D32A = (F F F F F F F F &(P arameter/10000000...))) == 0 ∧. al. n. v i n (D94395E2 = (F F FCFh F F F F &(P arameter/10000000...))) == 0 ∧ engchi U. (D954CBCB = (F F F F F F F F &(P arameter/10000000...))) == 0 ∧ (E2202A4D = (F F F F F F F F &(P arameter/10000000...))) == 0 ∧ (E684AA5C = (F F F F F F F F &(P arameter/10000000...))) == 0 ∧ (E811F 50A = (F F F F F F F F &(P arameter/10000000...))) == 1 ∧ Above equations are the path constraints for example P rotectT heCastle. If all the constraints are feasible, the cycle path will be executed and it will consume 435 gas. To solve those constraints, we have to change the format of the constraint to match to rules for SMT solver. Let SMT solver to find out the possible answer for those symbolic parameters. With the answer, we will know if this cycle path is feasible or not.. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 32.
(39) Table 6: Tested Smart Contracts from III Name purchaseOrder Ownable Bank BankAttacker Exchange Owned. Ins 1629 239 354 578 453 378. Block 129 16 26 46 22 30. Gas 235650 23001 13645 20375 7455 19637. We also run our analysis tool on some smart contract provided by III. There are total six source code being tested and only contract purchaseOrder contained a non-. 政 治 大 we generated for this path. If all the constraints are feasible, executing this path will 立 positive cycle. The result is show in table 6. Equation 15 show all the path constraints. consume massive gas.. ‧. ‧ 國. 學 (size(calldata) < 4) == 0 ∧. sit. y. Nat. (188E9EF 3 = (F F F F F F F F &(P arameter/10000000...))) == 0 ∧ (15). al. er. io. (2804B2C0 = (F F F F F F F F &(P arameter/10000000...))) == 0 ∧. v. n. (2F 2770DB = (F F F F F F F F &(P arameter/10000000...))) == 1 ∧. Ch. engchi. i n U. On the other hand, we calculate the possible maximum gas consumption for each source code we tested from Ethereum blockchain (table 7). None of the results contained the symbolic gas constraint. All the gas consumption are a constant number. We will continue using our tool to verified those untested smart contracts on the blockchain.. 6. Conclusions and Future Work. We have presented our approach to the vulnerability detection of Ethereum smart contracts at the level of EVM source code. We demonstrate a smart contract checking model that check EVM stack size and gas consumption. Also we generate an execution path of source code to determine whether the gas consumption of this path exceed the gas limit of. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 33.
(40) Table 7: Maximum Gas Consumption for Smart Contracts from Ethereum Blockchain Name MoonCatRescue. Gas 48076. CryptoPunksMarket. 48554. PromissoryToken. 4281. DigiPulse. 43589. PreTgeExperty. 121563. ProtectTheCastle. 40924. 立879 20735. 11582. 41424. n. al. Ch. EthergotchiOwnershipV2. 7588. Ownable. 399. QuizTime. 711. Dragon. 48076. PublicSaleManager. 21228. DiceRoll. 29817. XPS. 48554. MultiMerkleMine. 89781. er. io. sit. y. Nat MultiplyX10. ‧. PonzICO. 政 治 大. 學. NamiMarket. ‧ 國. NoFeePonzi. PC (size(calldata)<4)==0 (((Parameter/1000...)&FFFFFFFF)=36AE31EC)==1 (size(calldata)<4)==0 (((Parameter/1000...)&FFFFFFFF)=6FDDE03)==1 (size(calldata)<4)==0 (((Parameter/1000...)&FFFFFFFF)=53F14DA)==1 (size(calldata)<4)==0 (((Parameter/1000...)&FFFFFFFF)=A9059CBB)==1 (size(calldata)<4)==0 (((Parameter/1000...)&FFFFFFFF)=DE6746A5)==0 (((Parameter/1000...)&FFFFFFFF)=653C95B8)==1 (size(calldata)<4)==0 (((Parameter/1000...)&FFFFFFFF)=7E32A592)==1 (((Parameter/1000...)&FFFFFFFF)=63BDE24)==0 (29652E86=((Parameter/1000...)&FFFFFFFF))==1 (Parameter2<0)==1 (size(calldata)<4)==0 (Parameter2<0)==0 (20EA8D86=((Parameter/1000...)&FFFFFFFF))==1’ (size(calldata)<4)==0 (((Parameter/1000...)&FFFFFFFF)=27E235E3)==0 (2DDBD13A=((Parameter/1000...)&FFFFFFFF))==0 (3CCFD60B=((Parameter/1000...)&FFFFFFFF))==1 (size(calldata)<4)==0 (((Parameter/1000...)&FFFFFFFF)=EF67887)==0 (A1FB1DE9=((Parameter/1000...)&FFFFFFFF))==1 (size(calldata)<4)==0 (((Parameter/1000...)&FFFFFFFF)=095EA7B3)==0 (2B8BBBE8=((Parameter/1000...)&FFFFFFFF))==1 (size(calldata)<4)==0 (((Parameter/1000...)&FFFFFFFF)=8DA5CB5B)==1 (size(calldata)<4)==0 (((Parameter/1000...)&FFFFFFFF)=3FAD9AE0)==1 (size(calldata)<4)==0 (6FDDE03=((Parameter/1000...)&FFFFFFFF))==1 (size(calldata)<4)==0 (B63E0CD8=((Parameter/1000...)&FFFFFFFF))==1 (size(calldata)<4)==0 (ACFFF377=((Parameter/1000...)&FFFFFFFF))==1 (size(calldata)<4)==0 (((Parameter/1000...)&FFFFFFFF)=6FDDE03)==1 (size(calldata)<4)==0 (((Parameter/1000...)&FFFFFFFF)=F725839F)==1. engchi. i n U. v. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 34.
(41) blockchain. That give the information of gas usage before executing the smart contract. Not only prevent the situation of wasting Ether when exception happened, but provide a reference of estimated cost to smart contract user. Recently, our model can only simulate a single smart contract status. Like other program, Ethereum smart contract is able to communicate with other callable smart contract. So for more comprehensive analysis, we hope to trace the interaction between different contract. That means, as in the future, we want to know the stack status or gas usage when a smart contract involve another smart contract. Either simultaneously. 政 治 大. checking both smart contracts or our model should be able to deal with message call inside a smart contract.. 立. Furthermore, the main focus for theoretical contributions, where we shift our focus to. ‧ 國. 學. model checking techniques after having dependency graphs or path constraints. Specifically, we will investigate automata-based numeric and string constraint solving as the. ‧. fundamental solving techniques for vulnerability detection in target applications. The. sit. y. Nat. solver aims at solving a core string constraint language for numeric and string constraints. io. er. that are commonly generated in the target applications during program analysis. After vulnerabilities have been found, the next question is how to exploit the vulner-. n. al. Ch. i n U. v. ability and how to patch the vulnerability. For side channel attacks, an ideal attack is to. engchi. optimize information leakage after each try. As to reduce the risk, one should write the program in a way that has minimal information leakage (which can be quantified with our approach). On the other hand, with symbolic execution, the truth assignment of path conditions also reveals a feasible input to access the program point. The patch of the vulnerability is to restrict the condition so that it becomes infeasible when violation occurs.. References [1] P. Dai, N. Mahi, J. Earls, and A. Norta, “Smart-contract value-transfer protocols on a distributed mobile application platform,” URL: https://qtum. org/uploads/-. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 35.
(42) files/cf6d69348ca50dd985b60425ccf282f3. pdf, 2017. [2] “Solidity-solidity 0.4.21 documentation.” https://solidity.readthedocs.io/en/ develop/. [3] “Counterparty.” https://counterparty.io/. [4] “Stellar - develop the world’s new financial system.” https://www.stellar.org/. [5] “Home — lisk.” https://lisk.io/.. 政 治 大. [6] “The dao (organization) - wikipedia.” https://en.wikipedia.org/wiki/The_DAO_ (organization).. 立. ‧ 國. 學. [7] N. Atzei, M. Bartoletti, and T. Cimoli, “A survey of attacks on ethereum smart contracts (sok),” in International Conference on Principles of Security and Trust,. ‧. pp. 164–186, Springer, 2017.. sit. y. Nat. [8] “coloredcoins.org.” http://coloredcoins.org/.. n. al. er. io. [9] “Create a democracy contract in ethereum.” https://www.ethereum.org/dao. [10] “Monegraph.” https://monegraph.com/.. Ch. engchi. i n U. v. [11] K. Delmolino, M. Arnett, A. Kosba, A. Miller, and E. Shi, “Step by step towards creating a safe smart contract: Lessons and insights from a cryptocurrency lab,” in International Conference on Financial Cryptography and Data Security, pp. 79–94, Springer, 2016. [12] T. Dickerson, P. Gazzillo, M. Herlihy, and E. Koskinen, “Adding concurrency to smart contracts,” in Proceedings of the ACM Symposium on Principles of Distributed Computing, pp. 303–312, ACM, 2017. [13] M. Herlihy, V. Luchangco, M. Moir, and W. N. Scherer III, “Software transactional memory for dynamic-sized data structures,” in Proceedings of the twenty-second annual symposium on Principles of distributed computing, pp. 92–101, ACM, 2003.. DOI:10.6814/THE.NCCU.MIS.024.2018.A05. 36.
(43) [14] L. Luu, D.-H. Chu, H. Olickel, P. Saxena, and A. Hobor, “Making smart contracts smarter,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 254–269, ACM, 2016. [15] “Oyente by oyente.” https://oyente.github.io/benchmarks/. [16] K. Bhargavan, A. Delignat-Lavaud, C. Fournet, A. Gollamudi, G. Gonthier, N. Kobeissi, N. Kulatova, A. Rastogi, T. Sibut-Pinote, N. Swamy, et al., “Formal verification of smart contracts: Short paper,” in Proceedings of the 2016 ACM Workshop on Programming Languages and Analysis for Security, pp. 91–96, ACM, 2016.. 政 治 大 [17] N. Swamy, C. Hrit¸cu, C. Keller, A. Rastogi, A. Delignat-Lavaud, S. Forest, K. Bhar立 gavan, C. Fournet, P.-Y. Strub, M. Kohlweiss, et al., “Dependent types and multi-. ‧ 國. 學. monadic effects in f,” in ACM SIGPLAN Notices, vol. 51, pp. 256–270, ACM, 2016.. ‧. [18] A. Dika, “Ethereum smart contracts: Security vulnerabilities and security tools,” Master’s thesis, NTNU, 2017.. sit. y. Nat. io. n. al. er. [19] “Securify - formal verification of ethereum smart contracts.” https://securify.ch/.. i n U. v. [20] “Smartdec — smart contracts security audit.” http://smartcontracts.smartdec. net/.. Ch. engchi. [21] M. Suiche, “Porosity: A decompiler for blockchain-based smart contracts bytecode,” DEF CON, vol. 25, 2017. [22] E. Hildenbrandt, M. Saxena, X. Zhu, N. Rodrigues, P. Daian, D. Guth, and G. Rosu, “Kevm: A complete semantics of the ethereum virtual machine,” tech. rep., 2017. [23] S. Amani, M. B´egel, M. Bortin, and M. Staples, “Towards verifying ethereum smart contract bytecode in isabelle/hol,” CPP. ACM. To appear, 2018. [24] A. Mavridou and A. Laszka, “Tool demonstration: Fsolidm for designing secure ethereum smart contracts,”. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 37.
(44) [25] “Finite-state. machine. -. wikipedia.”. https://en.wikipedia.org/wiki/. Finite-state_machine. [26] “Nxt - the blockchain application platform.” https://nxtplatform.org/. [27] P. L. Seijas, S. J. Thompson, and D. McAdams, “Scripting smart contracts for distributed ledger technology.,” IACR Cryptology ePrint Archive, vol. 2016, p. 1156, 2016. [28] “Zero-knowledge. proof. -. Zero-knowledge_proof.. 立. [29] “Proof-carrying. code. -. wikipedia.”. https://en.wikipedia.org/wiki/. 政 治 大 wikipedia.”. ‧ 國. 學. Proof-carrying_code.. https://en.wikipedia.org/wiki/. [30] J. Rubin, M. Naik, and N. Subramanian, “Merkelized abstract syntax trees.”. Nat. sit. y. ‧. [31] “Ethereum network status.” https://ethstats.net/.. io. er. [32] “Ethereum blockchain explorer and search.” https://etherscan.io/.. al. n. v i n Ch solidity.readthedocs.io/en/v0.4.21/installing-solidity.html. engchi U. [33] “Installing the solidity compiler — solidity 0.4.21 documentation.” http://. [34] “Remix - solidity ide.” https://remix.ethereum.org/#optimize=false&version= builtin. [35] “Graphviz - graph visualization software.” https://www.graphviz.org/.. DOI:10.6814/THE.NCCU.MIS.024.2018.A05 38.
(45)
相關文件
To complete the “plumbing” of associating our vertex data with variables in our shader programs, you need to tell WebGL where in our buffer object to find the vertex data, and
In these lessons, students will evaluate the impacts of genetic engineering on our daily life, and analyze the moral issues raised in its development, especially those related
• We will look at ways to exploit the text using different e-learning tools and multimodal features to ‘Level Up’ our learners’ literacy skills.. • Level 1 –
In our AI term project, all chosen machine learning tools will be use to diagnose cancer Wisconsin dataset.. To be consistent with the literature [1, 2] we removed the 16
We also propose a Unified Code Management Schemes to eliminate code blocking completely and the reassignment cost will be reduced as far as possible based on CIDP.. Our schemes
We shall show that after finite times of switching, the premise variable of the fuzzy system will remain in the universe of discourse and stability of the adaptive control system
Writing texts to convey simple information, ideas, personal experiences and opinions on familiar topics with some elaboration. Writing texts to convey information, ideas,
Moderation of test papers and appropriate assessment tools will be provided if needed to monitor the progress of our students and to provide assistance as appropriate...