人型機器人節能步態生成器

全文

(1)國立臺灣師範大學科技與工程學院電機工程學系碩士論文 Department of Electrical Engineering College of Technology and Engineering. National Taiwan Normal University Master’s Thesis. 人型機器人節能步態生成器 An Energy-Efficient Gait Generation for The Humanoid Robot. 姜奧開 Eko Rudiawan Jamzuri 指導教授: 包傑奇教授 Advisor: Prof. Jacky Baltes. 中華民國 109 年 6 月 June 2020.

(2) Acknowledgment This work was financially supported by the ‘Chinese Language and Technology Center’ of National Taiwan Normal University (NTNU) from The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan, and Ministry of Science and Technology, Taiwan, under Grant Nos. MOST 108-2634-F-003-002, MOST 108-2634F-003-003, and MOST 108-2634-F-003-004 (administered through Pervasive Artificial Intelligence Research (PAIR) Labs) as well as MOST 107-2811-E-003-503. We are grateful to the National Center for High-performance Computing for computer time and facilities to conduct this research.. i.

(3) An Energy-Efficient Gait Generation for The Humanoid Robot. Student：Eko Rudiawan Jamzuri. Advisor：Prof. Jacky Baltes. Department of Electrical Engineering National Taiwan Normal University ABSTRACT Energy efficiency is the main issue in the robotics field, especially in the humanoid robot, due to the limited power source from the battery. Efficient power consumption becomes the primary role of increasing the durability of the robot. In the humanoid robot, the main electric load is on the joint actuators. Therefore, for reducing the energy consumption, it can be formulated through gait optimization, which is selected from the optimal values of parameterized of the gait engine. This thesis proposed a method for generating a stable and energy-efficient gait for the humanoid robot that can be applied in variable speed and omnidirectional walk. The gait pattern is generated by Zero Moment Point (ZMP) preview controller and Bezier function. Gait engine is parameterized by parameters to adjust the Centre of Mass (CoM) height, body posture, and walking speed. The Covariance Matrix Adaptation Evolution Strategies (CMA-ES) has been proposed to find the optimal values that yielded a stable and energy-efficient gait in a safe simulation environment. The optimal gait parameters were verified in the simulation and real robot, able to reduce energy about 29.813 % and improve stability 20 % during training. Verification in the real robot validated the result, which can save energy about 19.905 % compared to non-optimized gait. Moreover, the optimal parameters are generalized that can be applied to variable speed and omnidirectional walk without unstable issues.. Keywords: humanoid robot, gait generation, gait optimization, ZMP preview controller, CMA-ES.. ii.

(4) Table of Contents Acknowledgment ............................................................................................................. i ABSTRACT ................................................................................................................... ii Table of Contents........................................................................................................... iii Figure Index .....................................................................................................................v Table Index .................................................................................................................... vi Chapter 1. Introduction ....................................................................................................1 1.1 Background ............................................................................................................1 1.2 Related Works .......................................................................................................2 1.3 Problem Statement.................................................................................................4 1.4 Research Aim & Objective ....................................................................................4 1.5 Contribution of This Thesis ...................................................................................4 1.6 Outline of This Thesis ...........................................................................................4 Chapter 2. Kinematics Analysis ......................................................................................5 2.1 Humanoid Robot Platform ....................................................................................5 2.2 Simplified Model of Humanoid Robot ..................................................................6 2.3 Forward Kinematics ..............................................................................................7 2.4 Inverse Kinematics ..............................................................................................12 Chapter 3. Gait Generation ............................................................................................14 3.1 Walking Method of Humanoid Robot .................................................................14 3.2 Parameterized Walking Pattern Generation ........................................................15 3.3 Footstep Pattern Generation ................................................................................18 3.4 CoM Trajectory Generation ................................................................................18 3.5 Swing Foot Trajectory Generation ......................................................................23 3.6 Joint Space Trajectory Generation ......................................................................25 Chapter 4. Gait Optimization ........................................................................................26 4.1 Optimization Objective .......................................................................................26 4.2 Optimization Parameters .....................................................................................27 4.3 Covariance Matrix Adaptation Evolution Strategies...........................................27 4.4 Training in the Simulation-based Environment ..................................................30 Chapter 5. Experimental Result and Discussion ...........................................................32 5.1 Generated Walking Trajectory ............................................................................32 iii.

(5) 5.2 Training Performance ..........................................................................................34 5.3 Evaluation Performance ......................................................................................35 5.4 Comparison Performance Before and After Optimization ..................................36 5.5 Straight Walk with Variable Step Length ...........................................................38 Chapter 6. Conclusion and Future Work .......................................................................40 References .....................................................................................................................41 Autobiography ...............................................................................................................44 Academic Achievement.................................................................................................45. iv.

(6) Figure Index Figure 2-1. 3D model of Darwin OP3 robot. ..................................................................5 Figure 2-2. Hardware block diagram of Darwin OP3. ....................................................6 Figure 2-3. The 12-DoF model of Darwin OP3 (a) 3D model, (b) links description. ....6 Figure 2-4. The URDF tree of 12-DoF humanoid robot. ................................................9 Figure 3-1. Relationship between ZMP, CoM, and support polygon. ..........................15 Figure 3-2. Omnidirectional walking pattern generator. ...............................................16 Figure 3-3. Gait parameters for adjusting CoM and body posture. ...............................16 Figure 3-4. Cart-table model of the humanoid robot.....................................................19 Figure 3-5. ZMP preview controller block diagram......................................................20 Figure 3-6. Illustration of the swing foot trajectory from a lateral view. ......................23 Figure 4-1. Data flow of simulation-based learning......................................................30 Figure 5-1. Generated CoM and sole trajectories (a) walking forward, (b) walking sideways, (c) walking diagonal, and (d) walking with turning. ....................................33 Figure 5-2. Reward and success rate performance during training. ..............................34 Figure 5-3. Evaluation of simulation model (a) torque (b)  roll and  pitch . ..................35. v.

(7) Table Index Table 2-1. Link notation and length dimension. .............................................................7 Table 2-2. Frame notation and description. .....................................................................9 Table 3-1. Gait parameters and definition. ....................................................................17 Table 3-2. Notation and definition of the cart-table model. ..........................................19 Table 4-1. Gait parameters search space. ......................................................................27 Table 4-2. The hyperparameters setting for optimization. ............................................31 Table 5-1. Parameter values for generating trajectories. ...............................................32 Table 5-2. Result of optimal walking parameters. ........................................................35 Table 5-3. Comparison performance between initial and final generation on simulation. .......................................................................................................................................36 Table 5-4. Comparison before and after optimization. .................................................37 Table 5-5. Comparison walking forward with variable step length. .............................38 Table 5-6. Comparison walking backward with variable step length. ..........................38. vi.

(8) Chapter 1. Introduction 1.1 Background The most crucial part of developing a humanoid robot is the ability to walk. Due to the structure of the robot, similar to humans, establishing the stability of walking needs much effort. Hence, forming a stable and agile gait pattern can be considered. The gait generation is expected can make the robot move in the omnidirectional movement, such as forward, backward, sideways, and turning smoothly. In general, the formulation of dynamic walking in humanoid robots can be categorized into two methods, namely based on Zero Moment Point (ZMP) and Central Pattern Generator (CPG). In the ZMP-based gait generation, the trajectories are generated by modeling a humanoid robot with a simple dynamics model, for instance, 3D Linear Inverted Pendulum Model (3D-LIPM), Passive Inverted Pendulum Model (PIPM), Extended Inverted Pendulum (EIPM), or cart-table model. On the other hand, the trajectories can be generated through controlling the Centre of Mass (CoM) motion of dynamic model, so that it can be met with the ZMP criterion. The common method for generating the ZMP-based trajectories are using preview control [1], Model Predictive Control (MPC) [2], Linear Quadratic Gaussian (LQG) controller [3], analytical solution [4], or spline polynomial [5]. In ZMP-based gait, the trajectories are generated in world space and translated into joint space using inverse kinematics. In contrast to ZMP-based, the CPG-based mimics the biological systems of the human body, which directly actuates the joint actuator with a central non-linear oscillator, without relying on world space trajectory generation. The non-linear actuator that commonly used in CPG-based gait is the Matsuoka oscillator [6, 7] and Hopf oscillator [8]. By the time being, the researchers improved a standard gait pattern to realize an omnidirectional walk. The ZMP-based omnidirectional walk was successfully applied to the Nao, Darwin OP, and a teen-size robot [3, 9-17]. Moreover, the omnidirectional CPG-based gait also performs well in the Nao robot [18, 19]. However, to realize a dynamic stable walk, the gait parameters have to be tuned carefully, which usually made by trial and error experiments by the experts [20]. Trial and error experiments in the real 1.

(9) robot make a high chance of hardware damage because the robot probably unstable and fall while the selected parameters are not suitable. Therefore, in [11] suggest a stochastic optimization approach to find the optimal parameters rather than tuned manually. In the robotics competition, such as the Federation of International Robot-Sport Association (FIRA) robot world cup, which has marathon challenge, stability and agility are not enough. The winner of this competition is a robot that can run as long as possible without maintenance and battery replacement. In this case, the gait engine must be energy-efficient, which means to consume less power source to make the robot has long endurance. The energy efficiency of the robot can be estimated through measuring consumed the power load, such as motor, computer, and sensors. While the robot walks, joint actuators, or electric motors, which the main electric load will be moved according to the formed rhythm from gait, and the impact of the power consumption drastically increases. Based on these facts, the energy usage of the robot depends on the formed rhythm from the gait generator that relates to gait parameters. In this research, the gait parameters will evolve through Evolution Strategies (ES) optimization algorithm to find the optimal value, which yields an energy-efficient gait. The optimization used simulation software to prevent human intervention and hardware damage in a real robot. Therefore, the optimal parameter value, which has been founded through learning simulation, will be applied to the real robot to validate the energy reduction.. 1.2 Related Works The first energy minimization in humanoid robot research was conducted by [21] that proposed to use a genetic algorithm for optimizing seven critical parameters in hip and foot trajectories generation. The simulation result verified that the proposed method could generate stable and energy-efficient gait walking in level-ground and slope surface. However, this approach is verified in the simulation model and has not verified in the real robot. A different technique, namely policy parameterization, was introduced by [22], which yielded an 18% power reduction. The idea is to optimize the spline parameters 2.

(10) for variable CoM height trajectory using the PoWER algorithm [23-25]. Moreover, the learning process is run directly in the physical robot with 200 rollouts training. In [26] proposed a Passive Inverted Pendulum Model (PIPM) to generate energyefficient gait. Even though it yielded less energy compared to the conventional method, the gait parameters have to be determined through trial and error. Instead of proposing a pattern generation only, [27] proposed Extended Inverted Pendulum Model (EIPM) includes optimization. The six key parameters are optimized through the simulator using policy gradient reinforcement learning [28]. The power reduction is lower 44 % than baseline and 23.8 % lower than hand-tuned gait, while verified in the Nao robot. Even though the learning algorithm is sample efficient, it still needs human intervention to define an initial policy and has a chance stuck in local optima. In contrast to [21, 22] that evaluated objective value from a simulation or robot, in [29] used a neural network predictor to estimate torque, roll and pitch oscillation, and total traveled length from walking experience. Based on estimated values, the optimization using a Steady-State Genetic Algorithm (SSGA) is run offline. However, in this approach, data collection has a vital role in generalizing the neural network model, which affects the optimization result. A similar idea with [22], in [30] proposed a vertical CoM trajectory optimization, which was generated by hybrid ZMP-CPG based. The ZMP tridiagonal approach generated the horizontal trajectory, and the Hopt oscillator controlled the vertical trajectory. The Covariance Matrix Adaptation Evolution Strategies (CMA-ES) [31] is used to find the CPG parameters. The power reduction reaches 25 % compared to baseline gait. In this approach, the parameters are learned with variable step length and step period, which are suitable to use for walking forward with variable speeds. However, it did not guarantee to achieve the same performance in the omnidirectional walk, because of the step width is used for learning. Moreover, the foot trajectory is not optimized, which probably has a significant impact on energy cost.. 3.

(11) 1.3 Problem Statement The ZMP preview control gait generation, which is used the cart-table as a dynamic model, can guarantee a stable gait in a flat surface because the CoM motion will follow given ZMP reference. However, the gait parameters that defined from the height of CoM, body posture, foot trajectories, and time constant are unknown and need to be determined. Moreover, the gait parameters value must yield a stable and energy-efficient gait that can be applied to an omnidirectional walk with variable speed in a real robot.. 1.4 Research Aim & Objective This research aims to achieve an energy reduction in omnidirectional gait generation, which used a cart-table model and preview controller. The research is conducted by developing an omnidirectional gait engine that parameterized by gait parameters, which affects the speed, stability, and energy consumption. In order to find the optimal gait parameters that yield a stable and energy-efficient walk, we proposed an optimization approach that runs safely in the simulator.. 1.5 Contribution of This Thesis In this work, we used the cart-table model with the ZMP preview controller for generating the CoM trajectory. We improved this method by determining the gait parameters that suitable to adjust the CoM height, body posture, swing foot trajectory, and timing phase for walking. Moreover, we proposed an optimization approach using the CMA-ES to find the optimal gait parameters that can achieve a stable and energyefficient gait in an omnidirectional walk with variable speed.. 1.6 Outline of This Thesis The outline of this thesis is defined as follows: Chapter 2 described the kinematics analysis of the humanoid robot. Chapter 3 delivered an omnidirectional gait generation. Chapter 4 explained gait optimization. Chapter 5 reported an experiment finding of this research. Chapter 6 closes with the conclusion and future work.. 4.

(12) Chapter 2. Kinematics Analysis This chapter will be delivered the kinematics analysis for the humanoid robot. Section 2.1 explained the specification of a humanoid robot, followed by Section 2.2 that described the simplified model of the humanoid robot. Section 2.3 and Section 2.4 delivered about forward and inverse kinematics.. 2.1 Humanoid Robot Platform In this research, we used a humanoid robot platform Darwin OP3 that visualized in Figure 2-1. From a mechanical view, the design is similar to humans who have body, head, arms, and legs. The camera for visual perception is attached to the pan and tilt actuators located in the head. The main controller, sub-controller, and an Inertia Measurement Unit (IMU) sensor are located inside the body that protected by bumpers to minimize hardware damage while the robot fall.. Figure 2-1. 3D model of Darwin OP3 robot.. While Figure 2-2 described a hardware block diagram of the robot. The Universal Serial Bus (USB) port connected to the main controller with the vision sensor and the sub-controller. The sub-controller is connected to actuator and sensor devices through several buses, and also equipped with an RS-485 bus that can be used for connecting to the joint actuators. 5.

(13) Figure 2-2. Hardware block diagram of Darwin OP3.. 2.2 Simplified Model of Humanoid Robot The most considered segment, while generating gait, was the lower body that consists of 12-DoF joints and has a function to move the legs. We delivered a simplified model to give an illustration of the relationship between joints and links that visualized by Figure 2-3. The CoM illustrated by the round ball, and it became the base of the legs, which can be seen in Figure 2-3(a). The joints described by a grey tube, and the links are indicated by brown color. The frame indicated the orientation, where the red color indicated the x -axis, green color as the y -axis, and the blue color for the z -axis. In the lowest part, the soles become the end-effector of the legs.. (a) (b) Figure 2-3. The 12-DoF model of Darwin OP3 (a) 3D model, (b) links description. 6.

(14) Table 2-1. Link notation and length dimension. Link Notation. hc ho l1. l2. Description. Length. Distance between center of the hip to CoM measured in z -axis 0.091 m Distance between center of the hip to left/right hip measured in y -axis 0.035 m Distance between center of the hip to hip frame measured in z -axis 0.029 m Tight length 0.110 m. l3. Tibia length. 0.110 m. l4. Distance between ankle frame to sole measured in z -axis. 0.031 m. On the other hand, Figure 2-3 (b) and Table 2-1 show the links parameter of the robot. Overall, the total links of the legs are 11 links, which has identical links l1 - l4 and ho with the same length in both legs.. 2.3 Forward Kinematics Forward kinematics equation calculated the position and orientation (pose) of the end-effector by giving the joint angle. It is derived using chain rule homogeneous transformation, which depends on the transformation from the base to the end-effector frame. The equation (2.1) represented the transformation matrix T that consists of a rotation matrix and a translation vector. The matrix R in the equation (2.2) is yielded from rotation in the x -axis, y -axis, and z -axis, which described in (2.4) - (2.6). The vector p in the equation (2.3) consists of notation x , y , and z , represented the translation in the x -axis, y -axis, and z -axis.. R p T =  0 1. (2.1). R = Rz ( ) Ry (  ) Rx ( ). (2.2). p = x. y. z. T. 1 0 0    Rx (  ) =  0 cos (  ) − sin (  )   0 sin (  ) cos (  )   . 7. (2.3). (2.4).

(15)  cos (  ) 0 sin (  )    Ry (  ) =  0 1 0   − sin (  ) 0 cos (  )   . (2.5). cos ( ) − sin ( ) 0    Rz ( ) =  sin ( ) cos ( ) 0   0 0 1  . (2.6). The transformation of each frame is listed in the Universal Robotic Description Format (URDF) file contained with the information about the frame, joints, links, kinematics, and dynamics of the robot. The visualization of the URDF tree from the URDF file is shown in Figure 2-4. The elliptical block represented the joints, and a rectangular block illustrates links. The transformation of the frame is written below the link blocks, where the term "xyz" described the translation, and the term "rpy" describes the Z-Y-X-Euler rotation. The text beside the joint blocks indicated the joint rotation axis, where the possible axis for the revolute joint is x -axis, y -axis, or z -axis. If it is a fixed joint, then the rotation axis is none. Table 2-2 described a transformation of each frame from the URDF tree in Figure 2-4, where the matrix element is listed in the equation (2.7) - (2.14). The forward kinematics equation of the left leg, which is represented as a transformation from the CoM to the left sole frame, is described in the equation (2.15).. 8.

(16) Figure 2-4. The URDF tree of 12-DoF humanoid robot.. Table 2-2. Frame notation and description. Frame Notation. Joint Notation. Description. Description. 0 1. T. T from CoM to pelvis. 0. Hip yaw joint. 1 2. T. T from the pelvis to hip yaw. 1. Hip roll joint. 2 3. T. T from hip yaw to hip roll. 2. Hip pitch joint. 3 4. T. T from hip roll to hip pitch. 3. Knee joint. 4 5. T. T from hip pitch to knee. 4. Ankle pitch joint. 5 6. T. T from knee to ankle pitch. 5. Ankle roll joint. 6 7. T. T from ankle pitch to ankle roll. 7 8. T. T from ankle roll to sole. 0 8. T. T from CoM to sole. 9.

(17) 1 0 0  1T = 0  0. 0 1 0 0. 0 0 0 0  1 hc   0 1. cos (0 ) − sin (0 )  1  sin (0 ) cos (0 ) 2T =  0 0  0  0. (2.7). 0 0  0 ho  1 0  0 1. (2.8). 0 0 1 0 cos  ( 1 ) − sin (1 ) 2  3T = 0 sin (1 ) cos (1 )  0 0 0. 0 0  l1   1. (2.9).  cos ( 2 )  0 3  4T =  − sin ( 2 )  0 . 0 sin ( 2 ) 1 0 0 cos ( 2 ) 0 0. 0  0 0  1. (2.10).  cos (3 )  0 4  T = 5  − sin (3 )  0 . 0 sin (3 ) 1 0 0 cos (3 ) 0 0. 0  0 l2   1. (2.11).  cos ( 4 )  0 5  6T =  − sin ( 4 )  0 . 0 sin ( 4 ) 1 0 0 cos ( 4 ) 0 0. 0  0 l3   1. (2.12). 0 0 1 0 cos  ( 5 ) − sin (5 ) 6  7T = 0 sin (5 ) cos (5 )  0 0 0. 0 0  0  1. (2.13). 1 0 7  8T = 0  0. 0 1 0 0. 0 0 1 0. 0 0  l4   1. T = 01T 21T 23T 45T 56T 67T 78T. 0 8. 10. (2.14). (2.15).

(18)  m00 m 0  10 T = 8  m20   m30. m01 m11 m21 m31. m02 m12 m22 m32. m03  m13  m23   m33 . (2.16). The matrix components 08T are listed in the equation (2.17) - (2.30). The notation s0 represented as of sin (0 ) , c0 for the cos (0 ) , s234 described as sin (2 + 3 + 4 ) , and soon. m00 = −s0 s1s234 + c0c234. (2.17). m01 = ( s0 s1c234 + s234c0 ) s5 − s0c1c5. (2.18). m02 = ( s0 s1c234 + s234c0 ) c5 + s0 s5c1. (2.19). m03 = l1 ( s0 s1c2 + s2c0 ) + l2 ( s0 s1c23 + s23c0 ) +l3 ( ( s0 s1c234 + s234c0 ) c5 + s0 s5c1 ). (2.20). m10 = s0c234 + s1s234c0. (2.21). m11 = ( s0 s234 − s1c0c234 ) s5 + c0c1c5. (2.22). m12 = ( s0 s234 − s1c0c234 ) c5 − s5c0c1. (2.23). m13 = ho + l1 ( s0 s2 − s1c0c2 ) + l2 ( s0 s23 − s1c0c23 ) −l3 ( ( − s0 s234 + s1c0c234 ) c5 + s5c0c1 ). (2.24). m20 = −s234c1. (2.25). m21 = s1c5 + s5c1c234. (2.26). m22 = −s1s5 + c1c5c234. (2.27). m23 = hc + l0 + l1c1c2 + l2c1c23 − l3 ( s1s5 − c1c5c234 ). (2.28). m30 = m31 = m32 = 0. (2.29). m33 = 1. (2.30). 11.

(19) The vector p =  m03 m13 m23  represented the position of the left sole in the x T. axis, y -axis, and z -axis. The equation (2.31) - (2.33) determined the orientation of the sole in the Z-Y-X-Euler angle. The notation  ,  , and  represented a rotation in the x -axis, y -axis, and z -axis. For determining forward kinematics of the right leg, the. variable ho in the equation (2.8) can be substituted by −ho .. ). (2.31). m10 m00  ,   cos (  ) cos (  ) . (2.32). (.  = atan 2 −m20 , m002 + m102 .  = atan 2  . m21 m22  ,   cos (  ) cos (  ) .  = atan 2 . (2.33). For general formula, the forward kinematics equation can be rewritten as in equation (2.34), where  is the joints vector of the left leg and s =  x y z    . T. represented the end-effector pose of the left sole. s = f ( ). (2.34). 2.4 Inverse Kinematics This research used a Newton-Rapshon inverse kinematics solver [32], which available on Orocos Kinematics Dynamics Library1. Algorithm 1 presented the pseudocode, where the input is a target pose of the end-effector g , and the output is the joints angle  . The algorithm required initial joints angle value init , minimum error emin , maximum iteration imax , and step-size  . The notation emin and imax determined a stopping criterion, where notation  defined a changing value for updating the joints angle. The joints angle value is updated iteratively until the stop criterion is met.. 1. https://www.orocos.org/kdl. 12.

(20) Algorithm 1. Newton-Rapshon Inverse Kinematics Solver Input: Goal end-effector pose g Initialize: init , emin , imax ,  While e  emin and i  imax do. Calculate current end-effector pose s = f ( ) Calculate J Calculate J + e = g − s.  = J +  e  =  +  Calculate e Return . While the J notation represented as the Jacobian matrix of the forward kinematics equation. The equation (2.35) defined the pseudo-inverse of Jacobian J + . The error of goal and current poses e is calculated by Euclidean distance between goal and current poses, as described in the equation (2.36).. (. J+ = JT J. e=. n. ( g i =1. 13. i. ). −1. JT. (2.35). − si ). (2.36). 2.

(21) Chapter 3. Gait Generation This section described the proposed omnidirectional gait generation in this research. The walking method of the humanoid robot is explained in Section 3.1, followed by Section 3.2, which delivered the explanation of the parameterized walking pattern generation. Section 3.3, Section 3.4, and Section 3.5 explained the method for generating footsteps, CoM trajectory, and swing foot trajectory. An explanation about joint space trajectory generation is delivered in Section 3.6.. 3.1 Walking Method of Humanoid Robot A humanoid robot is quite different compared to another type of robot. One of the differences is base and end-effector move during walks. On the other hand, the robot has to maintain balance by keeping contact between the sole and the ground. This condition causes difficulty while generating the motion because besides planning the motion, the balance is needed to be considered. The walking method in the humanoid robot is classified into two; static walk and dynamic walk. In the static walk, the ground projection of CoM must locate inside support polygon during movement. This method usually used in a toy humanoid robot, which has a wider sole. However, it cannot be applied to a humanoid robot that has proportional feet. In the dynamic walk, the position of CoM may locate outside the support polygon in a short time that more represented as human walk behavior. Vukobratović et al. introduced the concept of Zero Moment Point (ZMP) for planning a stable gait in the humanoid robot [33]. The ZMP defined a point that always exists inside the support polygon, where the moment of the ground reaction force becomes zero, which related to momentum and angular momentum of CoM. Figure 3-1 illustrated an example of ZMP, CoM, and support polygon while humans in the standing pose. In Figure 3-1 (a), the CoM projection line and ZMP position are located inside support polygon, which called a statically stable. In contrast, in Figure 3-1 (b), the ground projection of CoM is located outside support polygon, but the ZMP position still exists inside support polygon. This situation is called statically unstable, and it is difficult to maintain body balance. 14.

(22) (a) (b) Figure 3-1. Relationship between ZMP, CoM, and support polygon.. The walking phase in the humanoid robot is divided into two cycles or phases, called Double Support Phase (DSP) and Single Support Phase (SSP). In DSP, the robot is statically balanced because both feet touch to the ground. In SSP, there is only support feet contacted with the ground, while other feet swing to the front. In this condition, the motion of CoM has to be determined carefully to ensure the ZMP position located inside support polygon to prevent the robot from falling.. 3.2 Parameterized Walking Pattern Generation The proposed walking pattern generation is illustrated in Figure 3-2, which divided into footstep generator, ZMP reference generator, ZMP tracking controller, foot trajectory generator, and inverse kinematics solver. The walking pattern is generated by giving a robot motion command that consists of three components ( cmd x , cmd y , and cmd ). Where the cmd x and cmd y described the length of the next footstep position of. the previous footstep position and cmd described the incremental angle of the next footstep orientation from the previous footstep orientation. From this motion command, the footstep array will be generated and will be assigned as a reference for generating CoM and foot trajectory. While generating the CoM trajectory, the footstep array 15.

(23) converted to ZMP reference and used as input by the tracking controller to obtain the CoM trajectory. On the other hand, the foot trajectory generator directly attained the input from the footstep array. Then, these trajectories will be converted to joint angles by the inverse kinematics solver.. Figure 3-2. Omnidirectional walking pattern generator.. Walking pattern generation has several predefined parameters that have a function to adjust CoM height, body posture, and timing phase. Body offset parameters affected the CoM height and posture of the robot, where the timing phase will control the duration of the SSP and duration of the DSP.. Figure 3-3. Gait parameters for adjusting CoM and body posture.. 16.

(24) A visual representation of gait parameters for CoM and body posture adjustment illustrated in Figure 3-3, and the definition is described in Table 3-1. The parameter zc controls the CoM height and xc directly affected the CoM position in x -axis, and the parameter  c will be evolved the orientation of CoM. Changing value in zc will affects knee bend during standing and walking. Moreover, changing of CoM position affects static balance while standing. Table 3-1. Gait parameters and definition. Parameter Name. zc zf xc ys c tstep. DSPratio. Definition Height of CoM measured from ground Maximum swing height of swing foot measured from ground CoM position offset in 𝑥-axis Sole position offset in 𝑦-axis CoM orientation offset in the pitch direction The time constant for one step gait The ratio of DSP (in percent). The parameter ys in Table 3-1, also has a relationship with the stability of the robot. The adjustment of this parameter produced the area of support polygon during the DSP phase, which becomes wider. The broader area of support polygon, the more chance projection of CoM located inside the support polygon. The parameter zf defined the maximum swing height of swing foot during the SSP. Changing this parameter affected the energy consumption of the swing leg. Due to the higher lifting of the foot, then the higher torque is required by the hip and knee joints. The timing parameters consist of tstep and DSPratio , where tstep defined total timing that required to generate one cycle gait and DSPratio represented as the ratio of the DSP timing in percent. These parameters control the speed of the robot while walking with constant step length and width. By defining these two parameters, the total time for the DSP phase t DSP formulated in the equation (3.1) , and the total time for the SSP phase tSSP can be calculated by equation (3.2). t DSP = DSPratio  tstep. (3.1). tSSP = tstep − t DSP. (3.2). 17.

(25) 3.3 Footstep Pattern Generation For generating one cycle walking gait, the minimum footstep pattern required is three, such as the pose of support foot, initial and target pose of swing foot. The purpose of footstep pattern generation is to convert the motion command into the footstep pattern required by the CoM trajectory generator and foot trajectory generator as the input data. A vector p n represented the footstep pattern data in the 2D pose, where notation n indicated the index of the footstep. The footstep pattern saved into First In First Out (FIFO) buffer with a total length of 3, which defined information about current support foot, initial swing foot, and target swing foot pose. The result of the footstep pattern has a relationship with motion command, ho , and ys which can be calculated using the equation (3.3). The notation of pxn and pyn defined the position of the footstep p n on the x -axis and y -axis while pn indicated the orientation of the footstep p n in the z -axis.. These variables are updated by step length sxn , step width s yn , step rotation sn , and support foot sf n in the equation (3.4) - (3.7).. ( ) ( ). n   pxn   pxn −1  cos s  n   n −1   n  p y  =  p y  +  sin s  pn   0   0     . ( ) cos ( s ). − sin sn n. . 0. 0  s n  x   n n  0  sf s y   n 1   s  . (3.3). s xn = cmd x. (3.4). syn = 2ho + 2 ys + cmd y. (3.5). sn = sn −1 + cmd. (3.6). −1 sf n =  1. if support foot = left if support foot = right. (3.7). 3.4 CoM Trajectory Generation In this research, the CoM trajectory generator used ZMP preview control that was proposed by [1]. The idea behind this approach is to simplify the dynamic model of a humanoid robot with a cart-table model, as illustrated in Figure 3-4. The cart with a. 18.

(26) mass M located on the top of the table represented the CoM of the humanoid robot, and the table is massless and has a small foot that described the leg. The balance of cart depended on the acceleration x while the cart moves to the x -axis.. Figure 3-4. Cart-table model of the humanoid robot.. As for the ZMP preview control method can be described as follows. The first defined the dynamic model of the cart table, which can be seen in Figure 3-4 by using the equation (3.8). The explanation of all the notation in equation (3.8) referred to Table 3-2. p = x−. zc x g. Table 3-2. Notation and definition of the cart-table model. Notation. p x zc g x. Definition Position of ZMP written in O (origin of world coordinate frame) Position of the ground projection of CoM written in O Height of CoM measured from ground Gravity CoM acceleration. 19. (3.8).

(27) Figure 3-5. ZMP preview controller block diagram.. After determining the ZMP dynamic model, then the controller block diagram system can be resolved, as seen in Figure 3-5, where the ZMP reference is used as the input system, and the output is the position of ZMP indicated as p. The concept of preview control similar to a servo tracking system. Only in this concept, the output of the preview control system reacts before the reference changed. The future reference must be predefined to achieve this requirement. As seen in Figure 3-5, the ZMP reference is stored in the FIFO buffer before entering the preview controller. The preview controller calculated a signal input u to control the ZMP output to ensure that the output reached to the reference. The state x defined the behavior of the cart that contained the information of position, velocity, and acceleration. The position of the cart can be used to define the CoM trajectory in the world coordinate. The mathematical formulation of the preview controller described as follows. From the dynamic model in (3.8), the discrete state-space representation with a control cycle t defined in the equation (3.9) and (3.10).. xk*+1 = Axk* + buk. (3.9). pk = cxk*. (3.10). Where the modified input and states vector notated as uk and xk* represented in the equation (3.11) - (3.13). uk = uk − uk −1 xk* =  pk. 20. xk . T. (3.11) (3.12).

(28) xk = xk − xk −1. (3.13). The notation A , b , and c are the modified system matrix that defined in the equation (3.14) - (3.16). 1 cA A=  0 A . (3.14). b =  cb b . (3.15). c = 1 0 0 0. (3.16). T. A , b , and c , which is the original system matrix of the cart-table model, described. in equation (3.17) - (3.19). 1 t  A = 0 1 0 0  b =  6t. 3. t 2 2.   t  1  t 2 2. t . (3.17). T. (3.18). c = 1 0 − zcg . (3.19). The cost function J in the equation (3.20) defined the performance of the ZMP preview controller, where the Q and R define positive weights. The final term of the ZMP preview controller defined in the equation (3.21). . 2 J =  Q ( p ref j − p j ) + Ru j. (3.20). uk = − K s  ( pref − p j ) − K x xk +  g j pkref+ j. (3.21). j =1. k. N. i =0. j =1. The K s and K x defined the gain value, and g j denoted the preview gain for the ZMP preview controller which can be calculated through the discrete-time algebraic Riccati equation. In practically, there is DARE function in Python Control Library2 to find the gain values easily by giving A , b , Q , and R values. The ZMP preview controller, which described before, can generate the CoM trajectory in a single axis. In the practical view, the CoM trajectory has to be generated 2. https://python-control.readthedocs.io/. 21.

(29) in the 3D coordinate. The CoM position in the x -axis and y -axis can be determined by two identical preview controllers with input reference from the ZMP in position x -axis and y -axis. From here, the resulted of the CoM trajectory becomes position vector p = x. y z  , where x and y are the CoM position in x -axis and y -axis, while z is T. equal to the constant height of CoM zc . While the walking pattern generator generated the omnidirectional walking, not only the CoM position has to be determined, but also the CoM orientation. The CoM orientation in the z -axis (yaw) defined the body heading during walking. While it set to zero, this setting is to make sure that the body always faces forward to the frontal view. This orientation has to be synchronized with the orientation of the sole in order to perform the omnidirectional walk. In order to generate the CoM orientation trajectory, the Bezier curve function has been chosen and only activated during the SSP. The CoM yaw orientation at the time b ( t ) in the equation (3.22) generated by the Bezier curve function that parameterized by point ai in the equation (3.23) - (3.25). The initial CoM orientation in yaw direction defined by  n , and the final CoM orientation in yaw direction described by  n +1 . Both  n and  n +1 can be defined from the footstep pattern as formulated in the equation (3.26) - (3.27). n n n −i b ( t ) =    (1 − t ) t i ai i =0  i . a0 = 0  n . ,0  t 1 T. a1 = 0  n +1 . (3.23). T. a2 = a3 = tSSP  n +1 . (3.22). (3.24) T. (3.25). n =. pn + pn −1 2. (3.26).  n+1 =. pn + pn +1 2. (3.27). On the other hand, CoM orientation in pitch orientation  ( t ) is dependent on the body offset parameter  c . While the CoM orientation in roll angle  ( t ) never changed 22.

(30) and stay constant on zeros. From here, the CoM orientation in roll, pitch, and yaw orientation is fully defined. Adjusting the body offset parameter xc will shift to the CoM position. This shifting has to be calculated to update the initial position of CoM resulted from the ZMP preview controller. The equation (3.28) and (3.29) can be used for determining the new CoM position given body offset value xc . Where xinit and yinit are the initial CoM position in x -axis and y -axis, and  is the current orientation of the CoM in the yaw direction.. x = xinit + xc cos ( ). (3.28). y = yinit + xc sin ( ). (3.29). The final result of the CoM position and orientation can be constructed into a homogenous transformation matrix OCT , as described in the equation (2.1), which defined the transformation of the CoM frame from the origin of the world coordinate.. 3.5 Swing Foot Trajectory Generation The swing foot trajectory generation is illustrated in Figure 3-6. It demonstrated one cycle walking pattern started from the left DSP, followed by SSP, and stop in DSP. The red line in Figure 3-6 illustrated a generated swing foot trajectory while the robot in the SSP. As seen in Figure 3-6, while the robot walked forward without changing the orientation, the trajectory related to the control point ai , maximum swing height zf , and step length sxn .. Figure 3-6. Illustration of the swing foot trajectory from a lateral view. 23.

(31) Bezier curve function, in the equation (3.22) generated the swing foot trajectory that has a position and orientation path. The position path generator results in the sole position depicted in the world coordinate. While orientation path generator only yielding the sole orientation path in z -axis or yaw direction and the orientation in roll and pitch is constant to zero. The position path resulted from the Bezier curve function represented a vector p = x. y z  that defined the sole position of swing foot. This path related to control T. points ai , which correlated with the initial footstep position, target footstep position, and maximum swing height parameter. The equation (3.30) - (3.33) described the control point ai . a0 =  pxn −1. 0 . p yn −1. T. T. a1 =  pxn −1. p yn −1. zf . a2 =  pxn +1. p yn +1. zf . a3 =  pxn +1. p yn +1. 0 . T. T. (3.30) (3.31) (3.32) (3.33). For generating an orientation path, the control point ai is substituted by the equation (3.34) - (3.36), which only contains two elements. The elements of point vectors consist of initial footstep orientation, target footstep orientation, and timing variable tSSP . The yaw orientation of the sole can be taken from the second element of the Bezier function. a0 = 0. pn −1 . a1 = 0. pn +1 . a2 = a3 = tSSP. T. (3.34). T. pn +1 . (3.35) T. (3.36). Both generated paths can be formulated as a frame by constructing a transformation matrix OST represented the translation and rotation of the sole frame that written from the origin of the world coordinate.. 24.

(32) 3.6 Joint Space Trajectory Generation Both resulted trajectories are defined as a frame that is written in the world space coordinate. These trajectories have to be translated into joint spaces to match the requirement. The conversion from world space trajectory to joint space trajectory can be started by finding CST , which described the transformation of the sole frame written from the CoM frame. Based on OCT and OST that was constructed in Section 3.4 and Section 3.5, the CST described in the equation (3.37) which has a similar meaning to 08T in (2.15). The leg joints angle can be determined using the inverse kinematics solver in Section 2.4. T = OCT −1 OST. C S. 25. (3.37).

(33) Chapter 4. Gait Optimization In this section, we introduced a method for optimizing a generated gait from Chapter 3. In Section 4.1 and Section 4.2, will be explained an optimization objective and parameters. Section 4.3 delivered an explanation of the optimization algorithm. In addition, the environment for optimization is delivered in Section 4.4.. 4.1 Optimization Objective We proposed a single objective function as defined in the equation (4.1) for measuring gait performance. The notation r referred to the reward value that the robot will get while performing one rollout walking. The notation E is the total of power energy, which is consumed by the leg actuators. The notation  roll and  pitch values are determined by the standard deviation of IMU data in roll and pitch orientation. The notation w1 , w2 , and w3 are negative weights that have to be adjusted manually. The optimization goal is to maximize r by selecting the optimal gait parameters.  w E + w2 roll + w3 pitch if robot not falling r= 1 otherwise  −10. (4.1). The walking stability is determined by torso vibration, which can be measured by the IMU sensor. This sensor measured and recorded roll and pitch orientation during the robot walked. The lower standard deviation value of recorded data indicates that the robot walked with more stability. There are two ways for estimating energy consumption, first by electric power and second by torque measurement. For estimating energy from electric power, the joint actuator must be equipped with current and voltage sensors. Measurement energy using electric power can be started by integrating electric power over time and normalize by time measurement, as defined in the equation (4.2). The notation E defined the total consumed electric power per second by leg actuators, where V is voltage, I is current, t is sampling time, and T is the total time for measurement.. 26.

(34) T. 1 E =  VI t T 0. (4.2). Energy estimation from torque can be used in the simulation model, where the current and voltage sensor models are not provided. The equation (4.3) determined the energy consumption estimation from torque. The notation E defined total torque per second by leg joints and  denoted torque of leg joints. T. E=. 1 t T 0. (4.3). 4.2 Optimization Parameters As described in Section 3.2, gait parameters affected to the stability, speed, and energy consumption of the robot. Moreover, gait parameters have unknown continuous value. For the optimization purposed, the search space of the parameters has to be determined. Based on the previous experience with hand-tuned parameters, we defined a search space of the gait parameters as described in Table 4-1, where the definition can be referred to Table 3-1. This range defined the possible value of the parameters for the robot to walk in the real environment.. Table 4-1. Gait parameters search space. Parameter Minimum Maximum Name Value Value zc 31 cm 34 cm 0.2 s 1.0 s tstep. DSPratio zf xc ys c. 10 %. 90 %. 2.5 cm 0 cm 0 cm 10 degrees. 5 cm 5 cm 2m 20 degrees. 4.3 Covariance Matrix Adaptation Evolution Strategies Covariance Matrix Adaptation Evolution Strategies (CMA-ES) is one of the evolution-based optimization techniques proposed by [31, 34, 35]. The CMA-ES addresses an optimization problem of non-linear and non-convex functions in a 27.

(35) continuous domain. For finding the optimal solution, the CMA-ES is applying mutation, recombination, and selection to the population of individuals. The single individual contains an n -dimensional candidate solution. During the evolution process, the new offspring is generated from the current population iteratively to get the best individual contains for the best solution. Algorithm 2. Covariance Matrix Adaptation – Evolution Strategies (CMA-ES) Input: m . n. , . +. ,. Initialize: C = I , pc = 0, p = 0 While not terminate Sampling xk and evaluate reward rk for k = 1,. ,. Rank xk based on reward Update mean of search distribution m Update evolution path pc and covariance matrix C Update evolution path p and step-size  Return m or x1. The algorithm of the CMA-ES divided into four processes, such as; generating the individuals or sample process, moving the search distribution, covariance matrix adaptation, and controlling the step size. These processes are executed iteratively until several numbers of generations or until the stopping criterion was met. Algorithm 2 described the pseudo-code of the CMA-ES. In the beginning, the hyperparameter m ,  , and  have to be defined. Where, m conducted to the mean of search distribution,  for an initial step-size, and  related to the total number of individuals per generation. For the initialization step, the covariance matrix C can be initialized with an identity matrix while pc and p are initialized with zeros. In the sampling process, the new individuals or offspring are generated by sampling a multivariate normal distribution specified by m and C of generation g . The equation (4.4) defined the mathematical formulation of the sampling process. The xk( g +1) corresponded to an individual k of the next generation ( g + 1) that contained a possible solution from the sampling process and  defined as the step-size control. Each individual will be evaluated by an objective function to get an objective value rk , and the individual will be ranked based on this value. 28.

(36) xk( g +1). ( 0, C ) for k = 1,. m( g ) +  ( g ). ,. (g). (4.4). On the other hand, the equation (4.5) is used to determine the process of moving the mean of search distribution. In this process, the elite individuals will be selected based on their rank. The  denoted to the total number of the elite individual which selected. For ideal selection, the elite individual was selected from 25% of the population or eff  4 . The mean value m ( g +1) was updated using a weighted mean of the elite. individuals, where wi related to a positive weight and cm denoted to the learning rate. . (. m( g +1) = m( g ) + cm  wi xi(:g +1) − m( g ) i =1. ). (4.5). In the covariance matrix adaptation process consisted of evolution path, rank-one update, and rank-  update. The equation (4.6) defined a mathematical model for adapting the covariance matrix, where c1 related to a coefficient for rank-one update and c  denoted to a coefficient for rank-  update. The pc( g +1) in the equation (4.8) defined. the evolution path for adapting the covariance matrix. . C ( g +1) = (1 − c1 − c  w j ) C ( g ) + c1 pc( g +1) pc( g +1)T + c  wi yi(:g +1) ( yi(:g +1) )T. (4.6). i =1. ( g +1) i:. y. (x =. ( g +1) i:. . − m( g ). ). (4.7). (g). pc( g +1) = (1 − cc ) pc( g ) + cc ( 2 − cc ) eff. m( g +1) − m( g ).  (g). (4.8). The last process is about the purpose of the step-size control to change the scale of distribution. In the CMA-ES, the step-size control approach used the Cumulative Steplength Adaptation (CSA). The equation (4.9) defined the CSA formulation that depended on the evolution path p( g +1) result in the equation (4.10). The coefficient c defined the learning rate for step-size control, and the coefficient d defined the damping parameter for the step-size update. . ( g +1). =. (g). c exp    d . 29.   p( g +1)  − 1  E  0, I ) (  . (4.9).

(37) ( g +1). p. = (1 − c ) p + c (2 − c ) eff C (g). (g). −. 1 2. m( g +1) − m( g ).  (g). (4.10). The step-size value can be an indicator of convergence. The closer step-size value to zeros indicated that the optimal solution is found, and it can be taken from the mean individuals m or from the best individual x1 .. 4.4 Training in the Simulation-based Environment For the optimization purpose, the objective function must be evaluated while the robot walked. Training in a real-world environment is time-consuming and costly. Due to the robot has a chance to fall that cause damaged, we used a simulated environment to prevent this issue. The Gazebo simulator software3 has been used in order to simulate the walking behavior of the robot during the optimization process.. Figure 4-1. Data flow of simulation-based learning.. During the process, the data flow of the training process illustrated in Figure 4-1. The learning algorithm which used CMA-ES generated stochastic gait parameters and used to generate a walking trajectory by giving a random motion command. The world space trajectory translated into a joint angle by an inverse kinematics solver and sent into the simulation environment to execute walking motion. During the robot walk, the. 3. http://gazebosim.org/. 30.

(38) data of IMU and joint states are recorded, which will be used for calculating a reward function. The reward value and rollout history that get from the past experience are used by CMA-ES to resample the gait parameters. For the training process, the hyperparameter of the CMA-ES algorithm, which is listed in Table 4-2, has to be determined. In this algorithm, the step per cycle will be defined as how many steps for one cycle walk. While the cycle per rollout defined how many walk cycles will be executed for one rollout. Each cycle has five different walking commands where it is defined as stepping, walking forward, walking backward, walking sideways, and turning walk. Walking command also generated with the random step length, step size, and step rotation, so there will be variable speed and movement during walking. The maximum generation defined the maximum iteration for the learning process, where each iteration contained a number of populations. In this case, we used a small population to reduce evaluation time due to the real-time performance in the simulation is lower than in the real environment.. Table 4-2. The hyperparameters setting for optimization. Hyperparameter Definition Value Step per cycle Total steps per one cycle walk 30 Cycle per rollout Total cycle per one rollout evaluation 5 Maximum generation Total maximum iteration for training 100 Number of individuals 10  (0) Initial sigma value 0.5 . 31.

(39) Chapter 5. Experimental Result and Discussion This chapter explained an experimental result and discussion of the proposed method. Section 5.1 described a result of the gait generator that visualizes in world spaces trajectory. Section 5.2 explained the training performance in the simulator, followed by Section 5.3, which delivered an explanation of evaluation in the simulation and physical robot. Section 5.4 delivered a comparison benchmark of hand-tuned and trained gait resulted from CMA-ES, followed by Section 5.5 that explained the extension of the previous experiment with variable step length.. 5.1 Generated Walking Trajectory This subsection described the result of walking pattern generation, which been illustrated in Section 3.2. In this experiment, we run the walking pattern generator with hand-tuned parameters that describe in Table 5-1. Table 5-1. Parameter values for generating trajectories. Parameter Name. Value. zc tstep. 34 cm 0.25 s. DSPratio zf xc ys c. 20 % 3 cm 0 cm 0 cm 0 degrees. The example of trajectory is done when the robot is given walking forward command which illustrated in Figure 5-1. The CoM frame moved forward in the x -axis direction, followed by the left and right sole frame while the robot got the motion command cmd x = 0.05, cmd y = 0.00, and cmd = 0.15 rad, as shown in Figure 5-1 (a). The curve in Figure 5-1 (a) indicated that the sole position is moving up to the z -axis and touch down to the ground while in the single support phase. Figure 5-1 (b) illustrated the resulted trajectory from walking sideways motion while the robot was given motion command by cmd y = 0.015, where cmd x , and cmd are 32.

(40) equals zeros. As shown in Figure 5-1 (b), the CoM frame moved to the left side, followed by the y -axis direction. The movement of the left and right soles also followed the direction of CoM, while the position in the x -axis maintained constant in 0 m. The diagonal path trajectory yielded when cmd x and cmd y values are given a nonzero, as illustrated in Figure 5-1 (c). In this case, the orientation CoM frame was pointing forward to the x -axis, but the position of the CoM frame was moving in both the x -axis and y -axis. Figure 5-1 (d) shown a trajectory where the cmd x and cmd components are set to a non-zero value. The CoM trajectory generated a curve path, as shown in Figure 5-1 (d). In contrast to Figure 5-1 (c), not only the position of the CoM frame was changing, but also the orientation of the CoM frame. In the real environment, this trajectory will result in a turning motion for the robot.. (a). (b). (c) (d) Figure 5-1. Generated CoM and sole trajectories (a) walking forward, (b) walking sideways, (c) walking diagonal, and (d) walking with turning. 33.

(41) 5.2 Training Performance Figure 5-2 shows the statistics of reward and average success rate during the training process. The training reward statistics are illustrated in Figure 5-2 (a). The red line indicated the median of reward from all individuals, followed by a red band that indicated the 25th and 75th percentile of reward. The blue line in Figure 5-2 (a) visualized the Root Mean Square (RMS) standard deviation from all parameters during training. On the other hand, Figure 5-2 (b) described success rate performance during training. As seen in Figure 5-2 (a), the median, on the 25th, and 75th percentile of the reward increased significantly during training. In the first generation, the median of reward was -8.946, with 25th percentile -10 and 75th percentile -6.894. Moreover, the resulted success rate at the beginning of training, about 28%. In the final generation, the median of reward increased to -3.160, as the 25th percentile rises to -3.212, and the 75th percentile goes up to -3.127, which the success rate performance steady at 98 %. Based on the training record, the reward value remained stable after reaching the 40th generation, while the mean success rate stayed constant above 90 %. The RMS standard deviation decreased significantly and close to zeros in the last generation that indicated the parameters converge. The best parameters yielded from training in the simulation are shown in Table 5-2.. (a) (b) Figure 5-2. Reward and success rate performance during training.. 34.

(42) Table 5-2. Result of optimal walking parameters. Parameter Name. Value. zc tstep. 32.220 cm 0.543 s. DSPratio zf xc ys c. 62.382 % 3.628 cm 3.158 cm 0.259 cm 12.651 degrees. 5.3 Evaluation Performance The best parameter resulted from each generation is evaluated in simulation to verify the energy reduction based on torque value. We verify from the first until the last generation to study reduction each training iteration. Figure 5-3 shows the evaluation in the simulation model. The leg torque is shown in Figure 5-3 (a), and the  roll and  pitch value are illustrated in Figure 5-3 (b). As seen in Figure 5-3 (a), the leg torque decreased significantly from the first to the last generation that indicated the optimization process successfully minimizes the energy.. (a) (b) Figure 5-3. Evaluation of simulation model (a) torque (b)  roll and  pitch .. On the other hand, the  roll value decreased from the first generation, and the  pitch value maintained stable at 0.04 - 0.06. This condition made the robot walk more stable in the last generation. Table 5-3 shown a definite improvement in energy and stability.. 35.

(43) The reduction of energy can be seen in torque value that reduced about 29.813 %. The improvement of stability can be seen in a success rate that increased about 20 % from the initial and final generation.. Table 5-3. Comparison performance between initial and final generation on simulation. Key Initial Final Changing Parameter Generation Generation Torque (Nm/s) 1.579 1.108 Decrease 29.813 % 0.051 0.027 Decrease 47.445 %  roll 0.040 0.051 Increase 25.891 % pitch  Success Rate (%) 80 % 100 % Increase 20 %. 5.4 Comparison Performance Before and After Optimization We evaluate the optimization result by comparing the performance of optimized and non-optimized gait in the robot. We used hand-tuned parameters from trial and error experiments for non-optimized gait and got the best parameter in Table 5-2 for optimized gait. In this experiment, we used a statistical approach with four different types of omnidirectional walking, such as forward, backward, sideways, and turning. Each walking type is applied to the robot with 50 trials on the thin carpet with the starting and finish line. The robot is controlled manually by joystick to start walking from the starting line to the finish line. The distance of the starting and finish line for walking forward and backward was 2 meters, and the distance for walking sideways was 1 meter. In the turning motion, we run the robot to walk on a circular path with a diameter of 0.74 m. During the experiment, the robot is given a constant motion command. The command for walking forward and backward cmd x = 0.03, −0.03 . For the walking sideways, we run with both walking to side left and walking in the right direction by giving the command cmd y = 0.01, −0.01 . For walking on a circular path, we gave the motion command cmd x = 0.03 and cmd = 5, −5 degrees to walk on clockwise (CW) and counter-clockwise (CCW) direction. We recorded the voltage, current, and IMU sensor data during walking to compare the gait performance as presented in Table 5-4.. 36.

(44) Table 5-4. Comparison before and after optimization. Walking Type. Before Optimization Success Energy Speed Rate (%) (W/s) (m/s). Walking 96 Forward Walking 22 Backward Walking 68 Sideways Turning 60 Average Saving Energy. After Optimization Success Energy Speed Rate (%) (W/s) (m/s). Saving energy (%). 12.776. 0.097. 100. 10.230. 0.046. 19.928. 12.962. 0.093. 100. 10.426. 0.042. 19.565. 13.127. 0.034. 100. 10.293. 0.015. 21.589. 13.075. 0.076. 100. 10.651. 0.044. 18.539 19.905. Table 5-4 delivered the performance comparison before and after optimization. The success rate represented the percentage of the successful trial, where the robot reached the finish line without falling. The energy represented by the total consumed electric power of the leg actuator per second. As described in Table 5-4, before optimization, the average success rate was about 61.5 % that indicated unstable gait and made the robot high chance to fall. The nonoptimized gait yielded a high success rate in walking forward only that indicated the parameters were not generalized for other types of walking. From the speed and energy perspective, before optimization, the average energy was about 12.985 W/s, and the average speed about 0.075 m/s. After optimization, the gait performance improved significantly. The success rate maintained a constant of 100 % for a different type of walking that indicated the gait parameters yielded a stable gait that made the robot never falling. On the other hand, the average consumed energy of 10.4 W/s, which was lower compared to non-optimized gait. However, the average speed reached 0.037 m/s, which was slower compared to non-optimized gait. By comparing the consumed energy before and after optimization, the optimized gait was able to save energy about 19.905 %. As the conclusion of the result visualized in Table 5-4, after optimization, the walking gait was slower, more stable, and less consumed energy compared to non-optimized gait.. 37.

(45) 5.5 Straight Walk with Variable Step Length We extend the experiment in Section 5.4 by varying step lengths when the robot walked forward and backward to study the effect of changing step length to gait performance. We used step length from range 1 cm to 5 cm with an interval of 1 cm. The result of the walking forward experiment shown in Table 5-5, and walking backward experiment summarized in Table 5-6.. Table 5-5. Comparison walking forward with variable step length. Step Length (cm) 1 2 3 4 5 6 7. Before Optimization Success Rate Energy (%) (W/s) 100 12.914 100 12.111 96 12.776 70 11.954 36 12.573 -. Speed (m/s) 0.041 0.070 0.097 0.113 0.129 -. After Optimization Success Rate Energy (%) (W/s) 100 10.907 100 11.004 100 10.230 100 10.498 100 10.249 100 10.021 100 10.099. Speed (m/s) 0.016 0.029 0.046 0.060 0.072 0.073 0.072. Based on data in Table 5-5, before optimization, the changing of step length affected the reduction in success rate. The higher step length applied to the robot produced a high chance for the robot to fall. However, after we optimized the gait, the changing of step length did not affect the stability. The robot can walk with minimum to maximum step length with a constant success rate. With the optimized gait, we can reach a maximum speed of 0.072 m/s with the highest step length of 7 cm. On the other hand, the consumed energy maintained stable at around 10.021 – 11.004 W/s, even though the step length was varying.. Table 5-6. Comparison walking backward with variable step length. Step Length (cm) 1 2 3 4 5 6 7. Before Optimization Success Rate Energy (%) (W/s) 44 13.404 48 13.823 22 12.962 36 12.336 8 12.373 -. Speed (m/s) 0.038 0.052 0.093 0.086 0.113 -. 38. After Optimization Success Rate Energy (%) (W/s) 100 11.271 100 10.788 100 10.426 100 11.128 100 10.990 100 10.823 100 11.301. Speed (m/s) 0.014 0.025 0.042 0.058 0.070 0.072 0.072.

(46) The backward walking result, illustrated in Table 5-6, has a similar result with walking forward. While the variable step length applied to non-optimized gait, it affected the reduction in success rate. By using an optimized gait, the robot able to walk backward from step length 1 cm – 7 cm without falling. However, compared to forward walking, the consumed energy was quite higher, about 10.426 W/s – 11.301 W/s. In contrast, the maximum speed was equal to forward walking about 0.072 m/s. Table 5-5 and Table 5-6 conclude that the yielded walking parameter from optimization successfully applied to walk with variable step length, which means that the robot can walk stably with low energy at a different speed.. 39.

(47) Chapter 6. Conclusion and Future Work This thesis presented a stable, energy-efficient, and omnidirectional gait generation on the humanoid robot. The ZMP preview controller with Bezier function was used to generate a walking gait. Moreover, the CMA-ES algorithm was proposed for optimizing gait parameters in the simulation model. The yielded gait engine was verified in the real robot to measure the stability and consumed energy performance. Based on an experimental result, the proposed gait generation achieved a stable and energy-efficient gait. The reduction in energy during training about 29.813 % in simulation. On the other hand, stability increases by 20 % in simulation. The optimized gait successfully reduced energy consumption by 19.905 % compared to non-optimized gait. Moreover, the optimized gait yielded a stable performance while it applied to variable-speed and omnidirectional walk. Even though the gait engine is stable, but it can not guarantee to reject external disturbance cause the gait generation is open loop. In future work, a model-free reinforcement learning will be studied to improve the dynamic balance in the robot.. 40.

(48) References [1]. [2]. [3]. [4]. [5]. [6] [7]. [8]. [9]. [10]. [11]. [12]. [13]. S. Kajita et al., "Biped walking pattern generation by using preview control of zero-moment point," in 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422), 2003, vol. 2, pp. 1620-1626 vol.2. M. Naveau, M. Kudruss, O. Stasse, C. Kirches, K. Mombaur, and P. Souères, "A Reactive Walking Pattern Generator Based on Nonlinear Model Predictive Control," IEEE Robotics and Automation Letters, vol. 2, no. 1, pp. 10-17, 2017. M. Kasaei, N. Lau, and A. Pereira, "A Fast and Stable Omnidirectional Walking Engine for the Nao Humanoid Robot," in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) vol. 11531 LNAI, ed, 2019, pp. 99-111. K. Harada, S. Kajita, K. Kaneko, and H. Hirukawa, "AN ANALYTICAL METHOD FOR REAL-TIME GAIT PLANNING FOR HUMANOID ROBOTS," International Journal of Humanoid Robotics, vol. 03, no. 01, pp. 119, 2006/03/01 2006. I. W. Park, J. Y. Kim, J. Lee, and J. H. Oh, "Online free walking trajectory generation for biped humanoid robot KHR-3(HUBO)," in Proceedings - IEEE International Conference on Robotics and Automation, 2006, vol. 2006, pp. 1231-1236. K. Matsuoka, "Mechanisms of frequency and pattern control in the neural rhythm generators," Biological Cybernetics, Article vol. 56, no. 5-6, pp. 345-353, 1987. K. Matsuoka, "Sustained oscillations generated by mutually inhibiting neurons with adaptation," Biological Cybernetics, Article vol. 52, no. 6, pp. 367-376, 1985. L. Righetti, J. Buchli, and A. J. Ijspeert, "Dynamic Hebbian learning in adaptive frequency oscillators," Physica D: Nonlinear Phenomena, Article vol. 216, no. 2, pp. 269-281, 2006. H. Wang, C. Liu, and Q. Chen, "Omnidirectional walking based on preview control for biped robots," in 2016 IEEE International Conference on Robotics and Biomimetics, ROBIO 2016, 2016, pp. 856-861. S. Wang, M. Hu, H. Shi, S. Zhang, X. Li, and W. Li, "Humanoid robot's omnidirectional walking," in 2015 IEEE International Conference on Information and Automation, ICIA 2015 - In conjunction with 2015 IEEE International Conference on Automation and Logistics, 2015, pp. 381-385. N. Snafii, A. Abdolmaleki, N. Lau, and L. P. Reis, "Development of an Omnidirectional Walk Engine for Soccer Humanoid Robots," International Journal of Advanced Robotic Systems, Article vol. 12, no. 12, 2015, Art. no. 193. P. Shen, Z. Liang, and X. Li, "Omnidirectional walk of biped robots in RoboCup3D simulation environment," in 26th Chinese Control and Decision Conference, CCDC 2014, 2014, pp. 2119-2123. A. Abdolmaleki, N. Shafii, L. P. Reis, N. Lau, J. Peters, and G. Neumann, "Omnidirectional walking with a compliant inverted pendulum model," in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) vol. 8864, ed, 2014, pp. 481-493. 41.

(49) [14]. [15]. [16]. [17]. [18]. [19]. [20]. [21]. [22]. [23] [24] [25]. [26]. [27]. N. Shafii, A. Abdolmaleki, R. Ferreira, N. Lau, and L. P. Reis, "Omnidirectional walking and active balance for soccer humanoid robot," in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) vol. 8154 LNAI, ed, 2013, pp. 283-294. J. J. Alcaraz-Jiménez, D. Herrero-Pérez, and H. Martínez-Barberá, "Motion planning for omnidirectional dynamic gait in humanoid soccer robots," Journal of Physical Agents, Article vol. 5, no. 1, pp. 25-34, 2011. J. Strom, G. Slavov, and E. Chown, "Omnidirectional walking using ZMP and preview control for the NAO humanoid robot," in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) vol. 5949 LNAI, ed, 2010, pp. 378-389. D. Gouaillier, C. Collette, and C. Kilner, "Omni-directional closed-loop walk for NAO," in 2010 10th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2010, 2010, pp. 448-454. J. Cristiano, D. Puig, and M. A. García, "Generation and control of locomotion patterns for biped robots by using central pattern generators," Journal of Physical Agents, Article vol. 8, no. 1, pp. 40-47, 2017. K. Moradi, M. Fathian, and S. Shiry Ghidary, "Omnidirectional walking using central pattern generator," International Journal of Machine Learning and Cybernetics, Article vol. 7, no. 6, pp. 1023-1033, 2016. D. Rodriguez, A. Brandenburger, and S. Behnke, "Combining Simulations and Real-Robot Experiments for Bayesian Optimization of Bipedal Gait Stabilization," in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) vol. 11374 LNAI, ed, 2019, pp. 70-82. V. H. Dau, C. M. Chew, and A. N. Poo, "Achieving energy-efficient bipedal walking trajectory through Ga-based optimization of key parameters," International Journal of Humanoid Robotics, Article vol. 6, no. 4, pp. 609-629, 2009. P. Kormushev, B. Ugurlu, S. Calinon, N. G. Tsagarakis, and D. G. Caldwell, "Bipedal walking energy minimization by reinforcement learning with evolving policy parameterization," in IEEE International Conference on Intelligent Robots and Systems, 2011, pp. 318-324. J. Kober and J. Peter, "Policy search for motor primitives in robotics," in Springer Tracts in Advanced Robotics vol. 97, ed, 2014, pp. 83-117. J. Kober and J. Peters, "Policy search for motor primitives in robotics," Machine Learning, Article vol. 84, no. 1-2, pp. 171-203, 2011. J. Kober and J. Peters, "Policy search for motor primitives in robotics," in Advances in Neural Information Processing Systems 21 - Proceedings of the 2008 Conference, 2009, pp. 849-856. J. Li and W. Chen, "Energy-efficient gait generation for biped robot based on the passive inverted pendulum model," Robotica, Article vol. 29, no. 4, pp. 595-605, 2011. Z. Sun and N. Roos, "An energy efficient dynamic gait for a Nao robot," in 2014 IEEE International Conference on Autonomous Robot Systems and Competitions, ICARSC 2014, 2014, pp. 267-272. 42.