基於深度強化學習之移動大型重物

全文

(1)國立臺灣師範大學科技與工程學院電機工程學系碩士論文 Department of Electrical Engineering College of Technology and Engineering. National Taiwan Normal University Master’s Thesis. 基於深度強化學習之移動大型重物 Moving Large Size and Heavy Object with Deep Reinforcement Learning 許哲菡 Hanjaya Mandala 指導教授: 包傑奇教授 Advisor: Prof. Jacky Baltes. 中華民國 109 年 6 月 June 2020.

(2) Acknowledgment This work was financially supported by the ‘Chinese Language and Technology Center’ of National Taiwan Normal University (NTNU) from The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan, and Ministry of Science and Technology, Taiwan, under Grant Nos. MOST 108-2634-F-003-002, MOST 108-2634F-003-003, and MOST 108-2634-F-003-004 (administered through Pervasive Artificial Intelligence Research (PAIR) Labs) as well as MOST 107-2811-E-003-503. We are grateful to the National Center for High-performance Computing for computer time and facilities to conduct this research.. i.

(3) Moving Large Size and Heavy Object with Deep Reinforcement Learning Student: Hanjaya Mandala. Advisor: Prof. Jacky Baltes. Department of Electrical Engineering National Taiwan Normal University ABSTRACT. Humanoid robots are designed and expected to work alongside a human. In our daily life, Moving Large Size and Heavy Objects (MLHO) can be considered as a problem that is a common activity and dangerous to humans. In this thesis, we propose a novel hierarchical learning-based algorithm, which we use dragging to transport an object on an adult-sized humanoid robot. The proposed method proves robustness on a THORMANG-Wolf adult-sized humanoid robot, that manages to drag a massive object with a mass of double of its weight (84.6 kg) for 2 meters. Therefore, the algorithms consist of three hierarchical deep learning-based algorithms to solve the MLHO problem and distributed in terms of robot vision and behavior control. Based on this insight, in the robot vision control, first, we propose deep learning algorithms to 3D object classification and surface detection.. For 3D object classification, we propose a Three-layers Convolution Volumetric Network (TCVN). Input data of the TCVN model used a voxel grid representation from point clouds data acquired from the robot’s LiDAR scanner. On the other hand, for surface detection, we propose a lightweight real-time instance segmentation called TinyYOLACT (You Only Look at Coefficients) to segment the floor from the robot’s camera. Tiny-YOLACT model is adopted from the YOLACT model and utilized ResNet-18 ii.

(4) model as the backbone network. Furthermore, for robot behavior control, as the main part of this thesis we address solving MLHO problem by an adult-sized humanoid robot using the deep reinforcement learning algorithm for the first time. At this part, we proposed a Deep Q-Learning algorithm to train a deep model for control policy in offsetting the Centre of Body (CoB) of the robot when dragging different objects named (DQL-COB). For this purpose, the offset CoB is implemented to keep tracking with the robot’s center of mass. As a result, the robot can keep balance with maintaining the ZMP in the support polygon. DQL-COB algorithm was first trained on the ROS Gazebo simulator to avoid costly experiments in terms of time and real environment constraints, then it was adopted with a real robot on three different types of surfaces.. To evaluate the stability of the THORMANG-Wolf robot with the proposed methods, we evaluated two types of experiments on three types of surfaces with eight different objects. In these experiments, in one scenario we use IMU along with foot Pressure (F/T) sensor, in the second scenario we just use IMU data as learning algorithm input. In the experiments, the success rates of applying the DQL-COB algorithm on the real robot are 92.91% with using the F/T sensor and 83.75% without using F/T sensors. Moreover, the TCVN model on 3D object classifications achieved a 90% accuracy in real-time. Correspondingly, the Tiny-YOLACT model achieved a 34.16 mAP on validation data with an average of 29.56 fps on a single NVIDIA GTX-1060 GPU.. Keywords: humanoid robot, deep reinforcement learning, dragging object, deep learning.. iii.

(5) Table of Contents. Acknowledgment ............................................................................................................. i ABSTRACT ................................................................................................................... ii Table of Contents........................................................................................................... iv List of Figures ................................................................................................................ vi List of Tables ............................................................................................................... viii Chapter 1: Introduction ....................................................................................................1 1.1.. Background ........................................................................................................1. 1.2.. Problem statement ..............................................................................................2. 1.3.. The objective of the study ..................................................................................4. 1.4.. Limitation of the study .......................................................................................5. Chapter 2: Literature Review ..........................................................................................6 2.1.. Related work ......................................................................................................6. 2.1.1.. Pushing object .............................................................................................6. 2.1.2.. Pivoting object ............................................................................................8. 2.1.3.. Teleoperation manipulation ........................................................................9. 2.1.4.. Walking Balance (Learning-Based) ............................................................9. 2.1.5.. Push Recovery (Learning-Based) .............................................................10. 2.1.6.. Summary of related work ..........................................................................11. 2.2.. Inverse Kinematic ............................................................................................12. 2.3.. Walking Gait ....................................................................................................14. 2.4.. Neural Network ................................................................................................16. 2.5.. Deep Learning ..................................................................................................17. 2.6.. Object Detection ..............................................................................................19. 2.7.. Reinforcement Learning ..................................................................................20. 2.8.. Deep Reinforcement Learning .........................................................................22. Chapter 3: Methodology ................................................................................................23 3.1.. THORMANG-Wolf Robot ..............................................................................23. 3.1.1.. Hardware Description ...............................................................................23. 3.1.2.. Software Description .................................................................................26 iv.

(6) 3.1.3. 3.2.. The Proposed Algorithm Design ...............................................................27. Robot Vision Process .......................................................................................30. 3.2.1.. 3D Object Detection (Deep Learning) ......................................................30. 3.2.2.. Floor Detection (Deep Learning) ..............................................................35. 3.3.. Robot Motion Control ......................................................................................38. 3.3.1.. Object Grasping ........................................................................................38. 3.3.2.. Walking Control ........................................................................................39. 3.4.. Robot Behavior Control ...................................................................................42. 3.4.1.. DQL-COB Algorithm Design ...................................................................43. Chapter 4: Experimental Result.....................................................................................54 4.1.. Experimental Setup ..........................................................................................55. 4.2.. Experimental Result for Robot Vision .............................................................57. 4.2.1.. 3D Object Classification Result ................................................................57. 4.2.2.. Floor Detection Result ..............................................................................59. 4.3.. Experimental Results for Robot Behavior .......................................................62. 4.3.1.. DQL-COB Training Results .....................................................................62. 4.3.2.. DQL-COB Empirical Evaluation Result ...................................................66. Chapter 5: Closing .........................................................................................................73 5.1.. Conclusion .......................................................................................................73. 5.2.. Future Work .....................................................................................................74. Bibliographies ................................................................................................................75 Autobiography ...............................................................................................................79 Academic Achievement.................................................................................................80. v.

(7) List of Figures. Figure 1-1 Comparison motion pose on the moving object. ...........................................3 Figure 2-1 Example of inverse kinematic on the left leg of a biped robot; ...................12 Figure 2-2 Tree structure of the humanoid links connection [3]. ..................................14 Figure 2-3 Sagittal plane view of walking gait cycle [38]. ...........................................15 Figure 2-4 ZMP support polygon [3]. ...........................................................................15 Figure 2-5 Projection of the Centre of Mass on Zero Moment Point. ..........................16 Figure 2-6 Neural network architecture. .......................................................................16 Figure 2-7 Operations done by neurons on a single layer perceptron. ..........................17 Figure 2-8 Deeper network architecture of ANN or called Deep Learning. .................18 Figure 2-9 Convolutional Neural Network subclass of deep learning. .........................18 Figure 2-10 Various types of 2D image object detection. .............................................19 Figure 2-11 Types of 3D point cloud object detection by [47]. ....................................20 Figure 2-12 Markov Decision Process of Reinforcement Learning..............................20 Figure 2-13 Deep Q-Network architecture [50]. ...........................................................22 Figure 3-1 THORMANG3 adult-sized humanoid robot. ..............................................24 Figure 3-2 THORMANG-Wolf hardware architecture. ................................................25 Figure 3-3 THORMANG-Wolf electrical components system. ...................................26 Figure 3-4 ROS graph architecture performing a dragging task. ..................................27 Figure 3-5 THORMANG-Wolf hierarchical framework data flow diagram. ...............28 Figure 3-6 Block diagram of the proposed DL algorithms to solve MLHO problem. .29 Figure 3-7 Flowchart DL algorithm of 3D object classification. ..................................31 Figure 3-8 Example process of preprocessing 3D point cloud data. .............................33 Figure 3-9 The network architecture of the TCVN model. ...........................................34 Figure 3-10 Types of floors used in this experiment.....................................................35 Figure 3-11 YOLACT network architecture [46]. ........................................................36 Figure 3-12 Building block (residual function) of ResNet [56]. ...................................37 Figure 3-13 Sample pre-recorded motion for grasping different types of objects. .......38 Figure 3-14 Cart-table model ........................................................................................40 Figure 3-15 Walking pattern generation based on preview control ..............................40 vi.

(8) Figure 3-16 Walking gait pattern generation process. ..................................................41 Figure 3-17 Integration walking module with the DQL-COB algorithm. ....................42 Figure 3-18 IMU sensor as a state of the environment. ................................................44 Figure 3-19 Torque vector on both feet. ........................................................................44 Figure 3-20 Action offset on COB X. ...........................................................................46 Figure 3-21 Reward based on robot pitch state and finished distance. .........................47 Figure 3-22 Hierarchical Q-Network architecture. .......................................................49 Figure 3-23 Experience replay illustration on training data. .........................................50 Figure 3-24 Solution to non-stationary target DQN. .....................................................51 Figure 3-25 Block diagram of Deep Q-Network on ROS Gazebo simulator. ..............52 Figure 4-1 The 4 types of the 3D object after voxel grid filter. ....................................57 Figure 4-2 The comparison of TCVN model performances during training. ...............58 Figure 4-3 The comparison of the TCVN model in a confusion matrix of the validation data.................................................................................................................................59 Figure 4-4 Example results of floor detection using the Tiny-YOLACT model. .........60 Figure 4-5 Result of validation mAP and FPS using different ResNet backbone. .......61 Figure 4-6 Snapshot of dragging in the Gazebo simulator. ...........................................62 Figure 4-7 Comparison of accumulated reward during training. ..................................63 Figure 4-8 Comparison of Euclidean error during training. ..........................................63 Figure 4-9 Snapshot during training in the Gazebo simulator. .....................................64 Figure 4-10 Recorded (states, actions) pair by the learned DQN during testing. .........65 Figure 4-11 Snapshots testing on plywood surfaces. ....................................................66 Figure 4-12 Snapshots testing on green carpet surfaces. ...............................................67 Figure 4-13 Snapshots testing on tile surfaces. .............................................................67 Figure 4-14 The success rate of dragging all objects per each surface. ........................69 Figure 4-15 Recorded (states, actions) pair without using the F/T sensor. ...................69 Figure 4-16 Recorded (states, actions) pair using the F/T sensor. ................................71 Figure 4-17 Example of a failure condition in dragging an empty small suitcase. .......71 Figure 4-18 Example of a failure condition in dragging a big suitcase with a human..72 Figure 4-19 The pre-defined CoB-X to dragging a big suitcase with a human. ...........72. vii.

(9) List of Tables. Table 3-1 THORMANG-Wolf Specification Details. ..................................................25 Table 3-2 The network details of the TCVN model. .....................................................34 Table 3-3 The adopted ResNet architecture and number of parameters [56]. ..............37 Table 3-4 Details of the Q-Network architecture. .........................................................49 Table 4-1 Experimental surfaces. ..................................................................................55 Table 4-2 Experimental objects. ....................................................................................55 Table 4-3 Foot-steps parameter. ....................................................................................56 Table 4-4 Deep-learning computer hardware specifications. ........................................56 Table 4-5 OPC (laptop) hardware specifications. .........................................................56 Table 4-6 List of hyperparameters and values of the DQN ..........................................63 Table 4-7 Summary and comparison of the success rate result for all experiments. ....68. viii.

(10) Chapter 1: Introduction. 1.1. Background Humanoid robots have become important types of robots that researchers develop and improve them rapidly. In [1], a description of the possible application using a humanoid robot in real-life is provided. In [2], the authors review over last decade's application and influence of humanoid robots in the social, healthcare, and education domains. Recently, in (2019), the humanoid robot applications in a real-world scenario were chosen as the special topic issues in IEEE Robotics and Automation Magazine (RAM)1. Therefore, the development of humanoid robots offers significant potential in alleviating tedious and tough tasks that currently performed by humans. The important question with developing a humanoid robot is “Why humanoid robot? Why not the other types of robots?”. The answer can be indicated as the functions of the humanoid robots itself. Three main fundamental functions of a humanoid robot are evaluated on [3]: (i) Humanoid robots are able to work in the human environment, (ii) Humanoid robots are capable to use humans tools, (iii) Humanoid robots are designed structurally similar to a human shape. As mentioned, a humanoid robot is designed to be similar to mankind. It should mimic a human from different aspects such as interaction, perception, locomotion, manipulation, and behavior.. Generally, humanoid robots were expected to work alongside humans, or as an alternative to humans in any circumstances. For example, in heavy-duty work such as civil engineering and hazardous environments construction, Moving Large and Heavy Objects (MLHO) is required. Moreover, in rescue applications, during the evacuation process, it is necessary to remove the large size of debris. Though biped humanoid robots have high mobility like humans, walking with moving objects has a possibility robot may fall, due to relatively disturbance in the Centre of Mass (COM) with suffering. 1. https://www.ieee-ras.org/publications/ram/special-issues/humanoid-robot-applications-in-real-world-scenarios 1.

(11) serious damage. So far, many humanoid robot development projects with a focus on the MLHO was still a challenging problem [4-12]. These challenges can be summarized into how to develop a stable walking gait on a biped robot while the robot is dragging a large size object. Admittedly, the dragging problem is more challenging than carrying because there are more uncertainties of surface friction which duplicates the complexity of the problem.. 1.2. Problem statement Biped walking humanoid robots may not be stable due to different real-time environment conditions even the desired walking pattern has planned to realize stable walking on the flat floor. However, in the MLHO problem, it is assumed that some objects are too heavy to lift or its shape or size is very hard to carry for a humanoid robot with limited joint torque. Therefore, to deal with this problem, we considered the humanoid robot to pull the object. For this reason, we used the pull motion and then specifically called dragging. This is significant although drag and pulls motion have a similar meaning, however, term dragging is more specific than pull. The important question in this MLHO motion type, “Why we choose dragging the object rather than pushing the object?”. The answer is illustrated in Figure 1-1, MLHO with dragging motion has more benefit than pushing an object, which is the main target in this thesis is based on that. A study about comparison force on the push and pull an object in flat horizontal surface provided by [13, 14].. Based on Figure 1-1, it shows that there is a difference in friction and forces toward the object between those two tasks. The push motion as shown in Figure 1-1(a), shows the vertical component of the pushing force acts on the object in the vertically downward direction. Therefore, it increases the effective weight of the object and it’s mathematically written in Eq (1-1). Whereas, it also affects the friction force between object and ground. The effective weight W of the object on pushing motion as follow:. 2.

(12) W = m  g + F sin . (1-1). Where m is a mass of the object, g is the gravity, F is the pushing force, and  is elevation angle of the force given to the object.. On the other hand, the pulling motion shows the reverse way of the vertical force component acts on the object is in a vertically upward direction. Thus, it reduces the effective weight of the object proof on Eq (1-2) and it also decreases friction between the object and the ground. The effective weight W of the object on pulling motion as follow:. W = m  g − F sin . (1-2). Based on these two equations, dragging an object on the horizontal plane is easier than pushing. Note that, although pushing the object can be beneficial in different conditions for a humanoid robot, but it is not the objective in this research study.. (a) Pushing object.. (b) Dragging object.. Figure 1-1 Comparison motion pose on the moving object.. 3.

(13) 1.3. The objective of the study In this work, we present an adult-sized bipedal humanoid robot that is capable of moving a large and heavy object. The objectives of this project are divided into two parts. First, proposing a robot vision algorithm on 3D object detection and 2D object instance segmentation, that uses a deep-learning algorithm approached. Furthermore, in 3D object detection, the object will be acquired using a real-time LiDAR scanner on the robot's head to get the 3D data. On the other hand, the 2D instance segmentation will be expected running in real-time and used for floor detection from the robot’s webcam. Second, proposing a deep reinforcement learning algorithm specifically on the Deep QLearning algorithm to improve the robot’s behavior on whole-body manipulation to transporting large size and heavy objects. Therefore, in the training process, we used a simulated robot model and environment on Gazebo2. The advantage of using a gazebo simulator that it can simulate very close to the real environment. As a result, the training resulted can be directly applied to the real robot without any parameter adjustment. This thesis discusses a way of MLHO by a bipedal adult-sized humanoid robot, in which the robot drags different objects including a massive object on various flat surfaces, and walks in a backward direction.. The rest of the thesis is organized as follows. In chapter 2 an overview of the literature review on moving objects using bipedal humanoid robots presented. Chapter 3 explains the methodology of the algorithms to solve MLHO problem, in which the architecture of THORMANG-Wolf robot, vision on the proposed deep learning 3D object classification and floor detection, the bipedal humanoid robot walking control, and the proposed deep reinforcement learning method are presented. Chapter 4 provides the experimental result of the 3D object classification and the proposed method of Deep Q-Network (DQN) on the THORMANG-Wolf robot. Finally, chapter 5 concludes the thesis and shows future work.. 2. http://gazebosim.org/ 4.

(14) 1.4. Limitation of the study There are four major limitations in this research that could be addressed in future research. First, the research focused on robot vision processing that is based on a deep learning approach. Also, it divided into 3D voxel object classification from LiDAR point cloud data and real-time instances segmentation for floor detection. The second limitation concern of robot manipulation control, it only used static grasp motion for grasping the object. Third, on the robot walking control, it used the original ZMP walking controller provided from ROBOTIS on the THORMANG3 robot. Finally, in robot behavior control, it specifically uses the deep reinforcement learning on the DQN algorithm to learn the control policy of the Centre of Body (CoB) parameter.. 5.

(15) Chapter 2: Literature Review. 2.1. Related work Balancing in Bipedal Humanoid Robot (BHR) systems is a challenging research problem and has been used to address a variety of issues. Hence, there are many the state-of-the-art stabilize walking in biped robot has been extensively studied [15], but walking with disturbance such as pushing [4-8], carrying [9, 10], or lifting [10-12] large or heavy objects are still an open problem. Therefore, maintain the balance of humanoid robots when transporting objects can be one of the critical problems to be investigated by adult-sized BHR. The following literature review confirms that MLHO presents a problem that goes beyond mere balancing, discusses specific and produced solutions, and concludes that specific approaches and robust initiatives are required for real widespread implementation of BHR in the real world.. In the rest of this section, the literature reviews of related works on the MLHO problems are discussed in several sub-section. As a rule, each sub-section is a group of related work in a more specific field and described briefly as follows. (i) Pushing the object, the most common method for transporting large objects. (ii) Pivoting object, an alternative motion for precise movement on moving large objects. (iii) Teleoperation manipulation, manual control of the whole-body humanoid robot to move large-size objects. Then, the humanoid robot control using a learning-based approached on (iv) walking control and (v) push recovery control.. 2.1.1. Pushing object In [4], the authors studied pushing a heavy object by humanoid robot considering the reflect force acted in the end-effectors (both hands). The reaction reflects force aimed at the single support phase of walking. They proposed Dynamically Complemental Zero Moment Point (DCZMP) considering the dynamical modification position of the COM. The COM trajectory of the HRP-2 humanoid robot is modified 6.

(16) based on the forces acting on the robot's hands. These findings were replicated by [16], in which the authors proposed GZMP (Generalized Zero-Moment Point) that enables stability when the robot hands are in contact with objects. They use contact force without grasping to take advantage of keeping robot balance during a disturbance. The author used an HRP-2 humanoid robot in a simulation environment to push an object and proposed GZMP which enables stability when the robot hands are in contact with objects. However, these solutions were tested on large object but not with heavy weight.. In [7], the authors utilized dual-arm force control on a humanoid robot to push a heavy wheelchair. They used a zero-moment-point (ZMP) offset approached, to maintain the balance of the robot. This rectification allows the humanoid robot to dynamically stabilize against the reaction forces. In this method, a real HRP-2 humanoid robot able to push a wheelchair with weight up to 90kg without slipping. Therefore, the importance of friction forces was captured by an expensive force sensor on the robot arms. However, their maintenance is difficult and not all humanoid robot has a force sensor on the arms. Moreover, force contact of a robot can be achieved from the measurement of joint torque without using an additional force sensor. Similarly, by [9], the authors investigated whole-body pushing motion by humanoid robot considering force and balance on different contact points. They used a humanoid robot for pushing heavy objects on the sensor-less region; using both hands, forearm, or the hip. In this research, authors manually generate the posture of a robot try to push unknown mass and COG of the object. A stable pushing force equation from the feet force sensor and external force was utilized for the closed-loop feedback. In this way, the HRP-2 humanoid robot able to push a non-wheeled heavy object. They achieved the highest force from the robot by pushing backward with hip contact. However, the large external reaction force (slip) which was caused by the transported object was not discussed in this study.. In [17], the authors provided a solution for the large reaction of external forces (slip) generated on the feet and hands-on pushing a heavy object. In this study, an optimizer named quadratic programming (QP) was utilized to optimize the joint torques 7.

(17) for predicting the maximum value of external force. Furthermore, this research determined the problem as a free-floating model humanoid robot simulated using the OpenHRP simulator. They used virtual mass (VM) as an alternative for the high computational cost to calculate inequality friction constraint. VM was attached to the end of limbs to estimate contact force between the free-floating model robot and object. Anyway, this work presented in the simulation environment wherein a practical scenario, a QP solver cannot directly deal with the joint torque limitation, because the design variable here is the joint acceleration. In [18], the authors evaluated torque-based balancing to perform a high-force interaction task. Instead of controlling the COM, the proposed controller straight acquires information from the gravity-inertial wrench cone (GIWC) to ensures the practicability of the balancing forces. They tested on TORO humanoid robot with force up to 250N (≈ 1/3 of the robot’s weight) able to push the table weighing 50 kg. However, one limitation with this approached that not all humanoid robots support torque control.. 2.1.2. Pivoting object Most previous studies on pushing manipulation show the range of pushing force is wide in hands pushing because the robot is easy to change COG for many joints between contact points and feet. Also, pushing the heavy and large objects in a plane requires generating large force to compensate for the ground-object friction force. This is a challenge because reaction forces from a heavy object can easily cause foot slippage or lose balance and fall. For this reason, pushing large and heavy objects may not perform well on some problems. In [19, 20], the authors validated pivoting motion as an alternative motion for pushing a large object. The robot performed whole-body manipulation of a large object by forward pivoting. Thus, this research maintains the whole-body balance using resolved momentum control (RMC) [6]. RMC was adopted for stepping motion keeping both hands in contact with the object. They tested the result on pivoting heavy objects in both simulation and real robot HRP-2 with displacement in x-direction was around 0.06[m]. The proposed motion had a good performance where there is no slipping occurs during transporting objects. However, pivoting motion took. 8.

(18) more time to accomplished moving objects with some distances, as it slowly moves an object through a sequence of pivoting motion to the right and left.. 2.1.3. Teleoperation manipulation Manipulation poses on transportation large and heavy objects, generally were generated manually by human assistance [8]; this finding shows the time completion to finds the perfect configuration is time-consuming. In [21], the authors solve this problem by using teleoperation control for controlling the HRP-2 humanoid robot through a joystick. Meanwhile, the self-balance of the robot learned from the dynamic friction model of the manipulated objects. They showed the robot able to turn, rotate, and push a table with a caster. Anyway, this research required to identified dynamic friction models [22] on every initial interaction with the new load, where solving dynamic friction modeling on a variety object is still a difficult task [23, 24].. On the other hand, another solution proposed by walking imitation of humanoid robot toward human walking recognition provided by [25]. The motion capture system was acquired by using 16 inertial measurement unit (IMU) sensors, placed on the human`s head, torso, and each limb. They achieved motion recognition successfully imitated by the humanoid robot on stance and movement direction with a time delay of 2.5 sec which is very slow. Based on this literature, considering stream a single IMU sensor requires a high-frequency process, it can be concluded that multi IMUs based approaches require a high computational cost for acquiring real-time data. In this regard, both of these approached did not take advantage of any learning algorithm.. 2.1.4. Walking Balance (Learning-Based) Machine learning (ML) algorithm push the technology nowadays by presenting an artificial intelligence of computer performs a specific task without using explicit instructions. On the humanoid robots, the Reinforcement Learning algorithm empowered robot intelligence through reward and punishment from a set of actions taken by the robot. The result was tremendously changed most current research towards this approach. In general, there was no learning-based algorithm has reported for adult9.

(19) sized humanoid robots to perform transportation on a large object. So then, the problem of maintaining stability during walking and stance against disturbance, with the problem of transporting large and heavy objects on the humanoid robot are equal. In the following literature, prior work related to RL-based application on BHR that related to this paper will be described respectively.. In [26], the authors designed an RL walking balancing policy, which learns the ankle joint position of the stance leg and determines the swing foot placement during walking. In [27], the authors used Q-Learning to control dynamic walking gait balance and acceleration of biped robot without prior knowledge of the environment. In [28], the authors proposed posture self-stabilizer of a biped robot under exerts amplitude-limited random disturbances using a hierarchical stabilizer based on RL. In [29], the authors aimed a posture-based imitation with balance learning, to allow humanoid robots to imitate demonstrated motions using Q-Learning for the balance learning algorithm. In [30], the authors realized the Deep Deterministic Policy Gradient (DDPG)-based deep reinforcement learning to control the fall over of biped robot to walk steadily on the slope. In [31], the authors utilized a Q-learning algorithm to obtain a straightforward gait pattern to train a humanoid robot to walk straight, where the turning direction is viewed as a gait parameter.. 2.1.5. Push Recovery (Learning-Based) The main objective of the MLHO problem is how to develop a balance system on a BHR. Therefore, likewise to a push recovery, which is also one of an essential method of maintaining the BHR stability. In general, the model-free RL method has an advantage on there is no predefined model given to the robot. The robot learns the optimum policy behavior based on the cumulative reward by trial-error. In the rest of this section, the RL applications in push recovery control problems on BHR will be reviewed to show similarity stability performing transporting large objects problem. Both of the problems should stand against perturbations from external and friction forces.. 10.

(20) In [32], the authors applied the RL algorithm on a humanoid robot to learn arms rotation for adapting perturbation in push recovery. They used the off-line Q learning process to solve a computationally expensive problem and applied online execution on the robot. In [33], the authors solve the issue of requirement big data for learning-based approaches that are severely restricted to a physical humanoid robot. They implemented an online RL system on a full-body push recovery controller performing omnidirectional walking. In [34], the authors presented the Dynamical Movement Primitives (DMP) based push recovery for biped humanoid robot, where DMP learned bio-inspired push recovery strategies, such as hip-ankle strategy and step strategy. In [35], the authors developed a full-body push recovery system using Neural-Fuzzy (NF) controller on a general humanoid robot without specialized sensors and actuators. This method uses RL to update the parameter of the NF controller. In [36], the authors employ the Deep QNetwork (DQN) algorithm for high-level push recovery control in small-size humanoid robots, where the reward formula is based on an equation that analyzes the Linear Inverted Pendulum Model (LIPM) from the energy point of view.. 2.1.6. Summary of related work As far as we know to the best of our knowledge, overall moving large objects was done mostly using the adult-sized HRP-2 humanoid robot platform [4, 7-9, 16, 17, 19-22]. However, there’s one approach employed the TORO humanoid robot [18]. Therefore, the most common approach to transporting large and heavy objects was done using the pushing motion. Based on this literature, no approaches were exploiting a learning-based algorithm on whole-body large object transportation using adult-sized BHR. Above all, the RL algorithms are applied to a humanoid robot had shown promise on stabilizing walking and stance posture (push-recovery) due to perturbation given. Reflecting that benefit, in this paper, we introduce the transporting large object on adultsized BHR problems and propose an RL algorithm to deal with it. In this regard, the robot uses dragging motion to drag heavy and large as a novel solution for pushing problem.. 11.

(21) 2.2. Inverse Kinematic Inverse Kinematics (IK) calculate corresponding joint angles of a specific link like foot or hand of the robot from a given position and orientation of the cartesian end effector [3]. An example of the IK problem is shown in Figure 2-1. The important question to solve the configuration is shown in Figure 2-1(b). Given a set of joint angles at the left foot is raised by 0.2 m and turned the pitch by 10 deg.. (a) Initial joint configurations.. (b) The left foot is moved up by 0.2m and rotated 10deg in pitch. Figure 2-1 Example of inverse kinematic on the left leg of a biped robot;. A humanoid robot is a mechanism consisting of many links connected by joints. Therefore, the theory to analyze the relationship between the position and orientation of each link is called coordinate transformations and rotations. The basic rotation is the rotation around x , y and z axes, which will call Roll, Pitch, and Yaw respectively.. A rotation point to Roll, Pitch, and Yaw an object from a given angle has to follow the following rotation matrix:. 0 1 0 cos  Rx ( ) =  0 sin   0 0. 12. 0 − sin  cos  0. 0 0  0  1. (2-1).

(22)  cos  0 Ry ( ) =   − sin    0. 0 sin  1 0 0 cos 0 0. cos  − sin   sin  cos  Rz ( ) =   0 0  0  0. 0 0  0  1. (2-2). 0 0  0  1. (2-3). 0 0 1 0. Roll, Pitch and then Yaw a point p around the origin, it will move the point,. p ' = Rz ( ) Ry ( ) Rx ( ) p. (2-4). A translation by a, b, c in the x, y and z directions respectively has the transformation matrix:. Trans( x , y , z ). 1 0 = 0  0. 0 1 0 0. 0 0 1 0. a b  c  1. (2-5). If we translate point p = ( x, y, z ,1)T , the translated new coordinate became:. p ' = Trans( a ,b ,c ) p. (2-6). In general, solving IK solutions exist on both the analytical method and the numerical method. Therefore, the position and orientation of set of links with joint angles are defined by nonlinear equations. Since the joint of most humanoid robots are rotational types, the nonlinear problem is unlikely to be solved by nonlinear equations with bunch of variables on the analytical method. However, the derivatives relationship between the position and rotation of a link and joint angles can be represented by linear 13.

(23) equations, and the solution of the IK problem can be solved by finding linear equations through the numerical method.. BODY. parent. R ARM child 1. R HAND child 2. L ARM. R LEG. child 1. L LEG. child 1. L HAND. R FOOT. child 2. child 1. L FOOT. child 2. child 2. Figure 2-2 Tree structure of the humanoid links connection [3].. The humanoid robot’s kinematic structure as shown in Figure 2-2 were formed a tree structure from joining of the links. This is also called the kinematic chain rule of the robot model. Nowadays, the most common way to acquires the IK solution from a kinematic chain is based on the numerical approach. Therefore, one of the famous IK solvers uses Jacobian Pseudo Inverse (JPI) (numerical method) that is available opensource and called Orocos Kinematic Dynamic Library (KDL) [37]. This approach could give an IK solution based on the kinematic chain rule that user-provided.. 2.3. Walking Gait Humanoid biped robot walking gait cycles consists of two phases. These phases are divided into Single Support Phase (SSP) and Double Support Phase (DSP). SSP means that the phase is defined when only one leg touches the ground. In SSP, the leg that touches ground called support foot and the leg that not touches the ground called swing foot. On other hands, DSP is defined when both of leg touches the ground. The sequences of walking are illustrated in Figure 2-3, starting by the SSP phase followed by the DSP phase and continuously [38].. 14.

(24) Figure 2-3 Sagittal plane view of walking gait cycle [38].. A humanoid robot is structurally the same as humans, but controlling walk on the robot is not as rigid as it looks. A humanoid robot needs to maintain its balance contact between the foot and ground while walking. For this purpose, Zero Moment Point (ZMP) is the most famous biped humanoid walking control [39]. ZMP is the reference point of the robot's combined force of gravity and ground inertial force. During the walking of the robot, if its ZMP is regularly located in the support polygon area, the robot will never fall.. (a) Full contact on both feet.. (b) Partial contact.. Figure 2-4 ZMP support polygon [3].. Figure 2-4 illustrated the region formed by enclosing all the contact points between the robot and the ground by using an elastic cord braid is called support polygon. The projection of ground with Centre of Mass (CoM) can be displayed outside of the support polygon. However, ZMP always exists inside of the polygon support. Therefore, humanoid robots can keep balance if the ground projection of CoM is located inside of the support polygon as shown in see Figure 2-5.. 15.

(25) Stable (DSP). Stable (SSP). COM. COM. ZMP. ZMP. Figure 2-5 Projection of the Centre of Mass on Zero Moment Point.. 2.4. Neural Network Artificial intelligence (AI) has become the most famous technique in robotics applications. One of the most powerful and widely used in AI algorithms is the Neural Network (NN). The main reason behind it, because NN presents an intelligence demonstrated by a machine that works similarly to the human brain. Briefly, the architecture of NN is consists of an interconnected number of nodes called neurons, that are organized in layers to process the data information.. Input Layer. Output Layer Hidden Layer. Figure 2-6 Neural network architecture.. Figure 2-6 represents a NN architecture looks like. When we zoom in to one of the hidden or output nodes, each node is called perceptron that illustrated in Figure 2-7. 16.

(26) w1. x1. Inputs. x2. xn. .. .. Activation Function. Σ. w2. wn. v. Output. f. y. b Bias. Weights. Figure 2-7 Operations done by neurons on a single layer perceptron.. The neurons process on single-layer perceptron shows in Figure 2-7 is the math calculations that denotes in the equation below:  n  y = f   wi xi + b   i =1 . (2-7). As shown in Figure 2-7 and denotes in Eq (2-7) the process can be described briefly as follows. (i) First, the inputs x1 , x2 , x3 are multiplied by variable weight w1 , w2 , w3 before it being sum up. Each neuron connection has its weight wn , and during. the learning process, those variables are the only parameter that will be tuned. (ii) Next, a bias b value is added to the total value calculated, it is not a value from a specific neuron. (iii) Finally, after all of those summations, the neuron applies a function called “activation function” to the obtained value.. 2.5. Deep Learning Deep learning (DL) is a subset of machine learning forms by artificial neural networks (ANN). The DL networks are similar to ANN but with deeper architecture (multiple hidden layers). The learning methods in DL can be supervised (labeled data) or unsupervised learning (not need labeled data). Additionally, in the DL algorithm, a large dataset is required to trains the model. An instance of the illustrated deeper network architecture of the DL model as shown in Figure 2-8.. 17.

(27) Figure 2-8 Deeper network architecture of ANN or called Deep Learning.. Despite the function of ANN, automatic data feature extraction is another function of deeper network architecture. Moreover, the feature extraction in the DL model layer is well famous applied in the image processing task. This layer is called a convolutional layer, which can obtain feature maps from several filtrations on the image (see Figure 2-9).. Figure 2-9 Convolutional Neural Network subclass of deep learning3.. Not only in image processing, several famous applications of DL as automatic speech recognition, visual art processing, natural language processing, recommendation systems, bioinformatics, fraud detection, mobile advertising, etc.. 3. https://www.mathworks.com/solutions/deep-learning/convolutional-neural-network.html 18.

(28) 2.6. Object Detection Object detection in computer vision is a method to find a target object in a digital image or video. Target object detection can be single and also multiple. In robotics applications, object detection has become fundamental as robot perception. Therefore, object detection can be divided into different types (see Figure 2-10). Whereas most approaches in object detections are based on DL-Convolutional Neural Network (CNN) approached.. Figure 2-10 Various types of 2D image object detection4.. As shown in Figure 2-10, the types of famous object detection in the 2D frame will be briefly introduced in respectively. (i) Semantic segmentation is a technic to label each pixel in the image with a category label, it doesn’t differentiate instances and only care about pixels. The most notable semantic segmentation is based on fully CNN architecture [40]. (ii) Classification and localization are the common object detection technique that finds object position and simultaneously classified the object name. There are several famous researched on this approach: Faster R-CNN [41], Single Shot MultiBox Detector (SSD) [42], You Only Look Once (YOLO) [43]. (iii) Instance segmentation is different from semantic segmentation that includes identification of boundaries of the objects at the detailed pixel level. Therefore, few works have focused on instance segmentations: Mask R-CNN [44], FCIS [45], YOLACT[46].. 4. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf 19.

(29) Object detection is not limited only to 2D frames. It has an expansive to 3D object detection. Therefore, in computer vision, 3D object detection is obtained from point clouds data that form a 3D model. See Figure 2-11, an example of 3D object detection in object classification, part segmentation, and semantic segmentation by [47].. Figure 2-11 Types of 3D point cloud object detection by [47].. 2.7. Reinforcement Learning Reinforcement learning (RL) subset of machine learning that differs from other types of machine learning. The main difference is that based on trial and error, there is no supervisor and only depend on a reward signal. The environment is initially unknown, where time matters. During the agents interact with the environment, it also improves its policy. agent observation. action. Ot. At. reward. Rt. environment. Figure 2-12 Markov Decision Process of Reinforcement Learning5. 5. http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/intro_RL.pdf 20.

(30) The flow process of RL as shown in Figure 2-12, the process is divided into agents and environment interaction. Each step t by the agent: (i) executes an action At , (ii) receives observation Ot , and (iii) receives a scalar reward Rt . Meanwhile, in the environment: it receives an action At , then emits observation Ot +1 , and finally receives a scalar reward Rt +1 . These steps learning processes are performed periodically in episodic time-based. This means that in every single episode, the process took a set of actions based on increment at the environment step t . Therefore, the mathematical formulation of the RL problems can be defined as Markov Decision Process (MDP).. A reward Rt is a scalar feedback signal, indicates how well an agent is doing at step t . All goals can be described by the maximization of Eq (2-8) expected cumulative reward: Rt = rt + rt +1 + rt +2 + ... + rn. (2-8). During the training process, an agent should care about immediate rewards to rewards in the future. This is called a discounted factor   0..1 in cumulative reward. If  = 0 , means the agent only cares about the first reward. On the other hand,  = 1 , means agents care about all future rewards.. Rt = rt +   rt +1 +  2  rt + 2 + ... +  n−t  rn. (2-9). The agent’s job is to maximize cumulative reward. To achieve that, RL must try to get the optimal value function, i.e. the maximum sum of cumulative rewards. Bellman equation [Eq (2-10)] helps the agent get the optimal value function. Q(s, a) = r +  max a ' Q(s ', a '). (2-10). In model-free RL, to learn with no prior knowledge of the environment can use the Temporal-Difference (TD) learning. The methods learn directly from episodes of experience. It can be mathematical formulate in the below equation [Eq (2-11)], observation before versus observation now:. 21.

(31) TD = [r +  max a ' Q(s ', a ')] − [Q(s, a)]. (2-11). Moreover, to learn the optimal value-function in the off-policy (randomly explore the environment), TD learning is combined with the Bellman equation. Therefore, it also called as Q-Learning and express in the below equation [Eq (2-12)]: Q(s, a) = (1 −  )Q(s, a) + [r +  max a ' Q(s ', a ')]. (2-12). Whereas, Q-Learning algorithm required a Q table to store the Q-values based on state, action it takes, and rewards it acquires during the training process [48].. 2.8. Deep Reinforcement Learning In traditional RL algorithms, the major limitation of the approach is limited to small problem spaces and few possibly state in the environment [48]. This is also called Q-Table, where the size of the table depends on the numbers of action and state. However, this method is not suitable when the states of the environment are substantial. Later on, the famous Deep Reinforcement Learning (DRL) algorithm was introduced by [49]. The authors utilized deep neural network architecture into RL, to replace the QTable and called it Deep Q-Network (DQN). Admittedly, the benefit of DQN that agents can learn a more complex environment. It allowing to have better generalization for unknown states and able to take action that never seen before. The illustration of the DQN algorithm is shown in Figure 2-13.. Figure 2-13 Deep Q-Network architecture [50]. 22.

(32) Chapter 3: Methodology. In this chapter, the methodology of the proposed algorithm will be discussed. The systems technique to approach this thesis is divided into four important parts. First, overviews of the architecture THORMANG-Wolf adult-sized humanoid into hardware, software description. Then, the proposed novel hierarchical learning method to solve the Moving Large size and Heavy Object (MLHO) problem is described. Second, explains the robot vision process. It distributed into two types of proposed DL object detection that are 3D object detection and floor detection. Third, it presents the robot walking controlling. It contains the biped robot walking controller in the ZMP walking controller. Finally, the details of robot behavior controlling. It shows the proposed DRL algorithm to achieve this dragging task. Deep Q-Learning is chosen as a type of DRL method to control the walking meanwhile drags an object.. 3.1. THORMANG-Wolf Robot THORMANG (Tactical. Hazardous. Operations Robot) is a. full-size. commercially available bipedal humanoid robot developed by (ROBOTIS, Inc) [51]. The main objective of the robot is to design an adult-sized humanoid robot as a researched platform. Currently, the latest version of this robot is called THORMANG3 as shown in Figure 3-1(a). However, due to application requirements in this project, we have a slight modification on THORMANG3 and named as THORMANG-Wolf. In the rest of this sub-section, the details of our THORMANG-Wolf robot are discussed into three parts including the hardware description, the software description, and the proposed hierarchical learning-based algorithm design.. 3.1.1. Hardware Description The main difference in mechanical appearance between THORMANG3 robot and THORMANG-Wolf robot is on the webcam of the robot. As original THORMANG3 mechanical design has USB webcam Logitech C920 HD that has a 23.

(33) limited field of view (FOV), we replaced it with Logitech C930E version that has an expansive 90-degree wide FOV6. The dimensions of the robot are stayed the same as the original and illustrated in Figure 3-1(b). It has a height of 137.5 cm, a width of 42.4 cm, and its weight including the two batteries is 42 kg. The details specification of the robot is shown in Table 3-1.. 1686. 424 222. 1375. (a) THORMANG3 robot.. (b) THORMANG3 robot dimension (mm).. Figure 3-1 THORMANG3 adult-sized humanoid robot.. THORMANG-Wolf robot has wide ranges of manipulation and walking motions with an overall 29 Degree of Freedom (DOF) in total. The actuators for robot kinematics are consisting of three different models of Dynamixel-PRO series and 1-DOF twofingered Dynamixel hand. Hardware components of the robot are equipped with advanced computational power and sophisticated sensors (see Figure 3-2). It has two minicomputers, one monovision webcam, one depth camera, one LiDAR scanner, one force and torque (F/T Sensors), and one speaker. The robot’s electrical power is supplied through two batteries, which grouped into the 22.2 Volt for actuators and 18.5 Volt for controllers and sensors.. 6. https://www.logitech.com/en-us/product/c930e-webcam 24.

(34) Table 3-1 THORMANG-Wolf Specification Details.. Category Dimension DOF. Actuator. Specification Weight Height Head Arm Leg Waist H54-200-S500-R H54-100-S500-R H42-020-S300-R RH-P12-RN. Value 42 Kg 137.5 cm 2 DOF 2 × 7 DOF 2 × 6 DOF 1 DOF 10 × 200W 11 × 100W 8 × 20W 2 × 80W LiDAR Scanner Hokuyo UTM-30LX-EW. E-Stop Switch. Webcam Logitech C930e HD RGB-D Intel RealSense R200. Wireless router Dlink DIR-806A. Power Switch. Speaker. Battery 18.5 Volt. Mini-PC (× 2) Intel® NUC Kit NUC5i5RYK • Intel i5-5250U at 2.70 GHz • DDR4 RAM 80GB • M.2 SSD 128-GB. Battery 22.2 Volt IMU MicroSrain 3DM-GX4-25 Gripper (× 2) RH-P12-RN. F/T Sensor (× 2) ATi Mini58-SI-2800-120. BACK. FRONT. Figure 3-2 THORMANG-Wolf hardware architecture.. The electrical components of the THORMANG-Wolf robot are shown in Figure 3-3. The main controller is distributed into three computers: (I) Perception Personal Computer (PPC), (II) Motion Personal Computer (MPC), and (III) Operating Personal Computer (OPC). Two computers (MPC and PPC) are attached to the robot and one computer (OPC) is located outside of the robot. MPC handles the dynamic kinematic system of the robot that computes every joints movement by translating into positions of actuators. PPC for perception processing that is acquiring sensors data in the robot. OPC works as manager processing to integrate the MPC and PPC. Therefore, to accommodate a multi-computers communication system in this robot, a router located. 25.

(35) on the back of the robot is employed for the ethernet connection across this collaborative computer system.. PPC (Perception PC) IP Address: 10.17.3.35. Depth Camera. USB RGB Camera. OPC (Operating PC) IP Address: 10.17.3.10. LiDAR Scanner. ROUTER. IP Address: 10.17.3.20. Left Arm & Torso. Right Arm & Head. 5 DOF. COM1. Left Leg. Right Leg. 5 DOF. 6 DOF F/T Sen.. 6 DOF F/T Sen.. COM2. COM3. COM4. Ethernet. MPC (Motion PC) IP Address: 10.17.3.30. IMU Sensor. USB. Ext. Port (F/T Sensor). FTDI USB-COM485-PLUS4. RS485. Figure 3-3 THORMANG-Wolf electrical components system.. 3.1.2. Software Description The software system of the THORMANG-Wolf robot was built initially using the Robot Operating System (ROS) software. Whereas, the ROS Kinetic Kame version is chosen as the compatibility version alongside with Ubuntu 16.04 (Xenial) operating system. It is well known that ROS is a set of software libraries and tools that was originally designed for robotic applications7. The main advantage of using ROS is the message passing feature that can easily be developed under a multi-computer communication system. Another benefit is the multi-language programming compatibility. Therefore, the software description of the THORMANG-Wolf robot is illustrated in Figure 3-4. Figure 3-4 presented the simplified ROS graph architecture of the THORMANGWolf robot on the dragging task. The core management of the systems is distributed into three different computers. Those three types of computers are as follows respectively: (i) PPC computer preprocess sensor perception acquisition on webcam and LiDAR scanner into two different ROS topics: “/rgb_image” and “/point_cloud”. (ii) MPC 7. https://www.ros.org/about-ros/ 26.

(36) computer provides “/robot_state” directly from the F/T and IMU sensors, then subsequently calculates the dynamic kinematic in the “/CONTROL_MANAGER” to read and write positions of each joint. (iii) OPC computer works as a management substance inside the “/MAIN” node to manage behavior control from PPC sensor input into the MPC action movement. Overall, those multi-computers systems were done using ROS to have a synchronized system of a humanoid robot in perception, behavior, manipulation, and locomotion.. /FT Sensor /IMU. /robot_state. MPC /rgb_image. /WEBCAM. /LiDAR. /robot/status. OPC /manipulation/static_pose /CONTROL MANAGER. /MAIN /walking/foot_step_generator. PPC. MPC /walking/balance_param. /point_cloud. Figure 3-4 ROS graph architecture performing a dragging task. Note: ROS node and topic is represented by ellipses and rectangle shape respectively.. 3.1.3. The Proposed Algorithm Design The algorithm design for the dragging task in the THORMANG-Wolf robot is a hierarchical independence framework at three different levels including robot vision process, behavior control, and motion control. The details of those frameworks are illustrated in Figure 3-5. As it is illustrated in Figure 3-5, first, the vision process is divided into two subcategories: (I) Object detection and (II) Floor detection. Object detection reads point clouds data from the LiDAR scanner and then feeds into the proposed DL for classifying object type. A result of object detection results will be used to determine pre-recorded manipulation motion for grasping the object. On the other hand, the floor detection uses RGB images obtains from the robot webcam. It feeds the images into the proposed lightweight DL algorithm for real-time instance segmentation on floor types detection. So then, the floor type result goes to DQN to adjust the offset coefficient of the Centre of Body (CoB). 27.

(37) Input. Robot Vision Process Request. Range. LiDAR. Assemble Laser. Filtering Filtered Range. Point Clouds Object Detection. Webcam. Raw Image. Floor Detection. Filtering. Filtered Image. Object Type. Next. Floor Types. Robot Behavior Control Request. DQN. Instructions Control. Balance Parameter. Output. Instructions. Robot Motion Control IMU Status Sensor Req. Status. Balancing. Inverse Kinematic. Foot Step Generator. Motion Planner. Motion Number Static Motion. Ack. Ack. Walking Gait. Positions. Actuators. Start. Req. Movement Instructions. F/T Sensor. Balance Control. New Plan. Figure 3-5 THORMANG-Wolf hierarchical framework data flow diagram. Note: The term “Ack” indicating a process termination acknowledgment and red color shape points out the proposed hierarchical deep learning algorithms.. 28.

(38) Second, the behavior control, which is also the main proposed DRL method in this MLHO problem. For this purpose, we used a DQN algorithm to learn the parameter of the walking balance control policy. This algorithm learns the behavior control based on robot states that are acquired from IMU and F/T sensors. As a result, the setpoints of. CoBX parameters were tuned automatically by the DQN algorithm in real-time during the dragging procedure.. Finally, the motion control, which handles all processes of robot movement. It comes from instructions controls to take action in sequential order. Moreover, there are two main functions of motion control are described as follows. At first, the grasping motion act as a motion manager that can store and play the recorded grasping motions for various objects. Then, followed by the walking control, it produces the walking footstep generator by solving the inverse kinematics of legs using Pseudo Jacobian Inverse and generates walking gait pattern.. 3D Obj Classification. Object Grasping Static Grasp Motion. Point Cloud. Filter. Voxel. Deep Learning. Floor Detection. END. START. Dragging. Deep Q-Learning Offset CoM (x-axis) FINISH. Plywood Green Carpet. START 2 Meter. Tile. Deep Learning. Offset (-). Default. Offset (+). Figure 3-6 Block diagram of the proposed DL algorithms to solve MLHO problem. Note: the red color blocks are the proposed algorithms.. In this regard, based on the hierarchical framework of the algorithm design illustrated in Figure 3-5, our proposed learning-based algorithms consists of three learning phases respectively to solve the problem as follows: (i) Deep Learning algorithm on 3D object classification. (ii) Deep Learning algorithm on real-time instance 29.

(39) segmentation for floor detection. (iii) Deep Reinforcement Learning algorithm on the walking balance control policy. Therefore, to clarify the MLHO process in sequential order, drawn a block diagram of the proposed hierarchical learning-based algorithm to solve MLHO problems more clearly as illustrated in Figure 3-6. In summary, each part of the data flow diagram in Figure 3-5 with the proposed hierarchical methods on the MLHO problems will be described individually.. 3.2. Robot Vision Process It is a very significant point on a humanoid robot to have a vision system to visualize the environment and identified objects. The problem of this dragging task can be simplified to a robot need to know what kind of object it will move and on which type of surfaces. In this section, the vision processing of the robot is divided into (I) 3D object detection and (II) Floor detection (instance segmentation). However, both processes are based on DL approaches and will be described below.. 3.2.1. 3D Object Detection (Deep Learning) Single two-dimensional (2D) images from a robot camera actually can provide an instance of visual information to this problem. However, information from the 2D image is limited to the two-dimension projection of length and width, whereas the threedimension (3D) of the object’s height is indistinguishable. Unlike 2D images, point clouds data contains 3D data that provide a rich source of information. Therefore, the point clouds are acquired by using the LiDAR scanner from the robot. The main objective of this approach is to use the LiDAR scanner to classified objects. After the object has been classified, the output will be used for selecting pre-recorded manipulation motion to grasp the object.. It is well-known that the state-of-the-art 2D image object recognitions were based on Convolutional Neural Network (CNN) [40-44]. The same concept also has been applied for 3D point cloud data object detection by using CNN as well [47, 52, 53]. On the other hand, in contrast to the camera, LiDAR has no interference with lighting 30.

(40) conditions that leads to increasing the robustness of the system. Therefore, the implementation of this object classification algorithm is based on CNN. The general flowchart of the proposed object classification is illustrated in Figure 3-7.. Start Voxel Grid Filter LiDAR Point Clouds. Deep Learning 3D Classification No Object Name. Filter Object Yes. End. Figure 3-7 Flowchart DL algorithm of 3D object classification.. The working process on DL 3D object classification is described in the following. First, the LiDAR point cloud data were acquired from the robot head’s scanning process. As shown in Figure 3-8(a) the LiDAR point clouds data includes additional information about the environment from scanned results. To tackle this issue, it is suitable to perform filtering on point cloud data. Removing additional features from raw point clouds, in other words, to extract important information, will lead to increases and robustness of DL models. For this reason, a proposed heuristic algorithm is applied to filtering and extract the object from a cluttered environment as shown in Figure 3-8(b). In this process, Euclidean distance-based filtering to extract the object from the environment is proposed and given by the following formula.. d ( p ,q ) =. 3. ( p − q ) i =1. i. i. 2. (3-1). Where p , and q are two points in Euclidean space, then the distance d from. p to q is calculated by each axis i , indicate the axis of ( x, y, z) respectively. So, in the. 31.

(41) filtering process, if there was no object in front of the robot, it redoes the scanning process for collecting the point cloud data.. Based on the structure of DL algorithms in general, DL models are required a fixed amount of input size to feed into the model. Also, it is well-known that the total number of points given each time of LiDAR scanned result has no fixed shape. To deal with this, Voxel Grid (VG) filter is utilized to downsample the point cloud data into a regular voxel grid representation [54]. Therefore, in the VG process, the filtered object point clouds were discretized spatially as binary voxel at 30×30×30 volumetric occupancy grid (see Figure 3-8(c)], where each voxel is assumed to have a binary state (occupied or unoccupied). Next, a fixed size of voxels data (occupancy grids) is continued to the input of the DL model. During this process, the DL model performs mathematical calculations for processing this 3D classification task. Finally, the last process also known as the output of the DL model will give a predicted answer based on the highest probability to recognize which type of object. Exploiting volumetric representation of the voxel data for 3D shape recognition, the empirical applications of shape recognition have become popular in the DL-based approaches [52, 53]. The most notable 3D shape recognition, that integrating a volumetric occupancy grid representation with a supervised 3D CNN provided by [52]. In [52], the authors introduced VoxNet, as a 3D CNN multi-class classification task on binary voxels data with a simple network architecture resulting in real-time performance. One other research study on volumetric 3D object multi-class classifications was presented in [53]. In [53], they proposed a lightweight Volumetric-CNN1 (V-CNN1) model. In this method, the volumetric 3D object was represented in the form of a set of spatially convoluted 2D images (known as feature maps). So, instead of using a 3D convolutional layer, the authors use a 2D convolutional layer for convoluting the 3D volumetric occupancy grid and achieved a faster training process (because of using fewer parameters). Although 2D convolutional were outperformed in 2D images classifications [40-44], the result was not as good as in the 3D occupancy grid [53]. It shows on V-CNN1, the classification accuracy was slightly decreased in comparison to VoxNet [52] (use 3D convolutional) on the same 3D data set. 32.

(42) (a). (b). (c). (d) (e) (f) Figure 3-8 Example process of preprocessing 3D point cloud data. Note: left figure: raw point cloud data, middle figure: filtered point cloud data, and right figure: after voxel grid filter.. Regarding the two mentioned models in the previous paragraph, that were outperformed on multiclass classification in the 3D volumetric occupancy grid. However, there are shortcomings of existing models described as follows. In the VoxNet model [52], the authors consider only a very small network that contains only two 3D convolutional layers and two fully-connected layers. In this regard, shallow network architecture caused the model lacks to generalize the data (learn more features on various levels) [55]. On the other hand, in V-CNN1 [53], the authors used deeper network architecture (depth-5 layers) but fail to establish a relationship between 3D data from the 3D convolution benefits. So, in this thesis, we proposed a new model called Three-layers Convolution Volumetric Network (TCVN) as a robust learning method to tackle issues of previous models. As shown in Figure 3-9, TCVN is based on the VoxNet and V-CNN1 reference concept, which is using deeper network architecture along with a 3D convolution layer. As shown in Figure 3-9, the proposed model using a volumetric occupancy grid computed with size 30 × 30 × 30. This model consists of three 3D convolutional layers, all with 32 filters of size 3 and stride 1. Correspondingly, the convolution layers are followed by batch normalization and three ReLU activation functions along with two 33.

(43) max-polling layers. The ReLU layer is to introduce non-linearity in the model by activating only positive neurons. The pooling layer following ReLU ensures that neurons do not contribute to the model from learning redundant information of spatial voxel. Also, there are two fully connected layers in the last part of the model, where the final fully connected layer used a SoftMax function to normalize the probability distribution of each class score. During the training, dropout with a probability of 0.5 is used to prevent overfitting, and an Adam optimizer with a standard base learning rate of 0.001 was employed for updating the model weights. Overall, the details of the proposed model are presented in Table 3-2.. Table 3-2 The network details of the TCVN model. Stride. Output Size. Convolution 3D Batch Norm. ReLU Max pooling 3D. Filter size / Dropout rate 3×3×3 2×2×2. 1×1×1 2×2×2. 32 × 30 × 30 × 30 32 × 30 × 30 × 30 32 × 30 × 30 × 30 32 × 15 × 15 × 15. Number of parameters 896 128 -. Convolution 3D Batch Norm. ReLU. 3×3×3 -. 1×1×1 -. 32 × 13 × 13 × 13 32 × 13 × 13 × 13 32 × 13 × 13 × 13. 27680 128 -. Convolution 3D Batch Norm. Max pooling 3D Dropout. 3×3×3 2×2×2 0.5. 1×1×1 2×2×2 -. 32 × 11 × 11 × 11 32 × 11 × 11 × 11 32 × 5 × 5 × 5 32 × 5 × 5 × 5. 27680 128 -. Fully connected Batch Norm. Fully connected. -. -. 2048 2048 5. 8194048 8192 10245. Layer type. Conv(32,3,1) / ReLU / BN / Max-Pool(2,2). Conv(32,3,1) / ReLU / BN /. Conv(32,3,1) / ReLU / BN / Max-Pool(2,2). .... .... .... 1 × 30 × 30 × 30 32 × 12 × 12 × 12. 32 × 10 × 10 × 10. 5. 32 × 5 × 5 × 5 2048. Figure 3-9 The network architecture of the TCVN model. 34.

(44) 3.2.2. Floor Detection (Deep Learning) One of the important features in MLHO tasks is a humanoid robot should know in what type of floor it performs the dragging task. In this section, the object instance segmentation technique is employed for the floor detection from the robot camera. Therefore, this algorithm will identify partitioning pixels image into a segmentation mask of floor area. As a result, the segmented pixels provide the information in which type of ground it performs the dragging task. Figure 3-10 shows three different types of floors where the humanoid robot will be evaluated on the dragging task.. (a) Plywood.. (b) Green carpet.. (c) Tile. Figure 3-10 Types of floors used in this experiment.. The state-of-the-art instances segmentation was introduced by [44]. In [44], the authors introduced Mask-RCNN that was built with focuses on performance. However, these instance segmentations are accurate but it only runs on 5 frames per sec (fps) on modern computer hardware. The main reason, because they force using expensive repooling operation in the ROI-align. Also, the Mask-RCNN model used a two-stage detector, which means the computational in the model happened in sequentially.. Later on [45], the authors introduced a real-time instance segmentation called YOLACT (You Only Look at Coefficients) that beats state-of-the-art instance segmentation in terms of speed. They used a one-stage detector and produce two parallel parts solution to split mask computation. First, they create a set of “prototype” mask for the whole image. Second, linearly combine those prototypes using coefficients from the prediction head.. As shown in Figure 3-11, the design of YOLACT network architectures is presented in detail. First, the model uses the standard Residual Network (ResNet) with 35.

(45) Feature Pyramid Net (FPN) as the backbone network. Then, FCN (“Proto Net”) is attached to the largest FPN layers to produce this whole prototypes masks. Second, in parallel, the standard “Prediction Head” predicts the linear combination coefficients for each anchor box. Finally, the models do some minimal post-processing (crop and threshold) to obtain the final mask.. C5. P7. Prediction Head Class. C4. P6. C3. P5. C2. P4. W×H× ca. W×H ×256. W×H× 256. Box. Mask. C1. W×H× 4a. W×H× ka. Crop. P3. ResNet. Feature Pyramid Net. Threshold. Proto Net. 69×69 ×256. 69×69 ×256. 138×138 ×256. 138×138 ×k. Input Image. Result Image. Figure 3-11 YOLACT network architecture [46].. For training the mask branch, a pixel-wise loss is applied only on the final assembled mass. Thus, the prototypes and linear combination coefficients only get downstream supervision from the mask loss. This means the combination is not constraining of any semantic. Therefore, the leads to the prototypes taking on some various translation variants in a fully convolutional network.. Furthermore, as stated in the original of ResNet [56], the authors validated deeper residual network lead to lower loss value that improved the accuracy. Therefore, in [46], the authors use ResNet-101 as the default backbone in the YOLACT model. They trained the model with a base size image 550 × 550 on advanced Microsoft-Common Object in Context (COCO) dataset. Furthermore, their method achieved results above 30 fps on the COCO dataset by using the ultimate graphic processing unit (GPU) NVIDIA Titan XP. 36.