在足球場景下多機器人同時定位與追蹤物體中的關鍵幅偵測

(1)

國立臺灣大學電機資訊學院資訊工程學系碩士論文

Department of Computer Science and Information Engineering College of Electrical Engineering and Computer Science

National Taiwan University Master Thesis

在足球場景下多機器人同時定位與追蹤物體中的關鍵幅偵測 Keyframe Detection for Multi-robot Simultaneous Localization

and Tracking in Robot Soccer Scenario

傅為剛 Wei-Kang Fu

指導教授：施吉昇博士 Advisor: Chi-Sheng Shih, Ph.D.

中華民國 107 年 1 月

January, 2018

(2)

(3)

致致致謝謝謝

在此論文完成之際，首先要感謝我的指導教授施吉昇博士，在進

行研究的路上不僅給我耐心的指導以及建議，更引領我突破自我。在

這段過程中受益良多，謹此致最深之感謝，蒙師督促鼓勵，銘記在

心。

在研究的過程中，感謝實驗室所有的學長以及學弟。特別是林坤

立學長，在我遇到實驗瓶頸的時候給予我寶貴意見，並且提供硬體平

台以及模擬器讓我們在研究的路上沒有後顧之憂。也謝謝實驗室江洵安、胡鈞、胡子浩、陳浩宇、鄭雅元、蔣承劭、湯梵平幫忙取得實驗

數據。

最後必須感謝我的家人、女朋友以及摯友林蔚城，在這段期間，在我失去人生方向的時候，給予我支持以及關心。再次感謝所有在這段

路上幫助過我的人，謝謝。

台北, 冬, 2017 傅為剛

(4)

摘摘摘要要要

近年來，多機器人合作定位相關的演算法日趨強大，但是在實作上通常需要考慮有限的計算資源以及網路資源。因此本研究利用關鍵幅偵測作為處理定位之前資訊重要性的判斷，以減少冗餘的資訊的使用。在機器人足球場景下，我們採用多機器人協同定位與追蹤的理論架構，期能在完美的通訊條件下提供良好的定位結果。但該理論在量

測更新過程中，隨著機器人之間分享的量測的數量增加，會對系統產

生顯著的負擔。

在關鍵幅偵測的研究中，針對影像內容做判斷的方法大致上分為以顏色為基礎或是以特徵點為基礎兩種。以顏色為基礎的特徵幅偵測雖然執行時最有效率，但是會忽略掉感興趣對象的空間資訊。而特徵點基礎的方法雖然提供更好且穩定的效能，但在計算特徵點時會消耗大

量的計算資源。本論文在多機器人協同定位與追蹤的理論架構下，於

影像前處理部分提出一關鍵幅選取架構，其中基於資訊熵理論設計評估量測的效用性函數計算方式。此效用函數不使用特徵點，只針對個別機器人從影像上所擷取較底層的動態與靜態物體資料分別進行效用性評估。

結果顯示我們提出的關鍵幅偵測架構可以測出場景變化。並且即使

在不使用從所有影像擷取出的量測進行量測更新步驟，機器人定位的

結果相較於里程計可以達到3倍的軌跡誤差修正，而當使用所有影像以

及測距值的軌跡誤差比值接近1的時候，使用關鍵幅架構的定位系統的最好效能是使用全部影像的85%。

關關關鍵鍵鍵字字字 - 關鍵幅偵測，多機器人合作定位，資訊熵，效用性函數

(5)

Abstract

The algorithm of multi-robot cooperative localization has become more and more powerful. However, it is needed to consider the limited computing resources and network resources in implementation. Therefore, in order to reduce the use of redundant information, many studies make use keyframe detection to measure the importance of information before localization process.

In robot soccer scenario, we adopt multi-robot cooperative localization and tracking, or MR-SLAT, which can provide good localization result under stable communication situation. But, in the measurement update step, the number of measurements sharing among robots can cause significant overhead to the system. Existing keyframe detection solutions against image content can be mainly classified into two categories, color-based and feature-based.

Color-based approaches execute efficiently, but they lose the spacing information of interested objects. Feature-based approaches provide more robust performance but spend more computational cost for extracting features. This thesis base on the MR-SLAT framework and propose a keyframe selection mechanism as a preprocessing for MR-SLAT. We design an utility function based on Shannon Entropy to provide a mechanism for calculating the utility of measurements. The proposed utility function, without the use of feature points, measure low-level data including moving objects and stationary objects which are extracted from an image. The experimental results show that, our method can detect the change of scene. In localization experiment, the trajectory of localization can be significantly improved for up to 3x from odometry by using 25% of image data. when the trajectory error ratio between using all frames and odometry is close to 1, the best performance of the keyframe-based localization system is about 85% of by using all frames.

(6)

Keywords-Keyframe Detection, Multi-robot Cooperative Localization,Shanon Entropy, Utility Function

(7)

List of Figures

1.1 The scenario of RoboCup Standard Platform League (SPL). . . 2

2.1 MR-SLAT process flow . . . 5

3.1 The NTU RoboPAL perception architecture. . . 11

3.2 Flowchart of keyframe detection system . . . 12

3.3 The comparison flow of keyframe detection . . . 13

4.1 The comparison of different compositions of the field lines by using the histogram-based method . . . 16

4.2 The comparison of a moving object scenario by using the histogram-based method. . . 17

4.3 An example of moving coordinate system. . . 20

4.4 An example of bipartite matching. . . 21

4.5 The comparison of two entropy tendency. . . 23

5.1 The two scenarios for showing keyframe selection results . . . 26

5.2 The timeline and the keyframes the system selected with various parameters in the first scenario. . . 27

5.3 The timeline and the keyframes the system selected with various parameters in the second scenario. . . 28

5.4 The relation between keyframe selection parameters and the result number of keyframes selected for both scenarios. . . 29

5.5 The timeline of keyframes which are selected by the system against thresholds in stationary object scenario. . . 30

5.6 The keyframes and entropy with threshold τ = 0.8 in stationary object scenario. . . 33

5.7 The timeline of keyframes which are selected by the system against thresholds in moving object scenario. . . 34

5.8 The keyframes and entropy with threshold τ = 0.8 and predicted distance of 900mm in moving object scenario. . . 35

(10)

5.9 Setting for localization experiment. . . 36

5.10 Trajectory of Robot 58 . . . 37

5.13 The ATE RMSE and keyframe selection ratio of Robot 58. . . 38

(11)

List of Tables

5.1 Camera projection errors for labeled points. . . 31 5.2 Comparison of absolute trajectory error (ATE) RMSE (unit: mm) and the

keyframe percentage under different threshold (τ ) selection. . . 32

(12)

Chapter 1 Introduction

1.1 Motivation

Localization is one of the most important capabilities for robotic applications. In single-robot localization, the pose of the robot relative to a given map can be estimated by its motion controls and measurements of landmarks. Many localization algorithms mostly follow a Bayesian framework which, by the control update and the measurement update, calculates the state in a probabilistic manner called belief distribution:

belt= p(xt|z1:t, u1:t) (1.1)

where bel_tis the posterior probability distribution over the state of the robot x_tat time t, z_1:tdenotes all past measuremaents and u_1:t denotes all past motion controls.

In multi-robot cooperative localization, each robot shares its observations through networks and jointly estimates all robots’ poses. Moreover, the cooperative localization algorithms have been proved that it effectively improves localization accuracy for all team members by incorporating relative measurements between a troop of robots [1] [2]. How- ever, the complexity of localization algorithms depend on the number of measurements shared among the robots. Therefore, the advantage of cooperative localization comes at the cost of computation and communication. In addition, multi-robot systems, especially for embedded robots, typically operate under resource constraints. As a result, how to

(13)

(a) The humanoid NAO robot used in the current SPL (b) RoboCup SPL competition Figure 1.1: The scenario of RoboCup Standard Platform League (SPL).

allocate these resources efficiently is an important issue.

Most studies regard this issue as a sensor management problem or sensor selection problem [3] [4]. The main objective of sensor management is a dynamic selection of a sensor or a group of sensors from among a set of available sensors to perform the task at each time instance during an interval. This issue occurs in various applications, such as target tracking, localization, and wireless sensor networks. Besides, considering different criteria and scenario, there are several areas relevant to sensor management approaches.

Heuristic approaches [5], Markov decision processes [6], optimization of estimation error [7], entropy or mutual information [8] are examples. However, the main purpose of these approaches in sensor selection is yielding a state of information which can quantify the data.

In this thesis, we focus on the RoboCup Spandard Platform League (SPL) scenario.

The scenario of RoboCup SPL is shown in Figure 1.1. Figure 1.1a shows the NAO robots.

The robot senses the nearby environment by upper and lower cameras. The upper camera is facing forward, and the lower camera is facing down. In Figure 1.1b, the five to five soccer game is shown. The field consists of white field lines and two goals. The measurements of objects of interest are extracted from the images by a series of processes of computer vision. Then, these measurements will be shared to other robots in the same team by UDP broadcast, and take as an input of localization. This implies that, for this image-based localization approach, not only maintaining the probabilistic descriptions of the state variable but extracting the measurements from images cost high computational

(14)

resources.

Therefore, in order to efficiently use resources including computation and communication, we select a group of robots to perform cooperative localization for each time instance by using the keyframe detection technique which can identify the next image that is sufficiently different than the last keyframe and reduce the content redundancy among a sequence of images.

1.2 Contribution

We develop a keyframe detection framework in RoboCup scenario. Each frame is given a measure to represent the difference from the last keyframe according to the proposed utility function. Through our design, the change of scene can be detected. More- over, the trajectory of localization can be improved for up to 3x for poor odometry data.

Moreover, the performance of localization by using 25% of frames is 85% of using all frames in the best case.

1.3 Thesis Organization

In Chapter 2, we present the background and related works for this thesis. In the background parts, we introduce the multi-robot simultaneous localization and tracking (MR- SLAT) and an overview of keyframe detection. In Chapter 3, we introduce the perception system in the humanoid robot system. Furthermore, we model the keyframe detection framework based on the perception system. Finally, we design the target problem. In Chapter 4, we explain why the classical keyframe selection methods do not suit RoboCup scenario. Furthermore, we introduce Shannon entropy and design the utility function for keyframe detection. In Chapter 5, first, we show the results of keyframe detection under specified scenarios. Second, the comparison of localization results between with and without keyframe detection is shown. In Chapter 6, we conclude the work in this thesis.

(15)

Chapter 2 Background and Related Works

2.1 Multi-Robot Simultaneous Localization and Track- ing

In [9], a multi-robot simultaneous localization and tracking (MR-SLAT) algorithm based on the extended Kalman filter is proposed. Also, to deal with the data association problem, the multiple hypothesis tracking (MHT) is integrated into MR-SLAT as same as simultaneous localization, mapping and moving object tracking (SLAM-MOT) [10].

Most of the state-of-the-art multi-robot localization systems either treat moving objects as outliers or track moving object separately from localization. Howecer, to solve several localization challenges such as insufficient map features, no map features, and symmetric maps, MR-SLAT estimates the dynamics of the teammate robots and moving objects in one framework and maintains all of the pairwise correlations between them through the augmented state covariance matrix.

The process flow of MR-SLAT system is illustrated in figure 2.1. Each robot collects the measurements and odometries from teammate robots through the communication de- vice and then update each hypothesis in the set of hypotheses by aggregating the data form teammate and form itself. On the other hand, MR-SLAT is a centralized system and each robot acts as a fusion center. In the following paragraphs, we introduce the necessary components of MR-SLAT system.

(16)

Figure 2.1: MR-SLAT process flow

A. Augmented State Representation

In MR-SLAT, the states of teammate robots and moving objects are augmented into one state vector Xtas follow:

X_t= [(R¹_t)^T · · · (R^N_t )^T (O_t¹)^T · · · (O_t^M)^T]_T (2.1)

where t denotes the time index, N denotes the number of robots, M denotes the number of moving objects, Rⁱ_t = [xⁱ, yⁱ, θⁱ]^T is the pose of i^th robot at time t, and O_tⁱ = [x^j, y^j, vx^j_t, vy^j_t]^T contains the position and velocity of the j^thmoving object at time t.

B. Motion model

There are two different motion models for moving objects in RoboCup scenario. First, for the teammate robot which can communicate each other, the odometry motion model is used for motion prediction:

R¯ⁱ_t = g_R(Rⁱ_t−1, o_t) + w_tⁱ (2.2)

(17)

where ¯R_tⁱ denotes the predicted state of i^th robot at time t which is derived from previous state Rⁱ_t−1 and odometry offset between two timestamps o_t = [∆xⁱ_t, ∆y_tⁱ, ∆θⁱ_t], and g_R(Rⁱ_t−1, o_t) denotes the motion model where

g_R(R_t−1ⁱ , oⁱ_t) =







cos(θⁱ_t−1) − sin(θ_t−1ⁱ ) 0 sin(θⁱ_t−1) cos(θⁱ_t−1) 0

0 0 1













∆xⁱ_t

∆y_tⁱ

∆θ_tⁱ





 +





 xⁱ_t−1 y_t−1ⁱ θ_t−1ⁱ







(2.3)

and wⁱ_t ∼ N (0, Qⁱ₀) is the Gaussian white noise of i^th robot’s motion with covariance Qⁱ₀. Second, for the moving objects which are not in the same team, the multiple model tracking, including constant velocity model and stationary model, is used to model the unknown motion mode of moving objects since there is no available odometry information.

C. Measurement model

There are three types of measurements: (1) relative pose between the robot and the map (robot-to-map), (2) relative pose between two teammate robots (robot-to-robot), and (3) relative pose between the robot and the moving object (robot-to-moving-object). How- ever, the measurement models of these types can be defined similarly. Taking robot-to- moving-object as an example, the sensor model is computed as follow:

z_R^O^j

i = h_RO(Rⁱ_t, O^j_t) + w_RO_t (2.4)

h_RO( ¯Rⁱ_t, ¯O_tⁱ) =







q

(x¯^j_t − ¯xⁱ_t)²− (y¯_t^j − ¯y_tⁱ)² atan2((y¯^j_t − ¯y_tⁱ)²− (x¯^j_t− ¯xⁱ_t)²) − ¯θ_tⁱ





 (2.5)

where z_R^O_i^j denotes the sensor model that allows the i^throbot to detect j^thmoving object.

h_RO( ¯Rⁱ_t, ¯Oⁱ_t) is a conversion function that transforms the expected position of j^thmoving object in the global coordinate system to the local coordinate system relative to the i^th robot. w_RO_t is the Gaussian white noise of this measurement.

(18)

D. Track Management

Since incorporating moving objects makes the state much more dynamic, it is necessary to manage the tracks of the moving objects in the augmentated state. Similar to few mapping algorithms, each track of moving object keeps a log odds ratio, and updates it for each MR-SLAT loop. If the moving object is observed in this frame, it’s log odds ratio grows; otherwise, the log odds ratio declines. For a predetermined period of time, the track with the log odds ratio smaller than zero would be deleted by marginalizing out from the current state. Besides, when new moving objects are detected, the tracks should be initialized.

E. Multiple hypothesis tracking (MHT)

To solve data association and tracking problems, MHT is applied in the MR-SLAT system. MHT maintains a hypothesis tree and makes the best decision based on a sequence of received measurements. The new hypothesis is created when the Mahalanobis distance D_mbetween the observed measurement and the predicted measurement is greater than the gating threshold T . However, the overhead grows exponentially depending on the size of hypothesis tree. In order to update more efficiently, the hypothesis management component maintains a compact posterior by pruning the low-scored hypotheses and merging similar hypotheses.

2.2 Related Works

Keyframe detection for SLAM has been studied by several researches. Hong Zhang and Bo Li [11] give an overview of keyframe detection in visual SLAM and compare five similarity methods in content-based image retrieval algorithms to the keyframe detection problem which are pixel-wise, global-historgram, local-historgram, feature matching, and Bag-of-Words. Feature matching method performs the best among the five but has most computational cost.

Lu, Guoliang and Zhou [12] combine pixel-wise and historgram based methods to

(19)

measure the abnormalities in a given video for surveillance cameras. However, they can extract interested objects by using the simple frame difference since the camera is standstill.

For feature-based methods, Zilong Dong and G. Zhang [13] select keyframes in order to reduce content redundancy. However, the proposed framework has to select the optimal keyframes offline. Arun Das and Steven L. Waslander [14] design two entropy-based approachs which seek to select keyframes based on two concepts. First, cumulative point entropy reduction (CPER) attempts to reduce map point entropy in the currently existing map. Second, in order to initialize the most favorable new feature points, they use the point pixel flow discrepancy (PPFD) to evaluate the new points between the currently existing map points and the predicted location of new points triangulated from multiple views.

To reduce bandwidth consumption and network latency, W. J. Beksi and N. Papanikolopou- los [15] control data transmission in a robotic network by selecting keyframes with an adaptive threshold. They also define an incoming frame by a Shannon entropy of the den- sity of the voxels from the RGB-D data. Keyframe detection approachs alleviate not only network bandwidth but also the computational requirements. For example, M. Hsiao and E. Westman [16] propose a keyframe-based dense planar SLAM (KDP-SLAM) system.

The frame rate of the system reaches 30f ps with a CPU (Intel Core i7-4790) by allo- cating different tasks with different computational requirements on four types of frames.

However, the type of the frames are distinguished by the distance of visual odometry.

Since the motherboard of NAO v. 4.0 is ATOM Z530 1.6GHz CPU, the feature-based algorithms are not applicable on such platform. Thus, we design the utility function according to the preprocessed data extracted from the images with prior knowledge, such as color and geometries, of the interested objects.

(20)

Chapter 3 System Architecture and Problem Definition

3.1 System Architecture

In our software framework, perception, behavior, and motion are separated processes run on NAO robot. Each process communicates with one another using shared memory interprocess communication. Perception provides the knowledge of the field and enables the robot to estimate the state of itself and also to keep tracks of other moving objects.

Behavior is composited by a planner and role state machines. the planner will update roles and waypoints depending on perception and motion inputs. These outputs will be brought to select the corresponding role state machine. Motion takes commands requested by behavior to change the actual motion mode between walking and special actions and control joints to provide stable walking or complicated pose.

3.2 Perception System

Fig 3.1 illustrates our perception architecture in a block diagram. The system can be divided into four layers: data preprocessing layer, image preprocessing layer, object detection layer, and localization and tracking layer.

(21)

3.2.1 Data Preprocessing Layer

In the data preprocessing layer, Frame Info. Provider handles the raw images. Cam- era Model Provider takes the robot model as camera extrinsic parameters, with intrinsic parameters calibrated in advance, to provide camera projection or reprojection. Color Segment Provider applies the predefined color table and the acquired raw images to provide color segments such as green, white, and black.

3.2.2 Image Preprocessing Layer

The image preprocessing layer provides the low-level feature information to higher level obeject detection. The data generally involves two types: field line candidates and moving object candidates which are extracted by color segmentation and object classi- fication by geometry. The properties of color and geometry of the field are the prior knowledge in the RoboCup scenario, and this heuristic method helps for accelerating to get the candidates from images. For example, the white pixels whose projected distances are no more than 5 cm would label as field line candidates, and these pixel coordinates are recorded in a list. In addition, the pixels which are green and are the field boundary are partitioned into connected components. Each component is labeled as a moving object candidate and the bounding boxes of connected components are maintained in the dataset.

3.2.3 Object Detection Layer

The object detection layer provides accurate measurements to localization and tracking. For moving object and ball detection, scoring is used to filter false positive detection.

Field line candidates are clustered horizontally and vertically, and changed into hough space using 2D straight line model to fit for actual lines. For center circle detection, RanSac is applied with the possible pixels to fit as a circle.

(22)

3.2.4 Localization and Tracking Layer

After a series of image processing, the localization sysytem introduced in Section 2.1 takes the odometry from motion, and the measurement from object-detection layer and from teammate to estimate the state. However, since the ball moves much faster than other moving objects on the field, we use an independent tracker to keep the balls.

Figure 3.1: The NTU RoboPAL perception architecture.

3.3 Keyframe Detection System

The proposed keyframe detection method selects keyframes based on image contents.

A video sequence can be regarded as a composition of time-sequential substreams. Each

(23)

Figure 3.2: Flowchart of keyframe detection system

substream is composed by similar video contents, for example, objects or activities [12].

The change points from one substream to the next are selected as keyframes.

To lower the use of computational resource, we do not consider any accurate yet com- putationally expensive object detection methods for extracting data such as robots, ball, or corner. Only approximated or sufficient results are produced for the input of the system since the role of keyframe detection in this framework is a preprocessing for other high-level tasks. Moreover, the two types of extracted data can not only be the input of keyframe detection but also be the seed of other higher-level object detection receiving more accurate measurements. Thus, there is no additional cost for preparing the input data of object detection modules. Therefore, this keyframe detection system measures temporal fluctuation of low-level data in a sequence of images only by the preprocessing data of object detection. Figure 3.2 illustrates the overall keyframe detection procedure.

For each frame at time t, It, we compute an entropy value etto represent the difference of data distribution between Itand the last keyframe Ki by comparing similarities of two types of data.

e_t= (1 − λ)H_{M O}+ λH_SO (3.1)

where the moving obeject term, H_{M O}, is derived from the moving object candidates of I_t and the last keyframe. In the same way, the stationary object term, H_SO, is derived from

(24)

Figure 3.3: The comparison flow of keyframe detection

the field line candidates of I_tand the last keyframe. The weighting λ is used to adjust the relative importance of the two types of data. For more detail, we discuss in Chaper 4.

In other words, in Figure 3.3, each frame in the interval Li, which extends from keyframe Ki through the last keyframe Ki+1, would be given an entropy value by in- tegrating Ki’s data and its own. A frame is selected as a keyframe if its entropy is lager than a threshold τ . The selection of a new keyframe K_i+1triggers the obeject detection and launches MR-SLAT. On the contrary, the data of the other frames would be dropped.

3.4 Problem Definition

Problem Definition. Given a sequence of frames F = {F1, F2, ..., FN} sampled at a constant rate, the target problem is to select a set of frames as keyframesK = {K1, K2, ..., Km}

from the inputF i.e. K ⊆ F . Each K_i ∈ K should have the ability to detect the scene change such as the increase of the number of objects of interest or the distances of objects of interest between two frames are greater than a threshold.

For the input F , we make two assumptions which are described below:

Assumption 1. In a segment of consecutive frames, they contain similar contents due to the continuity of content in a sequence of frames along the temporal extent.

Assumption 2. the robot capturing images do not have any sudden move such as moved

(25)

by a human or fallen. However, if a sudden move happens, the robot’s sensors, such as the force sensors under feet, can detect the assumption is broken to stop the program.

In this theses, we focus on the problem of keyframe detection in the image-based perception system introduced in Sec 3.2 that robots sense the environment by using camera sensors without depth data. Moreover, since the computing resource on the embedded robot is limited, the high computational computer vision algorithm is not used such as SIFT feature matching which costs at least O(N log N ), where N is the resolution of the image.

(26)

Chapter 4 Design of Entropy-Based Keyframe Detection

In most research of keyframe detection, feature points matching, such as SIFT, and the image histogram of grayscale are two classical methods. For feature points matching, it performs more robust than the image histogram. However, feature matching does not suit for the embedded system without powerful compute unit. For histogram-based methods, they are more efficient than feature-matching-based methods but lose spatial information within which difference between two images and is easily affected by the image noise [11].

For examples, in Figure 4.1(a), the entropy value of the image which only consists white line is 6.621, but, in Figure 4.1(b), the entropy of image with a T-corner which is indicated by the red circle is 6.5842. The measures are too close to detect the difference of the image content for the field lines. In figure 4.2, the robot indicated by the red rectangle moved from from (a), the right of the center circle, to (b), the left of the center circle. The mean of the entropy values in the sequence of images is 7.3681 with covariance 7.932e⁻⁴. The difference of entropy values in given sequence of images is approximate to zero. As a result, keyframes are difficult to detect by histogram-based methods. In this work, we propose an utility function, which makes use of the spatial information between the last keyframe and the current image, based on Shannon entropy by using preprocessing data

(27)

discussed in Section 3.3.

Figure 4.1: The comparison of two images with different compositions of the field lines by using the histogram-based method.

4.1 Entropy Computations

The Shannon entropy of discrete random variables X = {x₁, x₂, · · · , x_N} is defined by

H(X) = −

N

X

i=1

p(x_i) log₂p(x_i) (4.1) where

N = kXk

p(x_i) = P r[X = x_i], i ∈ [1, 2, · · · , N ] PN

i=1p(x_i) = 1

Equation 4.1 is the expected value of information content of X and represent a measure of unpredictability or uncertainty of the state. For instance, the entropy of a fair coin tossing is the maximum value of the entropy of one-bit information, since the probability of heads and tails are the same. Thus, it is difficult to predict the outcome in this case. In contrast, the entropy of the coin with two heads or two tails is zero, because the coin always comes up the same result.

The basic idea of the designed method is as follow. First, for the current frame, we

(28)

Figure 4.2: The comparison of a moving object scenario by using the histogram-based method.

define the random variable according to the image content. The random variable includes two main parts: redundant data and new information. Redundant data indicates the same content of the last keyframe and the current frame. In contrast, new information represents the difference of the interested objects between the two frames. Furthermore, the probability distribution of the random variable means the proportions of the new and the redundant content in the current frame compared to the last keyframe, and is defined by following the properties of Shannon entropy. Thus, when there are two similar images or the redundant part takes most of the current image content, the entropy is approximate to zero. On the contrary, when there comes out many new pieces of information, the uncertainty of the distribution goes high, and it makes large entropy.

According to Equation 3.1, we introduce two entropy terms, moving objects in Sec- tion 4.2 and stationary objects in Section 4.3, separately, and each section describes its random variable and the probability mass function of itself.

4.2 Entropy Function for Moving Objects

For the moving object term, since each moving object candidate represents a possible moving object or a object of interest, the entropy is derived by comparing each candidate in the current frame c^{M O}_t with candidates in the last keyframe c^{M O}_k . Therefore, the

(29)

random variable consists of each element in the c^{M O}_t and, besides, redundant data we mention before. Here, redundant data are called the background which means the similar content that the last keyframe, intersected to the current frame, remained. In other words, if the moving object candidates in the given frame do not provide any information, i.e. p(x_i) = 0, ∀x_i ∈ c^{M O}_t , p(background) = 1, thus the entropy is 0 according to equation 4.1.

4.2.1 The Relation Between Two Moving Object Sets

In order to define the probability mass function, each candidate in c^{M O}_t corresponds to the candidate in the last keyframe to find the proportion of new information part that the candidate in c^{M O}_t can provide. To find the correspondence, the most robust method must be feature matching; however, it is not efficient for embedded robots, such as Nao.

Therefore, we postulate the similarity of two candidates by distance, i.e. the closer the distance is, the more similar the candidates are. This postulation is based on that the content in the videos or image sequence is continuous that means no sudden move the object acts. The assumptions will be broken when the self-robot falls or is moved away manually. However, the sensors of the robot can be used for detecting the sudden move to stop the program in the scenarios.

We create a graph in order to describe the relation between two candidate sets. Here, we assume that the candidates in the same frame are independent from each other. Based on the assumption, we can build a weighted bipartite graph G = (V, E, W ), where V = c^{M O}_t ∪ c^{M O}_k and c^{M O}_t ∩ c^{M O}_k = ∅, the edges set E ⊆ c^{M O}_t × c^{M O}_k and the weighting of the edges set W are constructed by the following steps.

Step 1: Derive The Similarity Function of Candidates in c^{M O}_k

The similarity of candidates is a function of Euclidean distance on the self’s coordinate system. Hence, we first project each candidate u_k,i in c^{M O}_k onto the self’s coordinates with calibrated camera parameters and get projected candidates set ˜c^{M O}_k . The extrinsic camera parameters are calculated by accumulating the readings from the robot’s joints

(30)

measurements. The candidate ˜u_k,iin ˜c^{M O}_k is a point on the X-Y plane of three-dimensional Euclidean space, and the similarity function for each u_k,iis defined as:

S_k,i(x) =











1 − d_k,i(x)/r_i, d_k,i(x) ≤ r_i.

0 , otherwise.

x ∈ R³, ˜u_k,i∈ ˜c^{M O}_k (4.2)

where

d_k,i(x) = q

(x − ˜u_k,i)^|(x − ˜u_k,i) (4.3) The parameter r_i is a predicted range (PR) which means the maximum distance that the candidate u_k,iwill move over a specified time horizon ∆t, and, using the constant velocity model with the maximum velocity vmax, ri is written as ri = vmax∆t. In this thesis, we refer to [17] and set the maximum velocity of robots to 300mm/s.

However, the self-robot also has the mobility that the coordinate system which candidates projected onto has the velocity v_s. This causes that the path the candidate moving from the last keyframe to the given frame is longer than the distance that it moves on the world coordinate system when the direction of the path is opposite to the direction of v_s. In the contrast, it is shorter.

For example, in Fig 4.3a, the self-robot moves forward d along the y-axis of the world coordinates but the candidate stays at the same world coordinate. rk and rt are the self- robot coordinates at time k and t. The coordinate of the candidate c on the self-robot coordinates at two times are c_kand c_t. As a result, in the field of view (FOV) of the self- robot, the candidate moves back with distance d. To reduce the difference of the candidate that the self-robot made, we introduce the relative velocity model to rewrite the predicted range as:

ri(∆θ) = (vmax− vscos ∆θ)∆t (4.4) where ∆θ is the angle between the direction of self-robot’s velocity and the candidate’s path. The velocity vsis calculated with its odometry.

Continued to the last example, the self velocity v_s is 200mm/s. We predicted the

(31)

moving objects with v_max = 300mm/s in a specified time horizon ∆t = 3s. The difference of predicted ranges without and with the relative velocity model is shown in Fig 4.3b and Fig 4.3c. The similarity of c_tagainst c_kis 0.667 in (b) and 0.8 in (c).

Figure 4.3: (a) An example of moving coordinate system. (b) The predicted range without relative velocity model(c) The predicted range with relative velocity model

Step 2: Create the Edge Weight Set W

For each candidate u_t,i in c^{M O}_t , first, we project u_t,i on the self’s coordinate system and get ˜u_t,i. Second, the edge e_i,j is created if d_k,j(˜u_t,i) is not zero for each ˜u_k,j in ˜c^{M O}_k . As a result, the bipartite graph G is created.

Step 3: Find the Worst Case of Similarity of Two Candidates Sets

In order to define the probability distribution of XM O, we need a one-to-one relation between two candidate sets. The one-to-one relation is the worst case which indicates the most similar situation of two images. Therefore, we use maximum bipartite matching algorithm to find the worst case of weighted bipartite graph G to represent the similarity of moving object candidates between the two images. A bipartite matching problem can be regarded as a flow network problem. Fig 4.4 gives an example of bipartite matching and it can be solved by following steps. The first step is adding source s and sink t in graph G. Second, solve the flow network problem. Finally, two nodes are matching if they are on the same path from s to t. If the candidate ut,i is not matched, i.e. there is no correspondence between ut,iand any candidate in the last keyframe, it is regarded as a

(32)

new candidate. The complexity of bipartite matching is O((m + n)n); |E| = m, |V | = n and the output is a matching matrix m ∈ R^lx2; |c^{M O}_t | = l where the i^th row records the matched candidate u_k,j and the similarity s^j_i.

Figure 4.4: An example of bipartite matching.

4.2.2 The Probability Mass Function for Moving Objects

The probability distribution of random variables XM O can be derived by two main steps: initialization and update. First, the initialization step sets the probability distribution to maximize the entropy value. In our case, the maximum measure occurs when the proportion of the background is minimized and the objects in the given frame are all newly added objects. Furthermore, the update step adjusts the probability distribution depending on the similarity we discussed in Section 4.2.1. Next, we discuss two steps in detail.

Step 1: Initialization

The maximum value of Shannon entropy occurs if all the outcomes are equally likely which means the uncertainty of probability distribution is highest. Therefore, we initialize

(33)

the probability of each element in XM Owith the equal value: p(x_i) = _kX¹

M Ok, ∀x_i ∈ X_{M O} . This implies that, for each moving object candidate, the proportion of the new information it provided is no larger than _kX¹

M Ok. In addition, the proportion of the background is no less than _kX¹

M Ok. In other words, if one of the candidate takes the proportion which is larger than _kX¹

M Ok, the uncertainty of the distribution goes lower and the entropy value decreases.

Step 2: Update

For each moving object candidate u_t,i, we update the probability according to its similarity introduced in Sec 4.2.1:

p(u_t,i) =











1

kXM Ok(1 − s^j_i), u_t,i matches the candidate x_j ∈ c^{M O}_k

1

kXM Ok , otherwise.

∀u_t,i ∈ c^{M O}_t ⊂ X_{M O}

(4.5) and the probability of the background is updeted by:

p(background) = 1 −X

i

p(u_t,i), ∀u_t,i ∈ c^{M O}_t (4.6)

This step reduces the proportions of the candidates based on the similarities, and, furthermore, moves the proportion of the similar parts to the background. After the updating, the distribution tends to concentrate to the background. Hence, the entropy goes lower.

Once the probability distribution is defined, we can calculate the entropy of X_{M O} according to the equation 4.1. However, since the maximum of entropy grows with the number of the random variables, we use the normalized entropy or metric entropy to measure the uncertainty of the probability distribution without considering the number of the random variable. Thus the moving object term HM O in equation 3.1 can be written as

H_{M O} = − 1 log₂kX_{M O}k

N

X

i=1

p(x_i) log₂p(x_i), ∀x_i ∈ X_{M O} (4.7)

However, there is a problem that the entropy is low even with newly added candidates

(34)

if there are candidates with extremely low probability. For example, if there are some standstill moving objects or permanent false positives when the new moving object come in the FOV, it is hard to detect this frame. As a result, in order to reduce the effect of the candidates with extremely low probability, we merge these candidates into the background which means that, before calculating the normalized entropy, the random variables is modified by eliminating the low-probability candidates: XM O0 = {background, u_t,i | p(u_t,i) > }. Fig 4.5 shows the comparison of two entropy tendency with two random variables: the original one X_{M O} and the modified one X_{M O}0. Both of them are under the situation that there is only one newly added candidate, and the others are the same candidates. The entropy of X_{M O}dramatically drops if the number of candidates increases.

Figure 4.5: The comparison of two entropy tendency.

4.3 Entropy Function for Stationary Objects

The stationary objects of interest include different types of corner and the center circle.

However, field lines candidates extracted by preprocessing are only white pixels. Hence, we consider these candidates all together. As a result, the random variables for stationary object entropy X_SOconsists two tuples. One is the set of field lines candidates in the given frame c^SO_t , the other is the background which has the same meaning as we mentioned in section 4.2.

The algorithm of calculating the probability distribution of XSOis similar to XM Oin

(35)

section 4.2.2. First of all, we use image registration algorithm, such as Iterative closest point (ICP), to find the similarity between c^SO_t and c^SO_k . The registration algorithm provide the mean squared error (MSE) or mean absolute error (MAE) of the registration result between two points sets. The similarity of two points sets, c^SO_t and c^SO_k , is defined as:

S(c^SO_t , c^SO_k ) = exp^−c·E×100% (4.8)

where S is similarity measure between two points sets, and lies between 0 and 1, E is the mean absolute error of registration result between two points sets, and c is a experi- mentally determined parameter. The time complexity of ICP with the k-d tree structure is O(N log N ), where N is the number of sample points. The sample points are a set of white points by downsampling the stationary object set, and the number of sample points is less than the number of whole image pixels for 30x at least.

Second, each probability of random variable is initialized by the value of 0.5. Then, the probability of c^SO_t is updated by p(c^SO_t ) = 0.5 · (1 − S(c^SO_t , c^SO_k )), and the probability of background is 1 − p(c^SO_t ). As a result, the stationary object term HSO in equation 3.1 can be written as

H_SO = −p(c^SO_t ) log₂p(c^SO_t ) + −(1 − p(c^SO_t )) log₂(1 − p(c^SO_t )) (4.9)

To summarize, two tuples of Equation 3.1 are defined. The weighting λ in the exper- iments is followed the laws below: if the odometry of the self-robot is zero, λ is set to zero. That means that the FOV of the robot does not change, so we do not consider the change of stationary objects. On the contrast, if there is no moving object candidates, λ is set to one. Otherwise, λ is 0.5.

(36)

Chapter 5 Evaluation

To evaluate the performance of keyframe selected by the proposed method, first, we analyze the detection results under different thresholds and predicted ranges. Further- more, we design different scenarios for two types of candidates to evaluate the image contents of keyframes. Finally, we compare the localization results of different input image sets that are selected with different parameters.

5.1 Moving Object Selection Analysis

In this experiment, we set up two different scenarios depicted in Figure 5.1 on the soccer field to demonstrate the keyframe selection results under different situations. For each setting with different keyframe selection parameter, three types of figure are provided to show the results: the number of keyframes, the timeline of keyframes, and the keyframe images selected by the system.

The first scenario is shown in Figure 5.1(a). In this scenario the robot marked by the red circle stays at the same place. Another robot marked by the yellow circle walk along the yellow arrow. Figure 5.2 shows the keyfrme selection results with six different predicted ranges (300 mm, 600 mm, 900 mm, 1200 mm, 1500 mm, and 1800 mm) and five different entropy thresholds τ (from 0.5 to 0.9). All applied to the same image sequence from the robot marked with red circle in Figure 5.1(a). Each colored dot represents a selected keyframe by the system with different parameters. The color of the dot corresponds

(37)

Figure 5.1: The two scenarios for showing keyframe selection results. (a) Scenario for depicting keyframes selected from a stationary robot. (b) Scenario for depicting keframes selected from a moving robot.

to the applied entropy threshold. The x-axis represents the time instances and the y-axis refers to predicted walking range. Some of the continuous selections of keyframes are caused by preprocessing failure, such as false positive of moving objects. However, the selection process can reduce the computational requirement for up to 10x. The keyframe images shown in Figure 5.2 is selected τ = 0.8 and predicted range of 900 mm. The keyframe whose time difference is less than five frames compared to the last keyframe is not shown.

The other experimental scenario is shown in Figure 5.1(b). The robot which captures images is marked by the red rectangle and moves along the red arrow. The other robot stood still at the beginning and then moves along the yellow arrow starting at the frame number 180. The keyframe selection results is show in Figure 5.3. The situation of the continuous selections of keyframes became worse because the extrinsic camera parameters are noisier to cause the larger error of coordinate translation when the robot is moving.

The most obvious part is in the interval before the robot marked by the yellow rectangle was moving. However, when the entropy threshold is larger than 0.7, the uncertainty that the projected error caused is filtered out by the threshold but some of objects of interest are lost in a long interval. In this setting, the selection process reduce the computational requirement for up to 3x without losing objects of interest. The keyframe images shown in Figure 5.3 is selected τ = 0.9 and predicted range of 900 mm.

(38)

Figure 5.4 shows that, with different entropy thresholds, the relation between the increment of predicted difference range and the reduction of the amount of selected keyframes is obvious in both scenarios. The number of selected keyframes are affected by both parameters: the predicted range and the entropy threshold. Different ranges affect the similarity of candidates. Thus, they also affect the uncertainty of the distribution according to Equation 4.5. In addition, the entropy threshold decides the uncertainty of the probability distribution of the moving object candidates in the current frame.

Figure 5.2: The timeline and the keyframes the system selected with various parameters in the first scenario.

(39)

Figure 5.3: The timeline and the keyframes the system selected with various parameters in the second scenario.

5.2 Keyframe Content Analysis

In order to evaluate the content quality of the keyframe, we generate sequences of events for stationary objects and moving objects separately. For the event detection task, events are defined as the change of composition of objects of interest. For example, the event, in the moving object case, is the change of robots, and, in the stationary object case, is the change of composition of different landmarks.

(40)

Figure 5.4: The relation between keyframe selection parameters and the result number of keyframes selected for both scenarios.

5.2.1 Stationary Objects

The dataset covers a quarter field and the selection results are shown in Figure 5.5.

Each colored dot represents a selected keyframe by the system and the color of the dot corresponds to the applied entropy threshold. Each colored region is labeled manually and represents that, for each frame in the same region, the compositions of landmarks are the same. The red region represents the landmarks are within 3000mm, in contrast, if there is any landmark over 3000mm in the frame, it will be labeled by green region.

In addition, the frame which is not in the red and green region has no landmark in the image content. Therefore, the change between two region represents an event. If there is a color dot in the color region which is after the event, we define that the event has been detected. As a result, Figure 5.5 shows that the lower threshold can detect more events.

In particular, by using τ = 0.6, the recall of detecting events is 100%.

Moreover, the change of shapes of field line candidates can be detected. For example, in Figure 5.6, keyframes (I) to (III) was in the same color region, but the image positions of the same T-corner changed among these three images. In addition, since the FOV of the robot is change more quickly when the robot turn or spin, more keyframes are selected in the duration compared to walking straightly. For instance, keyframe (VI) to (IX) are selected in less than 20 frame interval.

(41)

Figure 5.5: The timeline of keyframes which are selected by the system against thresholds in stationary object scenario.

5.2.2 Moving Objects

The data include the combination of robot actions, such as standstill, walking into the FOV of the camera, occlusion, and walk away from the FOV of the camera. Fig- ure 5.7 shows the keyframe selection results against time with with three different predicted ranges (300 mm, 900 mm, 1500 mm) and three different entropy thresholds (from 0.6 to 0.8). The color dots represent the same meaning as in Figure 5.5. The frames in the same color region have the same robots in the FOV of the robot. The red region represents that the number of robots does not rise compared to the last color region. In contrast, the green region represents the increase of the number of robots. The number marked in or above the color region indicates the number of robots in the interval. The results show that the the keyframe is selected when not only the number of ROI increase but also the distance of moving object candidates in two frames increase. For example, in Figure 5.8, keyframe (IV) and (VII) are selected with the increase of number of ROI and others are selected with the increase of distance.

To summarize, there are four considerations to select a frame as a keyframe: for stationary object candidates, the change of compositions of landmarks or the shapes, for moving object candidates, the change of the number of robots and the distance of candi-

(42)

dates between two frames.

5.3 Trajectory Accuracy

In this experiment, in order to see how keyframes affect trajectory accuracy, we perform single robot localization with three robots on the field and compare the result of different input images sets which are extracted from the given sequence of images by using the proposed method with different keyframe selection parameters. The images are captured by robots and localization results are calculated on the simulator. The ground truth system used to provide the benchmark for quantitative comparison is briefly introduced. A wide-angle camera was placed over the field to provide the position of the robots’ head which are carried color boards including blue, yellow, and red. Figure 5.9a is an image captured by the overhead camera. The ground truth position is calculated by projecting the center position of the color board from the image coordinates onto the world coordinates.Figure 5.9b shows the labeled blue points we projected from the image of Figure 5.9a to the world coordinates which are marked by red circles. The projection errors for the labeled points are recorded in Table 5.1.

point number 1 2 3 4 5 6 7 8 9

projection error (mm) 10.04 30.90 9.11 3.40 4.97 4.02 6.03 24.14 23.82 Table 5.1: Camera projection errors for labeled points.

Table 5.2 shows the absolute trajectory RMSE (ATE RMSE) and keyframe selection ratio with different thresholds. The adaptive threshold τais defined by

τ_a ≡











0.6, c^{M O}_t = ∅ 0.8, otherwise.

(5.1)

which means that if the moving object candidates set is empty, in order to keep more stationary objects, the threshold is set to lower value. By using τa, the keyframe ratio is less than τ = 0.6 about 7%. However, the accuracy is worse and using more number of frame by using τ = 0.7 than using τa. For Robot 58 and Robot 340, trajectories was

(43)

improved significantly (for up to 3x ATE RMSE) from odometry data by using keyframes but the most accurate results are calculated by using all frames. For Robot 58, the best performance of keyframe-based localization is about 43% of using all frames, and 33%

for Robot 340. The trajectories are shown in Figure 5.10 and Figure 5.12. The trade- ff between the localization correctness and resource saving is shown in Figure 5.13 for Robot 58, and in Figure 5.15 for Robot 340. They have the same tendency that the the gold cross is at τ ≈ 0.65.

For Robot 235, the best ATE RMSE is by using odometry only and even better than using all frames. The reason is that there are noisy measurements in the interval after the first spinning shown in Figure 5.11, and these measurements make the robot’s be- lieve not to move forward. As a result, the trade-off shown in Figure 5.15 has no gold cross. However, the best performance with the keyframe-based method is 85% of using all frames.

All frames Odometry τ = 0.6 τ_a τ = 0.7 τ = 0.8

Robot 58

52.53 323.51 82.43 85.57 119.94 140.07

100% 0% 26.6% 19.3% 20.8% 12.6%

Robot 235

376.89 327.32 432.47 406.249 416.53 424.74

100% 0% 23.6% 16.9% 18.5% 11.2%

Robot 340

122.49 638.28 204.59 219.72 228.46 263.22

100% 0% 24.9% 17.2% 18.8% 13.3%

Table 5.2: Comparison of absolute trajectory error (ATE) RMSE (unit: mm) and the keyframe percentage under different threshold (τ ) selection.

(44)

(a) The images which are selected as keyframes with τ = 0.8.

(b) Entropy over time.

Figure 5.6: The keyframes and entropy with threshold τ = 0.8 in stationary object scenario.

(45)

Figure 5.7: The timeline of keyframes which are selected by the system against thresholds in moving object scenario.

(46)

(a) The images which are selected as keyframes with τ = 0.8. and predicted distance of 900 mm.

(b) Entropy over time.

Figure 5.8: The keyframes and entropy with threshold τ = 0.8 and predicted distance of 900mm in moving object scenario.

(47)

(a)

(b)

Figure 5.9: (a) Setting for localization experiment. (b) Camera projection for labeled points (red circles) from (a) against the world coordinates (blue points).

(48)

Figure 5.10: Trajectory of Robot 58

(49)

Figure 5.13: The ATE RMSE and keyframe selection ratio of Robot 58.

(50)

(51)

Chapter 6 Conclusion

In this thesis, we designed a keyframe selection mechanism with a utility function as a preprocessing for MR-SLAT. The utility function measures low-level data extracted from an image by incorporating Shannon Entropy theory. If the data extracted from an image has high utility, meaning the image has significant information change compared with the previous keyframe, the image will be selected as the next keyframe and the data extracted from the image will be used for object detection modules as well as for measurement update of localization. Our method can detect the change of scene such as the shape of field line, the number of robots and the distance of robots between two frames. In localization experiment, the trajectory of localization can be significantly improved for up to 3x from odometry by using 25% of image data. when the trajectory error ratio between using all frames and odometry is close to 1, the best performance of the keyframe-based localization system is about 85% of by using all frames.

(52)

Bibliography

[1] S. I. Roumeliotis and G. A. Bekey, “Distributed multirobot localization,” IEEE Transactions on Robotics and Automation, vol. 18, no. 5, pp. 781–795, Oct 2002.

[2] A. Howard, M. J. Matark, and G. S. Sukhatme, “Localization for mobile robot teams using maximum likelihood estimation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 1, 2002, pp. 434–439 vol.1.

[3] G. Ng and K. Ng, “Sensor management – what, why and how,” Infor- mation Fusion, vol. 1, no. 2, pp. 67 – 75, 2000. [Online]. Available:

http://www.sciencedirect.com/science/article/pii/S1566253500000099

[4] A. O. Hero and D. Cochran, “Sensor management: Past, present, and future,” IEEE Sensors Journal, vol. 11, no. 12, pp. 3064–3075, Dec 2011.

[5] H. Wang, G. Pottie, K. Yao, and D. Estrin, “Entropy-based sensor selection heuristic for target localization,” in Third International Symposium on Information Process- ing in Sensor Networks, 2004. IPSN 2004, April 2004, pp. 36–45.

[6] D. S. Zois, M. Levorato, and U. Mitra, “A pomdp framework for heterogeneous sensor selection in wireless body area networks,” in 2012 Proceedings IEEE INFO- COM, March 2012, pp. 2611–2615.

[7] S. Joshi and S. Boyd, “Sensor selection via convex optimization,” IEEE Transactions on Signal Processing, vol. 57, no. 2, pp. 451–462, Feb 2009.

[8] X. Shen and P. K. Varshney, “Sensor selection based on generalized information gain for target tracking in large sensor networks,” IEEE Transactions on Signal Pro- cessing, vol. 62, no. 2, pp. 363–375, Jan 2014.

[9] C. H. Chang, S. C. Wang, and C. C. Wang, “Exploiting moving objects: Multi-robot simultaneous localization and tracking,” IEEE Transactions on Automation Science and Engineering, vol. 13, no. 2, pp. 810–827, April 2016.

[10] C.-C. Wang, C. Thorpe, S. Thrun, M. Hebert, and H. Durrant-Whyte, “Simultaneous localization, mapping and moving object tracking,” The International Journal of Robotics Research, vol. 26, no. 9, pp. 889–916, 2007.

(53)

[11] H. Zhang, B. Li, and D. Yang, “Keyframe detection for appearance-based visual slam,” in 2010 IEEE/RSJ International Conference on Intelligent Robots and Sys- tems, Oct 2010, pp. 2071–2076.

[12] G. Lu, Y. Zhou, X. Li, and P. Yan, “Unsupervised, efficient and scalable key-frame selection for automatic summarization of surveillance videos,” Multimedia Tools and Applications, vol. 76, no. 5, pp. 6309–6331, Mar 2017. [Online]. Available:

https://doi.org/10.1007/s11042-016-3263-z

[13] Z. Dong, G. Zhang, J. Jia, and H. Bao, “Keyframe-based real-time camera tracking,”

in 2009 IEEE 12th International Conference on Computer Vision, Sept 2009, pp.

1538–1545.

[14] A. Das and S. L. Waslander, “Entropy based keyframe selection for multi-camera visual slam,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sept 2015, pp. 3676–3681.

[15] W. J. Beksi and N. Papanikolopoulos, “Point cloud culling for robot vision tasks under communication constraints,” in 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sept 2014, pp. 3747–3752.

[16] M. Hsiao, E. Westman, G. Zhang, and M. Kaess, “Keyframe-based dense planar slam,” in 2017 IEEE International Conference on Robotics and Automation (ICRA), May 2017, pp. 5110–5117.

[17] “Adaptive dynamic walking and motion optimization for humanoid robots,” Ph.D.

thesis, University of Miami, Department of Computer Science, 2015.