**Square Root SAM**Simultaneous Localization and Mapping via Square Root Information Smoothing Frank Dellaert and Michael Kaess Center for Robotics and Intelligent Machines, College of Computing Georgia Institute of Technology, Atlanta, GA 30332-0280 To appear in the Intl. Journal of Robotics Research Abstract Solving the SLAM problem is one way to enable a robot to explore, map, and navigate in a previously unknown environment. We investigate smoothing approaches as a viable alternative to extended Kalman filter-based solutions to the problem. In particular, we look at approaches that factorize either the associated information matrix or the measurement Jacobian into square root form. Such techniques have several significant advantages over the EKF: they are faster yet exact, they can be used in either batch or incremental mode, are better equipped to deal with non-linear process and measurement models, and yield the entire robot trajectory, at lower cost for a large class of SLAM problems. In addition, in an indirect but dramatic way, column ordering heuristics automatically exploit the locality inherent in the geographic nature of the SLAM problem. In this paper we present the theory underlying these methods, along with an interpretation of factorization in terms of the graphical model associated with the SLAM problem. We present both simulation results and actual SLAM experiments in large-scale environments that underscore the potential of these methods as an alternative to EKF-based approaches. 1 Introduction The problem of simultaneous localization and mapping (SLAM) [69, 51, 76] has received con- siderable attention in mobile robotics as it is one way to enable a robot to explore and navigate previously unknown environments. In addition, in many applications the map of the environment itself is the artifact of interest, e.g. in urban reconstruction, search-and-rescue operations, battle- field reconnaissance etc. As such, it is one of the core competencies of autonomous robots [77]. We will primarily be concerned with landmark-based SLAM, for which the earliest and most popular methods are based on the extended Kalman filter (EKF) [70, 62, 61, 2, 71, 53, 51]. The EKF recursively estimates a Gaussian density over the current pose of the robot and the position of all landmarks (the map). However, it is well known that the computational complexity of the EKF becomes intractable fairly quickly, and hence a large number of efforts have focused on 11 INTRODUCTION 2 modifying and extending the filtering approach to cope with larger-scale environments [64, 17, 21, 8, 45, 49, 38, 65, 50, 5, 78, 39, 75, 80, 67]. However, filtering itself has been shown to be inconsistent when applied to the inherently non-linear SLAM problem [44], i.e., even the average taken over a large number of experiments diverges from the true solution. Since this is mainly due to linearization choices that cannot be undone in a filtering framework, there has recently been considerable interest in the smoothing version of the SLAM problem. Smoothing and Mapping or SAM A smoothing approach to SLAM involves not just the most current robot location, but the entire robot trajectory up to the current time. A number of authors considered the problem of smoothing the robot trajectory only [9, 56, 57, 40, 46, 24], which is particularly suited to sensors such as laser- range finders that easily yield pairwise constraints between nearby robot poses. More generally, onecanconsiderthefullSLAMproblem[77], i.e., theproblemofoptimallyestimatingtheentireset of sensor poses along with the parameters of all features in the environment. In fact, this problem has a long history in surveying [36], photogrammetry [7, 37, 68, 10], where it is known as “bundle adjustment”, and computer vision [25, 72, 73, 79, 41], where it is referred to as “structure from motion”. Especially in the last five years there has been a flurry of work where these ideas were applied in the context of SLAM [16, 22, 23, 43, 30, 31, 26, 27, 28, 29, 77]. In this paper we show how smoothing can be a very fast alternative to filtering-based methods, and that in many cases keeping the trajectory around helps rather than hurts performance. In particular, the optimization problem associated with full SLAM can be concisely stated in terms of sparse linear algebra, which is traditionally concerned with the solution of large least-squares problems [35]. In this framework, we investigate factorizing either the information matrix I or the measurement Jacobian A into square root form, as applied to the problem of simultaneous smoothing and mapping (SAM). Because they are based on matrix square roots, we will refer to this family of approaches as square root SAM, or√SAM for short, first introduced in [18]. We propose that√SAM is a fundamentally better approach to the problem of SLAM than the EKF, based on the realization that, • in contrast to the extended Kalman filter covariance or information matrix, which both be- come fully dense over time [65, 78], the information matrix I associated with smoothing is and stays sparse; • in typical mapping scenarios (ie. not repeatedly traversing a small environment) this matrix I or, alternatively, the measurement Jacobian A, are much more compact representations of the map covariance structure • I or A, both sparse, can be factorized efficiently using sparse Cholesky or QR factorization, respectively, yielding a square root information matrix R that can be used to immediately obtain the optimal robot trajectory and map; Factoring the information matrix is known in the sequential estimation literature as square root information filtering (SRIF), and was developed in 1969 for use in JPL’s Mariner 10 missions to Venus (as recounted by [3]). The use of square roots of either the covariance or information matrix results in more accurate and stable algorithms, and, quoting Maybeck [59] “a number of

2 SLAM AND ITS GRAPHS 3 practitioners have argued, with considerable logic, that square root filters should always be adopted in preference to the standard Kalman filter recursion”. Maybeck briefly discusses the SRIF in a chapter on square root filtering, and it and other square root type algorithms are the subject of a book by Bierman [3]. However, as far as this can be judged by the small number of references in the literature, the SRIF and the square root information smoother (SRIS) are not often used. Sparse Linear Algebra, Graph Theory, and Sparse√SAM The key to performance is to capitalize on the large body of work in sparse linear algebra and fully exploit the sparseness of the matrices associated with the smoothing SLAM problem. The most dramatic improvement in performance comes from choosing a good variable ordering when factorizing a matrix. To understand this fact better, one needs to examine the close relationship between SLAM-like problems, linear algebra, and graph theory. Graph theory has been central to sparse linear algebra for more than 30 years [32]. For example, while most often used as a “black box” algorithm, QR factorization is in fact an elegant computation on a graph. This is a recurring theme: the state of the art in linear algebra is a blend of numerical methods and advanced graph theory, sharing many characteristics with inference algorithms typically found in the graphical models literature [11]. In particular, the graph-theoretic algorithm underlying the solution of sparse least-squares is variable elimination, in which each variable (such as a robot pose or a landmark position) is expressed in terms of other variables. The order in which the variables are eliminated has a large impact on the running time of matrix factorization algorithms such as QR and Cholesky factorization. Finding an optimal ordering is an NP-complete problem, but there are several ordering heuristics and approximate algorithms that perform well on general problems [1, 42] and are built into programs like MATLAB [58]. While general-purpose ordering methods drastically improve performance, yet another order- of-magnitude improvement can be obtained by exploiting the specific graphical structure of the SLAM problem when ordering the variables. Looking at SLAM in terms of graphs has a rich history in itself [6, 74, 33, 34] and has especially lead to several novel and exciting developments in the past few years [63, 60, 30, 65, 26, 31, 78, 27, 77]. Below we more closely examine the tight connection between the graphical model view and the sparse linear algebra formulation of SLAM. It is well known that the information matrix I is associated with the undirected graph connecting robot poses and landmarks (see e.g [78]). Less readily appreciated is the fact that the measurement Jacobian A is the matrix of the factor graph associated with SLAM. In addition, the square root information matrix R, the result of factorizing either I or A, is essentially in correspondence with a junction tree, known from inference in graphical models [11] and also recently applied in SLAM [65]. Exploiting domain knowledge to obtain good orderings is also a trend in linear algebra (e.g. [15]), and we believe that even more efficient algorithms can be developed by viewing the problem as one of computation on a graph. 2 SLAM and its Graphs SLAM refers to the problem of localizing a robot while simultaneously mapping its environment, illustratedbytheexampleofFigure1. InthissectionweintroducetheSLAMproblem, thenotation we use, and show how the three main graphical model representations known in the literature each

2 SLAM AND ITS GRAPHS 4 Figure 1: Bayesian belief network of a small SLAM problem (top left) and a more complex syn- thetic environment. The synthetic environment contains 24 landmarks with a simulated robot tak- ing 422 bearing & range measurements along a trajectory of 95 poses. The objective of SLAM is to localize the robot while simultaneously building a map of the environment. In addition to this, Full SLAM seeks to recover the entire robot trajectory. In the Bayesian belief network representation, the state x of the robot is governed by a Markov chain, on top, while the environment of the robot is represented at the bottom by a set of landmarks l. The measurements z, in the middle layer, are governed both by the state of the robot and the parameters of the landmark measured. yield a unique view on the SLAM problem that emphasizes a certain aspect of the problem. Later in the paper the connection is made between these graphs and their (sparse) matrix equivalents. Below we assume familiarity with EKF-based approaches to SLAM [71, 51, 8, 21]. We do not re- derive the extended Kalman filter. Rather, in Section 3 we immediately take a smoothing approach, in which both the map and the robot trajectory are recovered. SLAM As a Belief Net Following the trend set by FastSLAM and others [63, 60, 65, 30, 26, 77], we formulate the problem by referring to a belief net representation. A belief net is a directed acyclic graph that encodes the conditional independence structure of a set of variables, where each variable only directly depends on its predecessors in the graph. The model we adopt is shown in the top left of Figure 1. Here we denote the state of the robot at the ithtime step by xi, with i ∈ 0..M, a landmark by lj, with j ∈ 1..N, and a measurement by zk, with k ∈ 1..K. The joint probability model corresponding to this network is M Y where P(x0) is a prior on the initial state, P(xi|xi−1,ui) is the motion model, parameterized by a control input ui, and P(z|x,l) is the landmark measurement model. The above assumes a uniform prior over the landmarks l. Furthermore, it assumes that the data-association problem has been solved, i.e., that the indices ikand jkcorresponding to each measurement zkare known. K Y (1) P(xi|xi−1,ui) P(zk|xik,ljk) P(X,L,Z) = P(x0) i=1 k=1

2 SLAM AND ITS GRAPHS 5 Figure 2: Factor graph representation of the Full SLAM problem for both the simple example and the synthetic environment in Figure 1. The unknown poses and landmarks correspond to the circular and square variable nodes, respectively, while each measurement corresponds to a factor node (filled black circles). As is standard in the SLAM literature [71, 51, 8, 21], we assume Gaussian process and mea- surement models [59], defined by P(xi|xi−1,ui) ∝ exp−1 (2) 2kfi(xi−1,ui) − xik2 ⇔ xi= fi(xi−1,ui) + wi Λi where fi(.) is a process model, and wiis normally distributed zero-mean process noise with co- variance matrix Λi, and P(zk|xik,ljk) ∝ exp−1 (3) 2khk(xik,ljk) − zkk2 ⇔ zk= hk(xik,ljk) + vk Σk where hk(.) is a measurement equation, and vkis normally distributed zero-mean measurement noise with covariance Σk. Above kek2 givenacovariancematrixΣ. Theequationsabovemodeltherobot’sbehaviorinresponsetocontrol input, and its sensors, respectively. ∆= eTΣ−1e is defined as the squared Mahalanobis distance Σ SLAM as a Factor Graph While belief nets are a very natural representation to think about the generative aspect of the SLAM problem, factor graphs have a much tighter connection with the underlying optimization problem. As the measurements zkin Figure 1 are known (evidence, in graphical model parlance), we are free to eliminate them as variables. Instead we consider them as parameters of the joint probability factors over the actual unknowns. This naturally leads to the well known factor graph representation, a class of bipartite graphical models that can be used to represent such factored

2 SLAM AND ITS GRAPHS 6 Figure 3: The undirected Markov random fields associated with the SLAM problems from Figure 1. densities [48]. In a factor graph there are nodes for unknowns and nodes for the probability factors defined on them, and the graph structure expresses which unknowns are involved in each factor. The factor graphs for the examples from Figure 1 are shown in Figure 2. As can be seen, there are factor nodes for both landmark measurements zkand odometry links ui. In the SLAM problem we (typically) consider only single and pairwise cliques, leading to the following factor graph expression of the joint density over a set of unknowns Θ: Y Typically the potentials φi(θi) encode a prior or a single measurement constraint at an unknown θi∈ Θ, whereas the pairwise potentials ψij(θi,θj) relate to measurements or constraints that in- volve the relationship between two unknowns θiand θj. Note that the second product is over pairwise cliques {i,j}, counted once. The equivalence between equations (1) and (4) can be read- ily established by taking φ0(x0) ∝ P(x0) ψ(i−1)i(xi−1,xi) ∝ P(xi|xi−1,ui) ψikjk(xik,ljk) ∝ P(zk|xik,ljk) Y (4) P(Θ) ∝ φi(θi) ψij(θi,θj) i {i,j},i<j SLAM as a Markov Random Field Finally, a third way to express the SLAM problem in terms of graphical models is via Markov random fields, in which the factor nodes themselves are eliminated. The graph of an MRF is undirected and does not have factor nodes: its adjacency structure indicates which variables are linked by a common factor (measurement or constraint). At this level of abstraction, the form (4)

3 SAM AS A LEAST SQUARES PROBLEM 7 corresponds exactly to the expression for a pairwise Markov random field [82], hence MRFs and factor graphs are equivalent representations here. The MRFs for the examples from Figure 1 are shown in Figure 4. Note that it looks very similar to Figure 2, but an MRF is a fundamentally different structure from the factor graph (undirected vs. bipartite). 3 SAM as a Least Squares Problem While the previous section was concerned with modeling, we now discuss inference, i.e., obtaining an optimal estimate for the set of unknowns given all measurements available to us. We are con- cerned with smoothing rather than filtering, i.e., we would like to recover the maximum a posteriori (MAP) estimate for the entire trajectory X Z Θ estimate by maximizing the joint probability P(X,L,Z) from Equation 1, ∆= {xi} and the map L ∆= {ui}. Let us collect all unknowns in X and L in the vector ∆= (X,L). Under the assumptions made above, we obtain the maximum a posteriori (MAP) ∆= {lj}, given the measurements ∆= {zk} and control inputs U Θ∗∆= argmax P(X,L|Z) = argmax P(X,L,Z) Θ Θ = argmin −logP(X,L,Z) Θ which leads us via (2) and (3) to the following non-linear least-squares problem: (M i=1 ) K X X Θ∗∆= argmin (5) kfi(xi−1,ui) − xik2 khk(xik,ljk) − zkk2 Λi+ Σk Θ k=1 Regarding the prior P(x0), we will assume that x0is given and hence it is treated as a constant below. This considerably simplifies the equations in the rest of this document. This is what is often done in practice: the origin of the coordinate system is arbitrary, and we can then just as well fix x0at the origin. The exposition is easily adapted to the case where this assumption is invalid. In practice one always considers a linearized version of problem (5). If the process models fiand measurement equations hkare non-linear and a good linearization point is not available, non-linear optimization methods such as Gauss-Newton iterations or the Levenberg-Marquardt algorithm will solve a succession of linear approximations to (5) in order to approach the minimum [20]. This is similar to the extended Kalman filter approach to SLAM as pioneered by [70, 71, 52], but allows for iterating multiple times to convergence while controlling in which region one is willing to trust the linear assumption (hence, these methods are often called region-trust methods). We now linearize all terms in the non-linear least-squares objective function (5). In what fol- lows, we will assume that either a good linearization point is available or that we are working on one iteration of a non-linear optimization method. In either case, we can linearize the process terms in (5) as follows: fi(xi−1,ui) − xi≈?fi(x0 where Fi−1 i is the Jacobian of fi(.) at the linearization point x0 ?−?x0 ?=?Fi−1 ?− ai i−1,ui) + Fi−1 (6) δxi−1− δxi δxi−1 i+ δxi i i i−1, defined by ???? ∆=∂fi(xi−1,ui) ∂xi−1 Fi−1 i x0 i−1

3 SAM AS A LEAST SQUARES PROBLEM 8 ∆= x0 and ai constant). The linearized measurement terms in (5) are obtained similarly, hk(xik,ljk) − zk≈?hk(x0 where Hik evaluated at the linearization point (x0 i−1,ui) is the odometry prediction error (note that uihere is given and hence i− fi(x0 ?− zk=?Hik ?− ck kδxik+ Jjk kδxik+ Jjk jk) + Hik (7) ik,l0 kδljk kδljk kand Jjk kare respectively the Jacobians of hk(.) with respect to a change in xikand ljk, ik,l0 ???? ik,l0 Using the linearized process and measurement models (6) and (7), respectively, (5) begets (M i=1 jk): ???? ∆=∂hk(xik,ljk) ∂xik ∆=∂hk(xik,ljk) ∂ljk Jjk k Hik k (x0 ik,l0 (x0 ik,l0 jk) jk) ∆= zk− hk(x0 and ck jk) is the measurement prediction error. ) K X X kδxik+ Jjk kHik kFi−1 δ∗= argmin (8) δxi−1+ Gi iδxi− aik2 kδljk− ckk2 Λi+ Σk i δ k=1 i.e., we obtain a linear least-squares problem in δ that needs to be solved efficiently. To avoid treating δxiin a special way, we introduce the matrix Gi By a simple change of variables we can drop the covariance matrices Λiand Σkfrom this point forward. With Σ−1/2the matrix square root of Σ we can rewrite the Mahalanobis norm as follows, ∆= eTΣ−1e = (Σ−T/2e)T(Σ−T/2e) =??Σ−T/2e??2 Λ−T/2 i , and similarly for the measurement covariance matrices Σk. For scalar measurements this simply means dividing each term by the measurement standard deviation. Below we assume that this has been done and drop the Mahalanobis notation. Finally, after collecting the Jacobian matrices into a matrix A, and the vectors aiand ckinto a right-hand side (RHS) vector b, we obtain the following standard least-squares problem, i= −Id×d, with d the dimension of xi. kek2 Σ 2 i.e., we can always eliminate Λifrom (8) by pre-multiplying Fi−1 , Gi i, and aiin each term with i kAδ − bk2 δ∗= argmin (9) 2 δ which is our starting point below. A can grow to be very large, but is quite sparse, as illustrated in Figure 4. If dx, dl, and dzare the dimensions of the state, landmarks, and measurements, A’s size is (Ndx+ Kdz) × (Ndx+ Mdl). In addition, A has a typical block structure, e.g., with M = 3, N = 2, and K = 4: F2 3 G3 3 H1 1 H1 2 H2 3 H3 4 Above the top half describes the robot motion, and the bottom half the measurements. A mixture of landmarks and/or measurements of different types (and dimensions) is easily accommodated. G1 F1 a1 a2 a3 c1 c2 c3 c4 1 G2 2 2 J1 A = , b = 1 J2 2 J1 3 J2 4

3 SAM AS A LEAST SQUARES PROBLEM 9 Figure 4: On the left, the measurement Jacobian A associated with the problem in Figure 1, which has 3×95+2×24 = 333unknowns. Thenumberofrows, 1126, isequaltothenumberof(scalar)measurements. On the right: (top) the information matrix I (bottom) an alternative factor amdR obtained with a better variable ordering (colamd from [14]). ∆ = ATA; (middle) its upper triangular Cholesky triangle R;

4 A LINEAR ALGEBRA PERSPECTIVE 10 4 A Linear Algebra Perspective In this section we briefly review Cholesky and QR factorization and their application to the full rank linear least-squares (LS) problem in (9). This material is well known and is given primarily for review and to contrast the linear algebra algorithms with the graph-theoretic view of the next section. The exposition closely follows [35], which can be consulted for a more in-depth treatment. For a full-rank m × n matrix A, with m ≥ n, the unique LS solution to (9) can be found by solving the normal equations: ATAδ∗= ATb (10) ThisisnormallydonebyCholeskyfactorizationoftheinformationmatrixI, definedandfactorized as follows: I The Cholesky triangle R is an upper-triangular n × n matrix1and is computed using Cholesky factorization, a variant of LU factorization for symmetric positive definite matrices. For dense matrices Cholesky factorization requires n3/3 flops. After this, δ∗can be found by solving ∆= ATA = RTR (11) first RTy = ATb and then Rδ∗= y by back-substitution. The entire algorithm, including computing half of the symmetric ATA, re- quires (m + n/3)n2flops. For the example of Figure 1, both I and its Cholesky triangle R are shown alongside A in Figure 4. Note the very typical block structure of I when the columns of A are ordered in the standard way [79], e.g., trajectory X first and then map L (to which we refer below as the XL ordering): ?AT Here IXL blocks are band-diagonal. A variant of Cholesky factorization which avoids computing square roots is LDL factorization, which computes a lower triangular matrix L and a diagonal matrix D such that ? IXL AT XAX IT I = LAL XL ∆= AT XALencodes the correlation between robot states X and map L, and the diagonal I = RTR = LDLT An alternative to Cholesky factorization that is both more accurate and numerically stable is to proceed via QR-factorization without computing the information matrix I. Instead, we compute the QR-factorization of A itself along with its corresponding RHS: ?R Here Q is an m × m orthogonal matrix, and R is the upper-triangular Cholesky triangle. The preferred method for factorizing a dense matrix A is to compute R column by column, proceeding ? ?d ? QTA = QTb = 0 e 1Some treatments, including [35], define the Cholesky triangle as the lower-triangular matrix G = RT, but the other convention is more convenient here.

5 A GRAPHICAL MODEL PERSPECTIVE 11 from left to right. For each column j, all non-zero elements below the diagonal are zeroed out by multiplying A on the left with a Householder reflection matrix Hj. After n iterations A is completely factorized: ?R ? (12) Hn..H2H1A = QTA = 0 The orthogonal matrix Q is not usually formed: instead, the transformed RHS QTb is computed by appending b as an extra column to A. Because the Q factor is orthogonal, we have: 2=??QTAδ − QTb??2 kAδ − bk2 2= kRδ − dk2 2+ kek2 2 Clearly, kek2 the square system 2will be the least-squares residual, and the LS solution δ∗can be obtained by solving (13) Rδ = d via back-substitution. The cost of QR is dominated by the cost of the Householder reflections, which is 2(m − n/3)n2. Comparing QR with Cholesky factorization, we see that both algorithms require O(mn2) op- erations when m ? n, but that QR-factorization is a factor of 2 slower. While these numbers are valid for dense matrices only, we have seen that in practice LDL and Cholesky factorization far outperform QR factorization on sparse problems as well, and not just by a constant factor. 5 A Graphical Model Perspective 5.1 Matrices ⇔ Graphs From the exposition above it can now be readily appreciated that the measurement Jacobian A is the matrix of the factor graph associated with SLAM. We can understand this statement at two levels. First, every block of A corresponds to one term in the least-squares criterion (8), either a landmark measurement or an odometry constraint, and every block-row corresponds to one factor in the factor graph. Within each block-row, the sparsity pattern indicates which unknown poses and/or landmarks are connected to the factor. Hence, the block-structure of A corresponds exactly to the adjacency matrix of the factor graph associated with SAM. Second, at the scalar level, every row Aiin A (see Figure 4) corresponds to a scalar term kAiδ − bik2 kAδ − bk2 2in the sparse matrix least-squares criterion (9) , as X kAiδ − bik2 2= 2 i Hence, this defines a finely structured factor graph, via Y P(δ) ∝ exp−1 exp−1 2kAδ − bk2 2kAiδ − bik2 2= 2 i It is important to realize, that in this finer view, the block-structure of the SLAM problem is dis- carded, and that it is this graph that is examined by general purpose linear algebra methods. By working with the block-structure instead, we will be able to do better.

5 A GRAPHICAL MODEL PERSPECTIVE 12 As noted before in [78] and by others, the information matrix I = ATA is the matrix of the Markov random field associated with the SLAM problem. Again, at the block-level the spar- sity pattern of ATA is exactly the adjacency matrix of the associated MRF. The objective func- tion in Equation 5 corresponds to a pairwise Markov random field (MRF) [81, 82] through the Hammersley-Clifford theorem [81], and the nodes in the MRF correspond to the robot states and the landmarks. Links represent either odometry or landmark measurements. In [65, 78] the MRF graph view is taken to expose the correlation structure inherent in the filtering version of SLAM. It is shown there that inevitably, when marginalizing out the past tra- jectory X1:M−1, the information matrix becomes completely dense. Hence, the emphasis in these approaches is to selectively remove links to reduce the computational cost of the filter, with great success. In contrast, in this paper we consider the MRF associated with the smoothing infor- mation matrix I, which does not become dense, as past states are never marginalized out. 5.2 Factorization ⇔ Variable Elimination Figure 5: The triangulated graph for a good ordering (colamd, as in Figure 4). This is a directed graph, where each edge corresponds to a non-zero in the Cholesky triangle R. Note that we have dropped the arrows in the simulation example for simplicity. The one question left is what graph the square root information matrix R corresponds to? Remember that R is the result of factorizing either I or A as in Section 4. Cholesky or QR factorization are most often used as “black box” algorithms, but in fact they are similar to much more recently developed methods for inference in graphical models [11]. It will be seen below that R is essentially in correspondence with a junction tree, known from inference in graphical models and also recently applied in SLAM [65]. Both factorization methods, QR and Cholesky (or LDL), are based on the variable elimination algorithm [4, 11]. The difference between these methods is that QR eliminates variable nodes from thefactorgraphandobtainsA = QR, whileCholeskyorLDLstartfromtheMRFandhenceobtain

5 A GRAPHICAL MODEL PERSPECTIVE 13 Figure 6: The triangulated graph corresponding to the well known heuristic of first eliminating landmarks, and then poses. ATA = RTR. Both methods eliminate one variable at a time, starting with δ1, corresponding to the leftmost column of either A or I. The result of the elimination is that δ1is now expressed as a linear combination of all other unknowns δj>1, with the coefficients residing in the corresponding row R1of R. In the process, however, new dependencies are introduced between all variables connected to δ1, which causes edges to be added to the graph. The next variable is then treated in a similar way, until all variables have been eliminated. This is exactly the process of moralization and triangulation familiar from graphical model inference. The result of eliminating all variables is a directed, triangulated (chordal) graph, shown for our example in Figure 5. Once the chordal graph (R!) is obtained, one can obtain the elimination tree of R, defined as a depth-first spanning tree of the chordal graph after elimination, and which is useful in illustrating the flow of computation during the back-substitution phase. To illustrate this, Figure 6 shows the chordal graph obtained by using the well-known heuristic of first eliminating landmarks [79], and then poses (which we will call the LX ordering). The corresponding elimination tree is shown in Figure 7. The root of the tree corresponds to the last variable δnto be eliminated, which is the first to be computed in back-substitution (Equation 13). Computation then proceeds down the tree, and while this is typically done in reverse column order, variables in disjoint subtrees may be computed in any order, see [19] for a more detailed discussion. In fact, if one is only interested in certain variables, there is no need to compute any of the subtrees that do not contain them. However, the analysis does not stop there. The graph of R has a clique-structure which can be completely encapsulated in a rooted tree data-structure called a clique tree [66, 4], also known as the junction tree in the AI literature [11]. As an example, the clique-tree for the LX-ordering on the problem of Figure 1 is shown in Figure 8. The correspondence is almost one-to-one: every R corresponds to exactly one clique tree, and vice versa, modulo column re-orderings within cliques. The clique tree is also the basis for multifrontal QR methods [58], which we have also evaluated in our simulations below. In multifrontal QR factorization, computation progresses in this tree

5 A GRAPHICAL MODEL PERSPECTIVE 14 Figure 7: The elimination tree for the LX ordering, showing how the state and landmarks estimates will be computed via back-substitution: the root is computed first - here the first pose on the left - and proceeds down the Markov chain to the right. Landmarks can be computed as soon as the pose they are connected to has been computed. [ l44 x20 x19 x18 x17 x16 x15 x14 x13 x12 x11 x10 x9 x8 x7 x6 x5 x4 x3 x2 x1 x0 ] [ l37 x22 x21 ] [ l10 x25 x24 x23 ] [ l17 ] [ l30 x42 x41 x40 x39 x38 x37 x36 x35 x34 x33 x32 x31 x30 x29 x28 x27 x26 ] [ l46 x52 x51 x50 x49 x48 x47 x46 x45 x44 x43 ] [ l12 ] [ l29 x57 x56 x55 x54 x53 ] [ l32 x61 x60 x59 x58 ] [ l36 ] [ l45 x73 x72 x71 x70 x69 x68 x67 x66 x65 x64 x63 x62 ] [ l1 ] [ l31 ] [ l38 ] [ l39 ] [ l41 ] [ l33 x84 x83 x82 x81 x80 x79 x78 x77 x76 x75 x74 ] [ l4 ] [ l9 ] [ l14 ] [ l16 x86 x85 ] [ l18 x94 x93 x92 x91 x90 x89 x88 x87 ] [ l22 ] [ l25 ] Figure 8: The clique-tree corresponding to the LX ordering, i.e., Figures 6 and 7, showing at each node the frontal variables of each clique (those that do not also appear in cliques below that node). It can be seen that the root consists of the small piece of trajectory from the root to where the first landmark “subtree” branches off.

5 A GRAPHICAL MODEL PERSPECTIVE 15 from the leaves to the root to factorize a sparse matrix, and then from the root to the leaves to perform a backward substitution step. A complete treatment of the relationship between square root information matrices and clique trees is beyond the scope of the current paper, but in other work we have used the clique-tree structure in novel algorithms for distributed inference [19]. 5.3 Improving Performance ⇔ Reducing Fill-in The single most important factor to good performance is the order in which variables are elim- inated. Different variable orderings can yield dramatically more or less fill-in, defined as the amount of edges added into the graph during factorization. As each edge added corresponds to a non-zero in the Cholesky triangle R, both the cost of computing R and back-substitution is heavily dependent on how much fill-in occurs. Unfortunately, finding an optimal ordering is NP-complete. Discoveringalgorithmsthatapproximatetheoptimalorderingisanactiveareaofresearchinsparse linear algebra. A popular method for medium-sized problems is colamd [1], which works on the columns of A. Another popular method, based on graph theory and often used to speed up finite element methods, is generalized nested dissection [55, 54]. Given that an optimal ordering is out of reach in general, heuristics or domain-knowledge can do much better than general-purpose algorithms. A simple idea is to use a standard method such as colamd, but have it work on the sparsity pattern of the blocks instead of passing it the original measurement Jacobian A. In other words we treat a collection of scalar variables such as the x and y position and orientation α as a single variable and create a smaller graph which encapsulates the constraints between these blocks rather than the individual variables. Not only is it cheaper to call colamd on this smaller graph, it also leads to a substantially improved ordering. As we mentioned above, the block-structure is real knowledge about the SLAM problem and is not accessible to colamd or any other approximate ordering algorithm. While the effect on the performance of colamd is negligible, we have found that making it work on the SLAM MRF instead of on the sparse matrix I directly can yield improvements of 2 to sometimes a 100-fold, with 15 being a good rule of thumb. Note that there are cases in which any ordering will result in the same large fill-in. The worst- case scenario is a fully connected bipartite2MRF: every landmark is seen from every location. In that case, eliminating any variable will completely connect all variables on the other side, and after that the structure of the clique tree is completely known: if a pose was chosen first, the root will be the entire map, and all poses will be computed once the map is known. Vice versa, if a landmark is chosen, the trajectory will be the clique tree root clique, and computation will proceed via an (expensive) trajectory optimization, followed by (very cheap) computation of landmarks. Interest- ingly, these two cases form the basis of the standard partitioned inverse, or “Schur complement”, which is well-known in structure from motion applications [79, 41] and also used in GraphSLAM [77]. However, the worst-case scenario outlined above is an exceptional case in robotics: sensors have limited range and are occluded by walls, objects, buildings, etc. This is especially true in large-scale mapping applications, and it essentially means that the MRF will in general be sparsely connected, even though it is one large connected component. 2bipartite is here used to indicate the partition of variables into poses and landmarks, not in the factor graph sense.

6 SQUARE ROOT SAM 16 6 Square Root SAM √SAM variants In this section we take everything we know from above and state three simple depending on whether they operate in batch or incremental mode, and on whether non-linearities are involved or not. Batch√SAM 6.1 A batch-version of square root information smoothing and mapping is straightforward and a com- pletely standard way of solving a large, sparse least-squares problem: Algorithm 1 Batch√SAM 1. Build the measurement Jacobian A and the RHS b as explained in Section 3. p 2. Find a good column ordering p, and reorder Ap ← A p= argminδkApδp− bk2 3. Solve δ∗ from Section 4 2using either the Cholesky or QR factorization method r 4. Recover the optimal solution by δ ← δp, with r = p−1 In tests we have obtained the best performance with sparse LDL factorization [13], which, as mentioned above, is a variant on Cholesky factorization that computes I = LDLT, with D a diagonal matrix and L a lower-triangular matrix with ones on the diagonal. The same algorithm applies in the non-linear case, but is simply called multiple times by the non-linear optimizer. Note that the ordering has to be computed only once, however. Linear Incremental√SAM 6.2 In a robotic mapping context, an incremental version of the above algorithm is of interest. The treatment below holds for either the linear case, or when a good linearization point is available. It is well known that factorizations can be updated incrementally. One possibility is to use a rank 1 Cholesky update, a standard algorithm that computes the factor R0corresponding to a I0= I + aaT where aTis a new row of the measurement Jacobian A, corresponding to new measurements that come in at any given time step. However, these algorithms are typically implemented for dense matrices only, and it is imperative that we use a sparse storage scheme for optimal performance. While sparse Cholesky updates exist [12], they are relatively complicated to implement. A second possibility, easy to implement and suited for sparse matrices, is to use a series of Givens rotations (see [35]) to eliminate the non-zeros in the new measurement rows one by one. A third possibility, which we have adopted for the simulations below, is to update the matrix I and simply use a full Cholesky (or LDL) factorization. While this seems expensive, we have found that with good orderings the dominant cost is no longer the factorization but rather the updating of

7 SIMULATION RESULTS 17 I. For example, the experimental results in Section 8 show that the sparse multiplication to obtain the Hessian is about 5 times as expensive than the factorization at the end of the run. A more detailed discussion of asymptotic bounds can be found in [47]. A comparison with the Givens scheme above would be of interest, but we have not done so yet. Importantly, because the entire measurement history is implicit in I, one does not need to factorize at every time-step. In principle, we can wait until the very end and then compute the entire trajectory and map. At any time during an experiment, however, the map and/or trajectory can be computed by a simple factorization and back-substitution, e.g., for visualization and/or path planning purposes. Non-linear Incremental√SAM 6.3 If in an incremental setting the SLAM problem is non-linear and a linearization point is available, then the linear incremental solution above applies. Whether this is the case very much depends on the sensors and prior knowledge available to the robot. For example, in indoor mobile robots there is often no good absolute orientation sensor, which means we have to linearize around the current guess for the orientation. This is exactly why EKF’s have problems in larger-scale environments, as these potentially wrong linearization points are “baked into” the information matrix when past robot poses are eliminated from the MRF. A significant advantage of the smoothing approach is the fact that we never commit to a given linearization, as no variables are ever eliminated from the graph. There are two different ways to re-linearize: In certain scenarios, like closing of a large loop along the complete trajectory, it is cheaper to re-linearize all variables. Essentially this means that we have to call the batch version above each time. On the upside, our experimental results will show that even this seemingly expensive algorithm is quite practical on large problems where an EKF approach is intractable. The alternative way is favorable when only a smaller number of variables is affected by the change in the linearization point. In this case downdating and updating techniques [79] can be applied to temporarily remove these variables from the factor, followed by adding them in again using the new linearization point. We plan to evaluate this approach in future work. 7 Simulation Results Batch√SAM 7.1 We have experimented at length with different implementations of Cholesky, LDL, and QR fac- torization to establish which performed best. All simulations were done in MATLAB on a 2GHz Pentium 4 workstation running Linux. Experiments were run in synthetic environments like the one shown in Figure 9, with 180 to 2000 landmarks, for trajectories of length 200, 500, and 1000. Each experiment was run 10 times for 5 different methods: • none: no factorization performed • ldl: Davis’ sparse LDL factorization [13] • chol: MATLAB built-in Cholesky factorization

7 SIMULATION RESULTS 18 Figure 9: A synthetic Manhattan world with 500 landmarks (black dots on square city blocks) along with a 1000-step random street walk trajectory (red lines with a dot for each step), corre- sponding to 14000 measurements taken. The measurements (light green) do not line up exactly with the landmarks due to simulated noise. M N none 0.031 0.034 0.036 0.037 0.055 0.062 0.068 0.07 0.104 0.109 0.124 0.126 ldl chol 0.092 0.094 0.102 0.104 0.247 0.271 0.272 0.279 0.523 0.945 0.746 0.657 mfqr 0.868 1.19 1.502 1.543 2.785 3.559 5.143 5.548 10.297 12.112 14.151 15.914 qr 200 180 500 1280 2000 180 500 1280 2000 180 500 1280 2000 0.062 0.062 0.068 0.07 0.176 0.177 0.175 0.181 0.401 0.738 0.522 0.437 1.685 1.256 1.21 1.329 11.92 8.43 6.348 6.908 42.986 77.849 35.719 25.611 500 1000 Figure 10: Averaged simulation results over 10 trials, in seconds, for environments with various number of landmarks N and simulations with trajectory lengths M. The methods are discussed in more detail in the text. The none method corresponds to doing no factorization and measures the overhead.

7 SIMULATION RESULTS 19 Figure 11: Original information matrix I and its Cholesky triangle. Note the dense fill-in on the right, linking the entire trajectory to all landmarks. The number of non-zeroes is indicated by nz. Figure 12: Information matrix I after reordering and its Cholesky triangle. Reordering of columns (unknowns) does not affect the sparseness of I, but the number of non-zeroes in R has dropped from approximately 2.8 million to about 250 thousand.

7 SIMULATION RESULTS 20 Figure 13: By doing the reordering while taking into account the special block-structure of the SLAM problem, the non-zero count can be reduced even further, to about 130K, a reduction by a factor of 20 with respect to the original R, and substantially less than the 500K entries in the filtering covariance matrix. • mfqr: multifrontal QR factorization [58] • qr: MATLAB built-in QR factorization The results are summarized in Figure 10. We have found that, under those circumstances, 1. The freely available sparse LDL implementation by T. Davis [13] outperforms MATLAB’s built-in Cholesky factorization by a factor of 30%. 2. In MATLAB, the built-in Cholesky outperforms QR factorization by a large factor. 3. Multifrontal QR factorization is better than MATLAB’s QR, but still slower than either Cholesky or LDL. 4. While this is not apparent from the table, using a good column ordering is much more impor- tant than the choice of factorization algorithm: QR factorization with a good ordering will outperform LDL or Cholesky with a bad ordering. The latter opens up a considerable opportunity for original research in the domain of SLAM, as we found that injecting even a small amount of domain knowledge into that process yields im- mediate benefits. To illustrate this, we show simulation results for a length 1000 random walk in a 500-landmark environment, corresponding to Figure 9. Both I and R are shown in Figure 11 for the standard (and detrimental) XL ordering with states and landmarks ordered consecutively. The dramatic reduction in fill-in that occurs when using a good re-ordering is illustrated by Figure 12, where we used colamd [1]. Finally, when we use the block-oriented ordering heuristic discussed in Section 5.3, the fill-in drops by another factor of 2, significantly increasing computational effi- ciency.

8 EXPERIMENTAL RESULTS 21 Incremental√SAM 7.2 We also compared the performance of an incremental version of√SAM, described in Section 6, with a standard EKF implementation by simulating 500 time steps in a synthetic environment with 2000 landmarks. The results are shown in Figure 14. The factorization of I was done using sparse LDL [13], while for the column ordering we used symamd [1], a version of colamd for symmetric positive definite matrices. Smoothing every time step becomes cheaper than the EKF when the number of landmarks N reaches 600. At the end, with N = 1,100, each factorization took about 0.6 s, and the slope is nearly linear over time. In contrast, the computational requirements of the EKF increase quadrati- cally with N, and by the end each update of the EKF took over a second. As implementation independent measures, we have also plotted N2, as well as the number of non-zeros nz in the Cholesky triangle R. The behavior of the latter is exactly opposite to that of the EKF: when new, unexplored terrain is encountered, there is almost no correlation between new features and the past trajectory and/or map, and nz stays almost constant. In contrast, the EKF’s computation is not affected when re-entering previously visited areas -closing the loop- whereas that is exactly when R fill-in occurs. 8 We have evaluated the non-linear incremental version of√SAM on a very challenging, large-scale vision-basedSLAMproblem, inwhichamobilerobotequippedwith8camerastraversedanindoor office environment. This is probably the most challenging data-set we have ever worked with, both because of the amount of data that needed to be dealt with, as well as the logistical and calibration issues that plague multi-camera rigs. In addition, dealing with visual features is complex and prone to failure. In this case, the measurements are features extracted from 8 cameras mounted on top of an iRobot ATRV-Mini platform, as shown in Figure 15. They are matched between successive frames using RANSAC based on a trifocal camera-arrangement. The data was taken in an office envi- ronment, with a bounding box of about 30m by 50m for the robot trajectory, and the landmarks sought are a set of unknown 3D points.The measurements consisted of the odometry provided by the robot, as well as 260 joint images taken with variable distances of up to 2 meters between successive views, and an overall trajectory length of about 190m. The unknown poses were mod- eled as having 6 degrees of freedom (DOF), three translational and three rotational. Even though 3DOF seems sufficient for a planar indoor office environment, it turns out that 6DOF with a prior on pitch, roll and height is necessary, since any small bump in the floor has a clearly visible effect on the images. The standard deviations on x and y are 0.02m and on φ (yaw) 0.02rad. The priors on z, θ (pitch) and ψ (roll) are all 0 with standard deviations 0.01m and 0.02rad respectively. The camera rig was calibrated in advance. The real-time constraints of the application. Our approach was to invoke the batch√SAM algorithm after every three joint images taken by the robot. In each of these invocations, the problem was re- peatedly re-linearized and factorized to yield an optimal update. Again we used sparse LDL as the main factorization algorithm. More importantly, for ordering the variables we used colamd com- Experimental Results √SAM approach was able to comfortably deal with this large problem well within the

8 EXPERIMENTAL RESULTS 22 5 12x 10 1200 10 1000 8 800 6 600 4 400 2 200 0 0 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 (a) number of landmarks N seen (b) N2(circles) and nz in R (asterisks) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 50 100 150 200 250 300 350 400 450 500 (c) averagetimepercall, insecondsfortheEKF(circles) and LDL factorization (asterisks) Figure 14: Timing results for incremental SAM in a simulated environment with 2000 landmarks, similartotheoneinFigure9, but10blocksontheside. Asthenumberoflandmarksseenincreases, theEKFbecomesquadraticallyslower. Notethatthenumberofnon-zerosnz increasesfasterwhen large loops are encountered around i = 200 and i = 350.

9 DISCUSSION 23 Figure 15: Custom made camera rig, mounted on top of an ATRV-Mini mobile robot. Eight Fire- Wire cameras are distributed equally along a circle and connected to an on-board laptop. bined with the block-structured ordering heuristic. The latter alone yielded a 15-fold improvement in the execution time of LDL with respect to colamd by itself. Processingtheentiresequencetookatotalof11minutesand10secondsona2GHzPentium-M based laptop, and the final result can be seen in Figure 16. The correspondence matching yielded in 17780 measurements on a total of 4383 unknown 3D points, taken from 260 different robot locations. In the final step, the measurement matrix A had 36,337 rows and 14,709 columns, with about 350K non-zero entries. The relevant timing results are shown in Figure 17. The unexpected result is that the factor- ization, because of the good variable ordering, is now but a minor cost in the whole optimization. Instead, the largest cost is now evaluating the measurement Jacobian A (linearizing the measure- ment equations), which was done a total of 453 times. Its computational demands over time are shown in the panel at the top. Next in line is the computation of the information matrix I = ATA, shown above by “hessian” in the bottom panel. This is done exactly as many times as LDL itself, i.e., a total of 734 times. By the end of the sequence, this sparse multiplication (yielding - by then - a 15K×15K matrix) takes about 0.6 secs. In contrast, factorizing the resulting information matrix I takes just 0.1 seconds. Clearly, further improvement must come from avoiding linearization of the entire set of mea- surement equations, and hence keeping large parts of these expensive operations constant. 9 Discussion The square root information smoothing approaches we presented in the current paper have several significant advantages over the EKF: • They are much faster than EKF-based SLAM on large-scale problems • They are exact, in contrast to approximate methods to deal with the EKF shortcomings • They can be used in either batch or incremental mode • If desired, they yield the entire smoothed robot trajectory

9 DISCUSSION 24 Figure 16: Projected trajectory and map after applying incremental√SAM using visual input and odometry only. Each robot pose is shown as an outline of the ATRV-Mini platform. The recovered 3D structure is represented by green points. For comparison the manually aligned building map is shown in gray. Given that no loop-closing occurred and considering the large scale of the environ- ment and the incremental nature of the reconstruction method, this result is of very high-quality. Note that features also occur along the ceiling, and that some features outside the building outline are caused by reflections.

9 DISCUSSION 25 Jacobian 1.4 1.2 1 0.8 time Jacobian 0.6 0.4 0.2 0 1 20 39 58 77 96 115 134 153 172 191 210 229 248 267 286 305 324 343 362 381 400 419 438 iteration A'A, LDL 0.6 0.5 0.4 hessian ldl time 0.3 0.2 0.1 0 1 32 63 94 125 156 187 218 249 280 311 342 373 404 435 466 497 528 559 590 621 652 683 714 iteration Figure 17: Timing results for incremental√SAM on the sequence from Figure 16. Top: cost of forming the Jacobian A (linearization). Bottom: surprisingly, the cost for LDL factorization is now less than forming the Hessian or information matrix I.

10 CONCLUSION AND FUTURE WORK 26 • They are much better equipped to deal with non-linear process and measurement models than the EKF • They automatically exploit locality by means of variable ordering heuristics However, there is also a price to pay: • Because we smooth the entire trajectory, computational complexity grows without bound over time, as clearly illustrated in Figure 17. In many typical mapping scenarios, however, the computational and storage demands of the EKF information or covariance matrix will grow much faster still (in the same example, it would involve storing and manipulating a dense 15K × 15K matrix) • As with all information matrix approaches, it is expensive to recover the joint covariance matrix governing the unknowns. However, we can recover their marginal densities at a much lower cost [36] In this paper we only reported on our initial experiences with this approach, and in particular the following leaves to be desired: • We have not yet established any tight or amortized complexity bounds that predict the algo- rithm’s excellent performance on problems of a given size. • Comparison of our approach to more recent and faster SLAM variants, both approximate [49, 65, 78] and exact [38], is the object of future work. Finally, in this paper we primarily concentrated on the large-scale optimization problem asso- ciated with SLAM. Many other issues are crucial in the practice of robot mapping, e.g. the data- associationproblem, exploitationvs. exploration, andthemanagementofvery-largescalemapping problems. As such, the techniques presented here are meant to complement, not replace methods that make judicious approximations in order to reduce the asymptotic complexity of SLAM. 10 Conclusion and Future Work In conclusion, we believe square root information smoothing to be of great practical interest to the SLAM community. It recovers the entire trajectory and is exact, and even the sub-optimal incremental scheme we evaluated behaves much better than the EKF as the size of the environ- ment grows. In addition, our experiments with real robot data offer proof that the possibility of re-linearizing the entire trajectory at each time-step makes√SAM cope well with noisy measure- ments governed by non-linear measurement equations. In contrast, non-optimal linearization can- not be recovered from in an EKF, which inevitably has to summarize it in a quadratic (Gaussian) approximation. Incremental implementations of the proposed method are of great interest for real-time applica- tions. We are currently exploring Givens rotations as a means of incrementally updating the factor. Downdating could also allow selective changes of variables affected by re-linearization. Finally, our ongoing work indicates that an efficient retrieval of the marginal covariances is possible based on the square root factor. This is important for data association, an aspect of the SLAM problem that we have ignored in the present work.

REFERENCES 27 Acknowledgments We would like to thank Alexander Kipp and Peter Krauthausen for their valuable help in this project, without which our task would have been much harder. In addition, we would like to thank the anonymous reviewers for their valuable comments. Finally, we gratefully acknowledge support from the National Science Foundation through awards IIS - 0448111 "CAREER: Markov Chain Monte Carlo Methods for Large Scale Correspondence Problems in Computer Vision and Robotics" and No. IIS - 0534330 "Unlocking the Urban Photographic Record Through 4D Scene Understanding and Modeling". References [1] P. R. Amestoy, T. Davis, and I. S. Duff. An approximate minimum degree ordering algorithm. SIAM Journal on Matrix Analysis and Applications, 17(4):886–905, 1996. [2] N. Ayache and O.D. Faugeras. Maintaining representations of the environment of a mobile robot. IEEE Trans. Robot. Automat., 5(6):804–819, 1989. [3] G.J. Bierman. Factorization methods for discrete sequential estimation, volume 128 of Math- ematics in Science and Engineering. Academic Press, New York, 1977. [4] J.R.S. Blair and B.W. Peyton. An introduction to chordal graphs and clique trees. In J.A. George, J.R. Gilbert, and J.W-H. Liu, editors, Graph Theory and Sparse Matrix Computa- tions, volume 56 of IMA Volumes in Mathematics and its Applications, pages 1–27. Springer- Verlag, 1993. [5] M. Bosse, P. Newman, J. Leonard, M. Soika, W. Feiten, and S. Teller. An Atlas framework for scalable mapping. In IEEE Intl. Conf. on Robotics and Automation (ICRA), 2003. [6] R.A. Brooks. Visual map making for a mobile robot. In IEEE Intl. Conf. on Robotics and Automation (ICRA), volume 2, pages 824 – 829, March 1985. [7] Duane C. Brown. The bundle adjustment - progress and prospects. Int. Archives Photogram- metry, 21(3), 1976. [8] J.A. Castellanos, J.M.M. Montiel, J. Neira, and J.D. Tardos. The SPmap: A probabilistic framework for simultaneous localization and map building. IEEE Trans. Robot. Automat., 15(5):948–953, 1999. [9] R. Chatila and J.-P. Laumond. Position referencing and consistent world modeling for mobile robots. In IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 138–145, 1985. [10] M.A.R. Cooper and S. Robson. Theory of close range photogrammetry. In K.B. Atkinson, editor, Close range photogrammetry and machine vision, chapter 1, pages 9–51. Whittles Publishing, 1996.

REFERENCES 28 [11] R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter. Probabilistic Networks and Expert Systems. Statistics for Engineering and Information Science. Springer-Verlag, 1999. [12] T. Davis and W. Hager. Modifying a sparse Cholesky factorization. SIAM Journal on Matrix Analysis and Applications, 20(3):606–627, 1996. [13] T. A. Davis. Algorithm 8xx: a concise sparse Cholesky factorization package. Technical Re- port TR-04-001, Univ. of Florida, January 2004. Submitted to ACM Trans. Math. Software. [14] T.A.Davis, J.R.Gilbert, S.I.Larimore, andE.G.Ng. Acolumnapproximateminimumdegree ordering algorithm. ACM Trans. Math. Softw., 30(3):353–376, 2004. [15] T.A. Davis and K. Stanley. KLU: a "Clark Kent" sparse LU factorization algorithm for circuit matrices. In 2004 SIAM Conference on Parallel Processing for Scientific Computing (PP04), 2004. [16] M. Deans and M. Hebert. Experimental comparison of techniques for localization and map- ping using a bearings only sensor. In Proc. of the ISER ’00 Seventh International Symposium on Experimental Robotics, December 2000. [17] M. Deans and M. Hebert. Invariant filtering for simultaneous localization and mapping. In IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 1042–7, April 2000. [18] F. Dellaert. Square Root SAM: Simultaneous location and mapping via square root informa- tion smoothing. In Robotics: Science and Systems (RSS), 2005. [19] F. Dellaert, A. Kipp, and P. Krauthausen. A multifrontal QR factorization approach to dis- tributed inference applied to multi-robot localization and mapping. In AAAI Nat. Conf. on Artificial Intelligence, 2005. [20] J.E. Dennis and R.B. Schnabel. Numerical methods for unconstrained optimization and non- linear equations. Prentice-Hall, 1983. [21] M.W.M.G. Dissanayake, P.M. Newman, H.F. Durrant-Whyte, S. Clark, and M. Csorba. A solution to the simultaneous localization and map building (SLAM) problem. IEEE Trans. Robot. Automat., 17(3):229–241, 2001. [22] T. Duckett, S. Marsland, and J. Shapiro. Learning globally consistent maps by relaxation. In IEEE Intl. Conf. on Robotics and Automation (ICRA), San Francisco, CA, 2000. [23] T. Duckett, S. Marsland, and J. Shapiro. Fast, on-line learning of globally consistent maps. Autonomous Robots, 12(3):287–300, 2002. [24] R. Eustice, H. Singh, and J. Leonard. Exactly sparse delayed-state filters. In IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 2428–2435, Barcelona, Spain, April 2005. [25] O.D. Faugeras. Three-dimensional computer vision: A geometric viewpoint. The MIT press, Cambridge, MA, 1993.

REFERENCES 29 [26] J. Folkesson and H. I. Christensen. Graphical SLAM - a self-correcting map. In IEEE Intl. Conf. on Robotics and Automation (ICRA), volume 1, pages 383 – 390, 2004. [27] J. Folkesson, P.Jensfelt, and H. I. Christensen. Graphical SLAM using vision and the mea- surement subspace. In IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2005. [28] U. Frese. An O(logn) Algorithm for Simultaneous Localization and Mapping of Mobile Robots in Indoor Environments. PhD thesis, University of Erlangen-Nürnberg, 2004. [29] U. Frese. Treemap: An O(logn) algorithm for simultaneous localization and mapping. In Spatial Cognition IV, pages 455–476. Springer Verlag, 2005. [30] U. Frese and T. Duckett. A multigrid approach for accelerating relaxation-based SLAM. In Proc. of the IJCAI-03 on Reasoning with Uncertainty in Robotics, 2003. [31] U. Frese, P. Larsson, and T. Duckett. A multilevel relaxation algorithm for simultaneous localisation and mapping. IEEE Trans. Robototics, 21(2):196–207, April 2005. [32] J.A. George, J.R. Gilbert, and J.W-H. Liu, editors. Graph Theory and Sparse Matrix Com- putations, volume 56 of IMA Volumes in Mathematics and its Applications. Springer-Verlag, 1993. [33] M. Golfarelli, D. Maio, and S. Rizzi. Elastic correction of dead-reckoning errors in map building. In IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pages 905–911, 1998. [34] M. Golfarelli, D. Maio, and S. Rizzi. Correction of dead-reckoning errors in map building. IEEE Trans. Robot. Automat., 17(1), 2001. [35] G.H. Golub and C.F. Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, third edition, 1996. [36] G.H. Golub and R.J. Plemmons. Large-scale geodetic least-squares adjustment by dissection and orthogonal decomposition. Linear Algebra and Its Applications, 34:3–28, Dec 1980. [37] S. Granshaw. Bundle adjustment methods in engineering photogrammetry. Photogrammetric Record, 10(56):181–207, 1980. [38] J. Guivant and E. Nebot. Optimization of the simultaneous localization and map building algorithm for real time implementation. IEEE Trans. Robot. Automat., 17(3):242–257, June 2001. [39] J. Guivant, E. Nebot, J. Nieto, and F. Masson. Navigation and mapping in large unstructured environments. Intl. J. of Robotics Research, 23:449–472, April 2004. [40] J.-S. Gutmann and K. Konolige. Incremental mapping of large cyclic environments. In Proc. of the IEEE Intl. Symp. on Computational Intelligence in Robotics and Automation (CIRA), pages 318–325, November 2000.

REFERENCES 30 [41] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, 2000. [42] P. Heggernes and P. Matstoms. Finding good column orderings for sparse QR factorization. In Second SIAM Conference on Sparse Matrices, 1996. [43] A. Howard, M.J. Matari´ c, and G.S. Sukhatme. Relaxation on a mesh: a formalism for gener- alized localization. In IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pages 1055 – 1060, Wailea, Hawaii, Oct 2001. [44] S.J. Julier and J.K. Uhlmann. A counter example to the theory of simultaneous localization and map building. In IEEE Intl. Conf. on Robotics and Automation (ICRA), volume 4, pages 4238–4243, 2001. [45] S.J. Julier and J.K. Uhlmann. Simultaneous localisation and map building using split co- variance intersection. In IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), volume 3, pages 1257–62, 2001. [46] K. Konolige. Large-scale map-making. In AAAI Nat. Conf. on Artificial Intelligence, San Jose, 2004. [47] Peter Krauthausen, Frank Dellaert, and Alexander Kipp. Exploiting locality by nested dis- section for square root smoothing and mapping. In Robotics: Science and Systems (RSS), 2006. [48] F.R. Kschischang, B.J. Frey, and H-A. Loeliger. Factor graphs and the sum-product algo- rithm. IEEE Trans. Inform. Theory, 47(2), February 2001. [49] J. J. Leonard and H. J. S. Feder. Decoupled stochastic mapping. IEEE Journal of Oceanic Engineering, pages 561–571, October 2001. [50] J. J. Leonard and P. M. Newman. Consistent, convergent, and constant-time SLAM. In Intl. Joint Conf. on Artificial Intelligence (IJCAI), 2003. [51] J.J. Leonard, I.J. Cox, and H.F. Durrant-Whyte. Dynamic map building for an autonomous mobile robot. Intl. J. of Robotics Research, 11(4):286–289, 1992. [52] J.J. Leonard and H.F. Durrant-Whyte. Simultaneous map building and localization for an autonomous mobile robot. In IEEE Int. Workshop on Intelligent Robots and Systems, pages 1442–1447, 1991. [53] J.J. Leonard and H.F. Durrant-Whyte. Directed Sonar Sensing for Mobile Robot Navigation. Kluwer Academic, Boston, 1992. [54] R.J. Lipton and R.E. Tarjan. Generalized nested dissection. SIAM Journal on Applied Math- ematics, 16(2):346–358, 1979. [55] R.J. Lipton and R.E. Tarjan. A separator theorem for planar graphs. SIAM Journal on Applied Mathematics, 36(2):177–189, 1979.

REFERENCES 31 [56] F. Lu and E. Milios. Globally consistent range scan alignment for environment mapping. Autonomous Robots, pages 333–349, April 1997. [57] F. Lu and E. Milios. Robot pose estimation in unknown environments by matching 2D range scans. Journal of Intelligent and Robotic Systems, page 249:275, April 1997. [58] P. Matstoms. Sparse QR factorization in MATLAB. ACM Trans. Math. Softw., 20(1):136– 159, 1994. [59] P. Maybeck. Stochastic Models, Estimation and Control, volume 1. Academic Press, New York, 1979. [60] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit. FastSLAM: A factored solution to the simultaneous localization and mapping problem. In AAAI Nat. Conf. on Artificial Intelligence, 2002. [61] P. Moutarlier and R. Chatila. An experimental system for incremental environment mod- elling by an autonomous mobile robot. In Experimental Robotics I, The First International Symposium, Montréal, Canada, June 19-21, 1989, pages 327–346, 1989. [62] P. Moutarlier and R. Chatila. Stochastic multisensor data fusion for mobile robot location and environment modelling. In 5th Int. Symp. Robotics Research, 1989. [63] K. Murphy. Bayesian map learning in dynamic environments. In Advances in Neural Infor- mation Processing Systems (NIPS), 1999. [64] P. Newman. On the Structure and Solution of the Simultaneous Localisation and Map Build- ing Problem. PhD thesis, The University of Sydney, 1999. [65] M.A. Paskin. Thin junction tree filters for simultaneous localization and mapping. In Intl. Joint Conf. on Artificial Intelligence (IJCAI), 2003. [66] A. Pothen and C. Sun. Distributed multifrontal factorization using clique trees. In Proceed- ings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, pages 34–40. Society for Industrial and Applied Mathematics, 1992. [67] D. Rodriguez-Losada, F. Matia, and A. Jimenez. Local maps fusion for real time multirobot indoor simultaneous localization and mapping. In IEEE Intl. Conf. on Robotics and Automa- tion (ICRA), volume 2, pages 1308–1313, 2004. [68] C.C. Slama, editor. Manual of Photogrammetry. American Society of Photogrammetry and Remote Sensing, Falls Church, VA, 1980. [69] R. Smith and P. Cheeseman. On the representation and estimation of spatial uncertainty. Intl. J. of Robotics Research, 5(4):56–68, 1987. [70] R. Smith, M. Self, and P. Cheeseman. A stochastic map for uncertain spatial relationships. In Int. Symp on Robotics Research, 1987.

REFERENCES 32 [71] R. Smith, M. Self, and P. Cheeseman. Estimating uncertain spatial relationships in Robotics. In I. Cox and G. Wilfong, editors, Autonomous Robot Vehicles, pages 167–193. Springer- Verlag, 1990. [72] R. Szeliski and S.B. Kang. Recovering 3D shape and motion from image streams using non-linear least squares. Technical Report CRL 93/3, DEC Cambridge Research Lab, 1993. [73] R. Szeliski and S.B. Kang. Recovering 3D shape and motion from image streams using nonlinear least squares. Journal of Visual Communication and Image Representation, 5(1), March 1994. [74] Y. Tang and C. G. Lee. A geometric feature relation graph formulation for consistent sensor fusion. IEEE Trans. Syst., Man, Cybern., 22:115–129, January/February 1992. [75] J.D. Tardós, J. Neira, P.M. Newman, and J.J. Leonard. Robust mapping and localization in indoor environments using sonar data. Intl. J. of Robotics Research, 21(4):311–330, 2002. [76] S. Thrun. Robotic mapping: a survey. In Exploring artificial intelligence in the new millen- nium, pages 1–35. Morgan Kaufmann, Inc., 2003. [77] S. Thrun, W. Burgard, and D. Fox. Probabilistic Robotics. The MIT press, Cambridge, MA, 2005. [78] S. Thrun, Y. Liu, D. Koller, A.Y. Ng, Z. Ghahramani, and H. Durrant-Whyte. Simultane- ous localization and mapping with sparse extended information filters. Intl. J. of Robotics Research, 23(7-8):693–716, 2004. [79] B. Triggs, P. McLauchlan, R. Hartley, and A. Fitzgibbon. Bundle adjustment – a modern synthesis. In W. Triggs, A. Zisserman, and R. Szeliski, editors, Vision Algorithms: Theory and Practice, LNCS, pages 298–375. Springer Verlag, 2000. [80] S.B. Williams, G. Dissanayake, and H. Durrant-Whyte. An efficient approach to the simul- taneous localisation and mapping problem. In IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 406–411, 2002. [81] G. Winkler. Image analysis, random fields and dynamic Monte Carlo methods. Springer Verlag, 1995. [82] J.S. Yedidia, W.T. Freeman, and Y.Weiss. Generalized belief propagation. In Advances in Neural Information Processing Systems (NIPS), pages 689–695, 2000.