© Copyright 2001, by American Congress on Surveying and Mapping, All Rights Reserved. Displayed on these web pages with permission.

This peer reviewed paper appeared in the June 2001 Issue of
*Surveying and Land Information Systems*

The equations and diagrams here are ASCII art, since I don't think that the typeset equations are THAT much more readable, and I don't normally put in graphics unless there are significant improvements in readability.

Least Squares Network Adjustments via QR Factorization

John Halleck P.O. Box 58488 Salt Lake City, Utah 84158-0488 801.585.9572 John.Halleck@utah.edu http://www.cc.utah.edu/~nahaj/

An introduction to orthogonal QR factorizations (via Given's
rotations) as an alternative to Normal Equations for the least
squares solution of network adjustment problems. The main advantage
is its greater numerical stability. When the ratio of redundant
shots to non-redundant shots is small^{6},
QR factorization also has lower operation counts. In addition, it
allows for efficient use of the underlying sparsity of the design
matrix and has better performance on computers with paged memory.

- Introduction
- Notes
- Basics
- Review of Solving by Normal Equations
- Equivalence of QR and Normal Equation Solutions
- Solving by QR Factorization
- Given's Rotations
- Initial Setup
- Processing
- Updates
- Memory Usage
- Operation Counts
- Conclusions
- Example
- Numeric Stability Example
- Acknowledgements
- References
- Addenda

Network adjustment problems are usually set up as a (linearized) weighted matrix problem. The least squares solution of this problem is usually computed by "the method of normal equations". While this method has great intuitive appeal, and very long tradition, numerical analysts in other problem domains usually use other methods. The condition number of the matrix is a measure of the amount of significance lost when you solve the system it represents. Solving the problem via normal equations squares the condition number of the problem, while solving by many other methods does not.

Numerical analysts have many methods appropriate to least
squares problems. A very good overview of the methods available can
be found in "*Matrix Computations*" (Golub and Van Loan,
1996). "QR factorization" is a popular method that works well for
the surveying problem. There are, in fact, several equivalent
methods of forming the QR factorization. This article will discuss
using Given's rotations for the task. Other techniques that are
found in the literature that could also be used include Given's
reflections, Householder reflections, fast Givens, Gentleman's
methods, etc. Given's rotations are used in this paper because they
have low operation counts on problems with the structure that
survey networks show, and are relatively easy to describe and
implement. For other types of least squares adjustment problems
(such as photogrammetry) other methods of obtaining the QR
factorization may be more appropriate.

The notational conventions of the main body of the numerical analysis literature have been used in this paper. This should make it easier for people who read this article to go to the actual literature. However it does make for some awkwardness in a few places. For the same reason the presentation will be in terms of an upper triangular matrix, even though a lower triangular matrix is somewhat more natural to the problem.

Instead of full three dimensional adjustments, level nets will be used in order to simplify this presentation. Extension of the adjustments to three dimensions is the same as it is for traditional normal equation adjustments. Some QR specific optimizations exist, but are beyond the scope of this paper.

Any good book about adjusting survey networks will cover the details of the problem setup, so only a quick summary of the usual background is given below.

A survey network consists of shots that connect points, along with weights for those shots. At least one point has been tied down (called a "control point"). The problem has been linearized, converting the shots into changes in coordinates. The (co)variances of the shots have also been computed so that weights can be correctly assigned.

There is a design matrix **A** that describes the
connectivity of the problem (the matrix contains only 1, -1, and
0). There is a collection of unknowns **X**, an array of
constants **L** (the constraints and the linearized measured
values). The design matrix (**A**) has n columns and m rows (one
row for each measurement, and one for each control point). The
number of rows is greater than or equal to the number of columns.
When it is greater, the problem is called an "over determined
system", and least squares techniques are used to solve it. The
redundancy of the problem r is the number of rows minus the number
of columns (m-n). Not all shots are of equal accuracy, and they are
not normally weighted the same. Most problems have a weight matrix
**W** that reflects this fact.

When a system is over determined, it is not generally possible
to find "the solution" of the problem, since there is usually no
specific solution that makes the result exactly right. There are an
infinite number of possible "answers" that are nearly right. The
amount that would have to be added to the **L** matrix to make
the problem come out exact is called the "residuals" (usually
denoted by **V**). The residuals are often specifically written
into the problem equation when people solve by normal equations,
but is usually only computed after the least squares solution is
found in numerical analysis texts. The specific solution usually
sought for adjustments is the solution that minimizes the sums of
the squares of the residuals. This is called the "least squares"
solution in survey texts, although numerical analysis literature
often refers to this as the "solution that minimizes the 2-norm".

Solving the problem:

1) Unit conversion, linearization, gross blunder detection, and any other preprocessing.

2) The problem is set up in the form:

AX=L

subject to a weight matrix **W** (which is the inverse of the
matrix of covariances).

3) The least squares solution is computed. This is the step that this paper addresses.

4) The statistics of the survey are computed, blunder detection, relinearization, or other post processing is performed.

The least squares solution of the over determined system **
A****X**=**L**, subject to a weight matrix **W** is
given by the system

**A**^{t}**W****A****X** = **A**^{t}**W****L**

(Where **A**^{t} means transpose of **A**)

This new system is square, and may now be solved by a number of
methods. The traditional method is to use a Cholesky factorization
of the matrix on the left-hand side into an upper triangular matrix
**R** and its transpose. This gives:

**R**^{t}**R****X** = **A**^{t}**W****L**

The **R** matrix above is called the "Cholesky factor" of the
normal equations. Solving this system is now easy, since solving
triangular matrices is easy. In the worst case, the operation count
for Cholesky factorization is about n^{3}/3. The actual
work in the sparse matrix case varies with the problem and is not
easy to predict in advance.

Other techniques (such as Gaussian elimination) can also be used to solve the system, but there is no advantage in operation counts or in accuracy, so Cholesky factorization will be used in this paper.

The system is now solved for **X**. This is done either by
appropriate backward and forward substitutions or by explicitly
forming the inverse matrix. A common way of doing this is to form
the matrix augmented by the identity, and solve the systems:

R^{t}Y= [A^{t}WL|I] Solving this is equivalent to multiplying both sides by the inverse ofR^{t}RZ=YSolving this is equivalent to multiplying both sides by the inverse ofRZ= [X|R^{-1}(R^{t})^{-1}] but sinceR^{t}Ris alsoA^{t}WA, (R)^{-1}(R^{t})^{-1}is also (A^{t}WA)^{-1}. So our result is [X| (A^{t}WA)^{-1}]

It is possible to take advantage of the simple structure of the survey problem and of the resulting (sparse, positive definite) matrices in order to avoid having to do the full work that the matrix multiplications above would imply. There are many simplifications and speedups available that make actual implementations of this process by normal equations the traditional method of choice.

The least squares solution to the problem, while traditionally solved by normal equations) can be computed with some advantages by various orthogonalization methods such as QR factorization.

The weighted problem is first reduced to an equivalent
unweighted problem. The weight matrix can be factored into a matrix
**W'** such that **W'**^{t}**W'** is equal to the
original **W** matrix. This is trivial, since the matrix is
diagonal (or block diagonal in three dimensional problems). By
premultiplying both sides of the problem (**A****X** = **
L**) by **W'** a problem **W'****A****X** = **
W'****L** is produced which has the same solutions as the
original weighted problem. Using **G** for **W'****A** and
using **H** for **W'****L**, the new problem is **
G****X** = **H**.

The fact that this is the same problem can be shown by setting up the solution of this new problem by normal equations:

(W'A)^{t}W'AX= (W'A)^{t}W'L

which is

A^{t}W'^{t}W'AX=A^{t}W'^{t}W'L

and since **W'**^{t}**W'** is **W**, that
reduces the problem to

A^{t}WAX=A^{t}WL

which is what the normal equation solution would have given.

Solving the weighted system (**G****X**=**H**) proceeds
by factoring **G** into two matrices, **Q** and **R**,
such that **G** = **Q****R**, where **R** is upper
triangular, and **Q** is orthogonal. "Orthogonal" means that the
transpose of **Q** is also the inverse of **Q**.

The upper triangular part of the matrix **R** that comes out
of this factorization is the same matrix that would have come out
of the Cholesky factorization in the normal equation case. This can
be demonstrated by forming the normal equations with **
Q****R** instead of **G**.

(QR)^{t}QR=R^{t}Q^{t}QR

and since **Q**^{t}**Q** is **
Q**^{-1}**Q** (from the definition of **Q** being
orthogonal) and since that is the identity, the equations in this
case reduce to **R**^{t}**R**. Since **R** is
upper triangular, this is already in the form that the Cholesky
factorization would produce. Since the Cholesky factorization is
unique, our **R** must be the same **R** that would have been
produced by normal equation methods (up to the signs of the
rows^{4}).
Since the Cholesky factorization's **R** has positive diagonal
elements, the correct signs for rows could be established
trivially. Instead of adjusting the signs of the rows after the
fact, one can ensure the correct signs merely by making the shots
in the spanning tree always shoot from a known point to an unknown
point instead of the other way around. This works because the
Given's rotations will produce positive values on the diagonal, and
the restriction on direction insures that the diagonal starts with
positive entries.

A QR factorization splits the **A** matrix up into an upper
triangular **R** matrix, and an orthogonal **Q** matrix.
Since only **R** is really needed to compute the solution, there
is no need to actually form and save **Q**. This reduces the
storage required to just that needed to store **R** and some
space for housekeeping.

Given's rotations, the QR factorization method covered in this
paper, are an orthogonalization method that can be used to zero out
items below the main diagonal in a matrix without changing the
2-norm of the problem. This means that they can be freely used to
work on the problem without making changes to the least squares
solution.^{3}

Factoring with Given's rotations takes on the order of
(m-n)n^{2}/2 operations, compared with on the order of
n^{3}/3 for the Cholesky factorization. For small amounts
of redundancy, this is a smaller operation count. (If there is no
redundancy the overhead of this step really is zero, since the
Cholesky factor is a spanning tree in that case.) The traditional
solution has now been replaced with an algorithm that is usually
faster and always more stable.

For the QR factorization, the **R** matrix starts out in the
form of a spanning tree (discussed below) followed by the
redundant shots. Most surveys are already in that form or can be
put into it with trivial effort. For a collection of shots with no
order, you would have to put it into that form, but that can be
done in linear time.

After the QR factorization, the upper triangular part of **
R** is now the same as would have computed for the Cholesky
factorization. This allows us to use the computed **R** to get
the solution exactly the same way that it would be used for a
solution by normal equations.

A "Given's Rotation" is an orthogonal transformation that can be used to selectively zero out elements of a matrix, without changing the 2-norm (least squares solution) of the problem it represents.

If the leading element in the **R** matrix row is D, and the
leading element in the redundant row is E, then the Given's
rotation of the two rows is a premultiply by the following matrix,
where S=E/P, C=D/P, where P is the Pythagorean sum of D and E
[sqrt(D^{2}+E^{2})].

[ 1 ] [ . ] [ . ] [ 1 ] [ C . . . S ] This row is changed [ . 1 . ] [ . 1 . ] [ . 1 . ] [ -S . . . C ] This row is changed [ 1 ] [ . ] [ . ] [ 1 ] All other rows are unaffected.

All other elements above are zero. Since a multiplication by this matrix only affects two rows, it can be written as a two by two matrix multiplied by those two rows:

[ C S ] [ Row fromR] = [ Updated row forR] [ -S C ] [ Row we are processing ] [ Updated current row. ]

Actually computing the scale [sqrt(D^{2}+E^{2})]
by squaring the quantities, adding them, and taking the square root
can lead to notable loss of significance in the answer and the
danger of overflow. In practice the Moler-Morrison algorithm (Moler
and Morrison, 1983) can be used to compute the scale factor,
without danger of overflow unless the scale itself overflows. Other
techniques for avoiding numerical problems computing this quantity
are given in "Matrix Computations" (Golub and Van Loan, 1966). So
that some measure of work for this part can be used when computing
operation counts, the 24 floating point operations that
Moler-Morrison uses to compute 20 significant figures will be
used^{2}. Variants of the
Given's rotation such as Fast Given's or Gentleman's transformation
(Golub and Van Loan, 1996) can be used to avoid a square root
altogether.

Readers with a computer graphics or analytic geometry background will recognize this matrix as a classic rotation matrix, where C and S are the cosine and sine of an abstract angle.

Every survey network has what graph theorists call a "rooted spanning tree". This is a minimum collection of shots that tie the network together. There may be many possible spanning trees and any such tree will do for our purposes. Separating the problem into the spanning tree and the remaining redundant shots allows us to set up the problem in a form easy to process with QR factorization. A spanning tree is a survey you would get if you started at a control point and always shot from some already surveyed point to some point not yet in the survey.

The shots in the design matrix **A** usually form a matrix
that is almost lower triangular. The matrix can be arranged to be
lower triangular, with a control point first, and then the
non-redundant shots. Arranging it in lower triangular form is
equivalent to making it start with a spanning tree order
^{7}.

Since the order of shots and the exact numbering of points make
no difference to the solution of the network, both can be
renumbered in order to make this part of the **A** matrix an
upper triangular matrix. (There is no formal requirement that upper
triangular matrices be used, since a lower triangular matrix could be made
to work just as well. Upper triangular matrices will be used here only
to match the normal conventions for numerical analysis texts, and
avoid having to rederive the standard QR properties for a QL factorization.)
It should be noted that that all the renumbering is just loop
housekeeping in the program that processes this matrix, and need
not be actual movement of the data.

The **A** matrix in now (conceptually) in the following
form:

[ * * * * ... * * ] Spanning tree part. (TheRmatrix of the spanning tree) [ 0 * * * ... * * ] (Upper triangular) [ 0 0 * * ... * * ] [ 0 0 0 * ... * * ] [ ... ... ... ] [ 0 0 0 0 ... * * ] [ 0 0 0 0 ... 0 * ] (This row is a control point) [ ] [ ... ] Redundant shots. [ ... ] [ ... ]

Note that if the survey has no redundant shots, then the **
A** matrix at this point **is** the upper triangular **R**
matrix and the factorization is finished. The QR factorization, as
it processes each shot, converts the **R** matrix from what it
was into one that reflects the addition of this shot to the
survey.

Solving the problem, given **R**, is equivalent to walking
the graph from a control point to each reachable point and adding
in a weighted difference from each involved shot. As each redundant
shot is added to the factorization, the update is equivalent to
walking from each end of the shot back to a control point, while
readjusting weights along the way so that they reflect the
influence of the shot.

Since the matrix starts out in upper triangular form, the
explanation of QR factorization is straightforward. Orthogonal
matrices (Given's rotations in this case) are then used to zero out
each non-zero element in a redundant shot being added, and also update the
**R** matrix. After a redundant row (shot) is processed, it will
not be needed further and may be discarded.

The problem is started by setting **R** to the initial
spanning tree. Processing each redundant shot (or control point)
then proceeds as follows:

- While there are non-zero items in this shot:
- If the first non-zero item of the row is the K'th item, then
get the K'th row of
**R**. - Execute a Given's rotation to clear out that non-zero item.
(This may add new non-zero elements to the right of this element in
both the row of
**R**and in the redundant shot.)

- If the first non-zero item of the row is the K'th item, then
get the K'th row of

Since the redundant rows and the rows of the **R** matrix all
start out sparse, the multiplications implied by the Given's
rotation are usually a multiply by zero. The row of **R** that
is needed is the one corresponding to the next non-zero element in
the redundant vector, so each zero element between our current
position and the next non-zero item in the row means a row of **
R** that will not be affected by this update.

Each shot is processed against the current **R** matrix. This
means that if we save the **R** matrix, adding additional shots
(or control points) later may be done directly. Shots that don't
add a new point are just processed as above. Shots that do add a
new point to the survey just add a row and column to the **R**
matrix. This new row would not have participated in any previous
update, so no special processing needs to be done with it.

Assuming a consistent numbering scheme, two **R** matrices
can be combined by taking one as redundant rows for the other. This
allows parts of a problem to be processed separately, and the
result combined later.

Normal equation solutions can also be updated on the fly, but such Normal Equation update methods are notoriously poor numerically, and lose accuracy rapidly.

QR factorization by Given's rotations makes use of the design matrix in a row oriented manner, with rows of the matrix being accessed in sequential order. The matrix itself is usually sparse, making it an ideal candidate for simple sparse matrix methods. The fill-in pattern of the result is easy to predict since the sparsity pattern of a processed row is the union of the patterns of the two rows being used for the Given's rotation. Since the operations are row operations, this leads to locality of reference, which minimizes paging overhead on computers with paged memory

On computers with paged memory (PCs, most mainframes, workstations, etc.), programs with memory accesses that are localized have less swapping overhead than those with wider ranging access. A matrix algorithm which has mostly row operations, or mostly column operations, can be optimized to take advantage of pattern of memory accesses. The Cholesky factorization used for solving normal equations (or any equivalent method such as Gaussian elimination) mixes both row and column operations and is therefore difficult to optimize. QR factorization can be easily arranged to do exclusively row operations. This allows implementations of QR factorization to have much lower page swapping than normal equations would have when solving the same problem.

Note that, while all redundant shots are incorporated into the
computation of the normal equations, such shots are added to the
the QR factorization as they are processed, so that you have many
possible trade offs. A single redundant shot can be processed
against one linear pass through the existing **R** matrix, with
a memory usage about equal to the number of points (n). Or as many
shots as will fit in memory can be processed, with an additional
memory overhead of the number of shots times the number of
redundant observations. Or any scheme between these can be
used.

The pattern of memory accesses forming the normal equations jumps around and can cause page swapping on machines with paged memory. The QR factorization does not have that property. The step of finding the spanning tree will not exhibit much page swapping unless the survey was done with shots in somewhat random order.

The analysis below is for level nets because they are simple to examine. A three dimensional problem would, of course, have higher operation counts. The comparison between the methods would still be similar since they would all have equivalent amounts of extra work.

In general, computational overhead depends on both the number of
redundant shots and also on the interconnectivity of the network.
The connectivity of the network determines the sparsity of the **
R** matrix, and therefore determines the savings available by
sparse matrix techniques.

The standard figures for the amount of work to solve a least
squares problem, by the two methods, are given in *Matrix
Computations* (Golub and Van Loan, 1996):

Given's rotations 3mn^{2}-n^{3}which is (3m - n )n^{2}Normal equations mn^{2}+n^{3}/3 which is ( m + n/3)n^{2}

this means that when the redundancy (m - n) is is low,
orthogonalization methods have a lower operation
count^{8}.

These counts, however, are only valid for full matrix problems, and the survey problem has structure that both methods can exploit, and which are usually exploited for normal equation solutions. An analysis taking this into account is more complex, but gives more realistic figures. In addition, the overhead for a Given's rotation is higher than that of the basic operation in a Cholesky factorization; therefore that overhead needs to be accounted for also. These counts are usually just a measure of how fast the work increases as a function of problem size. They have implicit constants of work that don't normally appear in the numerical analysis literature, but are needed to work out an honest comparison.

Once **R** is computed, the processing is the same for Normal
Equations and for QR factorization. Therefore that part of the
processing will not need to be considered when comparing the
methods.

For a solution by normal equations, the structure of the problem
allows formation of the **A**^{t}**W****A** matrix
without floating point multiplies or divides, and only floating
point addition. Since multiplies and divides are the operations
that take most of the time in any matrix computations, operation
counts consider them and ignore the floating point addition
operations. That convention will be followed in this analysis. This
means that the normal equation setup can be ignored completely in
this analysis. This leaves only the n^{3}/3 term. It is
difficult to trim this term any further, since prediction of the
effects of the structure of the problem are hard to analyze.

QR factorization has a weighting step that is conceptually three multiplies per shot; but, since it is always a multiply by 1 or -1, these are just data transfers in practice. So, by the reasoning that omitted the setup for normal equations, the setup for QR factorization can be ignored also.

A more exact count of operations for QR factorization is

24rn Compute each Given's rotation (by Moler-Morrison) + 4rn(n-1)/2 Carry out the rotations ================= 24rn +2rn(n-1) = 2(12rn+rn(n-1)) = 2rn(n+11)

These are worst case figures, assuming nothing special is done for sparsity, to match the worst case assumptions made above for the normal equation case. This result is general, and makes no assumptions about the structure of the problem. Without sparse matrix techniques QR factorization wins on computational time if the total redundancy of the survey is less than about 15% or so.

As an example, a survey with 1000 points and 100 redundant shots would be about 333,000,000 operations via Cholesky factorization and about 202,000,000 operations via QR factorization.

Operation counts found in simulations show much better preformance than these figures indicate. A more carefull analysis is needed to see the reasons for this. Taking advantage of the structure at hand (where each shot has just a point it comes from, and a point it goes to) can tighten up the analysis of operation counts.

The leading entries of the row are processed by just giving **
R**'s element the scale factor computed by the rotation, and the
redundant row's element zero. The remaing entries break down to
three cases:

- neither entry is non-zero.
- only one of the two elements is non-zero.
- both of the elements are non-zero.

In case one, both rows get a zero, and no floating point operations are needed. In case two, one of the elements is zero, so there are only 2 multiplications that need be done to update the elements, and both rows end up with a non-zero element. In case three, a full 4 floating point multiplications need be done.

Starting with a problem that has at most 2 non-zero elements per row, the maximum work needed to compute an adjustment (using sparse matrix techniques) is straight forward to compute. If the redundancy is greater than n-1 shots, the analysis is that given above. If the redundancy is less than n-1, then processing the Kth redundant shot has an overhead of n Given's rotations, and 2 operations for each non-zero element in each row that doesn't have a non-zero element in the other row, and 4 operations for each non-zero that appears in the same position of both rows. This is a total of 24n+2(n-k-1)+4(kn-(k-1)k/2) operations.. This simplifies to 26n+4kn-2kk-2 operations to add in the Kth shot. Note that each shot added can, in worst case, take more time than the last. This is because as each shot is added, in can cause zero items in the matrix to get filled in, causing later shots to have more items to process. In order to add in r shots, the work needed to add each shot from 1 to r needs to be summed. This gives us a total count of work that is: 26nr+2nr(r-1)-r(r+1)(2r+1)/3-2r which simplifies to r(2n(12+r)-(r+1)(2r+1)/3-2) This is assuming worst case fill-in, actual operation counts can be much less depending on the structure of the survey.

For the example given earlier (n=1000, r=100) this gives worst
case figures to compute **R** of:

Cholesky (non-sparse) about 333,000,000 QR (non-sparse) about 202,000,000 QR (sparse) about 1,560,000

I couldn't find a worst case analysis of a sparse Cholesky
factorization on problems of this structure, so I can't properly
compare what it would be^{5}.

Worst case figures can, of course, be much different than any
typical cases. A test was run of 25 sample randomly generated
surveys with 1000 points and 100 redundant shots, with
instrumented code for each method. The average number of floating
point multiplies and divides to compute **R** were:

Cholesky (non-sparse) averaged about 167,000,000 QR (non-sparse) averaged about 1,570,000 QR (sparse) averaged about 417,000

Note that the actual Cholesky figure is about half of what the published figures for it would indicate, and all methods (of course) did better than their worst case figures.

There are many techniques that can be used with QR factorizations that are not covered in this paper but that are worth mentioning. The collection of redundant shots can, themselves, be simplified by QR techniques to lower operation counts dramatically. For example, two shots between the same two points can be combined into one equivalent weighted shot with a single Given's rotation. Sorting the input shots can also have an impact on the final operation count.

Orthogonalization methods are well known in the numerical analysis community for their numerical stability. Conversely, normal equation methods are known for their lack of numerical stability.

QR factorizations can make very good use of the sparsity of the problem.

If the ratio of redundant shots to non-redundant shots is low, QR factorization has lower operation counts in all cases. When sparse matrix techniques are used the count can be dramatically lower.

With the permission of Dr. Charles D. Ghilani, I will use example 11.1 from "Adjustment Computations" (Wolf and Ghilani, 1997). This problem has a high percentage of redundant shots, so it has a fairly high operation count. However, since a detailed solution by traditional methods is available, it makes comparison of the methods easier. Refer to that book for details of solving the same problem via normal equations in the traditional manner.

The problem as given:

From To Elevation change standard deviation A B 10.509 0.006 B C 5.360 0.004 C D -8.523 0.005 D A -7.348 0.003 B D -3.167 0.004 A C 15.881 0.012 A -- B | \/ | | /\ | D -- C

Wolf and Ghilani do some setup to avoid adjusting the control point, and come up with the system (ignoring residuals for the moment) of:

[ 1 0 0 ] [ 558.105 ] [ -1 1 0 ] [ B ] [ 5.360 ] [ 0 -1 1 ] [ C ] = [ -8.523 ] [ 0 0 -1 ] [ D ] [ -444.944 ] [ -1 0 1 ] [ -3.167 ] [ 0 1 0 ] [ 453.477 ]

Subject to the weight matrix:

[ 1/(0.006^{2}) 0 0 0 0 0 ] [ 0 1/(0.004^{2}) 0 0 0 0 ] [ 0 0 1/(0.005^{2}) 0 0 0 ] [ 0 0 0 1/(0.003^{2}) 0 0 ] [ 0 0 0 0 1/(0.004^{2}) 0 ] [ 0 0 0 0 0 1/(0.012^{2}) ]

The first step is to weight the problem. For that we need a **
W**' matrix such that **W**'^{t}**W**' = **W**.
Since this is a diagonal matrix, this is trivially the matrix where
each diagonal element is the square root of that in the diagonal
matrix:

[ 1/0.006 0 0 0 0 0 ] [ 0 1/0.004 0 0 0 0 ] [ 0 0 1/0.005 0 0 0 ] [ 0 0 0 1/0.003 0 0 ] [ 0 0 0 0 1/0.004 0 ] [ 0 0 0 0 0 1/0.012 ]

which is:

[ 166.667 0 0 0 0 0 ] [ 0 250.000 0 0 0 0 ] [ 0 0 200.000 0 0 0 ] [ 0 0 0 333.333 0 0 ] [ 0 0 0 0 250.000 0 ] [ 0 0 0 0 0 83.333 ]

Premultiplying by **W'** (The same, in the case of level
nets, as multiplying each row by the inverse of the associated
standard deviation) we get:

[ 166.667 0 0 ] [ 74684.167 ] [ -250.000 250.000 0 ] [ B ] [ 1340.000 ] [ 0 -200.000 200.000 ] [ C ] = [ -1704.600 ] [ 0 0 -333.333 ] [ D ] [ -148314.667 ] [ -250.000 0 250.000 ] [ -791.750 ] [ 0 200.000 0 ] [ 37789.599 ]

This matrix is already in lower triangular form. So, to put it into the upper triangular form usually seen in numerical analysis texts, we need only renumber the points (reversing them) and in the first n rows we renumber the shots (reversing them):

[ 200.000 -200.000 0 ] [ D ] [ -1704.600 ] [ 0 250.000 -250.000 ] [ C ] = [ 1340.000 ] [ 0 0 166.667 ] [ B ] [ 74684.167 ] [ ] [ ] [ -333.000 0 0 ] [ -148314.667 ] [ 250.000 0 -250.000 ] [ -791.750 ] [ 0 200.000 0 ] [ 37789.599 ] The upper part is the spanning tree, and the lower part the redundant shots.

There is no need to store the redundant shots in expanded rows. Until each shot needs to be processed they can be kept as just a from point, a to point, a value, and a standard deviation.

Starting with the first redundant row, the first element of that
row is non-zero, so we need to process it against the first row of
**R**. The leading entries are 200.000 and -333.000, so the
Given's rotation is:

Scale = 388.730 (Sqrt (200^{2}+(-300)^{2})) C = 200.000 / Scale = 0.514 S = -333.000 / Scale = -0.857 [ 0.514 -0.857 ] [ Row fromR] = [ 388.730 -102.899 0 ] [ 126301.768 ] [ 0.857 0.514 ] [ Redundant Row ] [ 0 -171.499 0.000 ] [ -77768.949 ]

The second element is now the first non-zero element, so we now
process it against the second row of **R**.

[ 0.825 -0.566 ] [ Row fromR] = [ 0 303.170 -206.155] [ 45097.753 ] [ 0.566 0.825 ] [ Redundant Row ] [ 0 0 -141.421] [ -63371.900 ]

The third element is the first non-zero element, so we now
process it against the third row of **R**

[ 0.762 -0.647 ] [ Row fromR] = [ 0 0 218.581 ] [ 97947.549 ] [ 0.647 0.762 ] [ Redundant Row ] [ 0 0 0 ] [ -0.216 ]

The result after processing the first redundant row is:

[ 388.730 -102.899 0 ] [ D ] [ 126301.768 ] [ 0 303.170 -206.155 ] [ C ] = [ 45097.753 ] [ 0 0 218.581 ] [ B ] [ 97947.549 ] [ ] [ ] [ 0 0 0 ] [ -0.216 ] [ 250.000 0 -250.000 ] [ -791.750 ] [ 0 200.000 0 ] [ 37789.599 ]

Note that the processed shot does not need to be stored any longer, since we know all elements of it are zero.

The second redundant shot is processed in a similar manner, giving:

[ 462.181 -86.546 -135.228 ] [ D ] [ 105801.372 ] [ 0 308.237 -240.736 ] [ C ] [ 31899.621 ] [ 0 0 276.654 ] [ B ] [ 123970.872 ] [ ] [ ] [ 0 0 0 ] [ -0.216 ] [ 0 0 0 ] [ -0.809 ] [ 0 200.000 0 ] [ 37789.599 ]

The third redundant row is now processed. Note that, since its first item is zero, it doesn't have to process row one at all.

[ 462.181 -86.546 -135.228 ] [ D ] [ 105801.372 ] [ 0 319.303 -232.392 ] [ C ] = [ 40656.640 ] [ 0 0 283.698 ] [ B ] [ 127127.754 ] [ ] [ ] [ 0 0 0 ] [ -0.216 ] [ 0 0 0 ] [ -0.809 ] [ 0 0 0 ] [ 0.755 ]

The upper triangular matrix above is the same as the factor that
the Cholesky factorization would have produced. Traditional methods
could now be applied to the computed **R**, and we would have
the same overhead as the those methods. Note, however, that the
first three lines above are a triangular system that can be solved
directly to produce the values for B, C, and D.

B = 448.10871 C = 453.46847 D = 444.94361

This result can be computed solving just one triangular system,
without computing **A**^{t}**W****L** that
traditional methods use on the right hand side of the equation.
This simplification was used neither in the computation of
operation counts, nor in the comparison of the methods. Computing
the inverse matrix, needed for the statistics, will still require
both the forward and back substitutions just as it does in
traditional adjustments.

Each redundant processed shot still has a non-zero right hand side.
These values are a set of residuals of the problem. These are **not**
the residuals of the actual shots that survey books are referring
to when they talk of residuals, but are instead the residuals of
the adjustment. Since the problem was weighted before we started
the QR factorization, these are also weighted residuals. The sum of
the squares of these residuals **is** the value that would be
computed if the shot residuals (**V**) were computed and **
V**^{t}**W****V** were evaluated (1.27 in this
case). The reference standard deviation can be computed from this
in the normal manner.

There are notable differences in numerical stability between normal equation and QR factorization methods. Consider the rather contrived example of

From To Value Std Dev A 1.0 0.0001 (Control point) A B 1.0 0.0001 B C 1.0 0.0001 B C 1.0 0.0001

and we restrict our arithmetic to about 8 significant figures (approximately IEEE single precision) the problem is easily solved by either method, and one gets what one would expect:

A = 1.0 B = 2.0 C = 3.0 with a residual of zero.

If we change the problem slightly to

A 1.0 0.0001 A B 1.0 0.1 (The standard deviation here has changed.) B C 1.0 0.0001 B C 1.0 0.0001

and continue to restrict our arithmetic to about eight significant figures, there are now differences in the answer that normal equations generate, while QR factorization result is essentially unchanged. In theory both problems (with perfect arithmetic) should generate exactly the same answer, since the badly weighted shot is not in any loop, and the loop itself closes perfectly.. So why are Normal Equations so much worse in cases like this??

What changed is that the Cholesky factor computed by the traditional method (with only 8 digit arithmetic) is now

[ 14142.136 -14142.136 0.000 ] [ 9.798 -10.206 ] [ 10000.000 ]

instead of the correct

[ 14142.136 -14142.136 0.000 ] [ 10.000 -10.000 ] [ 10000.000 ]

(Note that the rows and columns here have been reordered here to
match the earlier presentation of QR factorization.) Examining the
center element shows that factoring the normal equation has left it
with only a little more than one significant figure. This is
because forming the normal equations involves adding and
subtracting the squares of the standard deviations, while QR
factorization is dealing with them directly. Since squaring the
numbers increases the differences between them, the chance of
losing significance when adding the squares is dramaticly greater
with normal equations. This loss of significance happens in the
step of *forming* the normal equation matrix, therefore any method of
solving the problem with Normal Equations *must* inevitably have
the same difficulties.

Another important reason that normal equations do worse for this example is that it mixes some quantities that must later cancel out. If precision is lost they no longer cancel. QR factorization avoids some of this mixing to begin with, particularly when Given's rotations are used for the factorization.

On the other hand, the QR factorization has retained almost all significant figures in all of the elements of the array. In fact, QR factorization is still producing the correct results to seven significant figures when the problem is badly weighted to

From To Value Std. Dev. A 1.0 0.0001 A B 1.0 100000000000000000.0 B C 1.0 0.0001 B C 1.0 0.0001

The stability problem with normal equations, of course, depends
on the underlying arithmetic. If we switch from single precision to
double precision we don't see problems this bad with Normal Equations until
the weight for the second shot is raised to about 1000.0, but the QR
factorization also becomes even more stable and still produces the
correct answer with a weight of 10^{60}. Admittedly these
sorts of weightings are not realistic, but the subtle precision
problems that can be caused are hard to show in a small
example.

I would like to thank Dr. Charles D. Ghilani, since without his encouragement and time this paper wouldn't have gotten off the ground. I would also like to give my heart felt thanks to Olly Betts, whose unique ability to spot problems and suggest improvements has contributed greatly to the accuracy and readability of this paper.

George, Alan and Joseph, W-H Liu 1981, "*Computer Solution of
Large Sparse Positive Definite Systems*" Prentice Hall, Inc.

Ghilani, Charles D. 1990, "*A Surveyor's Guide to Practical
Least-Squares Adjustments*" Surveying and Land Information
Systems, Vol. 50, No. 4, 1999, pp 287-297.

Golub, Gene H. and Van Loan, Charles F. 1996, "*Matrix
Computations"* third edition, Johns Hopkins University Press.

Lawson, Charles L. and Hanson, Richard J. 1974, *"Solving Least
Squares Problems"* Prentice-Hall.

Moler, Cleve and Morrison, Donald 1983, "*Replacing Square
Roots by Pythagorean Sums*", IBM Journal of Research and
Development, Volume 27, Number 6, November 1983.

Pissanetzky, Sergio 1984, "*Sparse Matrix Technology*",
Academic Press

Ralston, Anthony and Rabinowitz, Phillip 1978, "*A First Course
in Numerical Analysis*", McGraw-Hill.

Schneick, J. T. 1997, "*Linear Algebra with Applications*",
McGraw Hill.

Strang, Gilbert 1986, "*Introduction to Applied
Mathematics*", Wellesley-Cambridge Press.

Wolf, Paul and Ghilani, Charles 1997, "*Adjustment
Computations, Statistics and Least Squares in Surveying and
GIS*", Wiley Interscience.

The paper above is © Copyright 2001 by the American Congress on Surveying and Mapping, All Rights Reserved.

Everything from here on down is © Copyright 2002-2007 by John Halleck, All Rights Reserved.

16oct2001. After the paper was published, I encountered a paper
that really should have been referenced, but that I had not known
about: Golub, Gene H. and Plemmons, Robert J. *"Large-Scale
Geodetic Least-Squares Adjustment by Dissection and Orthogonal
Decomposition"* in the journal "Linear Algebra and its
Applications" Volume 34, December 1980. This paper covers the
subject matter (from the math point of view) in depth, and the
special topics issue it is in contains several other relevant
articles, most notably Alan George and Michael T Heath's "*Solution of
Sparse Linear Least Squares Problems Using Givens Rotations*".

3nov2004: That special issue was reprinted in 1981 by the
publishing company North Holland as: "*Large Scale Matrix Problems*",
Ake Bjorck, Robert J. Plemmons, and Hans Schneider editors.

6nov2001. Everywhere else I converted to just counting multiplications and divisions, but I somehow left the full operation count for computing the pythagorean sums. (And, in fact, left it that way in the code that checked operation counts) So, for what it is worth, the counts for QR would have come out even better if I had been consistent.

7feb2002. Several people have asked "WHY doesn't multiplying by
an orthgonal matrix change the solution?". (And alternately, "why
does multiplying by a non-orthogonal matrix change the solution")
It is probably worth a quick comment.
If we have the system **A****x**=**L**, then what least squares
solution is minimizing is the square of the length of the vector
**A****x**-**L**. So the left hand side looks like:

(Ax-L)^{t}(Ax-L)

Now, if we multipy both sides by an orthogonal matrix **G** then we
get the system **GA****x**-**GL** and the least squares
problem (by Normal Equations) has a left hand side of

(GAx-GL)^{t}(GAx-GL)

Since we know Orthogonal matrices are non-singular, we can factor G out of both terms...

(G[Ax-L])^{t}G(Ax-L)

Noting that the transpose of a product is the product of the transposes in reverse order, we get:

(Ax-L)^{t}G^{t}G(Ax-L)

This is exactly what the Normal Equation solution started with, except
that we have **G**^{t}**G** as a center term. If **G** is
orthogonal, then **G**^{t}**G** is the identity (by
the definition of orthogonal) and identity changes nothing and can be removed.
If **G** is not orthogonal, then the solution changes by an amount related
to **G**^{t}**G**, and is therefore not the same as the
original

22jul2002. The matrix that just changes the sign of a row is orthogonal,
so in **Q****R** it can be part of the **Q**. The restriction that
diagonal elements of **R** be positive makes the QR factorization totally
unique. The comment "... up to the signs of the rows ..." is there because
the proof below it only really established that, and starting the problem
with the matrix in spanning tree order could start with positive elements
trivially anyway, so nothing more needed to be established.

Had the paper made the direct restriction to positive diagonal elements
in **R**, and included the prior paragraph's information, I could have
done away with that restriction and comment. But it didn't, and still
doesn't, seem worth it.

2003nov18: "I couldn't find a worst case analysis of a sparse Cholesky
factorization on problems of this structure". To be honest, I can't
find a sparse version of the Cholesky in the literature **at all**. (If you
know of one, I'd be interested in being pointed at
it^{a}.)

What the Geodesy Normal Equation folk commonly do is renumber the problem to make it narrow band, and then do a Cholesky confined to that band. This gains them most of the advantages of a sparse Cholesky factorization, without having to face the grimness of the actual sparse Cholesky problem. People processing Large datasets for Geodesy usually make use of matrix partitioning to decrease the work load.

If there was a good fast sparse Cholesky routine, and one ignored all of the overhead it had, and you had effectively infinite memory so you didn't need to worry about locality of reference and page swapping issues, the sparse Cholesky would reduce Normal Equations's operation counts for this step to be within roughly the same range as QR factorization. But Normal equations would still require you to solve two triangular systems afterwards, where QR only requires the solution of one. AND, Normal equations would still have dramaticly less numerical stability.

2007sep27 Tim Davis, of the University of Florida, wrote to inform me that
there *is* a lot of literature on Sparse Cholesky, and he recommended the
the sparse Cholesky in MATLAB as a good example. It is online at:

http://www.cise.ufl.edu/research/sparse/cholmod
His book
http://www.cise.ufl.edu/research/sparse/CSparse/
seems to be a good starting place for those interested.

2007sep27 Most non-Geodesy Survey problems have relatively few redundant shots.

2007oct16 Actually, to form the Upper triangular matrix the point order has to be the REVERSE of a spanning tree order. Just carelessness on my part. I'm surprised nobody caught this. More proof that nobody reads this.

2007oct16 Actually the figures for Orthogonalization methods are not as bad as this in real life because dense surveys have exploitable structure. As an extreme (unrealistic) example, once a survey is fully connected (unrealistic in practice) a new row can be added with only order n operations.

This page is http://www.cc.utah.edu/~nahaj/survey/QRintro.html © Copyright 2001 by John Halleck, All Rights Reserved. This page was last modified on October 18^{th}, 2007.