In some calibration strategies we encounter a sphere fitting problem: we need to find the map that does the best job sending the observed data to a unit sphere. This simple problem statement leaves two critical questions unanswered.
I’ll outline a wealth of options we have in answering these questions in later posts. This article will specify reasonable answers to these questions, use those answers to pose a clear problem, then describe an algorithm to solve the problem.
The naïve algorithm presented here is easy to implement but requires too much memory for many micro-controller applications. Some simple refinements give us a second algorithm that still requires space linear in the number of samples, but with a constant small enough to make it usable on an Arduino. Finally we observe that we can collect 25 summary statistics of our data and have all the information we need to carry out the Gauss-Newton algorithm, giving us a constant space, constant time streaming algorithm. These algorithms will be described in detail in later articles.
Although the focus here is on sphere fitting, the methods described, including the constant-space streaming algorithm, have straightforward (if tedious) extensions to very general polynomial mapping problems.
First, a bit about notation.
We denote real 3-vectors with boldface lower-case Latin letters and label the axis with the integers . So . We assume that we are given a set of 3-vectors .
We work with a real vector space of parameters — 6-dimensional in this application — and will denote these parameter vectors with Greek letters that have an overbar. For example are parameter vectors.
We also work with -vectors where is the number of samples. These are denoted by boldface lower-case Latin letters with an arrow above. The residual vector is one example we will encounter soon.
Matrices of any dimension will be denoted by boldface upper-case Latin letters and transpose is denoted by , so is a matrix and is its transpose.
The Latin letter will be used to index the samples and other sets of size . The letter and will index the dimensions of space: .
2. The Problem
To answer Question 1, we restrict our attention to a 6-parameter family of maps that allows each axis to be shifted and scaled independently.
Definition 1 (Axial Affine Transformation) Given a 3-vector and parameters , define the axial affine transformation of for parameters to be
Three comments are in order here.
- This is a special case of a general affine transformationon , we have just ruled out shear. The techniques described below extend in a straightforward way to handle general affine transformations, but are more involved and more difficult to follow. Also when calibrating MEMS sensors in practice the axial parameters are far more important than the shear parameters. It is not clear that modeling shear is worthwhile, but this needs more investigation.
- At this point it would look nicer to name the parameters like we did the first time we looked at the Gauss-Newton method: for the offsets and for the sensitivities. In the calculations that follow, we study functions on the six-dimensional space of parameters and want to treat them uniformly. Special names for the parameters actually make those calculations more confusing (for me anyway, YMMV). We store them as one array in the code too, so it is best to get used to that early.
- You may wonder why we divide by instead of multiply. In theory, both choices should work equally well. In practice, samples coming from a magnetometer will generally give us offset values on the order of 10-1000 and sensitivities on the order of 100-1000. If we multiplied our parameter instead of dividing it, we’d have to work with the inverse of the sensitivity, which will be on the order of 0.01 – 0.001. So we may have parameters differing by 6 orders of magnitude. In the intermediate matrices we’ll see in the Gauss-Newton calculations, this can lead to matrix entries differing by 12 orders of magnitude creating “nearly singular” matrices that can’t be handled with 32-bit entries. I know this from experience, I did this my first time through and saw data come along and break it. Choosing parameters that are about the same order of magnitude leads to stable, reliable performance.
Now we address Question 2: how can we decide when one map is better than another? We are given a set of observations trying to find an axial affine transformation that takes all of these onto the unit sphere. If the map does this perfectly, then for all we will have
If , we cannot hope to make this always happen but if our model is reasonable we can hope to get close. If the map is good, then should be small. So we define
Definition 2 (Residual) For sample and parameters , the residual of for is
The residual vector is the -vector
This measures the error for a single sample, but we need to combine all of the residuals to get one number representing the “total error”. We do this by taking the sum of square residuals.
We can now state our problem formally.
Problem 3 Given observations , find parameters that minimize . That is, find
Alarms should be going off when you see this definition. Two apparently arbitrary decisions have been made that could have an important impact on our final results.
- Why did we use instead of or or something else for the residual?
- Why choose the sum of squares of residuals instead of absolute values or -th powers or some other function to combine residuals?
These decisions were made for computational convenience. Using the square of the magnitude in the residual makes the residual a polynomial of , a critical fact when we develop a streaming Gauss-Newton algorithm for sphere fitting. Using the sum of squares of residuals is what allows us to use the Gauss-Newton algorithmin the first place. Making these decisions differently may lead to better (or worse) calibration results in practice — I don’t know and I would like to study it — but they will require different algorithms.
3. A Quick Recap of the Gauss-Newton Algorithm
Problem 1 can be solved using the Gauss-Newton algorithm. What we need to know about this method is
- The residual vector is a function .
- The Jacobian matrix of this map lets us locally approximate with a linear function:
- The first derivatives of are then
- The second derivative Hessian matrix of can be approximated by . These two facts follow from the form of as a sum of square residuals.
- We can use and to approximate with a its second-order Taylor expansion. This is a quadratic function and we can find its minimum by solving the six-dimensional matrix equation
So we can improve our parameters by replacing with .
- This gives us the Gauss-Newton algorithm:
4. A Naïve Implementation
The iterative step of the Gauss-Newton consists of
- computing the matrix and the vector for observations and parameters ,
- solving Equation 2, and
- updating then deciding whether to continue.
float r[N]; //residual vector float J[N]; //Jacobian matrix int16_t x[N] //the observations float JtJ; //The left-hand side of equation 2. It is symmetric so //we only need to store 21 of the 36 matrix entires. float JtR; //The right-hand side of equation 2.
I won’t write the code out for the implementation here because these declarations are enough to tell us that we are in trouble if we are implementing the algorithm on a micro-controller. These arrays require bytes of storage. If we take 10 samples we will be fine, but if we take 100 samples we need 3508 bytes. That will lead to some very undefined behavior on my Arduino Uno.
This will work for small samples, and the implementation is straightforward: loop over the observations and use Equations 3 and 4 to fill in the matrix entries. Then perform the needed matrix multiplications, solve the linear equation, update , and decide whether to continue. I will leave this as a warm-up exercise for the interested reader.
In the next post we will start writing some real code.