Implementing the Gauss-Newton Algorithm for Sphere Fitting (3 of 3)

The first post in this series made it clear that the major computational and storage costs to implementing the Gauss-Newton algorithm come in computing the matrices {\mathbf{J}_{\bar{\beta}}^{\top}\mathbf{J}_{\bar{\beta}}} and {\mathbf{J}_{\bar{\beta}}^{\top}\vec{\mathbf{r}}}. By carefully avoiding materialization of the Jacobian matrix and the residual vector, the second post gave us a viable sphere fitting algorithm that can be used to calibrate common MEMS accelerometers (e.g. ADXL335, ADXL345) and magnetometers (e.g. HMC5843). This algorithm still requires multiple passes over the observation data, and that is only possible if all of those observations are stored somewhere; its space complexity is {O(N)} where {N} is the number of observations.

In this post we will see how to eliminate that requirement and perform the Gauss-Newton algorithm in a one-pass streaming algorithm with space complexity effectively {O(1)}.

1. The heart of the matter

Notice that we can rewrite the {0,0} entry of {\mathbf{J}_{\bar{\beta}}^{\top}\mathbf{J}_{\bar{\beta}}} as follows

\displaystyle \begin{array}{rcl} [\mathbf{J}_{\bar{\beta}}^{\top}\mathbf{J}_{\bar{\beta}}]_{00} &=& \sum_{i=0}^{N-1} 4\frac{(x_{i0} - \beta_{0})^2}{\beta_{3}^4} \\ &=& \frac{4}{\beta_{3}^4}\left[ \sum_{i=0}^{N-1} x_{i0}^2 - 2\beta_0\sum_{i=0}^{N-1}x_{i0} + \sum_{i=0}^{N-1}\beta_0^2\right] \\ &=& \frac{4}{\beta_3^4}\sum_{i=0}^{N-1}x_{i0}^2 - \frac{8\beta_0}{\beta_3^4}\sum_{i=0}^{N-1}x_{i0} + \frac{4\beta_0^2}{\beta_3^4}N \end{array}

So if we know {\sum_{i=0}^{N-1}x_{i0}^2}, {\sum_{i=0}^{N-1}x_{i0}}, and {N}, then we can compute the matrix entry for arbitrary {\bar{\beta}}. We can scan the data once, compute these three numbers and then just use the formula above to compute the matrix entry while iterating through the Gauss-Newton algorithm. Once these numbers are computed, there is never a need to look at the data again.

A similar situation arises for all og the matrix entries. Before proceeding, let’s introduce some new notation that will make the sequel easier to follow, easier to code, and easier to generalize.

2. Notation

To think clearly about the calculations here, we need to stop thinking about 3-vectors and start thinking about {N}-vectors. We will build on the notation introduced in post 1 of this series and define three new {N}-vectors. For an axis {j \in \{0,1,2\}} define

\displaystyle \vec{\mathbf{x}}_{\bullet j} = \left[ \begin{array}{c} x_{0j} \\ x_{1j} \\ \vdots \\ x_{(N-1)j}\end{array} \right]

This is the {N}-vector of all observations for that axis. Next define

\displaystyle \vec{\mathbf{x^2}}_{\bullet j} = \left[ \begin{array}{c} x_{0j}^2 \\ x_{1j}^2 \\ \vdots \\ x_{(N-1)j}^2\end{array} \right]

This is the {N}-vector of the squares of the observations along this axis. Finally, it will be convenient to use a vector of all ones:

\displaystyle \vec{\mathbf{1}} = \left[ \begin{array}{c}1 \\ 1 \\ \vdots \\ 1 \end{array} \right]

For any pair of {N}-vectors {\vec{\mathbf{x}}}, {\vec{\mathbf{y}}}, we define the inner product as usual

\displaystyle \langle \vec{\mathbf{x}}, \vec{\mathbf{y}} \rangle = \sum_{i=0}^{N-1} x_i y_i

With this notation, we can rewrite the formula in the last section as

\displaystyle \left[\mathbf{J}_{\bar{\beta}}^{\top}\mathbf{J}_{\bar{\beta}}\right]_{00} = \frac{4}{\beta_3^4}\left( \langle \vec{\mathbf{x}}_{\bullet 0}, \vec{\mathbf{x}}_{\bullet 0} \rangle - 2\beta_0\langle \vec{\mathbf{x}}_{\bullet 0}, \vec{\mathbf{1}} \rangle +\beta_0^2 N\right)

3. Rewriting the matrix entry formulas

We will rewrite each of the formulas derived in post 2 in terms of these {N}-vectors. It is a worthwhile exercise to verify these.

For {j,k < 3}

\displaystyle \begin{array}{rcl} \left[\mathbf{J}_{\bar{\beta}}^{\top}\mathbf{J}_{\bar{\beta}}\right]_{jk} &=& \left[\mathbf{J}_{\bar{\beta}}^{\top}\mathbf{J}_{\bar{\beta}}\right]_{kj} = \sum_{i=0}^{N-1} 4\frac{(x_{ij} - \beta_{j})(x_{ik} - \beta_{k})}{\beta_{3+j}^2\beta_{3+k}^2} \\ &=& \frac{4}{\beta_{3+j}^2\beta_{3+k}^2}\left( \langle \vec{\mathbf{x}}_{\bullet j}, \vec{\mathbf{x}}_{\bullet k} \rangle - \beta_j\langle \vec{\mathbf{x}}_{\bullet k}, \vec{\mathbf{1}} \rangle - \beta_k \langle\vec{\mathbf{x}}_{\bullet j}, \vec{\mathbf{1}} \rangle + \beta_j\beta_k N \right) \end{array}

for {j < 3 \leq k}

\displaystyle \begin{array}{rcl} \left[\mathbf{J}_{\bar{\beta}}^{\top}\mathbf{J}_{\bar{\beta}}\right]_{jk} &=& \left[\mathbf{J}_{\bar{\beta}}^{\top}\mathbf{J}_{\bar{\beta}}\right]_{kj} \\ &=& \sum_{i=0}^{N-1} 4\frac{(x_{ij} - \beta_{j})(x_{i(k-3)} - \beta_{(k-3)})^2}{\beta_{3+j}^2\beta_{k}^3} \\ &=& \frac{4}{\beta_{3+j}^2\beta_{k}^3}\left( \langle \vec{\mathbf{x}}_{\bullet j}, \vec{\mathbf{x^2}}_{\bullet (k-3)} \rangle -2 \beta_{(k-3)}\langle \vec{\mathbf{x}}_{\bullet j},\vec{\mathbf{x}}_{\bullet (k-3)} \rangle + \beta_{(k-3)}^2 \langle\vec{\mathbf{x}}_{\bullet j}, \vec{\mathbf{1}} \rangle \right. \\ & & \left. \qquad - \beta_j\langle \vec{\mathbf{x}}_{\bullet (k-3)}, \vec{\mathbf{x}}_{\bullet (k-3)}\rangle + 2\beta_j\beta_{(k-3)}\langle \vec{\mathbf{x}}_{\bullet (k-3)}, \vec{\mathbf{1}}\rangle - \beta_j\beta_{(k-3)}^2 N \right) \end{array}

and for { 3 \leq j,k}

\displaystyle \begin{array}{rcl} \left[\mathbf{J}_{\bar{\beta}}^{\top}\mathbf{J}_{\bar{\beta}}\right]_{jk} &=& \left[\mathbf{J}_{\bar{\beta}}^{\top}\mathbf{J}_{\bar{\beta}}\right]_{kj} \\ &=& \sum_{i=0}^{N-1} 4\frac{(x_{i(j-3)} - \beta_{(j-3)})^2(x_{i(k-3)} - \beta_{(k-3)})^2}{\beta_{j}^3\beta_{k}^3} \\ &=& \frac{4}{\beta_{j}^3\beta_{k}^3}\left( \langle \vec{\mathbf{x^2}}_{\bullet ( j-3)}, \vec{\mathbf{x^2}}_{\bullet (k-3)} \rangle -2 \beta_{(k-3)}\langle \vec{\mathbf{x^2}}_{\bullet (j-3)},\vec{\mathbf{x}}_{\bullet (k-3)} \rangle \right. \\ && \qquad+ \beta_{(k-3)}^2 \langle\vec{\mathbf{x^2}}_{\bullet (j-3)}, \vec{\mathbf{1}} \rangle -2\beta_{(j-3)}\langle \vec{\mathbf{x}}_{\bullet (j-3)}, \vec{\mathbf{x^2}}_{\bullet (k-3)} \rangle \\ & & \qquad + 4\beta_{(j-3)}\beta_{(k-3)}\langle \vec{\mathbf{x}}_{\bullet (j -3)},\vec{\mathbf{x}}_{\bullet (k-3)} \rangle - 2\beta_{(j-3)}\beta_{(k-3)}^2 \langle \vec{\mathbf{x}}_{\bullet (j-3)}, \vec{\mathbf{1}}\rangle \\ & & \left. \qquad + \beta_{(j-3)}^2 \langle\vec{\mathbf{1}}, \vec{\mathbf{x^2}}_{\bullet (k-3)} \rangle - 2\beta_{(j-3)}^2\beta_{(k-3)}\langle \vec{\mathbf{1}}, \vec{\mathbf{x}}_{\bullet (k-3)}\rangle - \beta_{(j-3)}^2\beta_{(k-3)}^2 N \right) \end{array}

To we write similar expressions for the entries of {\mathbf{J}_{\bar{\beta}}^{\top}\vec{\mathbf{r}}}, note that

\displaystyle \vec{\mathbf{r}} = \vec{\mathbf{1}} - \sum_{j=0,1,2}\frac{1}{\beta_{3+j}^2}(\vec{\mathbf{x^2}}_{\bullet j} - 2\beta_j\vec{\mathbf{x}}_{\bullet k} + \beta_j^2\vec{\mathbf{1}})

So for {j < 3}

\displaystyle \begin{array}{rcl} \left[\mathbf{J}_{\bar{\beta}}^{\top}\vec{\mathbf{r}}\right]_{j} &=& \sum_{i=0}^{N-1}2\frac{x_{ij} - \beta_{j}}{\beta_{3+j}^2} \left( 1 - \sum_{k=0,1,2} \frac{(x_{ik} - \beta_{k})^2}{\beta_{3+k}^2} \right) \\ &=& \frac{2}{\beta_{3+j}^2} \left(\langle \vec{\mathbf{x}}_{\bullet j}, \vec{\mathbf{1}} \rangle - \sum_{k=1,2,3} \left(\langle \vec{\mathbf{x}}_{\bullet j}, \vec{\mathbf{x^2}}_{\bullet k} \rangle - 2\beta_k\langle \vec{\mathbf{x}}_{\bullet j}, \vec{\mathbf{x}}_{\bullet k} \rangle + \beta_k^2 \langle \vec{\mathbf{x}}_{\bullet j}, \vec{\mathbf{1}} \rangle\right) \right. \\ && \qquad \left. - \beta_j N + \beta_j\sum_{k=1,2,3} \left(\langle \vec{\mathbf{1}}, \vec{\mathbf{x^2}}_{\bullet k} \rangle - 2\beta_k\langle \vec{\mathbf{1}}, \vec{\mathbf{x}}_{\bullet k} \rangle + \beta_k^2 N\right) \right) \end{array}

and for {j \geq 3}

\displaystyle \begin{array}{rcl} \left[\mathbf{J}_{\bar{\beta}}^{\top}\vec{\mathbf{r}}\right]_{j} &=& \sum_{i=0}^{N-1}2\frac{(x_{i(j-3)} - \beta_{j-3})^2}{\beta_{j}^3} \left( 1 - \sum_{k=0,1,2} \frac{(x_{ik} - \beta_{k})^2}{\beta_{3+k}^2} \right) \\ &=& \frac{2}{\beta_{j}^3} \left(\langle \vec{\mathbf{x^2}}_{\bullet (j-3)}, \vec{\mathbf{1}} \rangle - \sum_{k=1,2,3} \left(\langle \vec{\mathbf{x^2}}_{\bullet (j -3)}, \vec{\mathbf{x^2}}_{\bullet k} \rangle - 2\beta_k\langle \vec{\mathbf{x^2}}_{\bullet (j-3)}, \vec{\mathbf{x}}_{\bullet k} \rangle + \beta_k^2 \langle \vec{\mathbf{x^2}}_{\bullet (j-3)}, \vec{\mathbf{1}} \rangle\right) \right. \\ && \qquad -2\beta_{(j-3)}\langle \vec{\mathbf{x}}_{\bullet (j-3)}, \vec{\mathbf{1}} \rangle + 2\beta_{(j-3)} \sum_{k=1,2,3} \left(\langle \vec{\mathbf{x}}_{\bullet (j-3)}, \vec{\mathbf{x^2}}_{\bullet k} \rangle - 2\beta_k\langle \vec{\mathbf{x}}_{\bullet (j-3)}, \vec{\mathbf{x}}_{\bullet k} \rangle + \beta_k^2 \langle \vec{\mathbf{x}}_{\bullet (j-3)}, \vec{\mathbf{1}} \rangle\right) \\ && \qquad \left. + \beta_{(j-3)}^2 N - \beta_{(j-3)}^2\sum_{k=1,2,3} \left(\langle \vec{\mathbf{1}}, \vec{\mathbf{x^2}}_{\bullet k} \rangle - 2\beta_k\langle \vec{\mathbf{1}}, \vec{\mathbf{x}}_{\bullet k} \rangle + \beta_k^2 N\right) \right) \end{array}

These expressions may look tedious, but they are just straightforward algebra. If we scan the observation data {\mathbf{x}_i} and compute the following statistics for all {j,k \in \{0,1,2\}} we will have all of the information we need to compute our matrix entries:

\displaystyle \begin{array}{|l| l| } \hline \text{Array} & \text{Size} \\ \hline \vspace{2pt} \langle \vec{\mathbf{x}}_{\bullet j}, \vec{\mathbf{1}} \rangle & 3 \\ \langle \vec{\mathbf{x^2}}_{\bullet j}, \vec{\mathbf{1}} \rangle & 3 \\ \langle \vec{\mathbf{x}}_{\bullet j}, \vec{\mathbf{x}}_{\bullet k} \rangle & 6 \text{ (symmetric)} \\ \langle \vec{\mathbf{x^2}}_{\bullet j}, \vec{\mathbf{x}}_{\bullet k} \rangle & 9\\ \langle \vec{\mathbf{x^2}}_{\bullet j}, \vec{\mathbf{x^2}}_{\bullet k} \rangle & 6 \text{ (symmetric)}\\ \hline \end{array}

In fact, the situation is even better. Notice that

\displaystyle \langle \vec{\mathbf{x}}_{\bullet j}, \vec{\mathbf{x}}_{\bullet j} \rangle = \langle \vec{\mathbf{x^2}}_{\bullet j}, \vec{\mathbf{1}} \rangle

So we do not need to store {\langle \vec{\mathbf{x^2}}_{\bullet j}, \vec{\mathbf{1}} \rangle}. Also the arrays {\langle \vec{\mathbf{x}}_{\bullet j}, \vec{\mathbf{x}}_{\bullet k} \rangle} and {\langle \vec{\mathbf{x^2}}_{\bullet j}, \vec{\mathbf{x^2}}_{\bullet k} \rangle} are symmetric, allowing us to store just 6 of the 9 entries. Altogether, we need to store 24 numbers. Of course we also need to store {N}, the number of observations, for a total of 25.

4. Putting it into code

Implementing this is straightforward even if it demands a bit of care. To see this implemented, look at the file BestSphereGaussNewtonCalibrator.cpp in muCSense. Here is a quick overview of what to look for.

  • The statistics are all accumulated in the function in the method BestSphereGaussNewtonCalibrator::update (BestSphereGaussNewtonCalibrator.cpp).
  • The calibration matrices are computed in the function in the method BestSphereGaussNewtonCalibrator::computeGNMatrices (BestSphereGaussNewtonCalibrator.cpp).In muCSense this sphere fitting procedure is used to calibrate “spherical sensors” such as magnetometers and accelerometers. At present, the library specifically supports the inertial sensors on SparkFun’s 9DoF Sensor Stick: the ADXL 345 accelerometer, HMC 5843 magnetometer, and the ITG 3200 gyroscope. To use this code to calibrate a different accelerometer or magnetometer, you just need to define a new subclass of Sensor — see ADXL345.h/ADXL345.cpp for an example — then follow this example sketch.

5. What’s next.

The code in muCSense works, but needs to be optimized. As it is, a simple sketch using this calibration algorithm takes up 20K of the 32K available on an Arduino Uno. The sums and products have a clear structure that I have not attempted to exploit at all.  Using this structure should allow a valuable reduction in compiled code size.  I will work on this, but would love to hear from anyone else who has some ideas.

It is interesting to note that this algorithm has a clear generalization  to any residual function which is a polynomial of the observation vector.  It works for higher dimensional observations and for higher degree polynomials.  In particular, it can be adapted to handle linear maps with shear, something I plan to implement soon.    The process I walked through here should be abstracted to build a system that takes a polynomial residual function as input and determines the statistics and computations needed in the background, optimizing the tedious arithmetic and saving us from writing this sort of code.

This entry was posted in Mathematics and tagged , , , , , , , , , , . Bookmark the permalink.

4 Responses to Implementing the Gauss-Newton Algorithm for Sphere Fitting (3 of 3)

  1. Ted Cheng says:

    Hi Rolfe, I’m trying your code on this page and I’m having trouble to get correct calibrated result.
    So I have a few questions and wondering if you can answer;

    1) Is your calibration method based on gauss-newton also applicable on other accelerometers/magnetometers ? I’m using MPU9150.

    2) In BestSphereGaussNewtonCalibrator.cpp the calibrate method,should the change be

    change = JtR[0]*JtR[0] + JtR[1]*JtR[1] + JtR[2]*JtR[2] + JtR[3]*JtR[3]/(_beta[3]*_beta[3]) + JtR[4]*JtR[4]/(_beta[4]*_beta[4]) + JtR[5]*JtR[5]/(_beta[5]*_beta[5]);

    instead of

    change = JtR[0]*JtR[0] + JtR[1]*JtR[1] + JtR[2]*JtR[2] + JtR[3]*JtR[3]*(_beta[3]*_beta[3]) + JtR[4]*JtR[4]*(_beta[4]*_beta[4]) + JtR[5]*JtR[5]*(_beta[5]*_beta[5]);

    I’m appreciate for your answer;



    • Yes, change is incorrectly defined, but the program still works.

      Convergence is much faster, and “change” has a sensible meaning if it is defined as the squared magnitude of the relative parameter shifts:

        change =0.;
          for (i=0; i<6; i++) change += JtR[i]*JtR[i]/(_beta[i]*_beta[i]);  //squared magnitude, relative parameter shifts
          printf("it# %d, rel. delta = %10.5f\n",20-num_iterations,change);

      Wonder what the author has been doing for the last five years!

      Cheers, Jim Remington

      • Thanks Jim and Ted. Right, I don’t remember what I was thinking when I hacked that 🙂 Using the squared magnitude of the relative parameter shifts does look reasonable.

        Of course, since the \beta_i are parameters for an affine transformation, what we really care about is that we get good transform results. So I think the “right” thing to do is to check that the distance between these transformations is small. (e.g. we might define change to be the maximum over all vectors in the unit sphere of the magnitude of the difference in transform values between the old and the new transforms.)

        As far as what I’ve been doing, well other hobbies and work have pulled me away from this stuff. but I did get some good results using Kalman filters to get rid of parameter drift and to exploit multiple sensors for denoising.

  2. Ted Cheng says:

    3) And how the equation of change is derived, where can I find the reference ?



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s