Optimal Approximations

Faculty Member
Joseph Rosenblatt

There is a wide range of interesting and challenging problems connected with optimal approximations in mathematics and statistics. In this area, there is an interplay between functions and data. This goes in both directions:

A) given a known function, we seek to find the discrete data that best approximates the function,

B) given known discrete data, we seek to find the functions that best approximate the data.

This project will have many opportunities for understanding or developing theory that addresses the interplay between data and the related functions, and also opportunities for computations and applications of this vital interplay.

We give just a few examples of what could be pursued in A) and B). At the start of the project, we will discuss in more detail what directions might interest the participants, and how it will be best to proceed in the work.

A) From Functions to Data

A1) Approximating the Riemann integral

Suppose you have a reasonably nice continuous function y = f(x) defined on [0,1]. We know that there is a well-defined Riemann integral. This integral can be approximated by taking a partition P and consider the left-endpoint Riemann sum. As the gaps in the partition go to zero, this sum gets closer and closer to the integral I of f. But what happens if we fix the number n of points in the partition and ask the following:

Question: What partition P with n points gives the smallest error between the integral I and the left-endpoint Riemann sum?

It is clear that this optimal approximation depends on the function f. But how can we describe this dependence? How do we compute this partition? In fact, is it unique? What happens as we increase n? Perhaps even more importantly, what happens with this if we use a different type of approximation to the Riemann integral like the right-endpoint sum, the midpoint sum, or Simpson's rule?

It turns out that there is a lot that can be said about these questions, but the computations and various aspects of the computations are still very poorly understood. See Joseph Rosenblatt's article 1) in the references for more details and background information.

A2) Approximating a Surface by Triangulations

Suppose we take a positive continuous function z = f(x,y) defined on the unit square. Then take a partition of the square into a possibly large number of triangles T(i). Each triangle has three vertices and the graph of f gives a height at each of these vertices. This gives a trapezoidal prism under the graph with base the triangle T(i). Now consider the areas A(i) of the triangles on top

of these trapezoidal prisms. If we take the sum A of these areas A(i) we would hopefully have an approximation of the surface area S(f) of the graph of f. As the size of the triangles T(i) shrinks, the approximation should get better and better. But suppose we have a limited amount of computational time and have to fix the number n of triangles. then we can ask this question.

Question: How do we find a (the?) choice of the triangles that gives the best approximation of S(f)?

The best choice of the triangles gives what we can call best triangulations B(n) of the surface of the graph of f. A basic question is whether or not we can relate the triangulation B(n) to each other as n increases? What does the mathematical and computational literature say about these types of optimal triangulations.

We hope we can compute B(n) quickly. If we can, we could use these triangulations to simulate a smooth surface by a surface that consists of many small triangles, and that would be useful for computer visualization and various applications of this visualization.

It is worth observing that we could strict this triangulation problem to the one-dimensional problem of approximating the length of a curve using a polygonal graph. Here with the number n fixed, we want to find the placement of points on the curve so that the sum of the distances between the points gives the best approximation of the length of the curve. This optimal approximation problem is not the same as, but certainly related to, the snow pole problem. That is, you have a limited number of snow poles with reflectors on the ends, and you want to place them on a curving road across open ground, so that when the snow is deep you can see them easily and stay on the road. Then what is the best position for placing the poles? How does it relate to the width of the road and the degree to which is curves?

B) From Data to Functions

B1) Classical Least Square with Offsets

Suppose that we have n real data points (x(i),y(i)). We assume that the values x(i) are distinct. Now we want to find a linear function y = ax + b that best approximates this data. The usual Least Squares Approximation Method (LSQ) chooses a and b such that we have the minimum sum of the squares of all the y(i) – a x(i) - b. There are unique a and b that achieve this. These can be found using a little calculus in two variables. Also, the LSQ is actually a linear optimization problem because there are formulas using matrices for these values.

But what happens if we do not insist on exactly using x(i) or y(i)? Indeed, suppose we seek the best linear approximation using offsets of that data. That is, allow the x(i) and y(i) to move a little bit to u(i) and v(i) respectively. Then find the least squares approximation. That is, find values of all the variables such that we have the minimum sum of all the squares of u(i) – x(i), v(i) – y(i), and v(i) – au(i) - b. This is what is meant by Least Squares with Offsets (LSQ with Offsets).

Question: Can the LSQ with Offsets be computed using linear algebra? Or do we need calculus? Is there a unique solution to the LSQ with Offsets?

It turns out that computations in this problem can be unpredictable. Sometimes computational results make perfect sense, and sometimes they do not. The difficulty is that it turns out that this minimization problem has at least two local minima. One of these is the best choice, and the other actually is not, even though both of them give a smaller error than the classical LSQ. How do we avoid the undesirable local minima? Is there a linear algebra method to distinguish these two local minima? And so on. A number of questions, both theoretical and computational, are worth considering here. See Susanne Schennach's survey in the references for background on this type of data problem.

The same issues considered here would apply if we were trying to consider offsets in the multilinear LSQ. This probably increases the number of local minima for the appropriate sum of squares that have to be considered, and some algorithm for deciding among them would have to be developed.

An important extension of the questions here would be to allow the least squares approximation to be a quadratic, cubic, or even higher order polynomial. These present some challenges for even the classical case. There is much yet to be understood if we want to do this higher order least squares approximation, but also allow offsets in the data. We can consider this in one and several variables. Some experiments with computations should help clarify what we can conclude about this type of least squares problem.

B2) Least Squares with Big Points

The Big Point Method (BPM) can be considered as one where there is initially only a disc (aka cloud) around a data point and no particular point in there is prescribed. That is, suppose we initially set up discs D(i) of radius d around each of our data points ((x(i),y(i)). Now assume that there is some a and b and some (u(i),v(i)) in the discs D(i) such that we have all v(i) = au(i) + b. This certainly occurs if d is big enough. But there is also a minimum value d for which this occurs. So take an a and b that corresponds to this value d, and now fix our linear approximation to be y = ax+b. Now the choice of a and b might not be unique. Moreover, this problem is no longer so easy to solve using calculus, and it is probably not a linear algebra problem either.

Questions: How predictable is the BPM Least Squares method? That is, is there a best possible solution to use in BPM? Also, does this method compare well with the Classical Least Squares, or for that matter the Least Squares with Offsets?

The idea of using cloudy data, that is the method above of setting up the discs D(i) in advance, could be useful also for higher order polynomial least squares approximation and multilinear least squares approximation. This would be worth both some computational and theoretical investigation.


1) J. Rosenblatt, Partitions for Optimal Approximations, Int. Journal of Math Analysis, Vol. 7, 2013, no 58, pp 2861-2877.

2) S. Schennach, Measurement systems, Journal of Economic Literature, to appear, preprint 65 pages.

Team Meetings
Project Difficulty
Undergrad Prerequisites

Calculus and some linear algebra, if possible. For coding, we could use Mathematica, Matlab, Python, etc, whatever students already know or are able to learn some of during the time of the project.