Using Genetic Algorithms to Determine Calculus Derivative Functions in C# and.NET

Mike Gold
19y
76.5k
0
0

Article

Figure 1 - Parabola with Tangent Line

Introduction

It is not clear who invented calculus first. It seems that the foundation for calculus was set down at about the same time by two different mathematicians: Isaac Newton and Gottfried Leibniz. Both seemed to come up with the fundamental theorem separately. Gottfried is more credited with the notation, where Newton is credited with the bringing the ideas of calculus together into a subject. Either way, the tools of calculus have done wonders in helping us advance our understanding of physics, chemistry, and biology.

The crux of calculus is the ability to derive a function for the change of one dimension with respect to the change of another dimension. For example one of the great formulas in physics discovered by Newton

Force = mass * acceleration

can be expressed as F = ma or F = m (dv/dt)

dv/dt equals the change in velocity with respect to the change in time. dv/dt is the definition of acceleration. You may ask yourself, how small a change in velocity is dv? The answer lies at the heart of calculus. Dv or (the change in velocity) with respect to Dt (change in time), is the change in both of these dimensions as the change gets infinitely small approaching zero.

Let's take a more mathematical case, the case of a parabola as shown in figure 1. The slope of a line is the ratio of the difference between two coordinates on a line. (slope = (y2 - y1)/(x2 - x1). If we draw a line through the parabola, we intersect at two points and the slope of this line can be calculated using the formula for the slope. If we move the line out to until it intersects with only a single point on the parabola, how do we determine the slope? After all the formula for the slope is (y2 - y1)/(x2 - x1). Since in this case (y2 = y1, and x2 = x1), the slope of the line using our slope formula would be (y2 - y2)/( x2 - x2 ) = 0/0 = ?. Using calculus, we have this concept of the limit of the numerator and denominator of the slope approaching zero, but never hitting it. In this case, dy/dx represents the infinitesimal change from one point to another on the parabola, so that in fact the slope of the line we are interested in only intersects the parabola at one point. For a function, this idea of the change in slope approaching zero is called the Newton Quotient and is shown in the formula below:

We can plug in the formula for a parabola into the Newton Quotient to determine the derivative

= [(x + h)² - x²]/h h --> 0

= (x² + 2xh + h² - x²)/h h --> 0

= (2xh + h²)/h h --> 0

= 2x + h h--> 0

As h goes to 0, the slope function becomes 2x

The Newton Quotient gives us the slope of the line at any point on the parabola. If the value of the parabola x value is 3, then the slope at (3,9) is 6. Other functions are a little tougher to calculate. The slope of sin x is cos x. The slope of e^x is e^x. Many of these slope functions can be derived mathematically. In this article we will derive the derivative of the function through trial and error using genetic algorithms.

Concept

In the past we have talked about a type of genetic expression algorithm called Multiple Expression Programming (MEP). This algorithm allows us to plug a set of numbers back into a genome that contains an equation. The genomes adapt to a fitness that is determined by how close the equation generates the correct answer. In the case of derivatives, we can use MEP to determine which equation most closely matches an equation that represents all the slopes along a particular function. Let's take the case of our parabola. If we measure the approximate slope of 100 points along the curve of the parabola then plug those points into our genome, only the genome representing the equation 2X will most closely match the set of slopes. Note that we only approximate the slopes along the tangent of the parabola by using successive point pairs that are relatively close together.

Below are the results of fitting the set of slopes of a parabola to a function after 102 generations. The result: (a + a) or 2a has the highest fitness (41825).

Figure 2 - Using a genetic algorithm to determine the derivative of x squared

Finding the derivative of the parabola didn't seem like much of a challenge, so we tried the ga on some harder functions. Below is the derivative determined from the cubic function (a³)

Figure 3 - GA determining derivative of the x³ function

Soon we added some trigonomic functions to our MEP genomes to make it even more interesting. Below is the derivative of sine + cosine after 102 generations:

Figure 4 - GA determining the derivative of sine + cosine

And here is the derivative of x * sin x after 102 generations. This is correct considering the derivative of a product xy is dy/dx*x + dx/dy*y.

Figure 5 - GA determining the derivative of asina

After trying trigonometric formulas for a while, we decided to bring natural logs into the picture. Below is the result of the GA finding the derivative of the function e^2x.

Figure 6 - GA Determining e^2x

The Code

This article takes advantage of code from our previous MEP article. The only difference in the code is 1) that we needed to change the genomes slightly to encourage them to use trig and natural log functions and 2) We needed to come up with a fitness function.

When the genome is randomly generated, it generates a number representing what function to use next in its sequence. Below is the C# code that determines what operation to use based on the number inside the genome. Note that numbers 0 - 3 are used to represent constants and variables in the genome and are not included in the case statement.

Listing 1 - Method inside the EquationGenome for performing a piece of the sequence of operations on the genome

public float DoOperation(float a, float b, int operation)
{
float result = 0;
    // determine which operation to perform next on the genome
switch(operation)
   {
      case 4: // add
         result = a + b;
      break;
      case 5: // subtract
         result = (a - b);
       break;
       case 6: // multiply
            result = (a * b);
       break;
       case 7: // divide
               if (b == 0)
                  result = 1000000.0f; // used to prevent divide by zero error
               else
                   result = (a / b);
       break;
       case 8: // sine
            result = (float)Math.Sin((double)a);
        break;
       case 9: // tangent
          result = (float)Math.Tan((double)a);
       break;
      case 10: // cosine
           result = (float)Math.Cos((double)a);
       break;
      case 11: // natural log
               result = (float)Math.Exp((double)a);
               if (result == float.NaN)
                   result = 1000000.0f;
                  break;
      case 12:
           result = 3;
       break;

default:
break;
} // end switch

return result;

}

In order to determine how well our genome approximates the set of slopes inside the function we wish to find the derivative, we need to first come up with a set of slopes to compare to the genome. To compute a set of slopes, we simply loop through a set of points generated by the function and compute the slope of all adjacent points.

Listing 2 - Getting the set of slopes from the function

static PointF[] slope_fx = null;
static PointF[] GetSlopeFunctionValues()
{
   if (slope_fx == null) // only needs to be done once
    {
         // get a set of points from the function
        // that we wish to determine the derivative
        PointF[] fx = GetPointsFromFunction();

       // get the slopes of all those points
        slope_fx = GetSlopes(fx);
      }

return slope_fx;
}

static PointF[] slopes = null;
static private PointF[] GetSlopes(PointF[] fx)
{
       if (slopes == null)
        {
            slopes = new PointF[fx.Length];
            PointF previousPoint = new PointF(-99999, -99999);
            int count = 0;

           // loop through each point of the function and use the
          // adjacent point to determine the next slope.
          foreach (PointF point in fx)
          {
                if ((int)previousPoint.X != -99999)
                  {
                    // calculate slope
                    float slope = (point.Y - previousPoint.Y) / (point.X - previousPoint.X);
                    slopes[count] = new PointF((point.X + previousPoint.X) / 2, slope);
                     count++;
                }

previousPoint = point;
}

}

// return the set of slopes
return slopes;

}

If the set of slopes most closely fits an equation represented by the genome, we found a fit genome. The CalculateDerivativeFitness method of our EquationGenome loops through each slope x value and runs it through the genome equation. The result is subtracted from the actual slope value. This delta between calculated and actual values is accumulated. When we have gone through the entire set of slopes, we take the reciprocal of the sum of the delta results. The reason we take the reciprocal is because a larger difference between actual and approximate values means a worse fitness. We want our fitness to be higher for larger numbers so we invert the fitness.

Listing 3 - Calculating the fitness of the Genome against the set of slopes

public double CalculateDerivativeFitness()

{

// this really is only called once, since we only need the set of slopes calculated one time for
// a particular function
GetSlopeFunctionValues();

CurrentFitness = 0;

// loop through each slope x value
for (int i = 0; i < slope_fx.Length-1; i++)
{
    // perform this genomes equation on the next slope
   // to determine the approximate slope at this x value
    float calc = PerformCalculation(slope_fx[i].X);

   // get the actual slope determined in listing 2
      float measure = slope_fx[i].Y;

// determine the difference and add it to the fitness
CurrentFitness += Math.Abs(measure - calc);
}

// take the reciprocal of the total set of differences,
// since a larger difference between approximated
// and actual should be a worse fitness.

CurrentFitness = 1 / CurrentFitness; // Math.Exp(-Math.Abs(CurrentFitness));

return CurrentFitness;

}

Conclusion

This program illustrates another useful way genetic algorithms can be used to come up with solutions to mathematical problems through trial and error. We found that the GA seemed to work very well for determining a few simple derivative equations. The GA didn't always perform perfectly. When we tried to determine the derivative of very complex polynomials, the GA seemed to have trouble. I suspect the GA can be made to focus better if the trig and exponent functions are removed when you are using the program on polynomials. Otherwise the GA may go down the wrong path and get stuck in a local minimum or converge too quickly. Anyway, have fun playing with this demonstration. Perhaps you will derive a new use for genetic algorithms in the world of mathematics in C# and .NET.