Genetic Algorithm For Icon Generation in Visual Basic

This article provides some of the basics of genetic algorithms, including what they are, what they're good for, why we call them genetic, and so on. This provides both theory and sample implementations in Visual Basic .NET.

Scope

In this article, we'll see some of the basics of genetic algorithms, in other words what they are, what they're good for, why we call them "genetic", and so on. This won't just be theory. We'll see how a simple genetic algorithm could be implemented in Visual Basic .NET. Source code is provided to and further experimentation encouraged.

Introduction

A genetic algorithm could be seen as a component in the field of Artificial Intelligence. They are ultimately a search technique using a metaheuristic approach to attain the specific goal of searching for a solution to a problem. We use the term "genetic" because, in order to solve a problem, we will use procedures inspired by the process of natural selection. Darwin meets computing. Similar in some ways to the iterative development approach except with code altering itself. Finding an ideal solution to a problem in one step is very difficult. This approach creates code that can "evolve" towards our goal, finding it's way by itself (well, almost).

Our natural world

In his "Principles of Biology", Herbert Spencer coined the expression "survival of the fittest", as a synonym for "most well adapted to the current environment". Darwin, in his "Origin of Species" used Spencer's quote alongside the concept of "natural selection", that could be intended as "better designed for an immediate, local environment". In any environment, the environment itself sets the standards to determine who is "fit".

For example, in a place with strong predators, a fundamental requirement to survive could be the ability to run away with great speed, or fight back with proficiency. In such a world, a fast runner or a strong fighter will be candidates for longer/better survival than those unprovided with those abilities. We could say the world poses a question like "how to deal with those predators?" and those who can find a good solution to the riddle will be able to survive. (That is an over-simplification, but we aren't biologists, after all!)

We can identify some parameters: a world to live in (or "a problem to solve"), a population that will experience the world's challenges (or, individuals that will try to solve the problems posed by their natural conditions), a story regarding individuals that will pass on their experiences, their successful traits (and unsuccessful, too), their DNA, hoping their successors will grow better and better.

In computer science, we could get inspiration from biological approach to solve problems. Nature doesn't brute-force solutions, it creates a multitude of possible solutions through recombinations of previous answers. Again, it's an over-simplification because in the real world challenges modify themselves, too, but the basis of the approach are still valid in a digital world. Genetic Algorithms (GAs) inherit some biology terms. We could identify then either as chromosome or individual, as a single solution to a given problem or population, as the array of possible solutions, a gene or a part (byte) of an individual or as fitness, or the degree of adequacy of a certain solution or as crossover or in-breeding, as the recombinations of solutions into new ones or as mutation, or the random change that could occur in a solution.

Genetic algorithm overall

Let's consider a simple problem. We have a numeric target, let's say the number 5. This number can be seen as our world's challenge (or the solution domain). In our world we'll have inhabitants and let's say the population is composed of a certain amount of mathematical operations.

We take, for example, 100 random operations to begin. Each operation must confront itself with the challenge, trying to solve it. This match is called "fitness". Fitness could be defined, as we saw in the preceding regarding biological beings, as the rate of adequacy in a given solution domain.

For example, the operation "0 + 0", will have a low fitness, because it cannot produce a value near to our target, but "0 + 5" will be a perfect valid answer. We must proceed in evaluating every member of our population, calculating fitness for each of them. Using this process, we'll find the best (or "fittest") answers to our problems. At this point, we take some pairs of individuals amongst the best, "in-breeding" them to produce new elements that share characteristics from both parents and repeat the process for our entire population (if we wish to maintain a specific number of elements, at the end of the process we must discard a number of the worst elements that equals the number of new elements created).

A new generation of solutions is born, nearer to our target, we hope, awaiting to be tested against our problem. Rinse and repeat until valid solutions are born. Living beings pass on their DNA, their genetic code, to their sons and daughters. In computer science, everything is composed by bytes and we could consider them like a kind of digital DNA, a representation of what an individual is. In the preceding example, we could represent an operation using six bytes. The first and the last two to represent the numbers between 0 and 255 (from 00 to hexadecimal FF), whereas the two bytes between them can represent the operator (for example 00 = +, 01 = -, and so on.).

Through the breeding phase, we extract a part of those bytes from a parent and a part from the other and combining it we will have a new item. Like in a real environment, we could think of introducing a sort of "mutation", that can alter in some ways the genes passed on. In other words, GAs are about finding the best path to solve a specific problem, creating better and better solutions towards it. For further specific information, check the Bibliography at the end of the article.

A practical implementation

Words are good, but when it comes to programming, they are pretty cheap and a piece of code is worth a thousands words. So, let's see how to profit from GAs in a practical way. In the following code, we'll discuss a graphical problem. Given a target icon and starting from a population composed of images made by random bytes, trying to "evolve" our random pixels to the target, hopefully reaching it completely.

As I said in the opening, we'll use Visual Basic for this. The final user must be able to select an ICO file of 16x16 pixels and the program will generate 100 random icons. For each of those icons, we will evaluate their fitness. This will be trivial, since an icon is an array of bytes, we'll simply match our target bytes with those of each population's element. The more they are alike, the better will be our image. For each generation of solutions, I will extract 4 elements to inbreed, generating 4 new brand individuals, in which their parent's byte are reproduced, discarding the 4 worst elements to maintain a constant population of 100 elements.

Genetic image class

Our "genetic image" class can be written like this: 
  1. Public Class GAImage  
  2.    
  3.     Const _FIXED_BYTES As Integer = 42  
  4.    
  5.     Dim _targetBytes() As Byte  
  6.    
  7.     Dim _personalBytes() As Byte  
  8.     Dim _isMutable() As Boolean  
  9.    
  10.     Public ReadOnly Property FixedBytes As Integer  
  11.         Get  
  12.             Return _FIXED_BYTES  
  13.         End Get  
  14.     End Property  
  15.    
  16.     Public Property PersonalBytes As Byte()  
  17.         Get  
  18.             Return _personalBytes  
  19.         End Get  
  20.         Set(value As Byte())  
  21.             _personalBytes = value  
  22.         End Set  
  23.     End Property  
  24.    
  25.     Public ReadOnly Property Fitness As Single  
  26.         Get  
  27.             Dim _fitness As Single = 0  
  28.             For ii As Integer = 0 To _targetBytes.Length - 1  
  29.                 If _targetBytes(ii) = _personalBytes(ii) Then  
  30.                     _fitness += 1  
  31.                     _isMutable(ii) = False  
  32.                 End If  
  33.    
  34.             Next  
  35.             Return _fitness  
  36.         End Get  
  37.     End Property  
  38.    
  39.     Public ReadOnly Property FitnessPercent As Single  
  40.         Get  
  41.             Return (Me.Fitness * 100 / _targetBytes.Length).ToString("000.00")  
  42.         End Get  
  43.     End Property  
  44.    
  45.     Public Function isByteMutable(numByte As IntegerAs Boolean  
  46.         Return _isMutable(numByte)  
  47.     End Function  
  48.    
  49.     Public Sub New(_target() As Byte, _code() As ByteOptional randomInit As Boolean =True)  
  50.         ReDim _personalBytes(_target.Length - 1)  
  51.         ReDim _targetBytes(_target.Length - 1)  
  52.         _targetBytes = _target  
  53.    
  54.         ReDim _isMutable(_target.Length - 1)  
  55.         For ii As Integer = 0 To _isMutable.Length - 1  
  56.             _isMutable(ii) = True  
  57.         Next  
  58.    
  59.         For ii As Integer = 0 To _FIXED_BYTES  
  60.             _personalBytes(ii) = _targetBytes(ii)  
  61.         Next  
  62.    
  63.         If Not randomInit Then Exit Sub  
  64.    
  65.         Randomize()  
  66.         For ii As Integer = _FIXED_BYTES + 1 To _targetBytes.Length - 1  
  67.             _personalBytes(ii) = _code(Int(Rnd() * (_code.Length - 1)))  
  68.         Next  
  69.     End Sub  
  70.    
  71. End Class  
Its constructor asks for a copy of the target's bytes, a genetic pool to extract from and a flag to determine if a random initialization must take place (we'll see why in a few lines). The class exposes some properties. PersonalBytes allows access to the "genetic code" of our individual, returning an array of its bytes; Fitness calculates how much the current image and the target are alike.

Using the _isMutable array, we will specify whether a specific byte is resistant to mutation (miming a solid genetic acquired trait). FitnessPercent simply exposes the precious value in a percent format. FixedBytes has only technical utility. Since the first 42 bytes represent the icon's header, those must be mainlined from the target to the individuals, in order to produce a set of valid icons. In that class we have all it takes to produce full-fledged genetic icons.

We need an environment in which our play could take place. A simple form that observes how our generations change will do.

The playground

Let's have a look at the final outlook our form will have:



The button "Choose target" allows the user to choose an icon that will be the image we want to reach evolving our solutions. One hundred PictureBoxes are dynamically created to contain the output of each GAImage. Random GAImages are generated and we can see none of them has apparent correlation with our target (lots of chaos in them!). Still, evaluating them, we could identify the better samples and start our loop of breeding.

Let's see a typical execution, to discuss later each part of it:



I will skip entirely the parts related to the control's creation or use that the reader could see in the source code. What will be discussed here is related exclusively to the aspects of GA. Most of the code below will be incomplete for the sake of comprehension. I suggest to download the source code to have the most possible complete overview.

Create a start solutions set

In opening our target image, we could access its bytes. The bytes of the target image will be our ideal genetic code and each byte represents an element of the genetic pool each created element draws by for its personal DNA.
  1. Private Sub LoadImage(path As String)  
  2.     Dim b As New Icon(path)  
  3.     pbOriginal.Image = b.ToBitmap  
  4.    
  5.     Dim ms As New MemoryStream  
  6.     b.Save(ms)  
  7.     ReDim _targetBytes(ms.ToArray.Length - 1)  
  8.     _targetBytes = ms.ToArray  
  9.    
  10.     _GACode = New List(Of Byte)  
  11.     Dim _tmpCode As New List(Of Byte)  
  12.     _tmpCode.AddRange(_targetBytes)  
  13.     _GACode.AddRange(From bt As Byte In _tmpCode Distinct Select bt)  
  14. End Sub  
  1. Private Sub InitializePopulation()    
  2.   _GAPopulation = New List(Of GAImage)    
  3.     For ii As Integer = 0 To _SAMPLE_NUMBER - 1    
  4.         Dim _ga As New GAImage(_targetBytes, _GACode.ToArray)    
  5.         _GAPopulation.Add(_ga)    
  6.         Dim pb As PictureBox = CType(GroupBox3.Controls("pbSample" & ii.ToString("00")), PictureBox)       
  7.         Try    
  8.            pb.Image = Image.FromStream(New MemoryStream(_ga.PersonalBytes))       
  9.         Catch    
  10.         End Try    
  11.     Next    
  12. End Sub   
We populate a start list composed of GAImages, initializing them with the target byte array (which, as we saw, is useful for fitness calculations), and passing the pool from which our bytes could be drawn. Please refer to the GAImage code above for details on the initialization procedure.

The icon I've used in the preceding example is 318 bytes long. It's very unlikely that a member of the starting population will produce our final result.

It's necessary to loop through generations, combining the best items of each set.

A proposal for evolution

Let's see my proposal for a evolutive behavior. In my code, it takes place in the click event of the Button2 control. Non-GA relevant code is omitted (refer to the source code for it).
  1. Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click    
  2.     ' omission    
  3.     While _runFlag    
  4.         ' omission    
  5.         Dim bestSamples As IEnumerable(Of GAImage) = From ga As GAImage In _GAPopulation Order By ga.Fitness Descending Take (4)    
  6.          ' omission    
  7.         If bestSamples(0).FitnessPercent > 95 Then Button3_Click(sender, e)    
  8.         Dim sample01 As GAImage = GenerateImage(bestSamples(0), bestSamples(1))    
  9.         Dim sample02 As GAImage = GenerateImage(bestSamples(1), bestSamples(0))    
  10.         Dim sample03 As GAImage = GenerateImage(bestSamples(2), bestSamples(3))    
  11.        Dim sample04 As GAImage = GenerateImage(bestSamples(3), bestSamples(2))    
  12.         _GAPopulation.Insert(0, sample01)    
  13.         _GAPopulation.Insert(1, sample02)    
  14.         _GAPopulation.Insert(2, sample03)    
  15.         _GAPopulation.Insert(3, sample04)    
  16.         Dim worstSamples As IEnumerable(Of GAImage) = From ga As GAImage In _GAPopulation Order By ga.Fitness Ascending Take (4)    
  17.         _GAPopulation.Remove(worstSamples(0))    
  18.       _GAPopulation.Remove(worstSamples(1))    
  19.         _GAPopulation.Remove(worstSamples(2))    
  20.         _GAPopulation.Remove(worstSamples(3))    
  21.         ' omission    
  22.     End While    
  23. End Sub    
As simple as that, we loop undefinitely throughout the list of GAImages (or, at least until 95% of the result has been reached), each time taking the 4 fittest samples and crossing their bytes. We insert the new created individuals on top of our list (assuming they are the bests) and removing an equal number of elements judged the worst in terms of fitness. It is possible, if something "goes wrong" during the generation phase, that a new created element is worse than a pre-existing one. In that case, it is discarded anyway.
  1. Private Function GenerateImage(firstGA As GAImage, secondGA As GAImage) As GAImage         
  2.     Dim _ga As New GAImage(_targetBytes, _GACode.ToArray, False)    
  3.     For ii As Integer = _ga.FixedBytes To firstGA.PersonalBytes.Length - 1 Step 1    
  4.         _ga.PersonalBytes(ii) = IIf(secondGA.isByteMutable(ii), Mutation(secondGA.PersonalBytes(ii), 2), Mutation(secondGA.PersonalBytes(ii), 20))    
  5.     Next    
  6.     Return _ga    
  7. End Function   
The inbreeding is pretty simple. Each accessible byte from the first element is looped and the obtained byte is calculated as follows. If the same byte from the second element is mutation-resistant, we call on the mutation routine with a probability of 1/20. Otherwise, we use a 1/2 probability rate. 
  1. Private Function Mutation(bt As Byte, rate As Byte) As Byte       
  2.     Randomize()    
  3.     Dim prob As Integer = Int(Rnd() * rate)    
  4.     If prob Mod rate = 0 Then    
  5.         Return _GACode(Int(Rnd() * (_GACode.Count - 1)))    
  6.     Else    
  7.         Return bt    
  8.     End If    
  9. End Function    

Mutation() takes place with a probability, if the mutation rate catches up, the specific byte is randomly extracted from the standard pool. Otherwise, the second element byte will be reproduced in the offspring. The loop through each generation/population for a single generation will eventually evolve towards the final (and expected) result.

This is only one of the methods that can be used to implement evolutive algorithms: each coder is the judge in defining what are the parameters to be called into question and what are the general rules to solve a specific problem.

In our case, as long as it can bring valid results, other kinds of approaches are possible.

Sample video

A sample running session could be seen here: g6JebAjtWQ 

Download source code

Bibliography