Blue Theme Orange Theme Green Theme Red Theme
 
Ads by Lake Quincy Media
Home | Forums | Videos | Photos | Downloads | Blogs | Interviews | Jobs | Beginners | Training
 | Consulting  
Submit an Article Submit a Blog 
 Login Close
User Id:
Password:
 
Forgot Password
Forgot Username
Why Register
 Jump to
Skip Navigation Links
TechnologyExpand Technology
WebsiteExpand Website
 Resources  
Close
 Our Network  
Close
Search :       Advanced Search »
Home » Visual C# » Using a Genetic Algorithm to Do Consultant Scheduling in C#

Using a Genetic Algorithm to Do Consultant Scheduling in C#

This article describes a way to use a type of genetic algorithm called PBIL (Population Based Incremenetal Learning) to optimize the scheduling of consultants on a group of 5 project.

Author Rank:
Total page views :  15957
Total downloads :  606
   Print Read/Post comments Post a comment  Similar Articles  
   Email to a friend  Bookmark  Author's other articles  
Download Files:
gascheduler.ZIP
 
Become a Sponsor

Figure 1 - Complex Schedule

Introduction

Let me present you with the following problem.  Let's say I have a company of 32 consultants.  Suppose I have 5 consulting projects over a 3 month period and I want to rotate each of my consultants through all these projects in such a way so that they all end up with equal pay at the end of the project.  One project pays $500/day, one project pays $400/day, one project pays $300/day,  one pays $200/day and finally one pays $100/day.  To simplify the problem, one consultant can work the same day and earn the sum of the two projects, but there are only 5 slots you can occupy  in a day, one for each project.  Another words on any given day, you can have at the most 5 consultants working the 5 projects.

How would I go about solving this problem?  Well I had several thoughts on it at first.  You could some how round robin your consultants.  Choose the first 5 then the next 5 then the next 5 and then have then rotate there order.  This doesn't work out very easily though, because the number of consultants does not divide easily into the 3 month period.  You could just randomly assign them to all the slots and see if it works out.  Then just keep switching them around until it evens out.  This particular solution is not as easy to resolve as it seems.

So what is the easiest solution?  Genetic Algorithms.  Let the computer do the work through trial and error.  Each time the GA will try to fill a schedule, give a fitness to the result, and the highest fitness survives. (Ain't GA's grand!)  The beauty about genetic algorithms is that you never really care how the GA gets to the answer, you only care about how accurately you represent the fitness of the solution contained in each Genome.  Although this can be solution can take a long arduous time to complete, it works.  (Imagine when quantum computing comes into play, then the long arduous part goes away!)

In this article we will use a genetic algorithm called PBIL (Population based Learning)  to converge on the solution.  Population based learning maintains a learning vector and converges on a solution using this vector in conjunction with the fitness of the genomes.

Genome Representation

As is true with every genetic algorithm, the trick is to figure out how the genome can be a representation of a possible solution.  In the case of our problem, we want the genome to represent the schedule for the entire 3 months of our 5 projects.  Each gene in the genome is a consultant in a particular slot in the schedule.  Since we conveniently have 32 consultants, then we can represent each consultant as a 5 digit binary number from 00000b - 11111b.  

The schedule has 5 slots over a 3 month period.  There are 5 working days in a week and 4 weeks in a month, so the calculation of number of slots is

(3 months ) * (20 days/month) * (5 slots) = 300 slots.

The calculation of the number of bits in our genome is:

(5 bits/consultant)  * 300 slots =  1500 bits

Now we have a representation in the genome of the entire schedule containing a consultant for each slot of the schedule over a 3 month period.  For example, here is the start of a possible genome:

11000 | 00100 | 00000 | 10001 | 00111 |  ....

The string above represents the first 5 consultants in the first 5 slots of the 300 slot schedule.  Anotherwords, these are all the consultants for a possible genome on day 1.  If we translate to base 10, we see that this comes out to

24 ,  4 ,  0 , 17 , 7   

If we assigned each consultant an ID from 0-31  then consultant  with ID 24 would be filling slot #1 on day 1,  consultant 4 fills slot #2,  consultant 0 fills slot #3, consultant 17 fills slot #4 and consultant 7 fills slot #5. 

The part of the genetic algorithm that constructs the genome will continue to fill the slots all the way up until the end of the month to represent one genome or an entire schedule of consultants for the 3 month period.  Since we want them all to get paid equally, we need to come up with a fitness function that will find a genome that balances the pay of all consultants for the 3 month period.

Fitness Function

The easiest way to figure out what the fitness of the problem is to ask ourselves "What is it exactly we are trying to accomplish?".  What we want in this problem is for all consultants to get paid equally over the 3 month period.  That means that if we add up the total billing for the period of 3 months for a consultant and compare it to another consultant, the billing should be close to the same (give or take a few hundred bucks).  How can we do this comparison?  One way to do the comparison is to first add up the amount billed each individual consultant in a particular genome.  Then take the average billing for all consultants represented in the genome.  Then sum the standard deviations (the amount each consultants total billing varies from the average).  The larger the sum of the standard deviations, the worse the fitness.  Therefore we should take the reciprocal of the calculated sum in order to make a higher deviation a worse fitness. 

Using this method, as the fitness of the genes get higher and higher as we run our algorithm,  the consultants total billing over the 3 month period should begin to equalize.

The Code for Fitness

Following the code for the fitness function, it performs just as we explained in the section above.  First the function loops through the 1500 bit genome and calculates the total amount each consultant makes in the 3 month project period.  Next it computes the average billing among all of the consultants.  Finally it computes the standard deviation for each consultant and sums all of the standard deviations together.  Finally it computes the fitness by taking the reciprocal value of the sum of the standard deviations.  For very large standard deviation sums, our fitness will be close to zero.  If we have a perfect fitness (all consultants getting paid the same exact amount) then our fitness should be close to infinity.

Listing 1 - Fitness function for Computing the Optimal Billing Schedule

double CalculateFullScheduleFitness()
{
 
float fitness = 0;                        // total fitness value for this genome
 
float standardDevSum = 0.0f; // standard deviation
 
float[] billing = new float[32]; // total billing for each consultant

int slotCount = 0; // used to keep track of the current slot # in the genome

// go through all slots in the month and calculate billing for each consultant
for (int i = 0; i < LENGTH; i += NUMBER_SIZE)
 {
  
// convert the binary part of the genome to an 5 bit integer
   
int nextConsultant = FormNumber(i, NUMBER_SIZE);
   billing[nextConsultant] += _slotValues[slotCount];
  
  
// keep a running total of the slots
   slotCount = (slotCount + 1) % NUMBER_OF_SLOTS;
 }

// Now the fitness is based on how close the billing is for all consultants
 // so we need to go through each consultants billing,
// and compute a standard deviation
// the smaller the sum of the deviations, the higher the fitness
// compute the average billing

  float totalBilling = 0.0f;
  float averageBilling = 0.0f;
  foreach (float nextBilling in billing)
   {
      totalBilling += nextBilling;
    }

   averageBilling = totalBilling / billing.Length;

// sum the standard deviations for each consultant
foreach (float nextBilling in billing)
  {
   
float sqrtOfDeviation = (nextBilling - averageBilling);
    standardDevSum += (sqrtOfDeviation * sqrtOfDeviation);
   }

  // the higher the sum of the deviations, the worse the fitness
  // take the reciprocal (add a small value to prevent a divide by 0
  // exception)

  fitness += 1.0f / (standardDevSum + .0001f);
 
return (double)fitness;
}

Results

After running the PBIL algorithm for 1000 generations with our fitness function we get the following results as shown in figure 2.  As we can see, the total billing is beginning to equalize.  All total billing for the 3 months are at least $1600 and at most $4400 for each of our 32 consultants.

Figure 2 - Generation 1000 of the PBIL algorithm

By the 14000th generation (figure 3) the total billing is even more equalized with the lowest paid consultant being paid a total of $2200 and the highest paying consultant being paid a total of $3400.

Figure 3 - Generation 14000 of the PBIL Algorithm

Finally at around generation 18000 (figure 4),  the total billing has pretty much equalized with everyone being paid around $2800. (There are a few lucky consultants who managed to squeak out an extra $100).  Note that figure 4 also shows the scheduling for all of the consultants.  Consultants 6, 4, 31, 25, and 27 fill the first 5 slots on day 1. Consultants 14, 22, 4, 11, and 25 fill the next 5 slots on day 2, and so on.  Note, the fitness function doesn't take into account that a consultant can work 2 slots on the same day or any distribution of time for each consultant.  For example, on day 3, consultant 3 is working two slots.  To solve this problem, you probably can easily add something to the program that swaps two consultants in the same slot,  if one of them is working two different slots on the same day after the PBIL algorithm has finished converging.

Figure 4 - Generation 18000 of the PBIL Algorithm (Convergence)

Conclusion

In the past we have examined how to use genetic algorithms to search for solutions to problems such as creating sudoku puzzles, electronic logic design, and playing mastermind.  In this article we discuss how to solve a scheduling problem with genetic algorithms that might normally be very difficult to solve otherwise. Perhaps you can use this solution to administrating some of your own difficult day to day activities.  Anyway, don't leave your consultants hanging in a unequal billing cycle,  get them organized with the help of C# and .NET.


Login to add your contents and source code to this article
 About the author
 
Mike Gold
Michael Gold is President of Microgold Software Inc., makers of the WithClass UML Tool. His company is a Microsoft VBA Partner and Borland Partner. Mike is a Microsoft MVP and founding member of C# Corner. He has a BSEE and MEng EE from Cornell University and has consulted for Chase Manhattan Bank, JP Morgan, Merrill Lynch, and Charles Schwab. Currently he is a senior developer at Finisar Corp. He has been involved in several .NET book projects, and is currently working on a book for using .NET with embedded systems. He can be reached at mike@c-sharpcorner.com
Looking for C# Consulting?
C# Consulting is founded in 2002 by the founders of C# Corner. Unlike a traditional consulting company, our consultants are well-known experts in .NET and many of them are MVPs, authors, and trainers. We specialize in Microsoft .NET development and utilize Agile Development and Extreme Programming practices to provide fast pace quick turnaround results. Our software development model is a mix of Agile Development, traditional SDLC, and Waterfall models.
Click here to learn more about C# Consulting.
 
Introducing MaxV - one click. infinite control. Hyper-V Hosting from MaximumASP.
Finally – a virtual platform that delivers next-generation Windows Server 2008 Hyper-V virtualization technology from a managed hosting partner you can truly depend on. Visit www.maximumasp.com/max for a FREE 30 day trial. Hurry offer ends soon. Climb aboard the MaxV platform and take advantage of High Availability, Intelligent Monitoring, Recurrent Backups, and Scalability – with no hassle or hidden fees. As a managed hosting partner focused solely on Microsoft technologies since 2000, MaximumASP is uniquely qualified to provide the superior support that our business is built on. Unparalleled expertise with Microsoft technologies lead to working directly with Microsoft as first to offer IIS 7 and SQL 2008 betas in a hosted environment; partnering in the Go Live Program for Hyper-V; and product co-launches built on WS 2008 with Hyper-V technology.
Dynamic PDF
ceTE software specializes in components for dynamic PDF generation and manipulation. The DynamicPDF™ product line allows you to dynamically generate PDF documents, merge PDF documents and new content to existing PDF documents from within your applications.
Go.NET
Build custom interactive diagrams, network, workflow editors, flowcharts, or software design tools. Includes many predefined kinds of nodes, links, and basic shapes. Supports layers, scrolling, zooming, selection, drag-and-drop, clipboard, in-place editing, tooltips, grids, printing, overview window, palette. 100% implemented in C# as a managed .NET Control. Document/View/Tool architecture with many properties&events. Optional automatic layout.
Dundas Software
Dundas Chart for .NET is the most advanced .NET charting package available today.  With an extremely complete feature set, elegant architecture and easy implementation, Dundas Chart can quickly add advanced Charting functionality to enhance and transform ASP.NET and Windows Forms applications.  Whether you are implementing charting into internal projects, or building applications for clients, Dundas Chart offers advanced technology and advanced results to get the most out of data.
Clickatell's SMS Gateway
Clickatell's Developer Solutions allow you to SMS enable any website or application via a range of API's. Learn More about our API connections.
Free access to .NET Memory Management video
Everything you need to know about Garbage Collection, Temporary Objects, Fragmentation, Finalization and common causes of memory leaks in .NET. Watch the video here.
Microsoft Visual Studio 2010 Professional
Microsoft Visual Studio 2010 Professional will launch on April 12, but you can beat the rush and secure your copy today by pre-ordering at the affordable estimated retail price of $549 (US). Pre-order now.
Nevron Chart for .NET 2010.1 Now Available
The leading .NET charting control now features PDF, Flash and Silverlight export, visualization of large datasets and more. Deliver true charting functionality to your BI, Scorecard, Presentation or Scientific apps. Download evaluation now.
Developer-Ready ASP.NET 2.0 Web Hosting with 3 MONTHS FREE
Now supporting .NET 3.0 Framework with Windows Workflow Foundation, Windows Communication Foundation (WCF), Windows Presentation Foundation (WPF), windows CardSpace (WCS)! Providing more flexibility for Developers with Web Services Support and a User/Permission Manger. Also supporting MS SQL 2005/2000 with Real-Time Backups, FREE Automated Attach .MDF Tool, FREE SQL Restore and Shrink SQL DB Tools, and SQL
 
   Print Read/Post comments Post a comment  Similar Articles  
   Email to a friend  Bookmark  Author's other articles  
Download Files:
gascheduler.ZIP
 
 Post a Feedback, Comment, or Question about this article
Subject:  
Comment:  
Become a Sponsor
 Comments
Neat stuff! How do you... by jon On April 19, 2007
Very cool... how would you modify the fitness function - based upon consultants requesting specific projects... say each consultant picks thier first to n choice of jobs to work on. thanks!
Reply | Email | Delete | Modify | 
Neat stuff! How can I... by Babut On April 26, 2007

How can I solve (using GA) the problem of coloring the nodes of a graph (with less than 100 nodes), any two adiacent nodes having a different color?

You're the best!

Thank You!

 

Reply | Email | Delete | Modify | 
Problem running the code by Hannah On April 20, 2007
Hi.. When I run the code.. I do not get the above out... No generations are displayed.. just .5,.5.... Any ideas? Thanks again... cool stuff!
Reply | Email | Delete | Modify | 
Problem running the code by jj On May 31, 2007
Hi I have the same problem like hannah, the particular output is a density of o.5,0.5 .............. Need your help thanks
Reply | Email | Delete | Modify | 
Re: Problem running the code by Mike On November 18, 2007

It takes a while for the next generation to come up, and depending upon your CPU speed it could be a while, but eventually the densities should change.

Best,

-Mike

Reply | Email | Delete | Modify | 
Re: Re: Problem running the code by Orion On August 28, 2009
Something seems a little fishy here.  1) the genome/fitness function here aren't that complicated, it shouldn't take long at all for it to step through a new generation (RE: the comment above).  How long does it take for your 18000 generations (and what are your computer specs?).

The other thing that seems off is how long it is taking to reach your solution.  Granted, since a GA (Genetic Algorithm) is a random (but intelligently random) process, you are not actually guaranteed the "best" solution, so it is understandable that it could take excessively long to get as close as you did.

But I'm am concerned about the 14,000 generation.  You did mean 14,000 there, right, not 1400?  I did an off-the-cuff coding of a GA and my own encoding for this same problem.  I use 30 consultants, 300 slots over the course of 3 months.  $100, $200, $300, $400, and $500, same as above.  I don't know what your population size is, but I chose 100. 

It took only 366 generations to get an RMSD (root mean square deviation) of $268 between the earnings of each consultant.  At 14000 generations, you have an RMSD of $273.  And based on the behavior I was seeing in my generations I suspect I have a bug or two, so I think 366 generations is even a bit high.

I'm not directly familiar with PBIL - but just with GA's in general.  What sort of population size do you have?  Rate of mutation? Crossover events?  What do you do to avoid having all the members of your population converged to a single genome?


Reply | Email | Delete | Modify | 
Re: Re: Re: Problem running the code by Mark On August 31, 2009
I have the arduous task of writing a scheduling program. It has been over 10 years since I last worked on project involving a GA. Do you have any thoughts as to how I might get back to where I need to be for this project? I noticed that you posted your comments in the last few days.
Reply | Email | Delete | Modify | 
Re: Re: Re: Re: Problem running the code by Orion On September 2, 2009
I don't claim to be an expert, I just observed different behavior than that presented in the article.

It seems to me the two most important factors of a Genetic Algorithm are 1) Sharing of "genomic" data between elements in the population, and 2) Random variation/mutation for bumping your results out of local min/maxs, 3) An appropriate fitness function

#1 above is addressed through what are called "crossover events" - 2 members of the population swap out only parts of their solution.  For a schedule this might mean 2 people trading assignments for the day.  For better results, you can enforce a meaningful cross over - which is to say you dont allow 2 people with the day off to "switch", because this is meaningless. 

#2 lets you get your solutions out of local min/max solutions.  You might converged on a solution that is decent, but not the best available.  However, without introducing random mutations into your data set, your population will quickly converge to a highly similar population, at which point cross-over events between these members accomplish very little, because they are so similar

#3 is perhaps the most difficult part of the problem.  Generally speaking, you convert the "fitness" of a member of the population to an easily evaluated integer or float value.  0-1.  0-infinity (int32.max).  You then specify whether your ideal solution is close or near that ideal value of 0, or of 1.  For the schedule in the above article, you might for example specify that your fitness of a single member of the population is RMSD (root mean square deviation) from the average $ made by a consultant over the 60 days.  Therefore, the closer the RMSD is to 0, the more similar the wage is earned by each consultant. 

It is important to note that what you define as a "population" and a member in that population is entirely up to you.  The above article is a bit tricky, because you might be inclined to think that each consultant is a member of the population defined as all consultants.  But that would be incorrect.  You have to think of a member of your population as a complete solution to your problem.  In that case, a member is actually all 30 consultants and their entire schedules.  And the population is the set of different solutions, not different consultants. 

There are a handful of attempts at "generic" genetic algorithm libraries out there.  The difficulty of a generic GA is that you can often get *much* better results by customizing your approach as it pertains to your specific problem.  For example, a generic GA might require you to encode your "genome" (your solution, which is evaluated for fitness) into bits.  Staight up 0 or 1s.  And then their cross over events and mutation events will operate purely on this bit level, without paying attention to higher order details of the problem.

This can lead to problems though, for many domain spaces.  For example, in this article randomly flipping a bit would might assign a consultant to a job that is already being attended by another consultant.  Or assigning them to two jobs on the same day.  And you end up with invalid solutions. 

One might argue that this is because of a bad encoding - that there is a way to encode the problem into bits that allows for truly random operation while maintaining semantically correct solutions.  But I find that in some cases, it makes more sense to have a more semantic description of your genome, which might mean using a higher order construct (such as a struct or a class).  The important part comes in how you choose to mutate them and do crossovers.  If you maintain higher order constructs in your model, it is easier to obey the rules of those constructs during these events - such as not having 2 jobs on the same day. 

There are trade offs here.  By using higher order constructs, you can get some tunnel vision.  One of the neat things about GA's is they might provide a solution you didn't intend.  But by using higher order constructs you've limited the way the GA can explore the possible solution space.  If you keep things generic, you might get a unique solution, but you lose the benefits of applying sensible heuristics to guide the answer to a result quicker.

One last tip - don't hard-code population sizes, mutation rates, or crossover event rates.  A very sensible GA implementation will fail if you have have a population size too small.  Or it might take eons to step through a generation if it is too large.  If your mutation rate is too small you'll easily get stuck in "decent but not great" solutions, but if it is too large you'll have trouble converging or find that your population is losing viable-solutions to fast to become integrated into the population. 
Reply | Email | Delete | Modify | 
Is this GA? by Bleu On January 23, 2010
hey bro.. u r not using population, crossover, mutation and generations..
so is ur program is still consider as GA?   
Reply | Email | Delete | Modify | 
Re: Is this GA? by Mike On January 30, 2010
TO be more technically accurate, this is not a genetic algorithm, however, a genetic algorithm could easily be substituted in this article with the same fitness function.  This is a learning algorithm, so the title should have probably been "Using a Learning Algorithm to Balance a Consultant Schedule"
Reply | Email | Delete | Modify | 

 Hosted by MaximumASP  |  Found a broken link?  |  Contact Us  |  Terms & conditions  |  Privacy Policy  |  Site Map  |  Suggest an Idea  |  Media Kit
Current Version: 5.2009.6.2
 © 2010  contents copyright of their authors. Rest everything copyright Mindcracker. All rights reserved.