A Dictionary Class Which Permits Duplicate keys : Part II

Introduction

In my previous article on this topic, I demonstrated how we can create a dictionary class (a 'Lexicon' class) that can have duplicate keys. This differs from the dictionary classes in the .NET Framework itself which do not permit duplicates.

I mentioned in the Conclusion to that article, that I was working on a number of adjuncts to the Lexicon class which I am now presenting in this article.

The OrderedLexicon class

This is a Lexicon where the keys are sorted but the corresponding lists of values are not. The values remain in the order they were originally added (unless the order has been changed since) and so I decided on the name of OrderedLexicon for this class.

OrderedLexicon<K, V> is a wrapper for a SortedDictionary<K, List<V>> and contains mostly the same members as the Lexicon<K, V> class. However, as the SortedDictionary class doesn't implement as many interfaces, or contain as many constructors, as the generic Dictionary class then the OrderedLexicon class reflects these omissions.

The main constructors it does contain are listed in the following table with a brief description of what they do :

Dictionary class

The Comparer property returns an IComparer rather than an IEqualityComparer as we are now concerned with the sort order of the keys and not just their equality.

The SortedLexicon class

This is a Lexicon in which both the keys and the corresponding lists of values are sorted. A different IComparer can be used for each or you can use the default comparers for the types in question.

Again, SortedLexicon<K, V> is a wrapper for a SortedDictionary<K, List< V>> and, apart from a few minor differences, contains the same members as the OrderedLexicon<K, V> class. In particular, the constructors are exactly the same.

The Comparer property has been renamed KeyComparer in this class and there's also a ValueComparer property which has a (private) valueComparer backing field.

All the methods and properties of the SortedLexicon class ensure that the sort order of the value lists is maintained when items are added, inserted, removed, or changed.

However, because the value lists are not copied when they are accessed, it is possible for the user to 'manually' change the order of the items in the list and so it is not therefore feasible for this class to guarantee that the values will always emerge in sorted order.

The alternative would have been to always copy or always re-sort the lists before they are accessed but I felt that the cost of this would have been far too expensive in memory and/or processing time to solve what is a relatively minor problem.

If you do want to manipulate the lists manually, you should therefore either ensure that the correct order of values in maintained or call this additional method to do it for you:

Dictionary class in .net

Lexicons where duplicate keys are relatively sparse

As mentioned in the previous article the Lexicon class, as currently implemented, is not ideal for dealing with situations where there are far less duplicate keys than single keys. This is because the value for a single key still has to be wrapped within a list whose internal array is given a capacity of four when the first element is added. This inevitably means that a Lexicon will use a lot more memory than a Dictionary would for single keys, the only type of key the latter can accommodate.

I have therefore been considering alternative ways in which the Lexicon could be implemented to avoid or reduce the impact of this problem.

One way would to modify the key so that, instead of just the key itself, a tuple consisting of both the key and the index of a particular value would be used. There would then only be a single value for each modified key. However, for this approach to be remotely efficient, you need to maintain a second dictionary so you know how many values there are for each key. Even then you still need 'n + 1' dictionary lookups to get all 'n' values for a given key whereas only one lookup is needed for the Lexicon. The 'housekeeping' for this approach, when values are inserted or deleted, is also more onerous.

As it's not very clear how much memory the 'tuple' approach would save at the expense of more processing time, I decided not to pursue it further.

A better idea would be store only the duplicate keys in lists. To do that you would need two internal dictionaries: one for single values and one for duplicate values. There would still be additional 'housekeeping' needed to move keys between the dictionaries as values are added or removed but I felt this was manageable.

The problems came when I tried to implement the 'two dictionaries' approach. It simply wasn't possible to do it in a natural, unified way. As you have no idea in advance whether a key will have duplicate values, it is necessary to check this first and then construct a list consisting of all the values. You also need different methods depending on whether you are returning a single value or a list. Moreover, this class doesn't sit well with the existing dictionary and lexicon classes; it is neither one nor the other and so can't sensibly implement either the generic IDictionary or ILexicon interfaces.

On balance, I think it is therefore best to leave it to the user to make any necessary separation between single and duplicates keys. In many situations, you simply won't know in advance whether the duplicate keys will be sparse or not and it is therefore better to load the data into a Lexicon initially. If the duplicate keys are then sparse, you can leave them in the Lexicon but move as many single keys as possible out to a separate Dictionary to save memory.

To assist with this, I've added two new members to all three Lexicon classes:

.net Dictionary class

Lexicons with empty lists of values

The initial implementation of the Lexicon class didn't deal too gracefully with these by assuming they were always errors and throwing exceptions in some circumstances. In practice, an empty list might arise temporarily if the user is manipulating lists directly though otherwise it probably is an error.

It is not, in any case, feasible to prevent empty lists from arising when a user is manipulating the lists directly because (as discussed in the SortedLexicon section) the cost of copying the lists before they are accessed would be just too high.

I have therefore 'tightened up' the code of all the Lexicon classes to ensure that empty lists cannot arise if you confine yourself to the methods and properties of those classes. Otherwise empty lists are regarded as legitimate and shouldn't cause any more problems than non-empty lists.

The following methods have been added to reduce the need for direct manipulations and to remove empty lists which may have been forgotten about:

Dictionaryclass4.gif

Notice also, that if a key does not exist in the Lexicon, the GetValueCount method now returns -1 rather than 0. This is to distinguish it from an existing empty key which still returns 0.

ToLexicon() extension method

For those using .NET 3.5 or later, I've added a ToLexicon() extension method for use in LINQ queries. This is analogous to the Enumerable.ToDictionary() extension method in the .NET framework except that it does not throw an exception if multiple values are generated for the same key.

As in the case of ToDictionary(), there are four overloads of this method.

Example of usage

The code for all three Lexicon classes, the ILexicon interface (unchanged since the previous article) and the ToLexicon() extension method can be downloaded from the link accompanying this article as it is far too long to include in the body of the article itself. All these types are included in a namespace called Lexicons and, apart from the extension method, can be used in .NET 2.0 or later.

The following console application contrasts the three Lexicon classes and also contains sample code for the new members:

using System;
using
System.Collections;
using
System.Collections.Generic;
using
Lexicons;

class City
{
    private string name;
    private string country;

    public string Name { get { return name; } }
    public string Country { get { return country; } }

    public City(string name, string country)
    {
        this.name = name;
        this.country = country;
    }
}

class
Program

{
    static void Main(string[] args)
    {
        Console.WriteLine("Unordered\n");
        Lexicon<string, int> lex = new Lexicon<string, int>();
        lex.Add("Dave", 6);
        lex.Add("John", 1);
        lex.Add("Dave", 3);
        lex.Add("Stan", 4);
        lex.Add("Dave", 2);
        lex.Add("Fred", 5);

        foreach (var kvp in lex)
        {
            Console.WriteLine("{0} : {1}", kvp.Key, kvp.Value);
        }

        Console.WriteLine("\nOrdered by key\n");

        OrderedLexicon<string, int> olex = new OrderedLexicon<string, int>(lex);
        foreach (var kvp in olex)
        {
            Console.WriteLine("{0} : {1}", kvp.Key, kvp.Value);
        }

        Console.WriteLine("\nOrdered by key, then by value\n");

        SortedLexicon<string, int> slex = new SortedLexicon<string, int>(lex);
        foreach (var kvp in slex)
        {
            Console.WriteLine("{0} : {1}", kvp.Key, kvp.Value);
        }

        Console.WriteLine();

       
// create an empty key
        slex["John"].Clear();

       
// mess up the order of another key's values
        slex["Dave"].Insert(0, 7);

       
// analyze keys
        int m, s, e;
        int t = slex.AnalyzeKeys(out m, out s, out e);
        Console.WriteLine("\nMultiple {0}, Single {1}, Empty {2}\n", m, s, e);

        // remove empty keys
        slex.RemoveEmptyKeys();

       
// transfer single keys to a Dictionary
        Dictionary<string, int> dict = new Dictionary<string, int>();
        slex.MoveSingleKeys(dict);

        Console.WriteLine("\nOrdered by key, then by value, after some removals\n")

        foreach (var kvp in slex)

        {

              Console.WriteLine("{0} : {1}", kvp.Key, kvp.Value);
        }

        // reorder "Dave"

        slex.SortValueList("Dave");

        Console.WriteLine("\nOrdered by key, then by value, after some removals and resorting\n");

        foreach (var kvp in slex)
        {
            Console.WriteLine("{0} : {1}", kvp.Key, kvp.Value);
        }

       
// create a list of City objects
        List<City> cities = new List<City>();
        cities.Add(new City("Delhi", "India"));
        cities.Add(new City("Madrid", "Spain"));
        cities.Add(new City("New York", "U.S.A"));
        cities.Add(new City("Mumbai", "India"));

       
// create a new lexicon from the list of cities
        Lexicon<string, string> lex2 = cities.ToLexicon(c => c.Country, c => c.Name)
 

        Console.WriteLine("\nNew lexicon created from list of cities\n");

        foreach (var kvp in lex2)
        {
            Console.WriteLine("{0} : {1}", kvp.Key, kvp.Value);
        }
        Console.ReadKey();
    }
}


Example output

A screenshot of the output is shown below:

Dictionaryclass.gif
Conclusion

As mentioned in the Conclusion to the original article, the Lexicon classes are not intended to be replacements for the generic Dictionary and SortedDictionary classes and should only be used in situations where keys may be duplicated.

I hope you will find them to be useful in these circumstances.