An Algorithm for Grouping Data On One Or More Fields Using A DataReader in .Net


After searching the internet for way too long on this subject, I decided I was pretty much on my own with this. What I was trying to do was to group data coming from a SQL Server data source on one or more fields, but the catch is that I must do this with a data reader. It's true, with SQL Server this could be accomplished on the backend and at first glance, doing so might appear to have been easier. That however, would violate the principles near and dear to my heart of data abstraction and the principle of doing only CRUD operations with a data source. For me, this had to be done in the business object code and I really didn't want the headache of long and complicated algorithms chock full of if...then...else statements and everything else. Further, I knew all I needed was a datareader and not the more expensive datatable, since datatables are populated by a datareader in the first place, to use more than that I felt would be extravagant.

The actual problem I was facing was that I was getting what could potentially be a large number of records, and each record would then be used to instantiate a struct which would then be added to a collection of like objects/structs. Further, the ID's of these structs might possibly be in sequence (a pre condition is that the data is sorted on this field, and then by any other relevant fields), but then again they may not. This collection is what ultimately gets bound to a datagrid for display to the user. When the ID's were in sequence, I did not want to list each one separately, instead I wanted a summary, something to the effect like "objects 1 to n", "object n+1", "objects n+2 to m" and so forth. This is saying that if I had n objects, I did not want to display all n objects, rather 1 object whose ID field (has to be a string for display) summarizes the range involved.

The basic approach then is to take the first record from the datareader. This will be the basis for our first business object against which we can make comparisons a little further on. We name this object startObj. With that out of the way, we now start looping through the rest of the records in an effort to "look ahead" for a series where the ID's are in sequence. With the next record (from our loop) we instantiate our second business object, here we name it endObj. Now, we can compare startObj to endObj to see if there is a "run" between the two, meaning that their respective ID's are sequential. To do so, we use a helper function titled "isInSequence". If there is a run found by this function, since we are using a datareader, how would we know if the current record or in this case, the object based on the current record, here endObj, is the last in the run? If startObj has ID=1, and endObj has ID=2, how can we know if the next record from our data reader loop will give us an object with ID=3, or 4? Or 100 even? We can't know at this point in the algorithm. Instead, we keep track with a simple count variable, here named ct. Once the run has been found to be terminated (a record comes from our datareader and we base our endObj object on that record, and we find that it is out of sequence when compared to startObj), we can use the value of the ct variable to see just how many ID's were in sequence, or how long the run lasted. We use the ct variable as a parameter when we call our helper function titled "AddCase" since that function will need to interrogate the value of ct variable and determine whether or not a "run" exists. If so it will take the proper actions to "summarize" that run and we will add a "summary" object to our list and so avoid listing each object individually. Next step is to reset our startObj by setting it equal to our endObj and reset the ct variable to 0. Remember, by now the endObj object is the first object AFTER the run in the previous objects (records, really) and at this point in the code we are about to loop through our datareader again. If we did not set startObj = endObj here, we would lose the reference to the current object as represented by endObj and it would be gone forever! Here is the code below:

    cn.Open()
            tr = cn.BeginTransaction
            dr = GetDataReader()
            If dr.Read() Then 'Get the first record upon which to base the starting obj.Id Number
                startObj = SomeObject.CreateObject(dr.Item(1), dr.Item(2)) 'Create 1st obj (factory method)
                While dr.Read() 'look ahead and if a run is found, keep track of it using ct (as integer)
                    endObj = SomeObject.CreateObject(dr.Item(1), dr.Item(2)) 'Create second obj, now we can

                                                                              compare the two!
                    If IsInSequence(startObj, endObj, ct) Then ' a run is found. Important to pass ct as a
                                                               ' parameter                       
ct += 1 ' count how far this run goes
                    Else
                        AddCase(startObj, ct) 'there is either no run or the run was ended, which is why we

                                                pass the var ct in this fxn call
                        startObj = endObj 'the important part is to start over in our search for
runs
                        ct = 0 'part of starting over
                    End If
                End While
                AddCase(startObj, ct) 'Add the last object
 
                dr.Close()
                tr.Commit()
            End If
    Private Function IsInSequence(ByRef startObj As SomeObject, ByRef endObj As SomeObject, ByVal runCount As Integer) As Boolean
        Dim obj(2) As Integer 'Object ID Numbers as integers
 
        If (startObj.Fieldx = endObj.Fieldx) = False Then Return False

        obj(0) = Integer.parse(startObj.ID)
        obj(1) = Integer.parse (endObj.ID)
'we need to use runCount, even if it is 0, since the startObj.ID value may differ from endObj.ID by more 'than 1, but by runCount + 1
        Return (obj (0) + 1 + runCount) = obj (1)                                                                                                        End Function
    Private Sub AddCase(ByRef myObj As SomeObject, ByVal runCount As Integer)
        If runCount = 0 Then
            InnerList.Add(myObj)
            Exit Sub
        End If
        'if we even get here we know there was a run so we need to summarize the data
        Dim summarized As String 'to summarize the data
        Dim nObj As Integer 'The actual obj Number at the end of the 'run' we know exists as
runCount > 0
        Const sep As String = "..."
 
        nObj = Integer.Parse(myObj.Id) + runCount 'Convert to int and add runCount to determine the last               
                                                  ' number in sequence
        summarized = myObj.Id + sep + nCase.ToString 'summarize the run
        myObj.Id = summarized 'myObj.Id is a string field
        InnerList.Add(myObj)
 
    End Sub

I couldn't find this algorithm myself through the '100's of thousands of crap results originally returned by the major search engines!!!