CopySchema - Part 2

CopySchema is a utility for copying all the data from one Oracle schema to another. Although similar to the exp/imp and bulk load utilities of Oracle, it has the advantage of offering a simple GUI interface and of handling changes to the schema.

Introduction

CopySchema is a utility for copying all the data from one Oracle schema to another.

Detail documentation is mostly in the code. This document is rather to explain the design as a whole, and to offer
suggestions about how to get started at familiarizing yourself with the code.

Aside from a simple C++ COM utility that does the bulk copying, the main code areas are

  • The Job classes, each of which defines a step in the process
  • The ThreadController, which initializes and controls all the jobs
  • The Worker class, which runs each worker thread
  • The DbTable classes, which implement database operations at the table level
  • The DbSchema, which implements database operations at the schema level

Sequence / Program Flow

The ThreadController and Worker classes control all the program flow. The ThreadController.Start() function creates the two first jobs (getting definitions from the two schemas) and starts the threads. The Worker classes run in a loop, getting jobs from the ThreadController, executing the job, and then returning the result to the ThreadController.

The following may be of interest:

Table 1. FormMain class  

Function Notes
A pseudo-singleton, in that a static variable stores the address of the last (and only) instance of the class. Classes can access the form using its Instance() static function
btnStart_Click ()      Saves and sets up all the settings required to start the work, then calls ThreadController.Start()

Table 2. ThreadController class   

Function Notes
Singleton, thread-safe
Start()

Begin the copy process
   1.
Set up the 2 first jobs (read schemas)
   2.
Create the appropriate number of threads
     Start each thread (the threads will pick up the pending jobs on their own)

GetNextJob()  Simple function that pops off the next job on the queue. If no jobs are available, it returns a JobWait
ReturnJob()  Decides what to do after any particular job finishes

Table 3. Worker class

Function Notes
Holds a reference to its own private logger (FormLog) and its own private CopyTable COM server
Start() Although it looks complicated because of the error handling, the function essentially does the following:
(while !quit)
{ job.DoWork;
Controller.ReturnJob(job);
job = Controller.GetNextJob();
}

Areas of Functionality

N.B.: the 'read schema' is the schema from which the data is copied from, and the 'write schema' is the destination.
Similarly, the 'read tables' and 'write tables' are the source and destination for the data, respectively.

With the exception of the DbField class (which is a simple structure and does not do any database access), all classes that start with the prefix 'Db' are classes that contain code to access the databases. There is no database access in any other class.

DbSchema

The DbSchema class contains the list of tables for that schema, along with functions to retrieve that information from the database. In addition, there are functions to modify the entire schema (i.e. all tables or sequences within the schema) and to compare the two schemas.

Figure 1. DbSchema Class Members

DbSchema.jpg

N.B.: underlined names are static members, a '+' signifies a public member and '#' a protected member.

DbTables

The DbTables have members to hold the definition of a given table. The DbTable parent is an abstract class that has all the functions and members that are shared by both the read and write tables.

The way that functions are split into the classes may seem a little confusing at first, because most of the functionality that might be associated with the DbSelectTable is actually shared, and thus ends up in the parent class. The DbSelectTable actually contains very little code.

There are only two polymorphic functions:

Table 4. DbTable Polymorphic Functions

 

Function Notes
DbGetDefinition( Gets the list and details of all the columns of the table. This function is overridden only in DbInsertTable, since it needs to retrieve supplementary information (this definition all that is required for the DbSelectTables)
OleDbConnectionString ()      The 'read' and 'write' tables each access their own database

Figure 2. DbTable Members and Derived Classes

DbTables.jpg

Job Classes

The job classes are all in the same 'Job.cs' file (rather than having 12 separate files). As described earlier, the DoWork
function is called by the worker thread and returns true if the work was successfully completed, false otherwise. If any job fails, the user is asked if he/she wants to continue or not.

The DoWork() function is the only virtual member. Job, JobSchema and JobBothSchemas are all abstract classes. JobNothing is an emergency class, used in error handling.

Figure 3. Job Members and Derived Classes

Jobs.jpg 

ILog Interface

The ILog interface is used to write data to the client windows. Currently, the only implementation of ILog is in
FormWorkLog.

Importantly, since the workers each have their own threads and thus exist outside of the main GUI thread, care must be taken when updating the text in the listboxes. Although the .Net runtime allows updates to the GUI from any thread, sometimes there will be synchronization problems.

The AppendLine call, in particular, would result in out-of-order appends, so the code uses delegates to synchronize with the main thread.

CopyTable COM Server

For robustness, CopyTable gets created as a single-instance out-of-process server, meaning that each C# worker thread gets its own separate exe / server process.

Here are the public members:

Table 5. CopyTable Public Members

Member Description
Cancel() Cancels the current copy
CopyTable()

Main function - copies the entire contents of a table, and fires off the 'RowsCopied' event each x rows

RowCount() Simple property to get the current row count (not used by the client at this time)
RowsCopied()         Event fired back to C# client each x rows

OTL, which by the way is a fantastic interface for accessing Oracle, is based entirely on the C++ streams concept. Here is a code sample to read the data from one table into another (assume that test_tab1 and test_tab2 each have two fields 'id' and 'comment'):

otl_stream in(50, "SELECT * FROM test_tab1 WHERE id >= :f<int>", db);
otl_stream
out(50, "INSERT INTO test_tab2 VALUES(:f1<int>, :f2<char[31]>)", db);
// Query parameters are written in the format :anyname<datatype[length]>
int id;
char comment[31];
in << 8; //pass in the parameter for the input stream
//n.b.: the SELECT automatically executes when all input variables are assigned
while(!i.eof())
{
in >> id >> comment; //use the stream to push the data to the buffers
out << id << comment; //similarly, push the data back into the write stream
}
//n.b.: there is never an explicit getRow() or writeRow()
// OTL automatically calls getRow() / writeRow() once the stream's fields
// have all been accessed

Luckily, although OTL offers explicit methods to do batch writes, even when data is written row by row the writes end up being batched behind the scenes. As a result, even though the COM server uses the simplest access methods, it still offers the full performance of batch writes.

The CopyTable() function has a set of buffers (one for each data type); the client passes in an array defining the data type for each field; for each field, a switch statement pushes the data to the correct buffer and then pushes that data back onto the write stream.

N.B.: currently, there is only support for integers, real numbers, char and date. Integers that are higher than 4 billion
should use the float datatype (the C# code already takes care of this). The maximum character field width is 10,000
characters. If you have wider VARCHARs, increase the CHAR_BUFF_SIZE constant in DbTable.cpp and recompile the COM server.

The DataType enum is defined in both the C# and C++ code (out of laziness). If you wish to add any new datatypes, you need to update both.

Notes

Although I explicitly close each database connection, I do not do so within try / catch blocks. Since this is a short-lived application that does not typically generate any exceptions, this has worked well and makes the code more readable.

FYI, a safer way is:

void MyFunction()
{
con.Open();
Try
{
...do work

}
Catch {}
Finally
{
con.Close();
con.Dispose();
}
}

Or, in some cases:

void MyFunction()
{
try
{
con.Open();
Try
{
...do work

}
Catch {}
Finally
{
con.Close();
con.Dispose();
}
}
catch (Exception ex)
{
...handle exception that was generated
while opening the connection
}
}

Method used in CopySchema:

void MyFunction()
{
con.Open();
...do work
con.Close();
}