CopySchema - Part 2

Introduction

 
CopySchema is a utility for copying all the data from one Oracle schema to another.
 
Detail documentation is mostly in the code. This document is rather to explain the design as a whole and to offer suggestions about how to get started at familiarizing yourself with the code.
 
Aside from a simple C++ COM utility that does the bulk copying, the main code areas are
  • The Job classes, each of which defines a step in the process
  • The ThreadController, which initializes and controls all the jobs
  • The Worker class, which runs each worker thread
  • The DbTable classes, which implement database operations at the table level
  • The DbSchema, which implements database operations at the schema level

Sequence / Program Flow

 
The ThreadController and Worker classes control all the program flow. The ThreadController.Start() function creates the two first jobs (getting definitions from the two schemas) and starts the threads. The Worker classes run in a loop, getting jobs from the ThreadController, executing the job, and then returning the result to the ThreadController.
 
The following may be of interest:
 
Table 1. FormMain class  
Function Notes
 
A pseudo-singleton, in that a static variable stores the address of the last (and only) instance of the class. Classes can access the form using its Instance() static function
btnStart_Click ()      Saves and sets up all the settings required to start the work, then calls ThreadController.Start()
 
Table 2. ThreadController class   
Function Notes
 
Singleton, thread-safe
Start()
Begin the copy process
 
   1.Set up the 2 first jobs (read schemas)
 
   2.Create the appropriate number of threads
 
     Start each thread (the threads will pick up the pending jobs on their own)
GetNextJob()  Simple function that pops off the next job in the queue. If no jobs are available, it returns a JobWait
 
ReturnJob()  Decides what to do after any particular job finishes
 
Table 3. Worker class
Function Notes
 
Holds a reference to its own private logger (FormLog) and its own private CopyTable COM server
Start() Although it looks complicated because of the error handling, the function essentially does the following:
 
(while !quit)
 
{ job.DoWork;
 
Controller.ReturnJob(job);
 
job = Controller.GetNextJob();
 
}
 
Areas of Functionality
 
N.B.: the 'read schema' is the schema from which the data is copied, and the 'write schema' is the destination.
 
Similarly, the 'read tables' and 'write tables' are the source and destination for the data, respectively.
 
With the exception of the DbField class (which is a simple structure and does not do any database access), all classes that start with the prefix 'Db' are classes that contain code to access the databases. There is no database access in any other class.
 

DbSchema

 
The DbSchema class contains the list of tables for that schema, along with functions to retrieve that information from the database. In addition, there are functions to modify the entire schema (i.e. all tables or sequences within the schema) and to compare the two schemas.
 
Figure 1. DbSchema Class Members
 
DbSchema.jpg
 
N.B.: underlined names are static members, a '+' signifies a public member and '#' a protected member.
 

DbTables

 
The DbTables have members to hold the definition of a given table. The DbTable parent is an abstract class that has all the functions and members that are shared by both the read and write tables.
 
The way that functions are split into the classes may seem a little confusing at first, because most of the functionality that might be associated with the DbSelectTable is actually shared, and thus ends up in the parent class. The DbSelectTable actually contains very little code.
 
There are only two polymorphic functions:
 
Table 4. DbTable Polymorphic Functions
 
Function Notes
DbGetDefinition( Gets the list and details of all the columns of the table. This function is overridden only in DbInsertTable, since it needs to retrieve supplementary information (this definition all that is required for the DbSelectTables)
OleDbConnectionString ()      The 'read' and 'write' tables each access their own database
 
Figure 2. DbTable Members and Derived Classes
 
DbTables.jpg
 

Job Classes

 
The job classes are all in the same 'Job.cs' file (rather than having 12 separate files). As described earlier, the DoWork function is called by the worker thread and returns true if the work was successfully completed, false otherwise. If any job fails, the user is asked if he/she wants to continue or not.
 
The DoWork() function is the only virtual member. Job, JobSchema, and JobBothSchemas are all abstract classes. JobNothing is an emergency class, used in error handling.
 
Figure 3. Job Members and Derived Classes
 
Jobs.jpg
 

ILog Interface

 
The ILog interface is used to write data to the client windows. Currently, the only implementation of ILog is in FormWorkLog.
 
Importantly, since the workers each have their own threads and thus exist outside of the main GUI thread, care must be taken when updating the text in the listboxes. Although the .Net runtime allows updates to the GUI from any thread, sometimes there will be synchronization problems.
 
The AppendLine call, in particular, would result in out-of-order appends, so the code uses delegates to synchronize with the main thread.
 

CopyTable COM Server

 
For robustness, CopyTable gets created as a single-instance out-of-process server, meaning that each C# worker thread gets its own separate exe / server process.
Here are the public members:
 
Table 5. CopyTable Public Members
Member Description
Cancel() Cancels the current copy
CopyTable()
Main function - copies the entire contents of a table, and fires off the 'RowsCopied' event each x rows
RowCount() Simple property to get the current row count (not used by the client at this time)
RowsCopied()         Event fired back to C# client each x rows
 
OTL, which by the way is a fantastic interface for accessing Oracle, is based entirely on the concept of the C++ stream. Here is a code sample to read the data from one table into another (assume that test_tab1 and test_tab2 each have two fields 'id' and 'comment'):
  1. otl_stream in (50, "SELECT * FROM test_tab1 WHERE id >= :f<int>", db);  
  2. otl_stream out(50, "INSERT INTO test_tab2 VALUES(:f1<int>, :f2<char[31]>)", db);  
  3. // Query parameters are written in the format :anyname<datatype[length]>  
  4. int id;  
  5. char comment[31]; in << 8; //pass in the parameter for the input stream  
  6. //n.b.: the SELECT automatically executes when all input variables are assigned  
  7. while (!i.eof()) { in >> id >> comment; //use the stream to push the data to the buffers  
  8.     out << id << comment; //similarly, push the data back into the write stream  
  9. }  
  10. //n.b.: there is never an explicit getRow() or writeRow()  
  11. // OTL automatically calls getRow() / writeRow() once the stream's fields  
  12. // have all been accessed 
Luckily, although OTL offers explicit methods to do batch writes, even when data is written row by row the writes end up being batched behind the scenes. As a result, even though the COM server uses the simplest access methods, it still offers the full performance of batch writes.
 
The CopyTable() function has a set of buffers (one for each data type); the client passes in an array defining the data type for each field; for each field, a switch statement pushes the data to the correct buffer and then pushes that data back onto the write stream.
 
N.B.: currently, there is only support for integers, real numbers, char, and date. Integers that are higher than 4 billion should use the float datatype (the C# code already takes care of this). The maximum character field width is 10,000 characters. If you have wider VARCHARs, increase the CHAR_BUFF_SIZE constant in DbTable.cpp and recompile the COM server
 
The DataType enum is defined in both the C# and C++ code (out of laziness). If you wish to add any new datatypes, you need to update both.
 
Notes
Although I explicitly close each database connection, I do not do so within try/catch blocks. Since this is a short-lived application that does not typically generate any exceptions, this has worked well and makes the code more readable.
 
FYI, a safer way is:
  1. void MyFunction() {  
  2.     con.Open();  
  3.     Try {  
  4.         ...do work  
  5.     }  
  6.     Catch {}  
  7.     Finally {  
  8.         con.Close();  
  9.         con.Dispose();  
  10.     }  
Or, in some cases:
  1. void MyFunction() {  
  2.     try {  
  3.         con.Open();  
  4.         Try {  
  5.             ...do work  
  6.         }  
  7.         Catch {}  
  8.         Finally {  
  9.             con.Close();  
  10.             con.Dispose();  
  11.         }  
  12.     } catch (Exception ex) {  
  13.         ...handle exception that was generated  
  14.         while opening the connection  
  15.     }  
Method used in CopySchema:
  1. void MyFunction() {  
  2.     con.Open();  
  3.     ...do work  
  4.     con.Close();  


Similar Articles