Automatic Reading And Verification Of Microsoft Word Document In C# Using Aspose.Words

Introduction

Student Course Registration and Verification (SCRV) is designed to automatically read and modify Roll Number slips in Microsoft Word format. The idea is to design a tool that reads, parses, and modifies the Microsoft Word document automatically. SCRV take input from Microsoft Word format files of Roll Number slips. It reads the Microsoft Word documents, comparing the contents of the document with an Oracle database. The document is updated/corrected, if required, and saved at a different location.

  • Tools Used: Visual Studio 2005, C#.

Prerequisites

Make sure that the following components and tools are installed on the machine.

  1. Microsoft Visual Studio 2005.
  2. Aspose.Words component installed.

Adding Reference to Aspose.Words Library

Create a simple C# Windows project using the built-in wizard of Microsoft Visual Studio 2005. Give an appropriate, meaningful name to the project, like SCRV (Student Course Registration and Verification). Now you have to add the reference of the Aspose.Words library to your project. So the library should be available for opening, editing, and manipulation of the Microsoft Word document. The following are simple steps to add the library.

Right-click the References node in the Solution Explorer window of the project and select the Add Reference option.

Aspose.words

Select the appropriate Aspose.Words version from the dialog. Press OK to proceed further.

The library is available for use in the project. Add the required namespaces of the Aspose.Words library to the project. So that we can access the classes without writing the fully qualified names of the class.​​​​​​

using Aspose.Words;
using Aspose.Words.Drawing;
using Aspose.Words.Reporting;
using Aspose.Words.Viewer;
namespace SCRV
{
    // Your code here
}

Reading Microsoft Word Documents

Reading Microsoft Word documents is very easy using Aspose.Words library. Following is the code segment to read and load Microsoft Word(*.doc) files in a batch from a folder.

// Reading all the files in the folder
string[] files = Directory.GetFiles("Path");
// Loop through all files to process them one by one
for (int i = 0; i < files.Length; i++)
{
    // Reading the Microsoft Word file using Aspose.Words library
    Aspose.Words.Document doc = new Aspose.Words.Document(files[i]);
}

Accessing Properties of Microsoft Word document

Once the file is loaded, we need to change the properties of the file. We need to change the author of the document to this new username. The following code segment will show how to access/change the properties of the Microsoft Word document in C#.

// Change the Author of the Document
doc.BuiltInDocumentProperties.Author = "Examination Center";
// Change the Category of the document
doc.BuiltInDocumentProperties.Category = "New category";
// Add the comments
doc.BuiltInDocumentProperties.Comments = "Comments";
// Add the University name
doc.BuiltInDocumentProperties.Company = "University name";
// Add the Subject
doc.BuiltInDocumentProperties.Subject = "Subject";
// Change the version of the document as required
doc.BuiltInDocumentProperties.Version = "3.0";

Parsing Microsoft Word Document

Microsoft Word document is a complex format. A Word document consists of Sections, Pages, Paragraphs, Tables, Bookmarks, Header and Footers. T here are many ways to access these elements of the document using Aspose.Words. So the idea is to use the most simple and efficient way to access the contents(Referred to as Nodes).

Accessing Sections of Microsoft Word Document

A Word document has one or more sections. We can navigate through sections using the Aspose.Words library. In our case we have only one section. So we can get the object of the first section of the document using following code.

// Load the Microsoft Word document from a file
Aspose.Words.Document doc = new Aspose.Words.Document(filename);
// Get the first Section of the document
Section firstSection = doc.FirstSection;

Accessing Header and Footer Of Microsoft Word Document

After we get the section from the document, we have to change the Header and Footer of the document. Each section has a separate header and footer. Moreover, first page, even and odd-page headers can also be different. One can easily read and change the content of the header and footer using the following code.

// Load the Microsoft Word document from a file
Aspose.Words.Document doc = new Aspose.Words.Document(filename);
// Get the first and only Section of the document
Section firstSection = doc.FirstSection;
// Loop through all the Headers and Footers of the Section
foreach (HeaderFooter headerFooter in firstSection.HeadersFooters)
{
    // If it's the header
    if (headerFooter.IsHeader)
    {
        // Modify the Header here if required
    }
}

Accessing Tables of Microsoft Word Document

Aspose.Words provide easy access to each and every element of the document using properties. The tables property is used to read and modify the content of the Table.

The Student Roll Number slip has two tables. First table contains the personal data of the student. The second table contains the information about the courses registered by the student. These are the courses our tool has to check and verify.

We can get the second table at index 1 from Section using the following code.

// Get the table of the document from the first section at index 1
Aspose.Words.Table courseTable = firstSection.Body.Tables[1];

Once we have searched for the right table to read, now we need to access each row of the table. Similarly, the Rows property of the Aspose.Words.Table class will be used to read and modify the rows.

// Loop through all the rows of the table one by one
foreach (Aspose.Words.Row myRow in courseTable.Rows)
{
    /*
     * Code for the manipulation of the Rows
     */
}

Similarly Aspose.Words.Cell class provides the functionality to read and parse the contents of a Cell. Each Row class has a collection of Cells. Each Cell can be accessed by the index.

We need to verify the Course code and Course Title from that particular table. We have to access the cells with indexes 1 and 2 from each row. Then the course code and title is verified and matched from the database. And the content of the cells is updated if required.

// Get the second table of the document from the first section
Aspose.Words.Table courseTable = firstSection.Body.Tables[1];
// Loop through all the rows of the table one by one
foreach (Aspose.Words.Row myRow in courseTable.Rows)
{
    // Getting the Cells with Course code
    Aspose.Words.Cell courseCode = myRow.Cells[0]; // Assuming the first cell (index 0) contains course code
    // Getting the Cells with Course Title
    Aspose.Words.Cell courseTitle = myRow.Cells[1]; // Assuming the second cell (index 1) contains course title
    // Check if the information is correct
    if (!CheckValidity(courseCode.GetText(), courseTitle.GetText()))
    {
        // Modify the courseCode and courseTitle with correct information from the database.
    }
}

Saving the Word Document

Finally, after all required modifications, we need to save the document. The Save function of the Aspose.Words.Document class is used to save the document in a compatible format.

// Save the Document at the location specified by outputFilename
doc.Save(outputFilename);

Screen Shot of the Application (SCRV)

scrv application

Conclusion

 Using Aspose.Words its very simple to create, read, and modify the Microsoft Word document in C#. This is only one case where we have automated our system. Aspose.Words gives full control of the Microsoft Word document to the programmer.


Similar Articles