How to Convert DocX Into PDF, HTML and XPS in C#

Introduction

I am a .NET programmer and now I use DocX to create a Word document that contains all of my required elements, such as hyperlinks, tables, shapes, headers and footers.

But later I needed to convert this DocX into printable formats, PDF, HTML and XPS and so on.

I searched the internet and found that the writer of DocX recommends Microsoft's Office interop libraries to finish this requirements. For my develop situation, we don't want to install Microsoft Word or office to be installed on our machine. Finally I found the free Spire.Doc to finish this task, that is a standalone .NET Word API.

Application Overview

First, I use DocX to create an empty Word document and then insert a paragraph with a hyperlink, rich text and a table. Then save the Word document. Secondly, I use Spire.Doc to load the generated DocX Word document and convert it to other popular file formats, such as PDF, HTML, Image, TXT, Epub and XPS and so on.

To show how to create a Word document in DocX, I have created a sample document and here is how it looks.

Word Document

First, we need to generate a Word document using DocX, which is an open source API to work with Word 2007/2010 files.

Namespace to be used:

using Novacode

DocX makes creating and manipulating documents simple. Check out the code snippet of creating a Word document with hyperlinks and tables as below:

  1. Console.WriteLine("\tHyperlinksImagesTables()");  
  2.   
  3. // Create a document.  
  4. using (DocX document = DocX.Create(@"Sample.docx"))  
  5. {  
  6.     // Add a hyperlink into the document.  
  7.     Novacode.Hyperlink link = document.AddHyperlink("link"new Uri("http://www.google.com"));  
  8.     // Add a Table into the document.  
  9.     Novacode.Table table = document.AddTable(2, 2);  
  10.     //table.Design = TableDesign.ColorfulGridAccent2;  
  11.     table.Design = TableDesign.ColorfulList;  
  12.     table.Alignment = Alignment.center;  
  13.     table.Rows[0].Cells[0].Paragraphs[0].Append("1");  
  14.     table.Rows[0].Cells[1].Paragraphs[0].Append("2");  
  15.     table.Rows[1].Cells[0].Paragraphs[0].Append("3");  
  16.     table.Rows[1].Cells[1].Paragraphs[0].Append("4");  
  17.     Row newRow = table.InsertRow(table.Rows[1]);  
  18.     //Insert a new Paragraph into the document.  
  19.     Paragraph title = document.InsertParagraph().Append("DocX word document").FontSize(20).Font(new FontFamily("Arial"));  
  20.     title.Alignment = Alignment.center;  
  21.     // Insert a new Paragraph into the document.  
  22.     Paragraph p1 = document.InsertParagraph();  
  23.     // Append content to the Paragraph   
  24.     p1.AppendLine("Here is a URL ").AppendHyperlink(link).Append(".");  
  25.     p1.AppendLine("This line contains a ").Append("word").Bold().Append(" in bold.");  
  26.     p1.AppendLine();  
  27.     p1.AppendLine("Here is a Table");  
  28.     p1.AppendLine();  
  29.     // Insert the Table after Paragraph 1.  
  30.     p1.InsertTableAfterSelf(table);  
  31.     // Save this document.  
  32.     document.Save();  
  33. }  
Next is to convert the Word document into a PDF. DocX doesn't have the conversion feature, so you need to use the other component: Free Spire.Doc, which is free and there is no need to install Microsoft on the machine.

Namespace to be used:
  1. using Spire.Doc;  
The convert function offered by Spire.Doc is very easy. You only need to use the LoadFromFile method to load the file and use SaveToFile to save the document to PDF file format. The detailed codes are as follows:
  1. Document doc = new Document();  
  2. doc.LoadFromFile("Sample.docx");  
  3. doc.SaveToFile("toPDF.PDF", FileFormat.PDF);  
Check the generated PDF format as below:

 

PDF Document

Besides converting a Word document into a PDF, the free Spire.Doc also supports conversion of Word documents into an image, HTML, TXT, XPS, XML and EPub and so on.

Code View

The full code is as follows:

  1. static void Main(string[] args)  
  2. {  
  3.     Console.WriteLine("\tHyperlinksImagesTables()");  
  4.   
  5.     // Create a document.  
  6.     using (DocX document = DocX.Create(@"Sample.docx"))  
  7.     {  
  8.         // Add a hyperlink into the document.  
  9.         Novacode.Hyperlink link = document.AddHyperlink("link"new Uri("http://www.google.com"));  
  10.         // Add a Table into the document.  
  11.         Novacode.Table table = document.AddTable(2, 2);  
  12.         //table.Design = TableDesign.ColorfulGridAccent2;  
  13.         table.Design = TableDesign.ColorfulList;  
  14.         table.Alignment = Alignment.center;  
  15.         table.Rows[0].Cells[0].Paragraphs[0].Append("1");  
  16.         table.Rows[0].Cells[1].Paragraphs[0].Append("2");  
  17.         table.Rows[1].Cells[0].Paragraphs[0].Append("3");  
  18.         table.Rows[1].Cells[1].Paragraphs[0].Append("4");  
  19.         Row newRow = table.InsertRow(table.Rows[1]);  
  20.         //Insert a new Paragraph into the document.  
  21.         Paragraph title = document.InsertParagraph().Append("DocX word document").FontSize(20).Font(new FontFamily("Arial"));  
  22.         title.Alignment = Alignment.center;  
  23.         // Insert a new Paragraph into the document.  
  24.         Paragraph p1 = document.InsertParagraph();  
  25.         // Append content to the Paragraph   
  26.         p1.AppendLine("Here is a URL ").AppendHyperlink(link).Append(".");  
  27.         p1.AppendLine("This line contains a ").Append("word").Bold().Append(" in bold.");  
  28.         p1.AppendLine();  
  29.         p1.AppendLine("Here is a Table");  
  30.         p1.AppendLine();  
  31.         // Insert the Table after Paragraph 1.  
  32.         p1.InsertTableAfterSelf(table);  
  33.         // Save this document.  
  34.         document.Save();  
  35.         Document doc = new Document();  
  36.         doc.LoadFromFile("Sample.docx");  
  37.         doc.SaveToFile("toPDF.PDF", FileFormat.PDF);   
  38.         Console.WriteLine("\tCreated: docs\\HyperlinksImagesTables.docx\n");    
  39.     }  
  40. }