Converting Word Documents to PDF Using Python

lu liu
2d
170
0
2
25
Blog

Converting Word documents to PDF format is one of the most common tasks in document processing workflows. PDF files offer universal compatibility, consistent formatting across devices, and enhanced security features, making them ideal for sharing, archiving, and printing documents.

This article demonstrates how to use Python and the Spire.Doc library to convert Word documents (.docx) to PDF format with various customization options, from basic conversion to advanced features like bookmark creation and font embedding.

Why Convert Word to PDF?

Converting Word documents to PDF addresses several practical needs:

Consistent formatting: PDF documents maintain their appearance across different devices and operating systems
Professional presentation: PDFs provide a polished, print-ready format suitable for official documents
Security: PDF files can be password-protected and restrict editing or copying
Smaller file sizes: PDFs often compress content more efficiently than Word documents
Universal accessibility: PDF readers are available on virtually all platforms

Automating this conversion process through Python eliminates repetitive manual work and ensures consistency when processing multiple documents.

Environment Setup

To get started, you need to install the Spire.Doc for Python library. This can be done easily using pip:

pip install Spire.Doc

Once installed, you can import the library in your Python scripts and access all the conversion features it provides.

Basic Word to PDF Conversion

Simple Conversion with SaveToFile

The most straightforward way to convert a Word document to PDF is by using the SaveToFile method of the Document class. This method accepts the output file path and the desired format as parameters, handling the entire conversion process automatically.

Here's a simple example demonstrating how to load a Word document and save it as a PDF:

from spire.doc import *
from spire.doc.common import *

# Define input and output file paths
inputFile = "./Data/ConvertedTemplate.docx"
outputFile = "ToPDF.pdf"

# Create a Word document objectdocument = Document()

# Load the Word document from diskdocument.LoadFromFile(inputFile)

# Save the document to a PDF filedocument.SaveToFile(outputFile, FileFormat.PDF)

# Close the document and release resourcesdocument.Close()

This process follows three simple steps:

Create an instance of Document and load the source Word file
Convert and save to PDF format by specifying FileFormat.PDF
Close the document to release system resources

This basic conversion preserves all text, images, tables, and formatting from the original Word document, producing a high-quality PDF that looks identical to the source document.

Advanced Conversion Options

Creating Bookmarks During Conversion

For longer documents, bookmarks provide essential navigation aids. Spire.Doc allows you to automatically generate PDF bookmarks from Word headings or existing Word bookmarks during the conversion process, making the resulting PDF easier to navigate.

The following example shows how to create PDF bookmarks based on Word bookmarks:

from spire.doc import *
from spire.doc.common import *

# Define input and output file paths
inputFile = "./Data/BookmarkTemplate.docx"
outputFile = "ToPDFAndCreateBookmarks.pdf"

# Create a document object and load from disk
document = Document()
document.LoadFromFile(inputFile)

# Create a parameter list for PDF conversion
parames = ToPdfParameterList()

# Enable creation of Word bookmarks in the PDF
parames.CreateWordBookmarks = True

# Choose bookmark source: headings or existing Word bookmarks
# Set to True to create bookmarks from Headings
# parames.CreateWordBookmarksUsingHeadings = True

# Set to False to use existing Word bookmarks
parames.CreateWordBookmarksUsingHeadings = False

# Save the document with bookmark settings
document.SaveToFile(outputFile, parames)

# Close the document and release resources
document.Close()

This example demonstrates several important concepts:

Parameter configuration: The ToPdfParameterList class provides fine-grained control over the conversion process
Bookmark sources: You can choose between generating bookmarks from Word headings or using existing Word bookmarks
Navigation enhancement: The resulting PDF includes a bookmark panel, allowing readers to jump to specific sections

This feature is particularly valuable for technical documentation, reports, and manuals where quick navigation is essential.

Embedding Fonts in PDF

Font embedding ensures that your PDF displays correctly on any device, even if the required fonts are not installed on the viewing system. This is crucial for maintaining brand consistency and ensuring that special characters render properly.

Here's how to embed all fonts used in the Word document into the resulting PDF:

from spire.doc import *
from spire.doc.common import *

# Define input and output file paths
inputFile = "./Data/ConvertedTemplate.docx"
outputFile = "EmbedAllFontsInPDF.pdf"

# Create a document object and load the filedocument = Document()
document.LoadFromFile(inputFile)

# Create a parameter list and enable full font embedding
ppl = ToPdfParameterList()
ppl.IsEmbeddedAllFonts = True

# Save the document to PDF with embedded fontsdocument.SaveToFile(outputFile, ppl)

# Close the document and release resourcesdocument.Close()

By setting IsEmbeddedAllFonts to True, the converter includes complete font data in the PDF file. This approach offers several benefits:

Consistent rendering: Text appears exactly as intended on all devices
Special character support: Ensures proper display of non-standard characters and symbols
Print reliability: Eliminates font substitution issues during professional printing

Note that embedding fonts increases the file size, so consider this trade-off based on your distribution requirements.

Controlling Hyperlinks in PDF

By default, hyperlinks in Word documents remain active in the converted PDF. However, there are scenarios where you might want to remove hyperlink functionality, such as when creating print-only versions or preventing navigation to external resources.

The following example demonstrates how to disable hyperlinks during conversion:

from spire.doc import *
from spire.doc.common import *

# Define input and output file paths
inputFile = "./Data/Template_Docx_5.docx"
outputFile = "DisableHyperlinks.pdf"

# Create a Word document objectdocument = Document()

# Load the file from diskdocument.LoadFromFile(inputFile)

# Create an instance of ToPdfParameterList
pdf = ToPdfParameterList()

# Set DisableLink to true to remove hyperlink effects# Set to false to preserve hyperlinks (default behavior)
pdf.DisableLink = True

# Save to PDF with hyperlinks disableddocument.SaveToFile(outputFile, pdf)

# Close the document and release resourcesdocument.Close()

Setting DisableLink to True removes the clickable behavior of hyperlinks while preserving the visual appearance of the linked text. This is useful for:

Print-ready documents: Preventing accidental clicks in printed PDFs
Archival copies: Creating static versions without external dependencies
Security considerations: Removing potential links to malicious websites

Practical Applications

Converting Word to PDF finds numerous applications across different domains:

Automated Report Generation

Organizations can automate the generation of monthly or quarterly reports by converting Word templates to PDF programmatically. This ensures consistent formatting and eliminates manual conversion errors. A batch processing function might look like this:

from spire.doc import *
from spire.doc.common import *
import os

def ConvertFolderWordToPDF(input_folder: str, output_folder: str):
    """Convert all Word files in a folder to PDF"""

    # Create output folder if it doesn't exist
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    # Process all files in the input folder
    for filename in os.listdir(input_folder):
        if filename.endswith(".docx") or filename.endswith(".doc"):
            # Build full file paths
            input_path = os.path.join(input_folder, filename)
            output_filename = os.path.splitext(filename)[0] + ".pdf"
            output_path = os.path.join(output_folder, output_filename)

            # Convert the file
            document = Document()
            document.LoadFromFile(input_path)
            document.SaveToFile(output_path, FileFormat.PDF)
            document.Close()

            print(f"Converted: {filename} -> {output_filename}")

# Example usage
input_folder = "./Word_Documents"
output_folder = "./PDF_Output"
ConvertFolderWordToPDF(input_folder, output_folder)

This function processes all Word files in a given folder and converts them to PDF, enabling rapid processing of large document volumes.

Document Archiving

Legal firms and government agencies often need to convert documents to PDF/A format for long-term archival. While the basic conversion produces standard PDFs, the same principles apply with additional parameters for archival compliance.

Client Deliverables

Professional service providers frequently deliver final documents in PDF format to prevent unauthorized modifications. Automating this conversion ensures that all client-facing documents maintain professional standards.

Best Practices

To optimize your Word to PDF conversions, keep the following recommendations in mind:

Font management: Always embed fonts when distributing PDFs externally to ensure consistent rendering
Image quality: Consider the balance between image quality and file size based on your distribution method
Bookmark structure: Use heading styles in Word to generate meaningful bookmarks automatically
Testing: Always verify the converted PDF on multiple devices to ensure compatibility
Error handling: Implement robust error handling when processing batches to continue despite individual file failures

Conclusion

Converting Word documents to PDF using Python and Spire.Doc provides a powerful solution for document processing workflows. Whether you need simple format conversion, bookmark generation, font embedding, or hyperlink control, these techniques enable efficient automation of document transformation tasks.

We have explored:

Basic Word to PDF conversion using SaveToFile
Creating bookmarks from Word headings or existing bookmarks
Embedding fonts for consistent rendering across devices
Controlling hyperlink behavior in the output PDF
Practical applications including batch processing and document archiving

By mastering these techniques, you will be able to automate your document conversion workflows effectively, improving productivity and ensuring consistent, professional-quality PDF output for all your business needs.