Introduction
This is a sample class that reads Microsoft Word 2007 - 365 documents and returns the content.
Microsoft introduced a new document format with contained data in Office 2007. This means that .docx files are like a zip file.
The code below shows how to read the text from .docx documents.
This technique uses Progress Telerik.
- using System.IO;
- using Telerik.WinForms.Documents.FormatProviders.OpenXml.Docx;
- using Telerik.WinForms.Documents.FormatProviders.Txt;
-
- namespace Telerik.WinForms.Documents
- {
- public static class
- {
- public static string ReadDocxContent(string file)
- {
- var docxFormatProvider = new DocxFormatProvider();
- using var input = File.OpenRead(file);
- var document = docxFormatProvider.Import(input);
- var txtFormatProvider = new TxtFormatProvider();
- return txtFormatProvider.Export(document);
- }
- }
- }
Just pass a document file with a full path and the function returns its content.
- var txt = ReadDocx.ReadDocxContent(@".\desktop\document1.docx");
This code works with .Net Framework 4.x too, with a small refactoring:
- using System.IO;
- using Telerik.WinForms.Documents.FormatProviders.OpenXml.Docx;
- using Telerik.WinForms.Documents.FormatProviders.Txt;
-
- namespace Telerik.WinForms.Documents
- {
- public static class ReadDocx
- {
- public static string ReadDocxContent(string file)
- {
- var docxFormatProvider = new DocxFormatProvider();
- using (var input = File.OpenRead(file))
- {
- var document = docxFormatProvider.Import(input);
- var txtFormatProvider = new TxtFormatProvider();
- return txtFormatProvider.Export(document);
- }
- }
- }
Happy coding!