Save Excel to CSV without losing Unicode characters

Kostas, Developer

Implementing a simple Excel file (.xlsx) conversion to a Comma Separated Value (.csv) file format using Microsoft Office Excel Interop seems to be a piece of cake. At first, everything seems straightforward:

var application = new Microsoft.Office.Interop.Excel.Application
{
   DisplayAlerts = false,
   Visible = false,
};

Workbook workbook = null;

try
{
   workbook = application.Workbooks.Open(@"C:\document.xlsx",
       ReadOnly: Microsoft.Office.Core.MsoTriState.msoTrue);

   var sheet = (Worksheet)application.ActiveSheet;
   sheet?.SaveAs(@"C:\result.csv", XlFileFormat.xlCSV);
}
finally
{
   workbook?.Close();
   application.Quit();
}

But in reality, it turned out to be quite a bumpy road.

The problem

Indeed, the code above does convert the selected sheet to a CSV file. But after a few tests, it turned out that the UTF8 characters were not supported. I found some old code snippets written for Visual Basic and Powershell, where setting the TextFilePlatform = 65001 did the trick, but trying to incorporate the same approach using C# did not work.

image

After many trial and error attempts, I found something that isn’t in the official Microsoft Interop docs for C#, but it is for Visual Basic. The MS Office VBA Reference describes an XlFileFormat enumerator, and surprisingly it contains a new value - xlCSVUTF8. I checked the XlFileFormat enumerator in my code again (using the official Microsoft.Office.Interop.Excel v15, latest NuGet package available for the time being), but the heavenly promising xlCSVUTF8 value was not present. My installed library is meant to work with the MS Office 2013 suite, but the VB.NET documentation where I found the xlCSVUTF8 turned out to represent MS Office 2016.

How we solved the problem

So we need to upgrade our MS Office suite, inject the v16 Interop libraries, and get the job done. Unfortunately, problems don’t end here. There is no Excel Interop v16 on NuGet for .NET 4.X. The v16 Interop libraries can only be added manually from your local machine’s resources, specifically using .NET Core. We had to work it around by extracting the Interop DLLs from a .NET Core project and importing it to a .NET 4.X project, which finally seems to work. Once you have upgraded your MS Office to at least 2016 and got the v16 Interops, all you need to change is this line of code:

sheet?.SaveAs(Path.Combine(@"C:\result.csv", XlFileFormat.xlCSVUTF8);

We have published Microsoft Office (Word, Excel and Powerpoint) v16 Interop libraries as a NuGet package for easier referencing and upgrading.

Install-Package Office2019.Interop

TL;DR

If you ever worked with MS Office Interop, you know how clumsy it might get. In this case, what seemed to be a few tweaks turned out to require the MS Office infrastructure upgrade, a few hacks to get the v16 working with .NET 4.X, and long hours of navigating through the labyrinths of interop documentation.

Fortunately, we have a simpler solution for you. We built an API that handles the tricky part and makes your conversions effortless. Let us handle your Excel to CSV conversions, so you don’t have to:

  • Install and maintain appropriate MS Office versions
  • Scale your server for large file conversions
  • Worry about the future code maintenance
  • Have to use the clunky MS Office Interop

In addition to this, we prepared a simple configuration for your XLSX to CSV conversion. You can choose to convert the active sheet, select a sheet to convert by index or a sheet name, or even combine and convert multiple sheets if the data columns have the same structure. It works with .NET 4.X as well as .NET Core using our SDK library. Oh, and you get access to 500+ more conversions out of the box. No legacy code, no server maintenance. Try our service for free!