Sunday, December 13, 2009

Office Interop Object Collection Technique – The right way.

In my previous article, I described the ghost excel process issue and its solution by releasing all interop references. In this article I will write about the way to release interop references.

Basically, every COM object that we access in .NET has a RCW (Runtime Callable Wrapper) associated with it. Each RCW has exactly one COM instance wrapped in it. The RCW is a light weight object allocated on managed heap. However, it can wrap within a COM wrapper allocated on native heap which may be resource hog.

Each RCW also has a refCount property to track the lifetime of the RCW object. Once the refCount property of RCW reaches zero, the underlying COM object is released and is no more accessible. And if at this point the code tries to access the RCW, an exception is thrown saying “COM object that has been separated from its underlying RCW cannot be used.”


Note: Using System.Runtime.InteropServices.Marshal.ReleaseComObject(object) you can decrease the refCount property of RCW. However, if not done carefully and judiciously, you can fall in trap of difficult to trace premature object killing.


The refCount property of an RCW is increased when a client references it. And it is decreased either automatically by GC or by using System.Runtime.InteropServices.Marshal.ReleaseComObject(object) method.

Below is how it works programmatically.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Excel = Microsoft.Office.Interop.Excel;
using System.Reflection;
using System.Runtime.InteropServices;
namespace RCW
{
    class Program
    {
        static void Main(string[] args)
        {           
            Excel.Application xlApp = new
Excel.Application xlApp = new Microsoft.Office.Interop.Excel.Application();
            Excel.Workbooks workbooks = xlApp.Workbooks;

            //The first reference of new workbook added to app
            Excel.Workbook newWorkbook = workbooks.Add(Missing.Value);

            //The second reference to the added workbook
            Excel.Workbook secondReference = workbooks[1];

            //The third refrence to the added workbook
            Excel.Workbook thirdReference = workbooks[1];

            int workbookRCWCounter = Marshal.ReleaseComObject(newWorkbook);
            Console.WriteLine(workbookRCWCounter.ToString());

            workbookRCWCounter = Marshal.ReleaseComObject(secondReference);
            Console.WriteLine(workbookRCWCounter.ToString());

            workbookRCWCounter = Marshal.ReleaseComObject(thirdReference);
            Console.WriteLine(workbookRCWCounter.ToString());

            Console.Read();
        }
    }
}
In above code, we are adding a new workbook to the excel application and collecting the reference to newWorkbook variable. This increases the RCW counter for that workbook object to 1. Again we reference the newly created workbook object from the workbooks collection variable and store it in secondReference variable. This increases the RCW counter for that workbook object to 2. We do the same again and collect the third reference to thirdReference variable which in turn increases RCW count to 3.

Observe that each time we referenced the newly created workbook object either while adding workbook or while referencing from workbooks collection it increases its RCW’s refCount. This can be termed as direct references to RCW.

Running the code the output will be 2 1 0 signifying the decrement of reference counter by one every time we called Marshal.ReleaseComObject.

In direct reference pattern, it is advised to call Marshal.ReleaseComObject for each direct reference after you are done with it.

On the other hand if we change our direct RCW reference to variable to variable references the RCW counter won’t increase.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Excel = Microsoft.Office.Interop.Excel;
using System.Reflection;
using System.Runtime.InteropServices;
namespace RCW
{
    class Program
    {
        static void Main(string[] args)
        {
            Excel.Application xlApp = new Microsoft.Office.Interop.Excel.Application();
            Excel.Workbooks workbooks = xlApp.Workbooks;

            //The first reference of new workbook added to app
            Excel.Workbook newWorkbook = workbooks.Add(Missing.Value);

            //The second reference to the added workbook
            Excel.Workbook secondReference = newWorkbook;

            //The third refrence to the added workbook
            Excel.Workbook thirdReference = newWorkbook;

            int workbookRCWCounter = Marshal.ReleaseComObject(newWorkbook);
            Console.WriteLine(workbookRCWCounter.ToString());

            workbookRCWCounter = Marshal.ReleaseComObject(secondReference);
            Console.WriteLine(workbookRCWCounter.ToString());

            workbookRCWCounter = Marshal.ReleaseComObject(thirdReference);
            Console.WriteLine(workbookRCWCounter.ToString());

            Console.Read();
        }
     }
}
In above code, observe that we took the first direct workbook RCW reference to newWorkbook variable. Which increased the RCW count to 1. However subsequently, we just copied the newWorkbook reference to secondReference and thirdReference variable. Since the secondReference and thirdReference variables were not directly referenced from workbook RCW object the refCount never increased and hence was 1 in spite of having three references. Thus the output of running above code will be 0 –1 –1. This is because when we called the Marshal.ReleaseComObject first time on newWorkbook reference the RCW refCount decreased to zero thus releasing the underlying COM object. Subsequenly calling Marshal.ReleaseComObject on second and third reference variables just decreased refCount to –1 stating invalid state of RCW object.

An interesting thing to note here is that if after releasing the original direct reference variable (newWorkbook) do you try to access the secondReference or thirdReference workbook references you will end up with “COM object that has been separated from its underlying RCW cannot be used.”  exception. This essentially points to premature object killing which can be annoying and difficult to trace.
In variable to variable reference copy pattern, it is advised to call the Marshal.ReleaseComObject only when all references are out of scope. This will help in preventing premature object killing.

Conclusion:
I explained the two patterns of RCW object references and behaviour of refCount in each pattern.  It may so happen that your code has a mix of these patters hence use your judgement to best while deciding which pattern to use at what time.

Thursday, December 10, 2009

Office Interop Object Collection Technique – Ghost Excel Instance

Recently, we wrote an add-in for Microsoft PowerPoint to automate several routine tasks for our business users. While developing this add-in we required to open a temp workbook in a new excel application instance to perform several tasks. Once these tasks were completed the temp workbook was supposed to be closed and the excel application needed to be quit. However to our amazement, we found that in spite of quitting excel application by calling Quit() on excel application object we found it to be silently running in task manager. It seemed that this process lives in memory forever until the AppDomain on which add-in is loaded is torn.

After this we researched a bit to see as to what is really causing the Excel process to live in memory forever. And to answer this and several other basic questions, I wrote a small windows console application as my POC. Below is how the source appears.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Excel = Microsoft.Office.Interop.Excel;
using System.Reflection;
using System.Runtime.InteropServices;
namespace GhostXLInstance
{
  class Program
  {
        static void Main(string[] args)
        {
            Excel.Application xlApp = new Microsoft.Office.Interop.Excel.Application();
            Excel.Workbook newWorkbook = xlApp.Workbooks.Add(Missing.Value);
            int workbookRCWCounter = Marshal.ReleaseComObject(newWorkbook);
            Console.WriteLine("RCW counter for workbook :: " + workbookRCWCounter.ToString());
            xlApp.Quit();
            int excelRCWCounter = Marshal.ReleaseComObject(xlApp);
            Console.WriteLine("RCW counter for XL application :: " + excelRCWCounter.ToString());
            Console.Read();
        }
    }
}
In above code, I am creating a new excel application, adding new workbook in it and then releasing both application and workbook references (observe that we explicitly created only two interop references xlApp and newWorkbook).

After executing the example, you will see two lines in console telling you the RCW count for each object we released. Viola!!!  It is zero which apparently means that all the objects we referenced are released. However, if you observe the task manager now the excel instance is still running. So the question still remains the same – What is holding the excel process?

Well, then I got a fantastic article http://blogs.msdn.com/geoffda/archive/2007/08/31/the-designer-process-that-would-not-terminate.aspx which explained the phenomena. So here is the phenomenon in simple words.
If you closely observe our example, we are actually getting three references. The third reference is implicitly created when we used the application’s Workbooks collection to add new workbook.

Excel.Workbook newWorkbook = xlApp.Workbooks.Add(Missing.Value);
Observe that xlApp.Workbooks actually returns Microsoft.Office.Interop.Excel.Workbooks reference. Wow! So did we got the cause of immutability of excel process? Yes! So this uninvited third reference lives forever in the memory and causes the excel process to wait for its garbage collection.
Now since we got the cause, lets modify our previous code to release third reference even.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Excel = Microsoft.Office.Interop.Excel;
using System.Reflection;
using System.Runtime.InteropServices;
namespace GhostXLInstance
{
    class Program
    {
        static void Main(string[] args)
        {
            Excel.Application xlApp = new Microsoft.Office.Interop.Excel.Application();
            Excel.Workbooks workbooks = xlApp.Workbooks;
            Excel.Workbook newWorkbook = workbooks.Add(Missing.Value);           
            int workbookRCWCounter = Marshal.ReleaseComObject(newWorkbook);
            Console.WriteLine("RCW counter for workbook :: " + workbookRCWCounter.ToString());
            int workbooksRCWCounter = Marshal.ReleaseComObject(workbooks);
            Console.WriteLine("RCW counter for workbooks :: " + workbooksRCWCounter.ToString());
            xlApp.Quit();
            int excelRCWCounter = Marshal.ReleaseComObject(xlApp);
            Console.WriteLine("RCW counter for XL application :: " + excelRCWCounter.ToString());
            Console.Read();
        }
    }
}
Finally in modified code, I am collecting the xlApp.Workbooks reference to a local variable and then releasing it. And now executing this code would release the ghost application while console application is waiting at Console.Read();

Conclusion:
Well so from our POC it became apparent that the so called collection references (like workbooks, shapes, charts …) needs to be explicitly referenced in local variable and garbage collected to neatly exit excel process.

Note: In my next posting I will detail the right way to release Interop references.