Friday, November 7, 2008

Interop Example: Marshalling Structures To The InChI Library

For the last week I've been finishing up CSInChI a library for using the IUPAC InChI library from C#. For those not acquainted with it, the InChI (International Chemical Identifier) is the a line notation used to represent molecular structures. Line notations are simply ways of encoding a structure as text string. Since the official InChI api provided by the IUPAC is written in C I thought this would be a good time to post an interop example. This tutorial will illustrate how to call an unmanaged function that takes structures as parameters using Platform Invoke.

The InChI library can be downloaded from: http://www.iupac.org/inchi/

In this example we'll tackle the function:

int GetStructFromINCHI(inchi_InputINCHI *inpInChI, inchi_OutputStruct *outStruct)

This function takes 2 C structs as parameters and returns an integer error code. The first one holds two strings, the inchi and a string of options.

typedef struct tagINCHI_InputINCHI {
/* the caller is responsible for the data allocation and deallocation */
char *szInChI; /* InChI ASCIIZ string to be converted to a strucure */
char *szOptions; /* InChI options: space-delimited; each is preceded by */
/* '/' or '-' depending on OS and compiler */
} inchi_InputINCHI;



We'll begin by creating a matching C# structure

public struct InChI_String_Input
{
public string inchiString;
public string options;
}


In this case we get surprisingly lucky and this struct marshals just fine with no additional attributes. The key thing here is to make sure that the fields are listed in the same order as in the unmanaged structure and that the type of each field is the same size as the C type. By default the C# compiler lays out the fields of a struct sequentially. If you want to use a class you must apply the [StructLayout(LayoutKind.Sequential)] attribute.

The C struct that holds the output from the function looks like this:

typedef struct
tagINCHI_OutputStruct {
inchi_Atom *atom;
inchi_Stereo0D
*stereo0;
S_SHORT num_atoms;
S_SHORT num_stereo0;
char *szMessage;
char *szLog;
unsigned long
WarningFlags[2][2];

}inchi_OutputStruct;

A C# equivalent looks something like this:
using System;
using System.Runtime.Interop;

public struct InChI_Struct_Output
{
public IntPtr AtomsPtr;
public IntPtr StereoPtr;

public short NumAtoms;
public short NumStereo0D;

public string Message;
public string Log;

[
MarshalAs(UnmanagedType.ByValArray, SizeConst = 4)]
public ulong[] WarningFlags;
}


The details of the inchi_Atom and inchi_Stereo0D structures will be discussed in a future post. For now we're only going worry about how to marshal arrays. Because a C style array is represented by a pointer to the first item in the array the C# equivalent is the IntPtr class from the InteropServices namespace. The WarningFlags array has to be changed to a 1-D array with the same total capacity because the CLR does not support marshaling nested arrays. Note that the MarshalAs attribute specifies a size. This is required both because the array has a fixed size in C and because the CLR needs to know the runtime size of an array in order to marshal it.

To convert the pointer representing a C style array to an array of C# structures write a method that takes the pointer and increments it by the size of the structure it represents calling the Marshal.PtrToStructure method at each iteration to convert the pointer to a C# structure.

public InChI_Atom[] GetAtoms()
{
int atomSize = Marshal.SizeOf(typeof(InChI_Atom));
InChI_Atom[] iAtoms = new InChI_Atom[NumAtoms];

InChI_Atom a;
IntPtr pAtom = AtomsPtr;

for (int i = 0; i < iAtoms.Length; i++)
{
a = (InChI_Atom)Marshal.PtrToStructure(pAtom, typeof(InChI_Atom));
iAtoms[i] = a;
pAtom = new IntPtr((int)pAtom + atomSize);
}
return iAtoms;
}


Finally we create a class to hold the methods that access the unmanged dll.

public static class LibInChI
{
[DllImport("libinchi.dll", EntryPoint = "GetStructFromINCHI")]
public static extern int ParseInChI(ref InChI_String_Input input, out InChI_Struct_Output output)
...
...
}


To call the method:

//All fields need to be set to non-null values
InChI_String_Input inp;
inp.Options = "";
string inchi = "InChI=1/H3N/h1H3";

InChI_Struct_Output outStruct;

int retVal = LibInChI.ParseInChI(ref inp, out outStruct);

Note the use of the ref and out keywords. When ref and out parameters are marshaled they are interpreted as &theParam. Remember that if a method has any ref or out parameters the keywords must be explicitly specified each time the method is called.

Thats it for today. The next interop example will look at the InChI_Atom struct and how to ensure that unmanaged resources are freed. For those who are interested, CSInChI will be available within the next week (fingers crossed!) from the ChemSharp project. http://sourceforge.net/projects/chemsharp

No comments: