Sunday, December 14, 2008

Introduction to OBDotNet: Part 1

This tutorial is intended for C# developers looking to use OBDotNet to develop chemically aware applications and libraries. It assumes that the reader is familiar with OpenBabel or similar cheminformatics toolkits.

What follows is an overview of the core classes in the C# bindings for OpenBabel with an eye toward things that may confuse C# programmers who are unfamiliar with SWIG and C++. If you are completely new to OpenBabel start by skimming the documentation and the examples in the project wiki.

To install OBDotNet:

1)Download the current release
2)Unzip into the directory of your choice
3)Set an environment variable named BABEL_DATADIR to point to the data folder.
3)Register OBDotNet.dll in the GAC using gacutil.exe or the .Net management console.(optional)


OBDotNet currently consists of a single namespace: OpenBabel. It contains C# wrappers for all the publicly exposed classes, functions, and constants defined in the Open Babel library. It also contains a number of classes defined by SWIG that wrap arrays and STL types exposed by Open Babel. There are a few things that users need to be aware of if they are not familiar with SWIG.

1) The openbabel class

SWIG creates a module class that contains global constants and functions. The module class for OBDotNet is the openbabel class. Important methods belonging to the openbabel class include:

CartesianToInternal(vectorpInternalCoord arg0, OBMol mol)
InternalToCartesian(vectorpInternalCoord arg0, OBMol mol)
dot(OBVector3 a, OBVector3 b)
cross(OBVector3 a, OBVector3 b)

This is also where SWIG maps the values of macros and most enumerated constants from OpenBabel. For example the values in the OBGenericDataType enumeration are mapped to static fields of the openbabel class.


2) std::vector wrappers

SWIG generates wrapper it creates a class for each type of vector and array found in OpenBabel.
For example:

std::vector<string> -> vectorString.

These wrappers implement a type safe but non-generic IEnumerable. This means that foreach loops work without a cast but you can't call extension methods that target generic IEnumerables or use the copy constuctors of generic collection types. Use the Cast<T>() extension method to convert to a strongly typed IEnumerable<T>.

OBConversion obc = new OBConversion();
vectorString inFormats = obc.GetSupportedInputFormat();
List molFormats = new List(inFormat.Cast<string>());

foreach(string frmt in molFormats.Where(s => s.ToLower().Contains("mdl")))

3) The double_array class

This is only one array wrapper in OBDotNet. It uses the basic SWIG array template which does not implement ICollection, has no indexer, and is not enumerable. Items are manipulated using the getitem(int n) and setitem(int m) methods.


If you examine OBDotNet you'll see a lot of class names that look like this:


These represent places where the SWIG type map was not able to create a C# proxy for a C++ type. These placeholder classes cannot be instantiated and have no methods. Any method that takes or returns one these types is currently unusable.


The classes you will work with most often are the OBMol, OBAtom, OBBond, OBVector3, and OBConversion.

We'll start with OBConversion. Instances of the OBConversion class are used for reading and writing structure files, parsing miles and InChI strings, and converting between chemical file formats. Open Babel supports a large number of file formats. In C++ a instance of OBConversion can be created using the following constructor

OBConversion(std::istream *is=NULL, std::ostream *os=NULL)

Currently OBDotNet does not have support for wrapping std:istream and std:ostream. As a result we see our first SWIGTYPE:


So we'll have to use the default constuctor. This stream issue is common to most of the bindings. As a work around the OpenBabel developers added the Read/WriteFile and Read/WriteString methods to the class. These allow you to read and write data without exposing a stream. The Read/WriteString methods can be used with the "smi" or "inchi" formats to parse smiles or strings.

Here is a first test program to check your OBDotNet install. It creates an OBConversion and calls the GetSupportedInputFormat() and GetSupportedOutputFormat() to display the available file formats.

example 1: displaying the supported file formats

You should get a list of 93 input formats and 96 output formats.

The next example demonstrates reading structure data from an sdf file and writing out smiles strings.

example 2: reading an sdf file

This usage suggests definitely suggests a pattern, so our final example is a facade class to simplify reading structure files.

example 3: a facade class for reading files

Now we can just write

OBMol m = OBReader.ReadMol("capsaicin.mol);


IEnumerable<OBMol> dataSet = OBReader.ReadFiles("someMols.mol");

Returning the IEnumerable<OBMol> allows developers to use LINQ to Objects for filtering data sets.

foreach(OBMol mol in dataSet.Where(mol => mol.GetMW() < 500))

We'll look more at LINQ when we discuss descriptors in part 3.

That's it for part 1. After reading this you should be able to use OBDotNet to create OBMol objects from data in files or smiles/inchi strings. Part 2 will look at the OBAtom, OBMol, OBBond, and OBVector3 classes as well as the use of enumerators.


Bob said...

I'm having some troubles with getting OBDotNet to work and am hoping for some help. I have OpenBable-2.2.0 installed and am using VS 2008 sp1. I made the simplest program

using System;
using OpenBabel;

namespace OpenBabelTesting
class Program
static void Main(string[] args)
OBConversion obc = new OBConversion();

and when I run it I get this exception:
System.TypeInitializationException was unhandled
Message="The type initializer for 'OpenBabel.openbabelPINVOKE' threw an exception."
at OpenBabel.openbabelPINVOKE.new_OBConversion__SWIG_2()
at OpenBabel.OBConversion..ctor()
at OpenBabelTesting.Program.Main(String[] args) in C:\Projects\Constellation.HTS\ChemRegistration\OpenBabelTesting\Program.cs:line 10
at System.AppDomain._nExecuteAssembly(Assembly assembly, String[] args)
at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
InnerException: System.TypeInitializationException
Message="The type initializer for 'SWIGExceptionHelper' threw an exception."
at OpenBabel.openbabelPINVOKE.SWIGExceptionHelper..ctor()
at OpenBabel.openbabelPINVOKE..cctor()
InnerException: System.DllNotFoundException
Message="Unable to load DLL 'openbabel': The specified module could not be found. (Exception from HRESULT: 0x8007007E)"
at OpenBabel.openbabelPINVOKE.SWIGExceptionHelper.SWIGRegisterExceptionCallbacks_openbabel(ExceptionDelegate applicationDelegate, ExceptionDelegate arithmeticDelegate, ExceptionDelegate divideByZeroDelegate, ExceptionDelegate indexOutOfRangeDelegate, ExceptionDelegate invalidCastDelegate, ExceptionDelegate invalidOperationDelegate, ExceptionDelegate ioDelegate, ExceptionDelegate nullReferenceDelegate, ExceptionDelegate outOfMemoryDelegate, ExceptionDelegate overflowDelegate, ExceptionDelegate systemExceptionDelegate)
at OpenBabel.openbabelPINVOKE.SWIGExceptionHelper..cctor()

I thought it had something to do with openbabel.dll but I could not add it as a reference to my project or the GAC. Any ideas?

Thanks in advance,

mesprague said...

What's happening is that the managed assembly (OBDotNet.dll) can't find the intermediate unmanaged C++ dll (openbabel.dll) that handles interop with the OpenBabel libraries.
This is common problem.

The usual culprit is one of two things:

1)The environment variables described in the IronPython instructions are set correctly.

2) OBDotNet.dll and all of it's supporting files installed in a directory other than the project output directory but the COPY LOCAL property for the assmebly is set to True in VS.

If you're still having trouble, email more details about the installation directory. My address is listed in the my user profile.