Friday, December 26, 2008

Implementing IEquatable

I'm currently working to implement reference equality in the SWIG bindings for OpenBabel. While doing some review I came across two pages which discuss some overlooked relationships between Object.Equals(object o), IEquatable<AType>.Equals(AType instance), and Object.GetHashCode().

So here is a little post-holiday reading:

Implementing IEquatable Properly

IEquatable(of T) and GetHashCode()

Wednesday, December 17, 2008

Perfect Female Companion - Powered by C#

I'm going to have to award these guys the Scientific C# project of the year.

( -- Aiko is a humanoid robot with a built in Biometric Artificial Intelligence Neural System (Brain) designed by Le Trung in Canada. Aiko is slightly less than 5-feet high with 32.24-inch bust, 22.44-inch waist and 33.07-inch waist. Aiko which means "love child" is the perfect companion. In fact, lovely Aiko speaks Japanese and English and can respond to annoyances and questions.

Le Trung's biggest claim to fame is the technology. He combined his innovative 'Brain' technology by programming in C# and Basic which constantly updates.

Click here for the full article.

Tuesday, December 16, 2008

Factoradics as Permutation Odometers

I originally intended for this blog to cover more general scientific programming topics, but it has ended up being pretty much all cheminformatics. So I had been thinking about doing a more general post and when I came across this blog entry while searching google I knew I had found something to write about.

Enumerating permutations using inversion vectors and factoradic numbering

The author links to the Wikipedia entry for factoradic which gives this definition:

Factoradic is a factorial-based mixed radix numeral system: the i-th digit, counting from right, is to be multiplied by i!

radix: 7 6 5 4 3 2 1 0
place value: 7! 6! 5! 4! 3! 2! 1! 0!
in decimal: 5040 720 120 24 6 2 1 1

The first 6 factoradic numbers are:







Wikipedia gives the subscript for readability but they are frequently written with commas dividing the place value as e.g., 2,1,0 which is more computer friendly. For more information you can also consult the entry in The Encyclopedia of Integer Sequences.

What's exciting about factoradics is that they provide a simple mapping between integers and permutations. The digits of the nth factoradic number can be used to obtain the nth permutation of a set of indices in lexicographical order.

I was intrigued by the idea of using factoradic numbers as permutation odometers. Lexical order is an odometer order if the set in question consists of numbers or letters. Since we're going to be working with indices, this holds true. For those not familiar with the concept, a permutation odometer is a means of controlling permutation generation that allows certain operations to be performed easily. These include: setting, reseting, rewinding, and fast-forwarding from a given permutation. You can read a basic introduction to base-n and base-10 odometers at this link.

In defining a type to represent a Factoradic we want something would act like an integral value type but readily supply the info necessary to obtain a permutation. I won't go into the details of how to get the permutation since they're given in the wikipedia entry and will be discussed in the follow up post. The important thing is that if I wanted to to, I could write some code like this

Factoradic fac = new Factoradic(11322);
while(fac < 0)
fac -= 4;
and generate every 4th permutation of a list in reverse lexicographical order starting with the 11,322 one. The operators ++, --, +=, etc. correspond to the operation odometer operations. Why would I want to do this? The reasons will be come abundantly clear later on. As a teaser take a look at the wikipedia entry on combinadics, which provide a similar mapping for the kth permutation of n items from a set.

Another important application of the factoradic is the generation of random permutations. Once we've defined the function GetNthPerm we can simply pass it a random integer. This has application in cryptography as well as software testing. The MSDN has an article which looks at this topic in depth.

This gist contains the source for my implementation of a Factoradic structure. Please bear in mind that it is a work in progress.

The maximum value that can be converted to/from an integral type is ulong.MaxValue while the maximum value that can be currently be stored is 20*20! Thats not bad, it gets us the approximately 10^19 permutations that can be generated by from naive implementation. I suspect that with some reordering of the list we could at least double that. We will also be able to use permutation patterns to our advantage, but that is definitely a topic for a later post.

Look for the next installment in a week or so. It will focus on the my implementation of the Factoradic structure and give some example code for actually generating the permutations.

OBDotNet Enumerator Workaround

As some of you may have noticed, in the initial release of OBDotNet the enumerator types are not mapped correctly. The proxy classes are generated but the parameter typemaps have a problem and no operators are overloaded. As a temporary fix, this gist contains some extension methods for enumerating atoms and bonds.

I threw this together kind of quickly and only did a little testing, so please email me if you see any bugs.

To demonstrate using these enumerators, here is a simple C# version of Noel's python script that calculated a circular fingerprint using the OBMolAtomBFSIter

Addendum: If your application is targeting v2.0 of the framework this MSDN article contains instructions on how to add support for extension methods. The relevant section is about 90% of the way down under the heading:

Extension Methods in .NET Framework 2.0 Apps

Sunday, December 14, 2008

Introduction to OBDotNet: Part 1

This tutorial is intended for C# developers looking to use OBDotNet to develop chemically aware applications and libraries. It assumes that the reader is familiar with OpenBabel or similar cheminformatics toolkits.

What follows is an overview of the core classes in the C# bindings for OpenBabel with an eye toward things that may confuse C# programmers who are unfamiliar with SWIG and C++. If you are completely new to OpenBabel start by skimming the documentation and the examples in the project wiki.

To install OBDotNet:

1)Download the current release
2)Unzip into the directory of your choice
3)Set an environment variable named BABEL_DATADIR to point to the data folder.
3)Register OBDotNet.dll in the GAC using gacutil.exe or the .Net management console.(optional)


OBDotNet currently consists of a single namespace: OpenBabel. It contains C# wrappers for all the publicly exposed classes, functions, and constants defined in the Open Babel library. It also contains a number of classes defined by SWIG that wrap arrays and STL types exposed by Open Babel. There are a few things that users need to be aware of if they are not familiar with SWIG.

1) The openbabel class

SWIG creates a module class that contains global constants and functions. The module class for OBDotNet is the openbabel class. Important methods belonging to the openbabel class include:

CartesianToInternal(vectorpInternalCoord arg0, OBMol mol)
InternalToCartesian(vectorpInternalCoord arg0, OBMol mol)
dot(OBVector3 a, OBVector3 b)
cross(OBVector3 a, OBVector3 b)

This is also where SWIG maps the values of macros and most enumerated constants from OpenBabel. For example the values in the OBGenericDataType enumeration are mapped to static fields of the openbabel class.


2) std::vector wrappers

SWIG generates wrapper it creates a class for each type of vector and array found in OpenBabel.
For example:

std::vector<string> -> vectorString.

These wrappers implement a type safe but non-generic IEnumerable. This means that foreach loops work without a cast but you can't call extension methods that target generic IEnumerables or use the copy constuctors of generic collection types. Use the Cast<T>() extension method to convert to a strongly typed IEnumerable<T>.

OBConversion obc = new OBConversion();
vectorString inFormats = obc.GetSupportedInputFormat();
List molFormats = new List(inFormat.Cast<string>());

foreach(string frmt in molFormats.Where(s => s.ToLower().Contains("mdl")))

3) The double_array class

This is only one array wrapper in OBDotNet. It uses the basic SWIG array template which does not implement ICollection, has no indexer, and is not enumerable. Items are manipulated using the getitem(int n) and setitem(int m) methods.


If you examine OBDotNet you'll see a lot of class names that look like this:


These represent places where the SWIG type map was not able to create a C# proxy for a C++ type. These placeholder classes cannot be instantiated and have no methods. Any method that takes or returns one these types is currently unusable.


The classes you will work with most often are the OBMol, OBAtom, OBBond, OBVector3, and OBConversion.

We'll start with OBConversion. Instances of the OBConversion class are used for reading and writing structure files, parsing miles and InChI strings, and converting between chemical file formats. Open Babel supports a large number of file formats. In C++ a instance of OBConversion can be created using the following constructor

OBConversion(std::istream *is=NULL, std::ostream *os=NULL)

Currently OBDotNet does not have support for wrapping std:istream and std:ostream. As a result we see our first SWIGTYPE:


So we'll have to use the default constuctor. This stream issue is common to most of the bindings. As a work around the OpenBabel developers added the Read/WriteFile and Read/WriteString methods to the class. These allow you to read and write data without exposing a stream. The Read/WriteString methods can be used with the "smi" or "inchi" formats to parse smiles or strings.

Here is a first test program to check your OBDotNet install. It creates an OBConversion and calls the GetSupportedInputFormat() and GetSupportedOutputFormat() to display the available file formats.

example 1: displaying the supported file formats

You should get a list of 93 input formats and 96 output formats.

The next example demonstrates reading structure data from an sdf file and writing out smiles strings.

example 2: reading an sdf file

This usage suggests definitely suggests a pattern, so our final example is a facade class to simplify reading structure files.

example 3: a facade class for reading files

Now we can just write

OBMol m = OBReader.ReadMol("capsaicin.mol);


IEnumerable<OBMol> dataSet = OBReader.ReadFiles("someMols.mol");

Returning the IEnumerable<OBMol> allows developers to use LINQ to Objects for filtering data sets.

foreach(OBMol mol in dataSet.Where(mol => mol.GetMW() < 500))

We'll look more at LINQ when we discuss descriptors in part 3.

That's it for part 1. After reading this you should be able to use OBDotNet to create OBMol objects from data in files or smiles/inchi strings. Part 2 will look at the OBAtom, OBMol, OBBond, and OBVector3 classes as well as the use of enumerators.

Friday, November 21, 2008

ChemSharp - should have googled it first

Then I would have known about this...

"The safest way to sharpen tungsten without grinding."

Thursday, November 13, 2008

CSInChI v0.5 Released

The first product of the ChemSharp project is now available to the public. The CSInChI library allows programmers to call the IUPAC InChI library from CLR languages. It is compatible with Iron Python as well although Python programmers should read up on how IPy handles value types and using the clr.Reference class with methods that take out and ref parameters.

CSInChI is designed as a stand alone library which is used by ChemSharp but not dependent on the rest of the project.

This is a beta release so it's a little rough and people should expect that breaking changes may be made between now and the eventual 1.0 release. Using the default constructors of the structs and then initalizing the fields will be the best way to ensure compatibility with future releases.

The next few posts will contain examples of how to use this library. More examples are included in the documentation.

CSInChI can be downloaded from this link.

Questions and comments can be directed either to me or to the CSInChI mailing list at

Wednesday, November 12, 2008

Languages available for use with the CLR

A recent post to the OpenBabel mailing list reminded me that many scientists who write .Net code are not fully aware of the wide array of compatible languages. In fact most popular programming languages and many not so popular ones have been ported to the CLR. The list includes:

A# (Ada)
S# (Small Talk)
FTN95 (Fortran)
F# (OCaml)

and many more...

A fairly complete list is posted here.

Sunday, November 9, 2008

Science Code .Net and Numerical Recipes

Here is an interesting project:

It seems to be an effort to implement the classic Numerical Recipes and provide classes to do some common Physics/Math calculations from C#. When I get a chance I'll be trying it out and posting a review. If any one has some experience with it comment and let me know what you thought of it.

Friday, November 7, 2008

Interop Example: Marshalling Structures To The InChI Library

For the last week I've been finishing up CSInChI a library for using the IUPAC InChI library from C#. For those not acquainted with it, the InChI (International Chemical Identifier) is the a line notation used to represent molecular structures. Line notations are simply ways of encoding a structure as text string. Since the official InChI api provided by the IUPAC is written in C I thought this would be a good time to post an interop example. This tutorial will illustrate how to call an unmanaged function that takes structures as parameters using Platform Invoke.

The InChI library can be downloaded from:

In this example we'll tackle the function:

int GetStructFromINCHI(inchi_InputINCHI *inpInChI, inchi_OutputStruct *outStruct)

This function takes 2 C structs as parameters and returns an integer error code. The first one holds two strings, the inchi and a string of options.

typedef struct tagINCHI_InputINCHI {
/* the caller is responsible for the data allocation and deallocation */
char *szInChI; /* InChI ASCIIZ string to be converted to a strucure */
char *szOptions; /* InChI options: space-delimited; each is preceded by */
/* '/' or '-' depending on OS and compiler */
} inchi_InputINCHI;

We'll begin by creating a matching C# structure

public struct InChI_String_Input
public string inchiString;
public string options;

In this case we get surprisingly lucky and this struct marshals just fine with no additional attributes. The key thing here is to make sure that the fields are listed in the same order as in the unmanaged structure and that the type of each field is the same size as the C type. By default the C# compiler lays out the fields of a struct sequentially. If you want to use a class you must apply the [StructLayout(LayoutKind.Sequential)] attribute.

The C struct that holds the output from the function looks like this:

typedef struct
tagINCHI_OutputStruct {
inchi_Atom *atom;
S_SHORT num_atoms;
S_SHORT num_stereo0;
char *szMessage;
char *szLog;
unsigned long


A C# equivalent looks something like this:
using System;
using System.Runtime.Interop;

public struct InChI_Struct_Output
public IntPtr AtomsPtr;
public IntPtr StereoPtr;

public short NumAtoms;
public short NumStereo0D;

public string Message;
public string Log;

MarshalAs(UnmanagedType.ByValArray, SizeConst = 4)]
public ulong[] WarningFlags;

The details of the inchi_Atom and inchi_Stereo0D structures will be discussed in a future post. For now we're only going worry about how to marshal arrays. Because a C style array is represented by a pointer to the first item in the array the C# equivalent is the IntPtr class from the InteropServices namespace. The WarningFlags array has to be changed to a 1-D array with the same total capacity because the CLR does not support marshaling nested arrays. Note that the MarshalAs attribute specifies a size. This is required both because the array has a fixed size in C and because the CLR needs to know the runtime size of an array in order to marshal it.

To convert the pointer representing a C style array to an array of C# structures write a method that takes the pointer and increments it by the size of the structure it represents calling the Marshal.PtrToStructure method at each iteration to convert the pointer to a C# structure.

public InChI_Atom[] GetAtoms()
int atomSize = Marshal.SizeOf(typeof(InChI_Atom));
InChI_Atom[] iAtoms = new InChI_Atom[NumAtoms];

InChI_Atom a;
IntPtr pAtom = AtomsPtr;

for (int i = 0; i < iAtoms.Length; i++)
a = (InChI_Atom)Marshal.PtrToStructure(pAtom, typeof(InChI_Atom));
iAtoms[i] = a;
pAtom = new IntPtr((int)pAtom + atomSize);
return iAtoms;

Finally we create a class to hold the methods that access the unmanged dll.

public static class LibInChI
[DllImport("libinchi.dll", EntryPoint = "GetStructFromINCHI")]
public static extern int ParseInChI(ref InChI_String_Input input, out InChI_Struct_Output output)

To call the method:

//All fields need to be set to non-null values
InChI_String_Input inp;
inp.Options = "";
string inchi = "InChI=1/H3N/h1H3";

InChI_Struct_Output outStruct;

int retVal = LibInChI.ParseInChI(ref inp, out outStruct);

Note the use of the ref and out keywords. When ref and out parameters are marshaled they are interpreted as &theParam. Remember that if a method has any ref or out parameters the keywords must be explicitly specified each time the method is called.

Thats it for today. The next interop example will look at the InChI_Atom struct and how to ensure that unmanaged resources are freed. For those who are interested, CSInChI will be available within the next week (fingers crossed!) from the ChemSharp project.

Wednesday, October 8, 2008

Using the CDK with the .Net framework and Mono

As part of my ongoing effort to create some cheminformatics options for the .Net framework I recently participated in a project to build C# bindings for the Open Babel C++ toolkit. Now that the first release of OBDotNet is out I decided to turn my attention to the other mature open source cheminformatics api that I have experience with: The java Chemistry Development Kit. For a first pass at the integration I decided to use IKVM an implementation of java for .Net. The components of IKVM include:

ikvm.exe : the VM
ikvmc.exe : a compiler that converts java byte code to msil code
ikvmstub.exe : a compiler that generates java stubs that wrap .Net classes

This post will look at using ikvmc to build a .Net dll from the CDK jar file and using it in C# and IronPython.

To build the cdk_dotnet dll first download IKVM . Then unzip it to the directory of your choice and add the directory containing the IKVM executables, the directory containing the C# compiler (csc for windows or mcs for mono), and the directory containing the java compiler (javac.exe) to the PATH environment variable. Next download the current release of the CDK.

Now we're ready to build the dll:

Change to the directory containing the CDK jar file and run the following command.

ikvmc -assembly:cdk_dotnet -target:library yourcdkjar.jar

The -debug switch can be added to generate debugging info for the assembly.

If you see alot of warnings when the dll is being built you may want to check your CLASSPATH environment variable, however in spite of the warnings most functionality seems to be unaffected, at least in C#. So go ahead an ignore them if you want to get started.

To test the dll:

Create a new C# console application project and add references to cdk_dotnet.dll and IKVM.OpenJDK.ClassLibrary.dll. Then write some code and run it. Here is an example of how to use the cdk_dotnet dll to read a file and calculate some descriptors.

using System;

namespace CDK_DotNet_Test
//Using aliases for convenience and to avoid importing whole
using FReader =;
using TPSA = org.openscience.cdk.qsar.descriptors.molecular.TPSADescriptor;
using LogP = org.openscience.cdk.qsar.descriptors.molecular.XLogPDescriptor;
using DoubleResult = org.openscience.cdk.qsar.result.DoubleResult;
using Builder = org.openscience.cdk.DefaultChemObjectBuilder;
using IMol = org.openscience.cdk.interfaces.IMolecule;
using MolReader =;
using Consts = org.openscience.cdk.CDKConstants;

class Program
static void Main(string[] args)
FReader fReader = new FReader("some_mols.mol");
MolReader molReader = new MolReader(fReader,Builder.getInstance());
IMol mol;
DoubleResult dr;
LogP logP = new LogP();
TPSA tpsa = new TPSA();
double logPVal, tpsaVal;
string name;

while (molReader.hasNext())
mol = (
dr = (
logPVal = dr.doubleValue();
dr = (
tpsaVal = dr.doubleValue();

//the title of each mol in the file is the name of the mol
name = (

Console.WriteLine("{0} {1} {2}",name,logPVal,tpsaVal);
//end Main
//end class Program
//end Namespace CDK_DotNet_Test

The output is:

Amitriptyline 20.01 3.24
antipyrine 11.827 23.55
carbamazepine 20.464 20.31
desipramine 17.511 3.24
lupitidine 11.036 76.32
phenserine 17.147 32.78
physostigmine 6.809 32.78
thioridazine 16.127 57.08
trifluoroperazine 17.027 35.02


Follow the instructions for installing IKVM and creating the cdk_dotnet dll then copy the CDK dll along with

IKVM.OpenJDK.ClassLibrary.dll and IKVM_Runtime.dll to the LIB directory of your IronPython installation.

I wanted to give an IronPython version of the C# example but for some reason the java File object throws an exception
when instantiated through IronPython. I'll be looking into this and hopefully I'll have an explanation to post soon. In the

mean time here is a simpler example using IronPython.

import clr


from org.openscience.cdk import Molecule
from org.openscience.cdk.smiles import SmilesParser
#import the whole package for brevity
from org.openscience.cdk.qsar.descriptors.molecular import *
from org.openscience.cdk.qsar.result import DoubleResult

tpsa = TPSADescriptor()
logP = XLogPDescriptor()

smiles = SmilesParser()
mol = smiles.parseSmiles("N=CCC=O")
dr = tpsa.calculate(mol).getValue()
tpsaVal = dr.doubleValue()
dr = logP.calculate(mol).getValue()
logPVal = dr.doubleValue()
print logPVal,tpsaVal

the output is:
3.181 40.92

When I get a chance I'll do a post demonstrating how to use IKVM to call C# from java and demonstrate how to mix and match ChemSharp (currently under development with an alpha release coming soon) and OBDotNet with the CDK. It looks like we're finally getting a wide range of options for doing cheminformatics work using Mono and .Net. CODE ON!

Correction 11/12/08:

When I wrote:

"other mature open source cheminformatics api: The java Chemistry Development Kit. "

I did not mean to denigrate or ignore any of the other open source toolkits out there. What Imeant to write was "mature open source cheminformatics api that I have experience with".
Apologies all around.

Monday, October 6, 2008

PyOpenGL blunders

I was sorting though some old files recently and I came across some screen shots from of the first real programs I ever wrote using OpenGL back in my Python days. Since they were kind of funny I thought I would post them for the amusement of the readers.

The project was to render an electrostatic potential map on a VDW surface using atomic charges calculated with GAMESS (a computational chemistry package for all of you non-chemists). To create a prototype I had to manually issue the vertices for each atom sphere so that the color of the vertex could be set. Manually drawing the spheres required issuing the points in the order required to create the triangle strips used to approximate the spheres. This proved more challenging than I had anticipated....

The Death Star here is my favorite!

And finally success! This is the oxygen in H2O.

Sunday, October 5, 2008

COM R and C#

I recently had a need for a statistics package I could use from C#. As far as open source options go, R is considered the gold standard. So I started looking for some examples and I found a nice tutorial at

Check out the comments, some of them contain some good info.

PGP (not the crypto)

While the main subject addressed on this blog is programming with C#, from time to time I will mention some areas of research that I find interesting.

One of my major interests in psychopharmacology is the effect of P-Glycoprotein drug efflux transporters on blood brain barrier penetration. Efflux transporters are active transporters that kick things out of a cell. They were most likely evolved as a defensive mechanism. However, they pose a major problem for the administration of certain drug compounds. The presence of these transporters in the intestines has been known for some time. A more recent discovery is the presence of pgp transporters at the blood brain barrier. This is particularly interesting because it provides an explanation for why clozapine is succeeds in treating some schizophrenics when less toxic neuroleptics fail. It turns out that clozapine is one of the few neuroleptics that is not a pgp substrate. This implies that at least some treatment resistant schizophrenics over produce pgp transporters and that in the future some kind of combination therapy mixing a neuroleptic with an inhibitor of pgp may eliminate the need for clozapine. This is exciting because clozapine has a nasty tendency to cause agranular cytosis and patients taking the drug (at least here in the US) require weekly white cell counts.

For those interested in more information see

The site doesn't seem to have been updated in quite a while but it does have a nice table of substrates and non-substrates as well as a short list of inducers.

Thursday, October 2, 2008


OK C# fans, I've been a little lax in posting here (alright, more than a little) so I'm pleased to be able to announce the the first release of the OBDotNet, a set of swig generated C# bindings for the OpenBabel cheminformatics toolkit. Developed by yours truly in conjunction with Noel O'Boyle (his always interesting OBlog is linked on the right hand side) this represents the first open source cheminformatics option for the .Net framework.

Interested parties can download it at here.

Sunday, July 20, 2008

Fortress - Not C# but it's cool

From the FAQ:

What is Fortress? Fortress is a new programming language designed for high-performance computing (HPC) with high programmability. In order to explore breakaway approaches to improving programmability, the Fortress design has not been tied to legacy language syntax or semantics; all aspects of HPC language design have been rethought from the ground up. As a result, we are able to support features in Fortress such as transactions, specification of locality, and implicit parallel computation, as integral features built into the core of the language. Features such as the Fortress component system and test framework facilitate program assembly and testing, and enable powerful compiler optimizations across library boundaries. Even the syntax and type system of Fortress are custom-tailored to modern HPC programming, supporting mathematical notation and static checking of properties such as physical units and dimensions, static type checking of multidimensional arrays and matrices, and definitions of domain-specific language syntax in libraries. Moreover, Fortress has been designed with the intent that it be a ``growable'' language, gracefully supporting the addition of future language features. In fact, much of the Fortress language itself (even the definition of arrays and other basic types) is encoded in libraries atop a relatively small core language.

My personal favorite feature is the ability to use greek letters and mathematical symbols in the source. Check it out at...

Thursday, July 17, 2008

Welcome to Scientific C#

This is the initial post of my new blog dedicated to scientific programming in C#.

I thought I'd kick things off by answering the question I'm asked most commonly.

Why C# instead of java:

User defined value types
Multidimensional Arrays
Superior implementation of generics
The option to write unmanaged code and use explicit pointers
The mono project is fairly well developed allowing dot net development for linux

For a good if outdated introduction to C# for scientific applications see...