Friday, December 26, 2008

Implementing IEquatable

I'm currently working to implement reference equality in the SWIG bindings for OpenBabel. While doing some review I came across two pages which discuss some overlooked relationships between Object.Equals(object o), IEquatable<AType>.Equals(AType instance), and Object.GetHashCode().

So here is a little post-holiday reading:

Implementing IEquatable Properly


IEquatable(of T) and GetHashCode()

Wednesday, December 17, 2008

Perfect Female Companion - Powered by C#

I'm going to have to award these guys the Scientific C# project of the year.

(PhysOrg.com) -- Aiko is a humanoid robot with a built in Biometric Artificial Intelligence Neural System (Brain) designed by Le Trung in Canada. Aiko is slightly less than 5-feet high with 32.24-inch bust, 22.44-inch waist and 33.07-inch waist. Aiko which means "love child" is the perfect companion. In fact, lovely Aiko speaks Japanese and English and can respond to annoyances and questions.




Le Trung's biggest claim to fame is the technology. He combined his innovative 'Brain' technology by programming in C# and Basic which constantly updates.


Click here for the full article.

Tuesday, December 16, 2008

Factoradics as Permutation Odometers

I originally intended for this blog to cover more general scientific programming topics, but it has ended up being pretty much all cheminformatics. So I had been thinking about doing a more general post and when I came across this blog entry while searching google I knew I had found something to write about.

Enumerating permutations using inversion vectors and factoradic numbering


The author links to the Wikipedia entry for factoradic which gives this definition:

Factoradic is a factorial-based mixed radix numeral system: the i-th digit, counting from right, is to be multiplied by i!

radix: 7 6 5 4 3 2 1 0
place value: 7! 6! 5! 4! 3! 2! 1! 0!
in decimal: 5040 720 120 24 6 2 1 1

The first 6 factoradic numbers are:


1100

120100

121100

220100

221100

13020100

Wikipedia gives the subscript for readability but they are frequently written with commas dividing the place value as e.g., 2,1,0 which is more computer friendly. For more information you can also consult the entry in The Encyclopedia of Integer Sequences.

What's exciting about factoradics is that they provide a simple mapping between integers and permutations. The digits of the nth factoradic number can be used to obtain the nth permutation of a set of indices in lexicographical order.

I was intrigued by the idea of using factoradic numbers as permutation odometers. Lexical order is an odometer order if the set in question consists of numbers or letters. Since we're going to be working with indices, this holds true. For those not familiar with the concept, a permutation odometer is a means of controlling permutation generation that allows certain operations to be performed easily. These include: setting, reseting, rewinding, and fast-forwarding from a given permutation. You can read a basic introduction to base-n and base-10 odometers at this link.

In defining a type to represent a Factoradic we want something would act like an integral value type but readily supply the info necessary to obtain a permutation. I won't go into the details of how to get the permutation since they're given in the wikipedia entry and will be discussed in the follow up post. The important thing is that if I wanted to to, I could write some code like this

Factoradic fac = new Factoradic(11322);
while(fac < 0)
{
fac -= 4;
list.GetNthPerm(fac);
}
and generate every 4th permutation of a list in reverse lexicographical order starting with the 11,322 one. The operators ++, --, +=, etc. correspond to the operation odometer operations. Why would I want to do this? The reasons will be come abundantly clear later on. As a teaser take a look at the wikipedia entry on combinadics, which provide a similar mapping for the kth permutation of n items from a set.

Another important application of the factoradic is the generation of random permutations. Once we've defined the function GetNthPerm we can simply pass it a random integer. This has application in cryptography as well as software testing. The MSDN has an article which looks at this topic in depth.

This gist contains the source for my implementation of a Factoradic structure. Please bear in mind that it is a work in progress.

The maximum value that can be converted to/from an integral type is ulong.MaxValue while the maximum value that can be currently be stored is 20*20! Thats not bad, it gets us the approximately 10^19 permutations that can be generated by from naive implementation. I suspect that with some reordering of the list we could at least double that. We will also be able to use permutation patterns to our advantage, but that is definitely a topic for a later post.

Look for the next installment in a week or so. It will focus on the my implementation of the Factoradic structure and give some example code for actually generating the permutations.

OBDotNet Enumerator Workaround

As some of you may have noticed, in the initial release of OBDotNet the enumerator types are not mapped correctly. The proxy classes are generated but the parameter typemaps have a problem and no operators are overloaded. As a temporary fix, this gist contains some extension methods for enumerating atoms and bonds.



I threw this together kind of quickly and only did a little testing, so please email me if you see any bugs.

To demonstrate using these enumerators, here is a simple C# version of Noel's python script that calculated a circular fingerprint using the OBMolAtomBFSIter





Addendum: If your application is targeting v2.0 of the framework this MSDN article contains instructions on how to add support for extension methods. The relevant section is about 90% of the way down under the heading:

Extension Methods in .NET Framework 2.0 Apps

Sunday, December 14, 2008

Introduction to OBDotNet: Part 1

This tutorial is intended for C# developers looking to use OBDotNet to develop chemically aware applications and libraries. It assumes that the reader is familiar with OpenBabel or similar cheminformatics toolkits.

What follows is an overview of the core classes in the C# bindings for OpenBabel with an eye toward things that may confuse C# programmers who are unfamiliar with SWIG and C++. If you are completely new to OpenBabel start by skimming the documentation and the examples in the project wiki.


To install OBDotNet:

1)Download the current release
2)Unzip into the directory of your choice
3)Set an environment variable named BABEL_DATADIR to point to the data folder.
3)Register OBDotNet.dll in the GAC using gacutil.exe or the .Net management console.(optional)


Components:

OBDotNet currently consists of a single namespace: OpenBabel. It contains C# wrappers for all the publicly exposed classes, functions, and constants defined in the Open Babel library. It also contains a number of classes defined by SWIG that wrap arrays and STL types exposed by Open Babel. There are a few things that users need to be aware of if they are not familiar with SWIG.

1) The openbabel class

SWIG creates a module class that contains global constants and functions. The module class for OBDotNet is the openbabel class. Important methods belonging to the openbabel class include:

CartesianToInternal(vectorpInternalCoord arg0, OBMol mol)
InternalToCartesian(vectorpInternalCoord arg0, OBMol mol)
dot(OBVector3 a, OBVector3 b)
cross(OBVector3 a, OBVector3 b)

This is also where SWIG maps the values of macros and most enumerated constants from OpenBabel. For example the values in the OBGenericDataType enumeration are mapped to static fields of the openbabel class.

openbabel.PairData;
openbabl.RingData;
...


2) std::vector wrappers

SWIG generates wrapper it creates a class for each type of vector and array found in OpenBabel.
For example:

std::vector<string> -> vectorString.

These wrappers implement a type safe but non-generic IEnumerable. This means that foreach loops work without a cast but you can't call extension methods that target generic IEnumerables or use the copy constuctors of generic collection types. Use the Cast<T>() extension method to convert to a strongly typed IEnumerable<T>.


OBConversion obc = new OBConversion();
vectorString inFormats = obc.GetSupportedInputFormat();
List molFormats = new List(inFormat.Cast<string>());

foreach(string frmt in molFormats.Where(s => s.ToLower().Contains("mdl")))
...



3) The double_array class

This is only one array wrapper in OBDotNet. It uses the basic SWIG array template which does not implement ICollection, has no indexer, and is not enumerable. Items are manipulated using the getitem(int n) and setitem(int m) methods.

4) SWIGTYPE

If you examine OBDotNet you'll see a lot of class names that look like this:

SWIGTYPE_p_istream.

These represent places where the SWIG type map was not able to create a C# proxy for a C++ type. These placeholder classes cannot be instantiated and have no methods. Any method that takes or returns one these types is currently unusable.


OBConversion:

The classes you will work with most often are the OBMol, OBAtom, OBBond, OBVector3, and OBConversion.

We'll start with OBConversion. Instances of the OBConversion class are used for reading and writing structure files, parsing miles and InChI strings, and converting between chemical file formats. Open Babel supports a large number of file formats. In C++ a instance of OBConversion can be created using the following constructor

OBConversion(std::istream *is=NULL, std::ostream *os=NULL)

Currently OBDotNet does not have support for wrapping std:istream and std:ostream. As a result we see our first SWIGTYPE:

OBConversion(SWIGTYPE_p_std__istream,SWIGTYPE_p_std__ostream)

So we'll have to use the default constuctor. This stream issue is common to most of the bindings. As a work around the OpenBabel developers added the Read/WriteFile and Read/WriteString methods to the class. These allow you to read and write data without exposing a stream. The Read/WriteString methods can be used with the "smi" or "inchi" formats to parse smiles or strings.

Here is a first test program to check your OBDotNet install. It creates an OBConversion and calls the GetSupportedInputFormat() and GetSupportedOutputFormat() to display the available file formats.

example 1: displaying the supported file formats



You should get a list of 93 input formats and 96 output formats.


The next example demonstrates reading structure data from an sdf file and writing out smiles strings.

example 2: reading an sdf file



This usage suggests definitely suggests a pattern, so our final example is a facade class to simplify reading structure files.

example 3: a facade class for reading files



Now we can just write

OBMol m = OBReader.ReadMol("capsaicin.mol);

or

IEnumerable<OBMol> dataSet = OBReader.ReadFiles("someMols.mol");

Returning the IEnumerable<OBMol> allows developers to use LINQ to Objects for filtering data sets.

foreach(OBMol mol in dataSet.Where(mol => mol.GetMW() < 500))
...

We'll look more at LINQ when we discuss descriptors in part 3.

That's it for part 1. After reading this you should be able to use OBDotNet to create OBMol objects from data in files or smiles/inchi strings. Part 2 will look at the OBAtom, OBMol, OBBond, and OBVector3 classes as well as the use of enumerators.