Wednesday, October 8, 2008

Using the CDK with the .Net framework and Mono

As part of my ongoing effort to create some cheminformatics options for the .Net framework I recently participated in a project to build C# bindings for the Open Babel C++ toolkit. Now that the first release of OBDotNet is out I decided to turn my attention to the other mature open source cheminformatics api that I have experience with: The java Chemistry Development Kit. For a first pass at the integration I decided to use IKVM an implementation of java for .Net. The components of IKVM include:

ikvm.exe : the VM
ikvmc.exe : a compiler that converts java byte code to msil code
ikvmstub.exe : a compiler that generates java stubs that wrap .Net classes


This post will look at using ikvmc to build a .Net dll from the CDK jar file and using it in C# and IronPython.

To build the cdk_dotnet dll first download IKVM . Then unzip it to the directory of your choice and add the directory containing the IKVM executables, the directory containing the C# compiler (csc for windows or mcs for mono), and the directory containing the java compiler (javac.exe) to the PATH environment variable. Next download the current release of the CDK.

Now we're ready to build the dll:

Change to the directory containing the CDK jar file and run the following command.

ikvmc -assembly:cdk_dotnet -target:library yourcdkjar.jar

The -debug switch can be added to generate debugging info for the assembly.

If you see alot of warnings when the dll is being built you may want to check your CLASSPATH environment variable, however in spite of the warnings most functionality seems to be unaffected, at least in C#. So go ahead an ignore them if you want to get started.

To test the dll:

Create a new C# console application project and add references to cdk_dotnet.dll and IKVM.OpenJDK.ClassLibrary.dll. Then write some code and run it. Here is an example of how to use the cdk_dotnet dll to read a file and calculate some descriptors.

using System;

namespace CDK_DotNet_Test
{
//Using aliases for convenience and to avoid importing whole
//packages
using FReader = java.io.FileReader;
using TPSA = org.openscience.cdk.qsar.descriptors.molecular.TPSADescriptor;
using LogP = org.openscience.cdk.qsar.descriptors.molecular.XLogPDescriptor;
using DoubleResult = org.openscience.cdk.qsar.result.DoubleResult;
using Builder = org.openscience.cdk.DefaultChemObjectBuilder;
using IMol = org.openscience.cdk.interfaces.IMolecule;
using MolReader = org.openscience.cdk.io.iterator.IteratingMDLReader;
using Consts = org.openscience.cdk.CDKConstants;

class Program
{
static void Main(string[] args)
{
FReader fReader = new FReader("some_mols.mol");
MolReader molReader = new MolReader(fReader,Builder.getInstance());
IMol mol;
DoubleResult dr;
LogP logP = new LogP();
TPSA tpsa = new TPSA();
double logPVal, tpsaVal;
string name;

while (molReader.hasNext())
{
mol = (
IMol)molReader.next();
dr = (
DoubleResult)logP.calculate(mol).getValue();
logPVal = dr.doubleValue();
dr = (
DoubleResult)tpsa.calculate(mol).getValue();
tpsaVal = dr.doubleValue();

//the title of each mol in the file is the name of the mol
name = (
String)mol.getProperty(Consts.TITLE);

Console.WriteLine("{0} {1} {2}",name,logPVal,tpsaVal);
}
}
//end Main
}
//end class Program
}
//end Namespace CDK_DotNet_Test


The output is:

Amitriptyline 20.01 3.24
antipyrine 11.827 23.55
carbamazepine 20.464 20.31
desipramine 17.511 3.24
lupitidine 11.036 76.32
phenserine 17.147 32.78
physostigmine 6.809 32.78
thioridazine 16.127 57.08
trifluoroperazine 17.027 35.02


IronPython:

Follow the instructions for installing IKVM and creating the cdk_dotnet dll then copy the CDK dll along with

IKVM.OpenJDK.ClassLibrary.dll and IKVM_Runtime.dll to the LIB directory of your IronPython installation.

I wanted to give an IronPython version of the C# example but for some reason the java File object throws an exception
when instantiated through IronPython. I'll be looking into this and hopefully I'll have an explanation to post soon. In the

mean time here is a simpler example using IronPython.

import clr

clr.AddReference("cdk_dotnet.dll")
clr.AddReference("IKVM.OpenJDK.ClassLibrary.dll")

from org.openscience.cdk import Molecule
from org.openscience.cdk.smiles import SmilesParser
#import the whole package for brevity
from org.openscience.cdk.qsar.descriptors.molecular import *
from org.openscience.cdk.qsar.result import DoubleResult

tpsa = TPSADescriptor()
logP = XLogPDescriptor()

smiles = SmilesParser()
mol = smiles.parseSmiles("N=CCC=O")
dr = tpsa.calculate(mol).getValue()
tpsaVal = dr.doubleValue()
dr = logP.calculate(mol).getValue()
logPVal = dr.doubleValue()
print logPVal,tpsaVal

the output is:
3.181 40.92

When I get a chance I'll do a post demonstrating how to use IKVM to call C# from java and demonstrate how to mix and match ChemSharp (currently under development with an alpha release coming soon) and OBDotNet with the CDK. It looks like we're finally getting a wide range of options for doing cheminformatics work using Mono and .Net. CODE ON!

Correction 11/12/08:

When I wrote:

"other mature open source cheminformatics api: The java Chemistry Development Kit. "

I did not mean to denigrate or ignore any of the other open source toolkits out there. What Imeant to write was "mature open source cheminformatics api that I have experience with".
Apologies all around.

3 comments:

Egon Willighagen said...

This would be a really nice contribution to CDK News!

Noel O'Boyle said...

Wow! I didn't know this was possible. Nice work.

Unknown said...

How stable is this solution? Is this still working with the current 2010 release?