Wednesday, October 8, 2008

Using the CDK with the .Net framework and Mono

As part of my ongoing effort to create some cheminformatics options for the .Net framework I recently participated in a project to build C# bindings for the Open Babel C++ toolkit. Now that the first release of OBDotNet is out I decided to turn my attention to the other mature open source cheminformatics api that I have experience with: The java Chemistry Development Kit. For a first pass at the integration I decided to use IKVM an implementation of java for .Net. The components of IKVM include:

ikvm.exe : the VM
ikvmc.exe : a compiler that converts java byte code to msil code
ikvmstub.exe : a compiler that generates java stubs that wrap .Net classes


This post will look at using ikvmc to build a .Net dll from the CDK jar file and using it in C# and IronPython.

To build the cdk_dotnet dll first download IKVM . Then unzip it to the directory of your choice and add the directory containing the IKVM executables, the directory containing the C# compiler (csc for windows or mcs for mono), and the directory containing the java compiler (javac.exe) to the PATH environment variable. Next download the current release of the CDK.

Now we're ready to build the dll:

Change to the directory containing the CDK jar file and run the following command.

ikvmc -assembly:cdk_dotnet -target:library yourcdkjar.jar

The -debug switch can be added to generate debugging info for the assembly.

If you see alot of warnings when the dll is being built you may want to check your CLASSPATH environment variable, however in spite of the warnings most functionality seems to be unaffected, at least in C#. So go ahead an ignore them if you want to get started.

To test the dll:

Create a new C# console application project and add references to cdk_dotnet.dll and IKVM.OpenJDK.ClassLibrary.dll. Then write some code and run it. Here is an example of how to use the cdk_dotnet dll to read a file and calculate some descriptors.

using System;

namespace CDK_DotNet_Test
{
//Using aliases for convenience and to avoid importing whole
//packages
using FReader = java.io.FileReader;
using TPSA = org.openscience.cdk.qsar.descriptors.molecular.TPSADescriptor;
using LogP = org.openscience.cdk.qsar.descriptors.molecular.XLogPDescriptor;
using DoubleResult = org.openscience.cdk.qsar.result.DoubleResult;
using Builder = org.openscience.cdk.DefaultChemObjectBuilder;
using IMol = org.openscience.cdk.interfaces.IMolecule;
using MolReader = org.openscience.cdk.io.iterator.IteratingMDLReader;
using Consts = org.openscience.cdk.CDKConstants;

class Program
{
static void Main(string[] args)
{
FReader fReader = new FReader("some_mols.mol");
MolReader molReader = new MolReader(fReader,Builder.getInstance());
IMol mol;
DoubleResult dr;
LogP logP = new LogP();
TPSA tpsa = new TPSA();
double logPVal, tpsaVal;
string name;

while (molReader.hasNext())
{
mol = (
IMol)molReader.next();
dr = (
DoubleResult)logP.calculate(mol).getValue();
logPVal = dr.doubleValue();
dr = (
DoubleResult)tpsa.calculate(mol).getValue();
tpsaVal = dr.doubleValue();

//the title of each mol in the file is the name of the mol
name = (
String)mol.getProperty(Consts.TITLE);

Console.WriteLine("{0} {1} {2}",name,logPVal,tpsaVal);
}
}
//end Main
}
//end class Program
}
//end Namespace CDK_DotNet_Test


The output is:

Amitriptyline 20.01 3.24
antipyrine 11.827 23.55
carbamazepine 20.464 20.31
desipramine 17.511 3.24
lupitidine 11.036 76.32
phenserine 17.147 32.78
physostigmine 6.809 32.78
thioridazine 16.127 57.08
trifluoroperazine 17.027 35.02


IronPython:

Follow the instructions for installing IKVM and creating the cdk_dotnet dll then copy the CDK dll along with

IKVM.OpenJDK.ClassLibrary.dll and IKVM_Runtime.dll to the LIB directory of your IronPython installation.

I wanted to give an IronPython version of the C# example but for some reason the java File object throws an exception
when instantiated through IronPython. I'll be looking into this and hopefully I'll have an explanation to post soon. In the

mean time here is a simpler example using IronPython.

import clr

clr.AddReference("cdk_dotnet.dll")
clr.AddReference("IKVM.OpenJDK.ClassLibrary.dll")

from org.openscience.cdk import Molecule
from org.openscience.cdk.smiles import SmilesParser
#import the whole package for brevity
from org.openscience.cdk.qsar.descriptors.molecular import *
from org.openscience.cdk.qsar.result import DoubleResult

tpsa = TPSADescriptor()
logP = XLogPDescriptor()

smiles = SmilesParser()
mol = smiles.parseSmiles("N=CCC=O")
dr = tpsa.calculate(mol).getValue()
tpsaVal = dr.doubleValue()
dr = logP.calculate(mol).getValue()
logPVal = dr.doubleValue()
print logPVal,tpsaVal

the output is:
3.181 40.92

When I get a chance I'll do a post demonstrating how to use IKVM to call C# from java and demonstrate how to mix and match ChemSharp (currently under development with an alpha release coming soon) and OBDotNet with the CDK. It looks like we're finally getting a wide range of options for doing cheminformatics work using Mono and .Net. CODE ON!

Correction 11/12/08:

When I wrote:

"other mature open source cheminformatics api: The java Chemistry Development Kit. "

I did not mean to denigrate or ignore any of the other open source toolkits out there. What Imeant to write was "mature open source cheminformatics api that I have experience with".
Apologies all around.

Monday, October 6, 2008

PyOpenGL blunders

I was sorting though some old files recently and I came across some screen shots from of the first real programs I ever wrote using OpenGL back in my Python days. Since they were kind of funny I thought I would post them for the amusement of the readers.

The project was to render an electrostatic potential map on a VDW surface using atomic charges calculated with GAMESS (a computational chemistry package for all of you non-chemists). To create a prototype I had to manually issue the vertices for each atom sphere so that the color of the vertex could be set. Manually drawing the spheres required issuing the points in the order required to create the triangle strips used to approximate the spheres. This proved more challenging than I had anticipated....



























The Death Star here is my favorite!













And finally success! This is the oxygen in H2O.

Sunday, October 5, 2008

COM R and C#

I recently had a need for a statistics package I could use from C#. As far as open source options go, R is considered the gold standard. So I started looking for some examples and I found a nice tutorial at

http://www.codeproject.com/KB/cs/RtoCSharp.aspx


Check out the comments, some of them contain some good info.

PGP (not the crypto)

While the main subject addressed on this blog is programming with C#, from time to time I will mention some areas of research that I find interesting.

One of my major interests in psychopharmacology is the effect of P-Glycoprotein drug efflux transporters on blood brain barrier penetration. Efflux transporters are active transporters that kick things out of a cell. They were most likely evolved as a defensive mechanism. However, they pose a major problem for the administration of certain drug compounds. The presence of these transporters in the intestines has been known for some time. A more recent discovery is the presence of pgp transporters at the blood brain barrier. This is particularly interesting because it provides an explanation for why clozapine is succeeds in treating some schizophrenics when less toxic neuroleptics fail. It turns out that clozapine is one of the few neuroleptics that is not a pgp substrate. This implies that at least some treatment resistant schizophrenics over produce pgp transporters and that in the future some kind of combination therapy mixing a neuroleptic with an inhibitor of pgp may eliminate the need for clozapine. This is exciting because clozapine has a nasty tendency to cause agranular cytosis and patients taking the drug (at least here in the US) require weekly white cell counts.

For those interested in more information see

http://www.mhc.com/PGP/

The site doesn't seem to have been updated in quite a while but it does have a nice table of substrates and non-substrates as well as a short list of inducers.

Thursday, October 2, 2008

OBDotNet

OK C# fans, I've been a little lax in posting here (alright, more than a little) so I'm pleased to be able to announce the the first release of the OBDotNet, a set of swig generated C# bindings for the OpenBabel cheminformatics toolkit. Developed by yours truly in conjunction with Noel O'Boyle (his always interesting OBlog is linked on the right hand side) this represents the first open source cheminformatics option for the .Net framework.

Interested parties can download it at here.