Thursday, November 20, 2008

Classifier Milestones

Lady's and gents, I am officially absorbed. The Classifier development is seriously picking up momentum now, to the point where I'm going to need to slow down soon and do some serious refactoring to clean things up and document them better (ie: at all). Soon I'll be implementing mundane application-wide alerts and validation checks, but for now, I want to take a moment to look back at the milestones reached in the past 50 days:
  • Subsets: users can now define subsets of their data via SQL in their properties file.
    • This allows actions like "Highlight cells with CDK knockdowns." & "Show me cells of phenotype-x from my positive controls."
  • Replaced drag-and-drop mechanism with one of my own which presents no bugs.
  • Training sets can be saved and loaded. Classifier also prompts for save on close if the current training set hasn't been saved already.
  • Overcame threading hurdles so image tiles may be loaded in a separate thread while the user does manual sorting.
  • Added many shortcuts to speed up interaction.
    • Double-click on image tiles: view full image.
    • Esc: closes image viewer.
    • Ctrl+a: select all tiles on current board.
    • Ctrl+d: deselect all tiles.
    • Ctrl+i: invert selection.
    • Delete: remove selection from board.
    • Up & down arrows: scroll board contents.
    • Ctrl+1,2,3...: Show/hide channels 1,2,3... in classifier and image viewer.
    • Double-click grid-row-label: Show image/images in row group.
  • Sorting boards can be added, removed, and named.
  • Connected the classifier backend so users can now train and fetch objects from multiple phenotypes.
  • Tested the image reader on 12 & 16bit tifs, and Cellomics dibs.
  • Changed the internal image representation to float32 numpy arrays in [0.,1.]
  • Added new colors to the channel-color mapping mechanism. Users can now choose to map image channels to red, green, blue, cyan, magenta, yellow, gray, and none/hidden.
  • Implemented scoring of phenotype hits and enrichments on a per-image basis.
    • Added a table view which can be sorted by columns, launch the image viewer from rows, and save it's contents to csv.
  • Users can now define groupings in their properties file (eg: per-well, per-gene, per-plate), and use them to group their enrichment scores.
We're almost there. Today I'm hoping to package the the code for the first time with py2app and hand it over to our image assay developers to tinker with. With my todo-list growing at the rate that it is, I can only imagine how much work will lay ahead of me with their feedback.

Tuesday, October 7, 2008

Threading a fine line

I have come to realize something about myself as a programmer, and that is that my past programming experience has not been in development but in hacking. Sure, I've written a large chunk of code here or there, but the vast majority of the code I have written has been written in a blitz. From homework assignments in college, to writing Perl scripts for molecular dynamics research, the average lifetime of my code is probably about 2 weeks.

Things are different now that I've taken on the responsibility of writing software for analyzing large, image-based biological screens. It used to be easier to just dodge code speed bumps, "Hack it, just make it work because no one else will have to learn how to use it." Now when I encounter a speed bump, I can't go around it or even drive over it, I have to get out of the car and find a way to peel this damn thing off the road or consider the nightmare of paving a new one!

At the start of the day yesterday, I was bogged down by the thought of having two highly elusive bugs hiding somewhere in my code. I knew I couldn't just look the other way and continue with development until the code was 100% stable. The frustration was that the code seemed completely stable because the errors were often difficult to reproduce. Still, they were there, and I knew they weren't going to be easy to fix, so I set to mailing the wxpython-users group for help before attacking the problem on my own.

The first bug, was related to a worker thread whose job it was to fetch image tiles and add them to a bin. I noticed that if I was resizing the window while the thread was hard at work, I would occasionally get error output in my console which didn't mean anything to me. However, even less frequently, I would get other errors which would cause the thread to crash.

This was unnerving psychologically because neither of these errors was terribly disruptive to the user. Even in the case where the worker thread would crash, it could be easily restarted by clicking a button. Still, as a developer I realized that small problems like these will snowball as an application grows. They are far easier to thwart at their outset than to put it off in hopes of them resolving themselves.

After playing with it for a while I found that triggering lots of resize events on the bin when the thread was adding a cell was more likely to cause an error. Clearly there was a lack of mutual exclusion for the bin. The solution was simple enough, do the image loading in the worker, then signal the main thread whenever an image was ready.

Today I stand with one bug left to fix. This one unpredictably crashes the entire app in a few inept-user scenarios. Lets just say I've already started to paving over that road.

Monday, September 29, 2008

Developing biologist-friendly analysis software

Welp, the tool I wrote to merge MySQL tables is pretty much done with the exception of a few niceties. Next up, I'll be returning to writing CellProfiler Analyst with my sights set on providing the user the ability to define dynamic groups.

Users will want to be able to interrogate the data with their screen in mind. In Classifier they might ask:
  • Show me cells from plate X.
  • Show me cells from plate X, well Y.
  • Show me cells treated with Z.
  • Show me cells from control wells.
  • Show me cells NOT from W.
Likewise, this will be applied to our visualizations down the road:
  • Color a plot by treatment name.
  • Plot measurement_X vs treatments.
  • Select all points from control wells.
After talking to Ray, who has a habit of seeing difficult things for their inherent simplicity, I am working on a plan to convert classifier to handle dynamic grouping under the new database schema. Ray's suggestion was to define groups by their where-clauses. Here's what I've come up with for storing the information in the properties file.

groups = EMPTY, CDKs, Accuracy75
group_where_EMPTY = CPA_per_image.well=well_id.well AND well_id.Gene=EMPTY
group_tables_EMPTY = CPA_per_image, well_id,
group_where_CDKs = CPA_per_image.well=well_id.well AND well_id.Gene REGEXP 'CDK.*'
group_tables_CDKs = CPA_per_image, well_id,
group_where_Accuracy75 = CPA_per_image.well=well_pairs.well_a AND well_pairs.accuracy>=75
group_tables_Accuracy75 = CPA_per_image, well_pairs,

With this information is available in the app, we'll be able to load cell tiles like this:

"Show me 20 random cells from control wells."
# Get the list of images that fall in control wells.
SELECT per_image.ImageNumber, meta.ImageNumber, meta.control FROM per_image, meta WHERE per_image.ImageNumber = meta.ImageNumber AND meta.control = 1;

# Use the existing data model to generate 20 random cell
# keys (tblNum,imNum,obNum) that fall in these images.

# Get the paths to the images we need
SELECT image_channel_path, image_channel_file, TableNumber, ImageNumber FROM per_image WHERE TableNumber = cellKey[0] AND ImageNumber = cellKey[1];

# Get the cell positions
SELECT pos_x, pos_y, TableNumber, ImageNumber, ObjectNumber, FROM per_object WHERE TableNumber = cellKey[0] AND ImageNumber = cellKey[1] AND ObjectNumber = cellKey[2];


# Load and crop the images.
The first query could also be broken into two and joined in python...
SELECT imagenumber FROM per_image;
SELECT imagenumber, control FROM meta WHERE control=1;

Wednesday, September 24, 2008

Merging MySQL Tables

My current task is to write a tool to allow users to merge tables created by CellProfiler into a master that will be used for the analysis software I'm writing. It's a simple objective except I'm still getting used to doing GUI development in wx.Python and learning which widgest to use and how is a lot like pulling teeth.

So! My thought is that I can use this blog to verbalize what I'm doing and how, and hopefully this will help me clear my mind and focus. This is something I haven't done before (heck, I've only made 2 posts in this blog so far), but I'm hopeful this will also help to re-instill that nerdy sense of accomplishment and learning that I felt when I first got into computer science.

The nature of our data is such that it is stored in pairs of tables (per-image-table, per-object-table). Yesterday I wrote MYSql statements to merge 3 table-pairs into a single pair.

# Create a new database where the master tables will be stored
# Create a master per-image table called "im" with the same definition as the old one
# Add a column "TableNumber" to keep track of which table each row is from
# Update the primary key
# Populate the dataset

CREATE DATABASE af;
CREATE TABLE im LIKE imageScreen.plate1_Per_Image;
ALTER TABLE im DROP PRIMARY KEY;
ALTER TABLE im ADD COLUMN TableNumber INT;
ALTER TABLE im ADD PRIMARY KEY (TableNumber, ImageNumber);
INSERT INTO im SELECT *,0 FROM
imageScreen.plate1_Per_Image;
INSERT INTO im SELECT *,1 FROM
imageScreen.plate2_Per_Image;
INSERT INTO im SELECT *,2 FROM
imageScreen.plate3_Per_Image;
ALTER TABLE im MODIFY COLUMN TableNumber INT FIRST;

# Do the same for per-object tables

CREATE TABLE ob LIKE
imageScreen.plate1_Per_Object;
ALTER TABLE ob DROP PRIMARY KEY;
ALTER TABLE ob ADD COLUMN TableNumber INT;
ALTER TABLE ob ADD PRIMARY KEY (TableNumber, ImageNumber, ObjectNumber);
INSERT INTO ob SELECT *,0 FROM
imageScreen.plate1_Per_Object;
INSERT INTO ob SELECT *,1 FROM
imageScreen.plate2_Per_Object;
INSERT INTO ob SELECT *,2 FROM
imageScreen.plate3_Per_Object;
ALTER TABLE ob MODIFY COLUMN TableNumber INT FIRST;


The next goal is to build this idea into a wx.Python GUI. A simple wizard seemed like the best way to break it down into the following steps from the user perspective:
  1. Connect to the source database. [host, dbname, user, pw]
  2. Choose which table pairs to include in the merge. [table name list]
  3. Choose a destination database to write master tables to. [dbname]
  4. Choose a prefix for your table names. eg: (prefix_per_image, prefix_per_object). [prefix]
I used wxGlade 0.6.3 to get the basic layout using a wx.notebook for each step of the process, but noted that I could not control which tabs of the notebook were enabled/disabled. I then took those same pages and plugged them into a wx.wizard using an example found here.

At present, the tool can perform steps 1 & 2 listed above, but some challenges still remain. Namely:
  1. Handle the rare case of merging only per_image tables.
  2. Ensure that the destination database does not already contain tables with the same name as the output tables.
    • Provide capability to remove these tables if they were created with this tool.
  3. Handle unpaired tables.
    • Warn for unpaired "per_image" or "per_object" tables.
    • What about metadata tables? eg: "plate_map"
So far this has been helpful for me to get a better idea of where I'm going, and it may provide a good place to reflect on where I've come from so I don't fall in any potholes more than once.

Monday, September 8, 2008

Merging RGB channels in python

How to merge 3 images which each contain grayscale data for a single channel into a single RGB image.

With PIL.Image:
import wx, numpy, Image
r = numpy.asarray(Image.open("r.tif"))
g = numpy.asarray(Image.open("g.tif"))
b = numpy.asarray(Image.open("b.tif"))
imRGB = Image.fromarray(numpy.dstack((r[:,:,0], g[:,:,0], b[:,:,0])))



With wx.Image:
import numpy, wx
def Merge(ims):
  width = ims[0].GetWidth()
  height = ims[0].GetHeight()
  dn = [numpy.fromstring(im.GetData(), 'uint8')[::3] for im in ims[:3]]
  for d in dn:
    d.shape=(height,width)
  img = wx.EmptyImage(width,height)
  img.SetData(numpy.dstack([d for d in dn]).flatten())
  return img
ims[0] = wx.Image('r.tif')
ims[1] = wx.Image('g.tif')
ims[2] = wx.Image('b.tif')
Merge(ims)

Wednesday, July 30, 2008

How to set the working directory in Eclipse

Here's an old one I don't want to forget:

How to set the working directory in Eclipse (so CPA can resolve paths to the Scripts directory).
  • Right click on the project in package explorer > properties
  • Under Run/Debug Settings click on JCPA and click the Edit Button
  • Under the Arguments tab of the new window, specify your working directory (ie: /Users/afraser/CPA)

Another Blog Saturated with a Coders Woes, Wins, and Wishes

My name is Adam Fraser. I'm a computer scientist at the Broad Institute of M.I.T. and Harvard. Moments ago I encountered a code roadblock that I've encountered before, and couldn't - for the life of me remember how I got around it last time. In the name of progress, and with the goal of not repeating history I've started this blog.