Monday, September 29, 2008

Developing biologist-friendly analysis software

Welp, the tool I wrote to merge MySQL tables is pretty much done with the exception of a few niceties. Next up, I'll be returning to writing CellProfiler Analyst with my sights set on providing the user the ability to define dynamic groups.

Users will want to be able to interrogate the data with their screen in mind. In Classifier they might ask:
  • Show me cells from plate X.
  • Show me cells from plate X, well Y.
  • Show me cells treated with Z.
  • Show me cells from control wells.
  • Show me cells NOT from W.
Likewise, this will be applied to our visualizations down the road:
  • Color a plot by treatment name.
  • Plot measurement_X vs treatments.
  • Select all points from control wells.
After talking to Ray, who has a habit of seeing difficult things for their inherent simplicity, I am working on a plan to convert classifier to handle dynamic grouping under the new database schema. Ray's suggestion was to define groups by their where-clauses. Here's what I've come up with for storing the information in the properties file.

groups = EMPTY, CDKs, Accuracy75
group_where_EMPTY = CPA_per_image.well=well_id.well AND well_id.Gene=EMPTY
group_tables_EMPTY = CPA_per_image, well_id,
group_where_CDKs = CPA_per_image.well=well_id.well AND well_id.Gene REGEXP 'CDK.*'
group_tables_CDKs = CPA_per_image, well_id,
group_where_Accuracy75 = CPA_per_image.well=well_pairs.well_a AND well_pairs.accuracy>=75
group_tables_Accuracy75 = CPA_per_image, well_pairs,

With this information is available in the app, we'll be able to load cell tiles like this:

"Show me 20 random cells from control wells."
# Get the list of images that fall in control wells.
SELECT per_image.ImageNumber, meta.ImageNumber, meta.control FROM per_image, meta WHERE per_image.ImageNumber = meta.ImageNumber AND meta.control = 1;

# Use the existing data model to generate 20 random cell
# keys (tblNum,imNum,obNum) that fall in these images.

# Get the paths to the images we need
SELECT image_channel_path, image_channel_file, TableNumber, ImageNumber FROM per_image WHERE TableNumber = cellKey[0] AND ImageNumber = cellKey[1];

# Get the cell positions
SELECT pos_x, pos_y, TableNumber, ImageNumber, ObjectNumber, FROM per_object WHERE TableNumber = cellKey[0] AND ImageNumber = cellKey[1] AND ObjectNumber = cellKey[2];


# Load and crop the images.
The first query could also be broken into two and joined in python...
SELECT imagenumber FROM per_image;
SELECT imagenumber, control FROM meta WHERE control=1;

Wednesday, September 24, 2008

Merging MySQL Tables

My current task is to write a tool to allow users to merge tables created by CellProfiler into a master that will be used for the analysis software I'm writing. It's a simple objective except I'm still getting used to doing GUI development in wx.Python and learning which widgest to use and how is a lot like pulling teeth.

So! My thought is that I can use this blog to verbalize what I'm doing and how, and hopefully this will help me clear my mind and focus. This is something I haven't done before (heck, I've only made 2 posts in this blog so far), but I'm hopeful this will also help to re-instill that nerdy sense of accomplishment and learning that I felt when I first got into computer science.

The nature of our data is such that it is stored in pairs of tables (per-image-table, per-object-table). Yesterday I wrote MYSql statements to merge 3 table-pairs into a single pair.

# Create a new database where the master tables will be stored
# Create a master per-image table called "im" with the same definition as the old one
# Add a column "TableNumber" to keep track of which table each row is from
# Update the primary key
# Populate the dataset

CREATE DATABASE af;
CREATE TABLE im LIKE imageScreen.plate1_Per_Image;
ALTER TABLE im DROP PRIMARY KEY;
ALTER TABLE im ADD COLUMN TableNumber INT;
ALTER TABLE im ADD PRIMARY KEY (TableNumber, ImageNumber);
INSERT INTO im SELECT *,0 FROM
imageScreen.plate1_Per_Image;
INSERT INTO im SELECT *,1 FROM
imageScreen.plate2_Per_Image;
INSERT INTO im SELECT *,2 FROM
imageScreen.plate3_Per_Image;
ALTER TABLE im MODIFY COLUMN TableNumber INT FIRST;

# Do the same for per-object tables

CREATE TABLE ob LIKE
imageScreen.plate1_Per_Object;
ALTER TABLE ob DROP PRIMARY KEY;
ALTER TABLE ob ADD COLUMN TableNumber INT;
ALTER TABLE ob ADD PRIMARY KEY (TableNumber, ImageNumber, ObjectNumber);
INSERT INTO ob SELECT *,0 FROM
imageScreen.plate1_Per_Object;
INSERT INTO ob SELECT *,1 FROM
imageScreen.plate2_Per_Object;
INSERT INTO ob SELECT *,2 FROM
imageScreen.plate3_Per_Object;
ALTER TABLE ob MODIFY COLUMN TableNumber INT FIRST;


The next goal is to build this idea into a wx.Python GUI. A simple wizard seemed like the best way to break it down into the following steps from the user perspective:
  1. Connect to the source database. [host, dbname, user, pw]
  2. Choose which table pairs to include in the merge. [table name list]
  3. Choose a destination database to write master tables to. [dbname]
  4. Choose a prefix for your table names. eg: (prefix_per_image, prefix_per_object). [prefix]
I used wxGlade 0.6.3 to get the basic layout using a wx.notebook for each step of the process, but noted that I could not control which tabs of the notebook were enabled/disabled. I then took those same pages and plugged them into a wx.wizard using an example found here.

At present, the tool can perform steps 1 & 2 listed above, but some challenges still remain. Namely:
  1. Handle the rare case of merging only per_image tables.
  2. Ensure that the destination database does not already contain tables with the same name as the output tables.
    • Provide capability to remove these tables if they were created with this tool.
  3. Handle unpaired tables.
    • Warn for unpaired "per_image" or "per_object" tables.
    • What about metadata tables? eg: "plate_map"
So far this has been helpful for me to get a better idea of where I'm going, and it may provide a good place to reflect on where I've come from so I don't fall in any potholes more than once.

Monday, September 8, 2008

Merging RGB channels in python

How to merge 3 images which each contain grayscale data for a single channel into a single RGB image.

With PIL.Image:
import wx, numpy, Image
r = numpy.asarray(Image.open("r.tif"))
g = numpy.asarray(Image.open("g.tif"))
b = numpy.asarray(Image.open("b.tif"))
imRGB = Image.fromarray(numpy.dstack((r[:,:,0], g[:,:,0], b[:,:,0])))



With wx.Image:
import numpy, wx
def Merge(ims):
  width = ims[0].GetWidth()
  height = ims[0].GetHeight()
  dn = [numpy.fromstring(im.GetData(), 'uint8')[::3] for im in ims[:3]]
  for d in dn:
    d.shape=(height,width)
  img = wx.EmptyImage(width,height)
  img.SetData(numpy.dstack([d for d in dn]).flatten())
  return img
ims[0] = wx.Image('r.tif')
ims[1] = wx.Image('g.tif')
ims[2] = wx.Image('b.tif')
Merge(ims)