Monday, September 29, 2008

Developing biologist-friendly analysis software

Welp, the tool I wrote to merge MySQL tables is pretty much done with the exception of a few niceties. Next up, I'll be returning to writing CellProfiler Analyst with my sights set on providing the user the ability to define dynamic groups.

Users will want to be able to interrogate the data with their screen in mind. In Classifier they might ask:
  • Show me cells from plate X.
  • Show me cells from plate X, well Y.
  • Show me cells treated with Z.
  • Show me cells from control wells.
  • Show me cells NOT from W.
Likewise, this will be applied to our visualizations down the road:
  • Color a plot by treatment name.
  • Plot measurement_X vs treatments.
  • Select all points from control wells.
After talking to Ray, who has a habit of seeing difficult things for their inherent simplicity, I am working on a plan to convert classifier to handle dynamic grouping under the new database schema. Ray's suggestion was to define groups by their where-clauses. Here's what I've come up with for storing the information in the properties file.

groups = EMPTY, CDKs, Accuracy75
group_where_EMPTY = CPA_per_image.well=well_id.well AND well_id.Gene=EMPTY
group_tables_EMPTY = CPA_per_image, well_id,
group_where_CDKs = CPA_per_image.well=well_id.well AND well_id.Gene REGEXP 'CDK.*'
group_tables_CDKs = CPA_per_image, well_id,
group_where_Accuracy75 = CPA_per_image.well=well_pairs.well_a AND well_pairs.accuracy>=75
group_tables_Accuracy75 = CPA_per_image, well_pairs,

With this information is available in the app, we'll be able to load cell tiles like this:

"Show me 20 random cells from control wells."
# Get the list of images that fall in control wells.
SELECT per_image.ImageNumber, meta.ImageNumber, meta.control FROM per_image, meta WHERE per_image.ImageNumber = meta.ImageNumber AND meta.control = 1;

# Use the existing data model to generate 20 random cell
# keys (tblNum,imNum,obNum) that fall in these images.

# Get the paths to the images we need
SELECT image_channel_path, image_channel_file, TableNumber, ImageNumber FROM per_image WHERE TableNumber = cellKey[0] AND ImageNumber = cellKey[1];

# Get the cell positions
SELECT pos_x, pos_y, TableNumber, ImageNumber, ObjectNumber, FROM per_object WHERE TableNumber = cellKey[0] AND ImageNumber = cellKey[1] AND ObjectNumber = cellKey[2];


# Load and crop the images.
The first query could also be broken into two and joined in python...
SELECT imagenumber FROM per_image;
SELECT imagenumber, control FROM meta WHERE control=1;

No comments: