Wednesday, October 20, 2010

careful when casting numpy arrays

It's been a while since I posted here since I've been posting my questions (and eventual answers like this one) to stackoverflow.com.  In any case, I was running into a problem, that I thought I'd share with you all.

I have some code that was creating a numpy array from a list of lists. Depending on the user's interactions with our program, one of the columns in the list could contain strings, so I was casting the array to type np.object.  However, for very specific inputs, I was encountering a strange error:

ValueError: setting an array element with a sequence.

So I set out to take a closer look at what was happening and this is how I was able to reproduce the error:

>>>np.array(['asdf', 1.3456e-9]).astype('object')[1:].astype('float')
Traceback (most recent call last):
File "", line 1, in
ValueError: setting an array element with a sequence.


And more frighteningly:

>>> np.array(['asdf', 1.345678e-9]).astype('object')[1:].astype('float')
array([ 1.345678])


By this point, you probably see what's going on here.  Since I am _casting_ the array to type np.object, numpy is fitting all of the elements into equally sized blocks of memory that fit the largest string in the array.  The problem is that it doesn't seem to take the floating point values into account, so they are apparently cast to strings and tuncated when they are too large.  In the case above it appears that the underlying data representation would be numpy "|S8"since the float value coming back out is 8 characters wide.

In another example, we see that when the largest string is long enough, the whole float representation gets stored and properly casted.

>>>np.array(['asdfasdfasd', 1.34567e-9]).astype('object')[1:].astype('float')
array([ 1.34567800e-09])


The message here is never to cast a mixed type array to np.object. Rather, create it with that type to begin with so numpy knows how much memory to allocate for your floating point values.

>>> np.array(['asdf', 1.3456789123456e-9], dtype='object')[1:].astype('float')
array([  1.34567891e-09])

Thursday, January 28, 2010

Matplotlib draggable legend

I thought I'd share my solution to the draggable legend problem since it took me forever to assimilate all the scattered knowledge on the mailing lists...
class DraggableLegend:
    def __init__(self, legend):
        self.legend = legend
        self.gotLegend = False
        legend.figure.canvas.mpl_connect('motion_notify_event', self.on_motion)
        legend.figure.canvas.mpl_connect('pick_event', self.on_pick)
        legend.figure.canvas.mpl_connect('button_release_event', self.on_release)
        legend.set_picker(self.my_legend_picker)
        
    def on_motion(self, evt):
        if self.gotLegend:
            dx = evt.x - self.mouse_x
            dy = evt.y - self.mouse_y
            loc_in_canvas = self.legend_x + dx, self.legend_y + dy
            loc_in_norm_axes = self.legend.parent.transAxes.inverted().transform_point(loc_in_canvas)
            self.legend._loc = tuple(loc_in_norm_axes)
            self.legend.figure.canvas.draw()
    
    def my_legend_picker(self, legend, evt): 
        return self.legend.legendPatch.contains(evt)   

    def on_pick(self, evt): 
        if evt.artist == self.legend:
            bbox = self.legend.get_window_extent()
            self.mouse_x = evt.mouseevent.x
            self.mouse_y = evt.mouseevent.y
            self.legend_x = bbox.xmin
            self.legend_y = bbox.ymin 
            self.gotLegend = 1
            
    def on_release(self, event):
        if self.gotLegend:
            self.gotLegend = False

#...and in your code...

def draw(self): 
        ax = self.figure.add_subplot(111)
        scatter = ax.scatter(np.random.randn(100), np.random.randn(100))
        legend = DraggableLegend(ax.legend())
update:
I emailed the Matplotlib-users group and John Hunter was kind enough to add my solution it to SVN HEAD.
On Thu, Jan 28, 2010 at 3:02 PM, Adam Fraser <adam.n.fraser@gmail.com> wrote:
> I thought I'd share a solution to the draggable legend problem since
> it took me forever to assimilate all the scattered knowledge on the
> mailing lists...
Cool -- nice example.  I added the code to legend.py.  Now you can do

leg = ax.legend()
leg.draggable()

to enable draggable mode.  You can repeatedly call this func to toggle
the draggable state.

Thanks,
JDH

Thursday, October 15, 2009

Redirecting python logging to a wx.TextCtrl

Today I was googling around for a bit of code that seems to lack a published solution. I wanted to simulate a console window in wxPython for logging purposes. The closest examples I could find were for redirecting stdout to a wx.TextCtrl (http://www.blog.pythonlibrary.org/2009/01/01/wxpython-redirecting-stdout-stderr/). Eventually I managed to find a partial solution here http://www.velocityreviews.com/forums/t515815-wxpython-redirect-the-stdout-to-a-textctrl.html. Anyway, I thought I'd share my working modification:

import wx
import logging

class WxLog(logging.Handler):
def __init__(self, ctrl):
logging.Handler.__init__(self)
self.ctrl = ctrl
def emit(self, record):
self.ctrl.AppendText(self.format(record)+"\n")

class MainFrame(wx.Frame):
def __init__(self):
wx.Frame.__init__(self, None, title="logging test")
self.level = 4
log = wx.TextCtrl(self, style=wx.TE_MULTILINE)
criticalbtn = wx.Button(self, -1 , 'critical')
errorbtn = wx.Button(self, -1 , 'error')
warnbtn = wx.Button(self, -1 , 'warn')
infobtn = wx.Button(self, -1 , 'info')
debugbtn = wx.Button(self, -1 , 'debug')
togglebtn = wx.Button(self, -1 , 'toggle level')
sizer = wx.BoxSizer(wx.VERTICAL)
sizer.Add(log, 1, wx.EXPAND)
sizer.Add(criticalbtn)
sizer.Add(errorbtn)
sizer.Add(warnbtn)
sizer.Add(infobtn)
sizer.Add(debugbtn)
sizer.Add(togglebtn)
self.SetSizer(sizer)

self.logr = logging.getLogger('')
self.logr.setLevel(logging.DEBUG)
hdlr = WxLog(log)
#hdlr.setFormatter(logging.Formatter('%(levelname)s | %(name)s |%(message)s [@ %(asctime)s in %(filename)s:%(lineno)d]'))
hdlr.setFormatter(logging.Formatter('%(levelname)s |%(message)s'))
self.logr.addHandler(hdlr)

self.Bind(wx.EVT_BUTTON, self.on_toggle, togglebtn)
self.Bind(wx.EVT_BUTTON, self.on_critical, criticalbtn)
self.Bind(wx.EVT_BUTTON, self.on_error, errorbtn)
self.Bind(wx.EVT_BUTTON, self.on_warn, warnbtn)
self.Bind(wx.EVT_BUTTON, self.on_info, infobtn)
self.Bind(wx.EVT_BUTTON, self.on_debug, debugbtn)

def on_toggle(self, evt):
self.level = (self.level+1) % 5
levels = [logging.CRITICAL, logging.ERROR, logging.WARN, logging.INFO, logging.DEBUG]
self.logr.setLevel(levels[self.level])
self.logr.critical('Logging level set to: %s: %s'%(self.level, logging.getLevelName(levels[self.level])))

def on_critical(self, evt):
self.logr.critical('Test message.')

def on_error(self, evt):
self.logr.error('Test message.')

def on_warn(self, evt):
self.logr.warn('Test message.')

def on_info(self, evt):
self.logr.info('Test message.')

def on_debug(self, evt):
self.logr.debug('Test message.')

if __name__ =="__main__":
app = wx.App(0)
frame = MainFrame()
frame.Show()
app.MainLoop()

UPDATE:
I discovered that the above method will not work if you have multiple threads using the same logger. This can be remedied by using a generic handler that simply calls a function and updating your wx.TextCtrl on idle.

class FuncLog(logging.Handler):
'''
A logging handler that sends logs to an update function
'''
def __init__(self, update):
logging.Handler.__init__(self)
self.update = update

def emit(self, record):
self.update(self.format(record))


class MainGUI(wx.Frame):
def __init__(self, parent, **kwargs):
...
self.console = wx.TextCtrl(self, -1, '', style=wx.TE_MULTILINE|wx.TE_READONLY)
...
self.logr = logging.getLogger()
# recent logs will be stored in this string
self.log_text = ''
def update(x):
self.log_text += x+'\n'
hdlr = FuncLog(update)
self.logr.addHandler(hdlr)
...
self.Bind(wx.EVT_IDLE, self.on_idle)

def on_idle(self, evt=None):
if self.log_text != '':
self.console.AppendText(self.log_text)
self.log_text = '

Enjoy!

Thursday, November 20, 2008

Classifier Milestones

Lady's and gents, I am officially absorbed. The Classifier development is seriously picking up momentum now, to the point where I'm going to need to slow down soon and do some serious refactoring to clean things up and document them better (ie: at all). Soon I'll be implementing mundane application-wide alerts and validation checks, but for now, I want to take a moment to look back at the milestones reached in the past 50 days:
  • Subsets: users can now define subsets of their data via SQL in their properties file.
    • This allows actions like "Highlight cells with CDK knockdowns." & "Show me cells of phenotype-x from my positive controls."
  • Replaced drag-and-drop mechanism with one of my own which presents no bugs.
  • Training sets can be saved and loaded. Classifier also prompts for save on close if the current training set hasn't been saved already.
  • Overcame threading hurdles so image tiles may be loaded in a separate thread while the user does manual sorting.
  • Added many shortcuts to speed up interaction.
    • Double-click on image tiles: view full image.
    • Esc: closes image viewer.
    • Ctrl+a: select all tiles on current board.
    • Ctrl+d: deselect all tiles.
    • Ctrl+i: invert selection.
    • Delete: remove selection from board.
    • Up & down arrows: scroll board contents.
    • Ctrl+1,2,3...: Show/hide channels 1,2,3... in classifier and image viewer.
    • Double-click grid-row-label: Show image/images in row group.
  • Sorting boards can be added, removed, and named.
  • Connected the classifier backend so users can now train and fetch objects from multiple phenotypes.
  • Tested the image reader on 12 & 16bit tifs, and Cellomics dibs.
  • Changed the internal image representation to float32 numpy arrays in [0.,1.]
  • Added new colors to the channel-color mapping mechanism. Users can now choose to map image channels to red, green, blue, cyan, magenta, yellow, gray, and none/hidden.
  • Implemented scoring of phenotype hits and enrichments on a per-image basis.
    • Added a table view which can be sorted by columns, launch the image viewer from rows, and save it's contents to csv.
  • Users can now define groupings in their properties file (eg: per-well, per-gene, per-plate), and use them to group their enrichment scores.
We're almost there. Today I'm hoping to package the the code for the first time with py2app and hand it over to our image assay developers to tinker with. With my todo-list growing at the rate that it is, I can only imagine how much work will lay ahead of me with their feedback.

Tuesday, October 7, 2008

Threading a fine line

I have come to realize something about myself as a programmer, and that is that my past programming experience has not been in development but in hacking. Sure, I've written a large chunk of code here or there, but the vast majority of the code I have written has been written in a blitz. From homework assignments in college, to writing Perl scripts for molecular dynamics research, the average lifetime of my code is probably about 2 weeks.

Things are different now that I've taken on the responsibility of writing software for analyzing large, image-based biological screens. It used to be easier to just dodge code speed bumps, "Hack it, just make it work because no one else will have to learn how to use it." Now when I encounter a speed bump, I can't go around it or even drive over it, I have to get out of the car and find a way to peel this damn thing off the road or consider the nightmare of paving a new one!

At the start of the day yesterday, I was bogged down by the thought of having two highly elusive bugs hiding somewhere in my code. I knew I couldn't just look the other way and continue with development until the code was 100% stable. The frustration was that the code seemed completely stable because the errors were often difficult to reproduce. Still, they were there, and I knew they weren't going to be easy to fix, so I set to mailing the wxpython-users group for help before attacking the problem on my own.

The first bug, was related to a worker thread whose job it was to fetch image tiles and add them to a bin. I noticed that if I was resizing the window while the thread was hard at work, I would occasionally get error output in my console which didn't mean anything to me. However, even less frequently, I would get other errors which would cause the thread to crash.

This was unnerving psychologically because neither of these errors was terribly disruptive to the user. Even in the case where the worker thread would crash, it could be easily restarted by clicking a button. Still, as a developer I realized that small problems like these will snowball as an application grows. They are far easier to thwart at their outset than to put it off in hopes of them resolving themselves.

After playing with it for a while I found that triggering lots of resize events on the bin when the thread was adding a cell was more likely to cause an error. Clearly there was a lack of mutual exclusion for the bin. The solution was simple enough, do the image loading in the worker, then signal the main thread whenever an image was ready.

Today I stand with one bug left to fix. This one unpredictably crashes the entire app in a few inept-user scenarios. Lets just say I've already started to paving over that road.

Monday, September 29, 2008

Developing biologist-friendly analysis software

Welp, the tool I wrote to merge MySQL tables is pretty much done with the exception of a few niceties. Next up, I'll be returning to writing CellProfiler Analyst with my sights set on providing the user the ability to define dynamic groups.

Users will want to be able to interrogate the data with their screen in mind. In Classifier they might ask:
  • Show me cells from plate X.
  • Show me cells from plate X, well Y.
  • Show me cells treated with Z.
  • Show me cells from control wells.
  • Show me cells NOT from W.
Likewise, this will be applied to our visualizations down the road:
  • Color a plot by treatment name.
  • Plot measurement_X vs treatments.
  • Select all points from control wells.
After talking to Ray, who has a habit of seeing difficult things for their inherent simplicity, I am working on a plan to convert classifier to handle dynamic grouping under the new database schema. Ray's suggestion was to define groups by their where-clauses. Here's what I've come up with for storing the information in the properties file.

groups = EMPTY, CDKs, Accuracy75
group_where_EMPTY = CPA_per_image.well=well_id.well AND well_id.Gene=EMPTY
group_tables_EMPTY = CPA_per_image, well_id,
group_where_CDKs = CPA_per_image.well=well_id.well AND well_id.Gene REGEXP 'CDK.*'
group_tables_CDKs = CPA_per_image, well_id,
group_where_Accuracy75 = CPA_per_image.well=well_pairs.well_a AND well_pairs.accuracy>=75
group_tables_Accuracy75 = CPA_per_image, well_pairs,

With this information is available in the app, we'll be able to load cell tiles like this:

"Show me 20 random cells from control wells."
# Get the list of images that fall in control wells.
SELECT per_image.ImageNumber, meta.ImageNumber, meta.control FROM per_image, meta WHERE per_image.ImageNumber = meta.ImageNumber AND meta.control = 1;

# Use the existing data model to generate 20 random cell
# keys (tblNum,imNum,obNum) that fall in these images.

# Get the paths to the images we need
SELECT image_channel_path, image_channel_file, TableNumber, ImageNumber FROM per_image WHERE TableNumber = cellKey[0] AND ImageNumber = cellKey[1];

# Get the cell positions
SELECT pos_x, pos_y, TableNumber, ImageNumber, ObjectNumber, FROM per_object WHERE TableNumber = cellKey[0] AND ImageNumber = cellKey[1] AND ObjectNumber = cellKey[2];


# Load and crop the images.
The first query could also be broken into two and joined in python...
SELECT imagenumber FROM per_image;
SELECT imagenumber, control FROM meta WHERE control=1;

Wednesday, September 24, 2008

Merging MySQL Tables

My current task is to write a tool to allow users to merge tables created by CellProfiler into a master that will be used for the analysis software I'm writing. It's a simple objective except I'm still getting used to doing GUI development in wx.Python and learning which widgest to use and how is a lot like pulling teeth.

So! My thought is that I can use this blog to verbalize what I'm doing and how, and hopefully this will help me clear my mind and focus. This is something I haven't done before (heck, I've only made 2 posts in this blog so far), but I'm hopeful this will also help to re-instill that nerdy sense of accomplishment and learning that I felt when I first got into computer science.

The nature of our data is such that it is stored in pairs of tables (per-image-table, per-object-table). Yesterday I wrote MYSql statements to merge 3 table-pairs into a single pair.

# Create a new database where the master tables will be stored
# Create a master per-image table called "im" with the same definition as the old one
# Add a column "TableNumber" to keep track of which table each row is from
# Update the primary key
# Populate the dataset

CREATE DATABASE af;
CREATE TABLE im LIKE imageScreen.plate1_Per_Image;
ALTER TABLE im DROP PRIMARY KEY;
ALTER TABLE im ADD COLUMN TableNumber INT;
ALTER TABLE im ADD PRIMARY KEY (TableNumber, ImageNumber);
INSERT INTO im SELECT *,0 FROM
imageScreen.plate1_Per_Image;
INSERT INTO im SELECT *,1 FROM
imageScreen.plate2_Per_Image;
INSERT INTO im SELECT *,2 FROM
imageScreen.plate3_Per_Image;
ALTER TABLE im MODIFY COLUMN TableNumber INT FIRST;

# Do the same for per-object tables

CREATE TABLE ob LIKE
imageScreen.plate1_Per_Object;
ALTER TABLE ob DROP PRIMARY KEY;
ALTER TABLE ob ADD COLUMN TableNumber INT;
ALTER TABLE ob ADD PRIMARY KEY (TableNumber, ImageNumber, ObjectNumber);
INSERT INTO ob SELECT *,0 FROM
imageScreen.plate1_Per_Object;
INSERT INTO ob SELECT *,1 FROM
imageScreen.plate2_Per_Object;
INSERT INTO ob SELECT *,2 FROM
imageScreen.plate3_Per_Object;
ALTER TABLE ob MODIFY COLUMN TableNumber INT FIRST;


The next goal is to build this idea into a wx.Python GUI. A simple wizard seemed like the best way to break it down into the following steps from the user perspective:
  1. Connect to the source database. [host, dbname, user, pw]
  2. Choose which table pairs to include in the merge. [table name list]
  3. Choose a destination database to write master tables to. [dbname]
  4. Choose a prefix for your table names. eg: (prefix_per_image, prefix_per_object). [prefix]
I used wxGlade 0.6.3 to get the basic layout using a wx.notebook for each step of the process, but noted that I could not control which tabs of the notebook were enabled/disabled. I then took those same pages and plugged them into a wx.wizard using an example found here.

At present, the tool can perform steps 1 & 2 listed above, but some challenges still remain. Namely:
  1. Handle the rare case of merging only per_image tables.
  2. Ensure that the destination database does not already contain tables with the same name as the output tables.
    • Provide capability to remove these tables if they were created with this tool.
  3. Handle unpaired tables.
    • Warn for unpaired "per_image" or "per_object" tables.
    • What about metadata tables? eg: "plate_map"
So far this has been helpful for me to get a better idea of where I'm going, and it may provide a good place to reflect on where I've come from so I don't fall in any potholes more than once.