Wednesday, September 24, 2008

Merging MySQL Tables

My current task is to write a tool to allow users to merge tables created by CellProfiler into a master that will be used for the analysis software I'm writing. It's a simple objective except I'm still getting used to doing GUI development in wx.Python and learning which widgest to use and how is a lot like pulling teeth.

So! My thought is that I can use this blog to verbalize what I'm doing and how, and hopefully this will help me clear my mind and focus. This is something I haven't done before (heck, I've only made 2 posts in this blog so far), but I'm hopeful this will also help to re-instill that nerdy sense of accomplishment and learning that I felt when I first got into computer science.

The nature of our data is such that it is stored in pairs of tables (per-image-table, per-object-table). Yesterday I wrote MYSql statements to merge 3 table-pairs into a single pair.

# Create a new database where the master tables will be stored
# Create a master per-image table called "im" with the same definition as the old one
# Add a column "TableNumber" to keep track of which table each row is from
# Update the primary key
# Populate the dataset

CREATE DATABASE af;
CREATE TABLE im LIKE imageScreen.plate1_Per_Image;
ALTER TABLE im DROP PRIMARY KEY;
ALTER TABLE im ADD COLUMN TableNumber INT;
ALTER TABLE im ADD PRIMARY KEY (TableNumber, ImageNumber);
INSERT INTO im SELECT *,0 FROM
imageScreen.plate1_Per_Image;
INSERT INTO im SELECT *,1 FROM
imageScreen.plate2_Per_Image;
INSERT INTO im SELECT *,2 FROM
imageScreen.plate3_Per_Image;
ALTER TABLE im MODIFY COLUMN TableNumber INT FIRST;

# Do the same for per-object tables

CREATE TABLE ob LIKE
imageScreen.plate1_Per_Object;
ALTER TABLE ob DROP PRIMARY KEY;
ALTER TABLE ob ADD COLUMN TableNumber INT;
ALTER TABLE ob ADD PRIMARY KEY (TableNumber, ImageNumber, ObjectNumber);
INSERT INTO ob SELECT *,0 FROM
imageScreen.plate1_Per_Object;
INSERT INTO ob SELECT *,1 FROM
imageScreen.plate2_Per_Object;
INSERT INTO ob SELECT *,2 FROM
imageScreen.plate3_Per_Object;
ALTER TABLE ob MODIFY COLUMN TableNumber INT FIRST;


The next goal is to build this idea into a wx.Python GUI. A simple wizard seemed like the best way to break it down into the following steps from the user perspective:
  1. Connect to the source database. [host, dbname, user, pw]
  2. Choose which table pairs to include in the merge. [table name list]
  3. Choose a destination database to write master tables to. [dbname]
  4. Choose a prefix for your table names. eg: (prefix_per_image, prefix_per_object). [prefix]
I used wxGlade 0.6.3 to get the basic layout using a wx.notebook for each step of the process, but noted that I could not control which tabs of the notebook were enabled/disabled. I then took those same pages and plugged them into a wx.wizard using an example found here.

At present, the tool can perform steps 1 & 2 listed above, but some challenges still remain. Namely:
  1. Handle the rare case of merging only per_image tables.
  2. Ensure that the destination database does not already contain tables with the same name as the output tables.
    • Provide capability to remove these tables if they were created with this tool.
  3. Handle unpaired tables.
    • Warn for unpaired "per_image" or "per_object" tables.
    • What about metadata tables? eg: "plate_map"
So far this has been helpful for me to get a better idea of where I'm going, and it may provide a good place to reflect on where I've come from so I don't fall in any potholes more than once.

No comments: