This semester’s research practicum has been focusing extending some prior work on dynamic social network analysis of Free/Libre Open Source Software (FLOSS) development teams. In practical terms, this means that we’re analyzing the patterns of communication in email lists for a selection of FLOSS projects. The long-term goal is to scale it up to analyze hundreds of projects, so this also becomes a matter of building the social science cyberinfrastructure to make it all possible.
Due to a number of factors, nothing has progressed according to my original expectations - all part of a valuable learning experience, no question. Despite plenty of practical delays, I’m finally making more tangible progress now that the semester is nearly over. The first thing I managed to turn out was something I’ve wanted to do for years now - a dynamic edge weight decay function and algorithm! This means that the older an email message is within the time frame of analysis, the less weight it receives, according to an exponential decay function. While it took some time to wrap my head around how to pull it off, the implementation is pretty simple.
The work is now all being done using Taverna Workbench, which is an open-source scientific workflow management tool that was originally created to facilitate research collaboration in the life sciences, especially genomics. I could rave for ages about how great this application is, but here are the main highlights: I can configure a set of components (some provided, some created) for the desired analysis, the intermediate inputs and outputs of which can be examined to identify problems with the analysis, and the application generates its own unique-in-all-the-world identifier for each workflow and keeps its own audit trail. I can’t help but compare this to my experiences using web analytics vendor applications, and a WA application that functions like Taverna would have been so much better for that work. It’s a real analysis tool as opposed to a black box, and believe it or not, some folks are actually smart enough to configure their own analysis if you just give them the pieces they need to do it and the support functions (like the audit features and intermediate data transparency) to make it practical.
The current results over which I’m overjoyed are simple (ugly) R graphs of network centralization. The impressive part is what had to happen to make them. First, all the email list data is run through a workflow that generates a sociomatrix for the specified dates - in this case, I’m sticking with month-by-month. I only created a small piece of that workflow, so I can’t take credit for its awesomeness. Literally hundreds of messages are being automatically parsed into events that are then assigned a weight for the network edges as the workflow queries a web database and performs all kinds of mad transformation using built-in Java widgets and other web services, until there’s a nice neat data input file for my analysis. That alone is enough to make me weak in the knees, but there’s more! The second workflow process, which I created all by myself from Rshell components, takes in a couple of lists that I would consider trivial to generate, calculates the network centralization for each matrix, and spits out a time-series graph of the network centralizations.
Eventually, these pieces will be integrated and the way the workflow runs will be refined until it’s almost as easy as pushing a button. I still find it rather amazing to be able to distill so much data with such speed and replicability. I’m also really excited that I can make these things work almost entirely by myself and that I can make a “tangible” contribution to building the social science cyberinfrastructure that will some day be available for others to use. The best part of all is that it’s fun. It’s also incredibly frustrating about 80% of the time, in the way that highly interdependent analysis problem-solving has a tendency to be, but I love doing that kind of work and it’s very rewarding.







March 21st, 2008 at 1:49 pm
What Is Workflow Analysis…
I enjoyed reading your blog. It is so interesting reading other peoples personal take on a subject….