projects

Paid Parental Leave Policies at US and Canadian Universities APR 2018

We released an open-source dataset of the paid parental leave policies for 205 US and Canadian universities. The accompanying website features a neat visualization in D3 and some preliminary analyses for those interested. This was joint work with Sam F. Way, Aaron Clauset, Dan Larremore, and Mirta Galesic.

Salsa Competition MAR 2018

Building off of our baking competition, I organized a salsa contest this semester. Sixteen dips were entered, and two won - best mild () and hot. To prepare for the competition, I attempted to make a data-driven salsa by mining salsa recipes online and looking for ingredients that are predictive of highly rated recipes. Details of this analysis and the supplementary code, can be found here.

"The Great Boulder Bakeoff" DEC 2017

Eight participants contributed a batch of cookies. Each cookie was judged by thirteen critics according to four different categories: most creative, best looking, best tasting, and best texture. While there are clear standouts shown in the histograms below, they were all good cookies. Code to generate these figures can be found here.

Prediction in Projection Using Google Search Trends MAY 2017

Using state-space reconstruction on a scalar time series, can Google search trends be predicted? Check out the Github repository or the project write-up.

Flappy BI/O NOV-DEC 2016

Inspired by Seth Bling's (super awesome) MarI/O, we implemented Kenneth Stanley and Risto Miikkulainen's NeuroEvolution of Augmenting Topologies (NEAT) algorithm to try to win Flappy Bird. NEAT is a genetic algorithm for evolving neural networks, and it relies on three key principles: (1) tracking the history of genes to determine suitable networks to mate (called crossover), (2) evolving successful networks further (called speciation), and (3) starting from the simplest neural networks possible and complexifying only out of necessity.

Unlike Super Mario World, Flappy Bird navigates a randomly determined playing field, posing an interesting challenge for NEAT. Our code was built on top of an existing Flappy Bird pygame, and that's about it. We've implemented our own neural networks.

You can follow the steps below to watch our implementation of NEAT in action. You must install Python, pygame and git.

    git clone https://github.com/Brennan-M/5622_PacMan_NN.git
    git checkout NEAT_Master
    python flappy_driver.py

If you'd like to learn more, here is a video of our final presentation.

Anomalyzer SEP-NOV 2014

Probabilistic anomaly detection for time series written in Go. Blog post about the work featured on front-page of Hacker News August 13th.

Example

    conf := &anomalyzer.AnomalyzerConf{
        Sensitivity: 0.1,
        UpperBound:  5,
        LowerBound:  anomalyzer.NA, // ignore the lower bound
        ActiveSize:  1,
        NSeasons:    4,
        Methods:     []string{"diff", "fence", "highrank", "lowrank", "magnitude"},
    }

    // initialize with empty data or an actual slice of floats
    data := []float64{0.1, 2.05, 1.5, 2.5, 2.6, 2.55}
    anom, _ := anomalyzer.NewAnomalyzer(conf, data)

    // `Push(point)` automatically triggers a recalcuation of the
    // anomalous probability.  recalculation can also be triggered
    // by a call to `Eval()`.
    prob := anom.Push(8.0)
    fmt.Println("anomalous probability:", prob)

Multiclass Naive Bayesian Classification NOV-DEC 2014

Often in document classification, a document may have more than one relevant classification -- a question on stackoverflow might have tags go, map, and interface. While multinomial Bayesian classification offers a one-of-many classification, multibayes offers tools for many-of-many classification.

A simple use case for our naive Bayesian classifier is decribed in "Catching Clickbait: Using a Naive Bayesian Classifier in Go". Inspired by Paul Graham's "Plan for Spam", I scraped 10,000 headlines to train our classifier to recognize clickbait (e.g. “17 Facts You Won’t Believe Are True”, “18 Pugs Who Demand To Be Taken Seriously”, etc). At the end of this article is a fun interactive classifier where you can find posterior probability of a new headline being clickbait.

Example

documents := []struct {
    Text    string
    Classes []string
}{
    {
        Text:    "My dog has fleas.",
        Classes: []string{"vet"},
    },
    {
        Text:    "My cat has ebola.",
        Classes: []string{"vet", "cdc"},
    },
    {
        Text:    "Aaron has ebola.",
        Classes: []string{"cdc"},
    },
}

classifier := NewClassifier()

// train the classifier
for _, document := range documents {
    classifier.Add(document.Text, document.Classes)
}

// predict new classes
probs := classifier.Posterior("Aaron's dog has fleas.")
fmt.Printf("posterior probabilities: %+v\n", probs)

// posterior probabilities: map[vet:0.8571 cdc:0.2727]

Musical Staircase APR-MAY 2013

Built a musical staircase using an Arduino Uno and 16 pairs of lasers and photoresistors. Featured in Reed Magazine. Here is a video of the staircase in action.