<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

 <title>Michael Lai</title>
 <link href="http://mdlai.github.io/atom.xml" rel="self"/>
 <link href="http://mdlai.github.io/"/>
 <updated>2017-04-10T21:15:15+00:00</updated>
 <id>http://mdlai.github.io</id>
 <author>
   <name>Michael Lai</name>
   <email></email>
 </author>

 
 <entry>
   <title>Basketball Player Tracker</title>
   <link href="http://mdlai.github.io/2016/03/29/player-tracker/"/>
   <updated>2016-03-29T00:00:00+00:00</updated>
   <id>http://mdlai.github.io/2016/03/29/player-tracker</id>
   <content type="html">&lt;p&gt;I wanted to create a system that could track players in a basketball clip and translate them to a coordinate grid.&lt;/p&gt;

&lt;h3 id=&quot;background&quot;&gt;Background&lt;/h3&gt;
&lt;p&gt;This kind of motion tracking already exists in the form of SportVU.  But the aim of my project was to use the accessibility of YouTube clips to create my player tracking.&lt;/p&gt;

&lt;p&gt;By satisfying this constraint the system would have the potential to be applied in college stadiums and foreign leagues where $100,000 installation of 6 cameras just isn’t feasible.&lt;/p&gt;

&lt;p&gt;Player tracking has shown a lot of applications already in terms of strategic information, player health monitoring, and evaluating talent.  Making it more accessible levels the playing field of analytics.&lt;/p&gt;

&lt;h3 id=&quot;challenges&quot;&gt;Challenges&lt;/h3&gt;
&lt;p&gt;Creating this system means I have to overcome a few challenges.
&lt;img src=&quot;http://mdlai.github.io/images/player-tracker/challenges.png&quot; alt=&quot;img&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;projective-transform&quot;&gt;Projective Transform&lt;/h3&gt;
&lt;p&gt;After a series of filters, we find probabilistic Hough lines.  Using the baseline, sideline, free throw line, and bottom of the free throw lane, we can create a map.  Since the dimensions of a real court are known, we can place objects on the drawn court.
&lt;img src=&quot;http://mdlai.github.io/images/player-tracker/transform.png&quot; alt=&quot;img&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;player-identification&quot;&gt;Player Identification&lt;/h3&gt;
&lt;p&gt;The next challenge we need to address is identifying players in each frame.  Issues such as overlapping players and distorted lighting make this a significant challenge.  I applied a convolutional neural network based on &lt;a href=&quot;http://www.image-net.org/challenges/LSVRC/2013/slides/overfeat_ilsvrc2013.pdf&quot;&gt;OverFeat&lt;/a&gt;.  The model is implemented using TensorBox and creates bounding boxes on identified players.&lt;/p&gt;

&lt;p&gt;Here’s a Youtube video one of the early iterations applied to a &lt;a href=&quot;https://www.youtube.com/watch?v=eZK2_-rIzJE&quot;&gt;Youtube highlight reel&lt;/a&gt;.&lt;/p&gt;
&lt;iframe width=&quot;420&quot; height=&quot;315&quot; src=&quot;https://www.youtube.com/embed/Qd8l2MbkKnM&quot; frameborder=&quot;0&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;p&gt;The detection model has a parameter for threshold which allows less confident boxes to be included.  By reducing this threshold to 30% confidence, I increased my recall by 40% with a loss in precision.  The false positives will be filtered out in the next stage.
&lt;img src=&quot;http://mdlai.github.io/images/player-tracker/cnn.png&quot; alt=&quot;img&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;filtering-and-error-detection&quot;&gt;Filtering and Error Detection&lt;/h3&gt;
&lt;p&gt;I choose two colored filters, in this case purple and white, based on the teams that are playing.  By applying a white and purple filter to each bounding box I can create a feature, color occurrence frequency, that allows me to classify players by team.&lt;/p&gt;

&lt;p&gt;Furthermore, I can filter out erroneous bounding boxes by using this and detecting neither team color in the photo.
&lt;img src=&quot;http://mdlai.github.io/images/player-tracker/filter.png&quot; alt=&quot;img&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;results&quot;&gt;Results&lt;/h3&gt;
&lt;p&gt;When all these steps are applied to each frame of the video, the video can be reconstructed and looks something like this.
&lt;img src=&quot;http://mdlai.github.io/images/player-tracker/animate.gif&quot; alt=&quot;img&quot; /&gt;&lt;/p&gt;

&lt;p&gt;There’s a lot of room for improvement here.  As you can see players cut in and cut out.  Also the coordinates have significant noise.&lt;/p&gt;

&lt;p&gt;The takeaway is that each player can be mapped to a location on the court, based on their position on the screen and the landmarks on the court.&lt;/p&gt;

&lt;h3 id=&quot;future-work&quot;&gt;Future work&lt;/h3&gt;
&lt;p&gt;The model is trained on only 75 frames of training data.  I’d definitely look here to begin with improvements.&lt;/p&gt;

&lt;p&gt;To make the translation more versatile, I need to improve the handling of court detection.  This is particularly an issue when crossing half court.  Trying to detect objects on the court rather than lines might be more resistant to obstruction and ultimately prove more flexible.&lt;/p&gt;

&lt;p&gt;Next leveraging the temporal nature of film, I can reduce jittering by interpolating points between frames.  A Kalman filter might be applicable here.&lt;/p&gt;

&lt;p&gt;Finally I want to create a time series where I can link rectangles to each other.  My idea for this is to use either the Kalman filter or a HMM to create the continuous sequence of coordinates based on a probability function including distance and color.&lt;/p&gt;

&lt;p&gt;So there’s plenty of future work to do, and if you’d like to take a look at the code and try some of it for yourself check it out on my &lt;a href=&quot;https://github.com/mdlai/player_tracker&quot;&gt;github&lt;/a&gt;.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>NBA Player Descriptions</title>
   <link href="http://mdlai.github.io/2016/03/03/nba-text/"/>
   <updated>2016-03-03T00:00:00+00:00</updated>
   <id>http://mdlai.github.io/2016/03/03/nba-text</id>
   <content type="html">&lt;p&gt;Draftexpress.com does a lot of analysis.  Sometimes I wonder if it’s all the same or if they’re actually coming up with some new and novel way to describe people every time.&lt;/p&gt;

&lt;p&gt;I scraped 761 player profile pages, such as &lt;a href=&quot;http://www.draftexpress.com/profile/Karl-Towns-61831/&quot;&gt;this one&lt;/a&gt;, from between 2000-2015 and used Latent Dirichlet Allocation (LDA) to model what topics writers were talking about.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;http://mdlai.github.io/images/nba-text/topics.png&quot; alt=&quot;img&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Each topic encompasses a position, along with their desirable attributes.  The point guard and power forward positions oddly share a the phrase “mid range”.&lt;/p&gt;

&lt;p&gt;The positions are accurately categorized by the topics!  This confirms that the writing and  model capture positions correctly.
&lt;img src=&quot;http://mdlai.github.io/images/nba-text/byposition.png&quot; alt=&quot;img&quot; /&gt;&lt;/p&gt;

&lt;p&gt;There’s also a very odd relationship between the clusters and the years that fall into those clusters.
&lt;img src=&quot;http://mdlai.github.io/images/nba-text/byyear.png&quot; alt=&quot;img&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Since the two clusters containing these years also both contained the phrase mid range, I dug into the frequency of the occurrence of midrange.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;http://mdlai.github.io/images/nba-text/midrange.png&quot; alt=&quot;img&quot; /&gt;&lt;/p&gt;

&lt;p&gt;From the chart we can see that the usage of the phrase mid range is going out of style.  Today’s NBA is focused on efficiency, and it seems like writers are following that trend too.&lt;/p&gt;

&lt;p&gt;Apparently no one cares about midrange basketball anymore, not even NBA writers.  So long Kobe and MJ.  Hello Steph Curry.&lt;/p&gt;

&lt;p&gt;The chart below helped me better understand my data.&lt;/p&gt;

&lt;p&gt;It’s interactive!  Click on the vertical axis to filter out different lines.  It’s an easy way to filter each cluster in terms of year, pick, position, height, and weight.&lt;/p&gt;

&lt;iframe src=&quot;/images/nba-text/index.html&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; width=&quot;100%&quot; height=&quot;800&quot;&gt;&lt;/iframe&gt;
</content>
 </entry>
 
 <entry>
   <title>Predicting Numbers</title>
   <link href="http://mdlai.github.io/2016/02/23/Digit-Predictor/"/>
   <updated>2016-02-23T00:00:00+00:00</updated>
   <id>http://mdlai.github.io/2016/02/23/Digit-Predictor</id>
   <content type="html">&lt;p&gt;Lately I’ve been building a model that can read your mind… or just your handwriting.&lt;/p&gt;

&lt;p&gt;Using Tensorflow, Flask, and Heroku, I created an app that can guess numbers drawn on it.&lt;/p&gt;

&lt;div class=&quot;12u$&quot;&gt;&lt;span class=&quot;image fit&quot;&gt;&lt;img src=&quot;/images/lucky-number/presentation.jpg&quot; alt=&quot;&quot; /&gt;&lt;/span&gt;&lt;/div&gt;

&lt;p&gt;Here’s me presenting my app!&lt;/p&gt;

&lt;p&gt;The model is a convolutional neural network trained on the MNIST data set.  The next step would be for me to store the drawn data and add it to my model… maybe when I have a bit more time.&lt;/p&gt;

&lt;p&gt;It kinda sucks at guessing 9s and 0s, a consequence of a model that’s over fit to its specific training data.  Maybe adding some deformations to the dataset would improve performance.&lt;/p&gt;

&lt;p&gt;The instructions are simple, just click on the grid and draw a number.&lt;/p&gt;

&lt;iframe src=&quot;https://number-predictor.herokuapp.com/&quot; width=&quot;100%&quot; height=&quot;800&quot;&gt;
  &lt;p&gt;
    &lt;a href=&quot;https://number-predictor.herokuapp.com/&quot;&gt;
      Fallback link for browsers that, unlikely, don't support frames
    &lt;/a&gt;
  &lt;/p&gt;
&lt;/iframe&gt;

&lt;p&gt;If you want to see the code check out &lt;a href=&quot;https://github.com/mdlai/digit_recognition&quot;&gt;my github.&lt;/a&gt;&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>HTML5 To Jekyll</title>
   <link href="http://mdlai.github.io/2016/02/15/Blog-About-Blogs/"/>
   <updated>2016-02-15T00:00:00+00:00</updated>
   <id>http://mdlai.github.io/2016/02/15/Blog-About-Blogs</id>
   <content type="html">&lt;p&gt;I just finished my 5th week at Metis, so I decided what better way to celebrate than spend my weekend re-working my blog!&lt;/p&gt;

&lt;p&gt;Making Jekyll work with HTML5 is a bit of a headache.  I hope my pain can be your gain.&lt;/p&gt;

&lt;p&gt;There’s a great tutorial on how to convert a theme to &lt;a href=&quot;http://jekyll.tips/guide/setup/&quot;&gt;here&lt;/a&gt; but I’ll toss a few things that I found would’ve helped past me.  I’m giving tips from the perspective that you at least skimmed this guide.&lt;/p&gt;

&lt;h3 id=&quot;basics&quot;&gt;Basics&lt;/h3&gt;
&lt;ol&gt;
  &lt;li&gt;Find an HTML5 template you like.&lt;/li&gt;
  &lt;li&gt;Get Jekyll running.&lt;/li&gt;
  &lt;li&gt;Add the necessary folder structure and config files to your template.&lt;/li&gt;
  &lt;li&gt;Customize.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id=&quot;tips&quot;&gt;Tips&lt;/h3&gt;
&lt;p&gt;I’ll skip to number three since the first two should be pretty straight forward.  The best tip I have for past me is to break up each section into an _include.  It adds a lot of flexibility in how your layouts are designed.&lt;/p&gt;

&lt;p&gt;For example having a header is necessary, but adding specific HTML for say a specific menu allows you to have slightly different headers to suit your needs.&lt;/p&gt;

&lt;p&gt;In terms of customization, I spent far too much time wresting with the liquid code.  I really hope I didn’t miss the easy way, but finding indexes and going through arrays was a monstrosity.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;
  #Finds the index of the current page as well as the total for the number of posts.
   for post in site.posts
     assign max = forloop.length | minus : 2
     if page.url == post.url
       assign current = forloop.index
     endif
   endfor

  #If the post is less than 3 from the start or greater than 3 from the end,
  #use the first 5 pages or last 5 respectively.
   if current &amp;lt; 3
     assign current = 3
   elsif current &amp;gt; max
     assign current = max
   endif

  #For each post use two posts forward in time and two backwards in as the links in the sidebar.
   for post in site.posts
     assign last = current | plus : 3
     assign first = current | minus : 3
     if forloop.index &amp;gt; first and forloop.index &amp;lt; last
      &lt;!-- Show Post --&gt;
     endif
   endfor
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This little bit was necessary just to create chronologically adjacent sidebar links further than 1 post away.&lt;/p&gt;

&lt;p&gt;Finally, copy elements from other themes.  Tons of HTML5 based themes are available &lt;a href=&quot;http://jekyll.tips/templates/&quot;&gt;here&lt;/a&gt; and they’re already fit for jekyll&lt;/p&gt;

&lt;p&gt;Liquid is really annoying and Jekyll is really amazing.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;mailto:Michael@mdlai.com&quot;&gt;E-mail me.&lt;/a&gt;&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Cleaning Pictures</title>
   <link href="http://mdlai.github.io/2016/02/12/Cleaning-Pictures/"/>
   <updated>2016-02-12T00:00:00+00:00</updated>
   <id>http://mdlai.github.io/2016/02/12/Cleaning-Pictures</id>
   <content type="html">&lt;p&gt;This week I discovered that image preprocessing is a ton of work.&lt;/p&gt;

&lt;p&gt;My current project is creating a model that predicts digits using the MNIST data set.&lt;/p&gt;

&lt;p&gt;To improve the performance of the my model I needed to touch up my digits with a two fold strategy.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Create a bounding box&lt;/li&gt;
  &lt;li&gt;De-Skew the image&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Easy I thought.  Even with no experience doing image processing how hard could it be?  Well after step one I was feeling pretty confident.  A quick drop into the &lt;code class=&quot;highlighter-rouge&quot;&gt;skimage&lt;/code&gt; library and my numbers were bounded and resized.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;thresh = threshold_otsu(image)
image = image &amp;gt; thresh
binary = regionprops(image)[0].image.astype(float)
plt.matshow(resize(binary,(28,28)))
&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&quot;row uniform&quot;&gt;
  &lt;div class=&quot;6u&quot;&gt;&lt;span class=&quot;image fit&quot;&gt;&lt;img src=&quot;/images/cleaning-pictures/six.png&quot; alt=&quot;&quot; /&gt;&lt;/span&gt;&lt;/div&gt;
  &lt;div class=&quot;6u$&quot;&gt;&lt;span class=&quot;image fit&quot;&gt;&lt;img src=&quot;/images/cleaning-pictures/pca.png&quot; alt=&quot;&quot; /&gt;&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Then I thought about de-skewing or rotating my image.  And I kept thinking.  For three days I had no real clue where to go.  After reading a ton of papers and consulting Professor Google countless times I finally came to a workable solution.&lt;/p&gt;

&lt;p&gt;I would use PCA to determine a principal axis vector and use that to calculate an angle and rotate my image.  And it worked, sort of.  Maybe I’ll get some better results next week.&lt;/p&gt;

&lt;p&gt;If you have any methods or bits of knowledge you want to drop on me about image processing, I’d love to hear them.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;mailto:Michael@mdlai.com&quot;&gt;E-mail me.&lt;/a&gt;&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Determiner Noun: Number</title>
   <link href="http://mdlai.github.io/2016/01/29/Determiner-Noun/"/>
   <updated>2016-01-29T00:00:00+00:00</updated>
   <id>http://mdlai.github.io/2016/01/29/Determiner-Noun</id>
   <content type="html">&lt;p&gt;Since the dawn of time, humanity has sought the answers to what to name their movies if they want to make the most money.  Finally someone has nudged the the space of understanding.&lt;/p&gt;

&lt;p&gt;Beginning with a data set of just movie Titles, Revenue, and Theaters the struggle began, to break down Titles into parts of speech to find the answers I was looking for.  In order to accomplish this I met a friend, “Spacey” that helped me smash my Titles into tiny parts of speech.&lt;/p&gt;

&lt;div class=&quot;12u$&quot;&gt;&lt;span class=&quot;image fit&quot;&gt;&lt;img src=&quot;/images/determiner-noun/table.png&quot; alt=&quot;&quot; /&gt;&lt;/span&gt;&lt;/div&gt;

&lt;p&gt;Armed with my tiny parts of speech, I herded the Movies into bins based on their Total Theater counts.  In one bin small movies with 0-20 theaters, the next bin 20-1000, the final bin 1000+ theaters.  This way all the bins were of roughly equal size.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;image right&quot;&gt; &lt;img src=&quot;/images/determiner-noun/bins.png&quot; alt=&quot;&quot; /&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;In order to put my Total Grosses in line, I mushed them all through a natural log, which oddly enough make them more normal.&lt;/p&gt;

&lt;p&gt;As my quest for the answers progressed, I decided to cross my parts of speech with my bins.  Which gave me a new weapon: 56 features.  Again I had to normalize them, in order to make them the same size.  So they’d be a better fit, of course.&lt;/p&gt;

&lt;p&gt;I took these features and trained them on a ridge (regression).  And what a beautiful ridge regression it was.  The lambda was all the way to 10.&lt;/p&gt;

&lt;p&gt;After training them on the ridge regression, each feature produced a beta I knew I was close to finding the answers.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;http://mdlai.github.io/images/determiner-noun/betas.png&quot; alt=&quot;img&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In order to bait out the answers, I subtracted the high betas minus the low betas for each part of speech.  This was because my low betas were disguised, and were actually the baseline for my comparisons.&lt;/p&gt;

&lt;p&gt;Huzzah!  The answers!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;http://mdlai.github.io/images/determiner-noun/widerelease.png&quot; alt=&quot;img&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;http://mdlai.github.io/images/determiner-noun/midrelease.png&quot; alt=&quot;img&quot; /&gt;&lt;/p&gt;

&lt;p&gt;For wide release movies I discovered it was best to name a movie with a noun, number, and punctuation.  My guess is the number trend is a result of wide release sequels producing a lot of money.&lt;/p&gt;

&lt;p&gt;For mid release movies, adpositions and verbs were the way to go.&lt;/p&gt;

&lt;p&gt;While the results weren’t everything I was looking for, they were something worthwhile.  Maybe I’ll be able to find more in &lt;em&gt;Determiner Noun: Number + 1&lt;/em&gt;!&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Learning Hearthstone</title>
   <link href="http://mdlai.github.io/2016/01/19/Learning-Hearthstone/"/>
   <updated>2016-01-19T00:00:00+00:00</updated>
   <id>http://mdlai.github.io/2016/01/19/Learning-Hearthstone</id>
   <content type="html">&lt;p&gt;In any competitive game information is a huge advantage.  Imagine if an opponent played with an open hand.  Planning your own turn and planning your future turns would be a piece of cake.&lt;/p&gt;

&lt;p&gt;The big question is&lt;/p&gt;

&lt;h2 id=&quot;how-do-we-predict-our-opponents-next-move&quot;&gt;How do we predict our opponent’s next move?&lt;/h2&gt;

&lt;p&gt;Our data is going to be 50,000 games of data where we have our opponents sequence of cards played.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;http://mdlai.github.io/images/learning-hearthstone//data.png&quot; alt=&quot;png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;To do this we’re going to use n-grams.  N-grams, similar to the word engram, stores our data in a structure where we can easily recall it for relevant information.&lt;/p&gt;

&lt;p&gt;An N-gram is comprised of two components: the order and the level.  The level is the building block of our n-gram, the lowest level unit.  The order is the length of our n-gram, or the length, in units of our n-gram.&lt;/p&gt;

&lt;p&gt;N-grams are comprised of continuous strings pulled from a longer continuous string.  Here’s what our N-grams look like for our data.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;http://mdlai.github.io/images/learning-hearthstone/n-gram.png&quot; alt=&quot;png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Using the n-grams for prediction is just a stone throw away.  Suppose we want to know what comes after a leper gnome based on our data.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;http://mdlai.github.io/images/learning-hearthstone/prediction.png&quot; alt=&quot;png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Based on our data we see two cards that occur after a leper gnome.  So we apply equal weight and each has a 50% chance of being played next.&lt;/p&gt;

&lt;p&gt;Applying this model to Hearthstone requires a slight modification.  Since we can’t interrupt our opponent’s turn it’s more important to predict what their next turn will be rather than what card they’ll play next.  The application then looks like the following.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;http://mdlai.github.io/images/learning-hearthstone/application.png&quot; alt=&quot;png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The guy that implemented the model few other small tweaks and you can read more about it &lt;a href=&quot;https://www.elie.net/blog/hearthstone/predicting-hearthstone-opponent-deck-using-machine-learning&quot; title=&quot;Predicting Hearthstone&quot;&gt;here!&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;N-grams have other applications in things like spam filters, text prediction, voice recognition, and more.&lt;/p&gt;
</content>
 </entry>
 
 <entry>
   <title>Turning Over MTA Data</title>
   <link href="http://mdlai.github.io/2016/01/14/Turning-Over/"/>
   <updated>2016-01-14T00:00:00+00:00</updated>
   <id>http://mdlai.github.io/2016/01/14/Turning-Over</id>
   <content type="html">&lt;p&gt;For my first data science project at Metis, I worked with a group tasked finding the best MTA station for the purpose of placing a kiosk and promoting a Women in Technology gala.&lt;/p&gt;

&lt;h2 id=&quot;goal&quot;&gt;Goal&lt;/h2&gt;
&lt;p&gt;The focus of the project was to use MTA turnstile data in order to find the optimal station.  To this end we decided that the goal of the kiosk should be to find the highest quantity of high quality attendees.&lt;/p&gt;

&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
&lt;p&gt;The quantity was defined simply as the foot traffic in a station.  The quality was defined as the median donation value of that area.  This donation value data came from a 2012 study from a different organization.  Because the gala is scheduled for the Summer, we chose data from April and May to promote find the best time to promote the event.&lt;/p&gt;

&lt;p&gt;Our methodology was to first find the top 5 foot traffic stations, then cross them with the median donation value for their respective areas.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;image right&quot;&gt; &lt;img src=&quot;/images/turning-over/top_stations.png&quot; alt=&quot;&quot; /&gt;&lt;/span&gt;&lt;/p&gt;

&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
&lt;p&gt;The top 9 Stations were as follows.  The 86th St, 125 St, and 96 St stations were dropped due to the way the data was aggregated.  Those stations have multiple lines with the same names but different locations.  As a result the stations would aggregate foot traffic counts across multiple locations.&lt;/p&gt;

&lt;p&gt;Next we crudely took a product of the media contribution and the foot traffic to get some idea of the overall quality of a station.&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;We ordered them by our “quality” metric and the Wall St station turned out to be our best bet.  So we broke Wall St’s foot traffic down by the week to find a best day to send the street team.&lt;/p&gt;

&lt;div class=&quot;12u$&quot;&gt;&lt;span class=&quot;image fit&quot;&gt;&lt;img src=&quot;/images/turning-over/donations.png&quot; alt=&quot;&quot; /&gt;&lt;/span&gt;&lt;/div&gt;

&lt;p&gt;It appears Wednesdays or Thursdays are our best bet.  It turns out that other stations in the top 7 appear to follow a similar traffic trend.&lt;/p&gt;

&lt;div class=&quot;12u$&quot;&gt;&lt;span class=&quot;image fit&quot;&gt;&lt;img src=&quot;/images/turning-over/wall_st.png&quot; alt=&quot;&quot; /&gt;&lt;/span&gt;&lt;/div&gt;

&lt;p&gt;Some limitations of the analysis include the fact that median contributions for a location and MTA may only be loosely connected.  Furthermore things catching a lot of people in a hurry were not considered in the analysis.&lt;/p&gt;

&lt;h2 id=&quot;data-stuff&quot;&gt;Data Stuff&lt;/h2&gt;
&lt;p&gt;From my novice data cleaning perspective it was a significant challenge to correct the often screwed up turnstile data.  It was necessary to do things such as interpolating values when turnstiles reset, or deleting values when they bizarrely went backwards.  Automating all that was a nice learning experience.&lt;/p&gt;
</content>
 </entry>
 

</feed>
