thoughts...

  • The technology behind Tweetrad.io. |

    Wow it’s been an exciting couple of days! Tweetrad.io has received nearly 28,000 pageviews in the past 2 days thanks to a successful Hacker News article, a blog post from Mashable, a celebrity tweet from Alyssa Milano, and subsequent viral action on twitter. I’m pleased to say that Tweetrad.io for the most part has stood up well to this traffic spike.

    traffic

    Here’s a brief write-up on the technology behind Tweetrad.io, how the system evolved and the reasons behind our technology choices.

    Tweetrad.io was initially born out of a weekend hacking session where I was playing around with the idea of a ruby daemon that would search twitter and then fire off a configurable ruby script. The first script I wrote to demonstrate the capabilities of what at the time I was calling “Twobots (twitter robots)” was to just run the tweet text through OS X’s say command. I brought the project into work on Monday to show my coworkers and we all got a good laugh out of listening to the voices read various humorous and mostly NSFW tweets. Edwin immediately saw the entertainment potential and we decided to partner on it as a side project.

    Our First-pass Architecture

    We knew early on that processing text to speech on a high volume of tweets called for a queue based architecture. The initial plan for Tweetrad.io called for a lightweight application server and three types of asynchronous daemon services (Searchers, Converters, Monitors) all running on Amazon Web Services.

    first-pass architecture

    When a user ran a query on this version of Tweetrad.io the web application added a row to the database with the search query the user provided. A searcher daemon would see the unprocessed query in the database and fire off a query to the twitter search api (via the excellent Grackle library). The retrieved tweets were then pushed onto an SQS queue based on MD5 hash of the search query. Next a converter daemon would see the new queue and burn through it converting each tweet to an MP3 utilizing the open source Festival TTS engine. The Mp3 was then written to an S3 public bucket from whence it could be served back to the javascript client directly. A pool of Monitor services was responsible for making sure that tweets were periodically cleaned up from S3 as they reached a max age threshold. This was necessary to avoid incurring high S3 storage bills.

    This architecture was cleanly organized and allowed us to tune the conversion process by managing the numbers of each type of daemon; however, as we tested we quickly discovered that some of our initial ideas would not work in production.

    Problems We Encountered

    Twitter Search API Rate Limiting
    Although running a pool of searcher bots on EC2 seemed like a good way to scale up to handle lots of concurrent queries, we quickly ran up against rate-limiting on our search API calls. My coworker Aaron suggested that we move querying down to the client to distribute the API hits. This worked like a charm and eliminated the need for the Searcher service pool altogether. Now when a user searches that query is run on the client directly via a jsonp call to the twitter search api. The json for the retrieved tweets is posted directly to our Sinatra web application, which handles putting the tweets into queue for conversion.

    Problems with Festival TTS
    Initially we planned to do the majority of our text to speech conversion using Festival running on our App Server and potentially scaling out additinal EC2 instances as needed. Although Festival had some excellent voices and it’s support for SABLE allowed us to generate fairly natural sounding speech, we found that running it on the server caused our load average to spike as query intensity increased. We also found that OS X’s say command provided somewhat more natural sounding voices. Our solution was to run the conversion processes on a cluster of several OS X boxes running at our homes. This allowed us to alleviate load on the Web Server and leverage the high quality text to speech in OS X while simultaneously reducing our EC2 bill.

    Concurrency Issues with ActiveRecord
    Each converter service runs up to 20 concurrent conversions. I initially had a lot of problems getting ActiveRecord to work properly in a multithreaded ruby script. Eventually I discovered the problem was with the Mysql ruby gem. The solution came in the form of the Mysqlplus gem from neverblock (more info). If you are tearing your hair out trying to get a multithreaded ruby script to interact with a mysql database I highly recommend checking out this project.

    Cost of running on AWS
    Although AWS provides excellent scalable infrastructure at a reasonable price, Tweetrad.io, as a side-project with no funding, is operating on a shoestring budget. In development our AWS bills were pretty small, but I had concerns that if we started to generate real traffic the bills could go up quickly. Notably our architecture relied heavily on SQS; and the daemon jobs constantly polling the queue for updates was causing SQS to be a surprisingly significant percentage of our bill. To alleviate these billing concerns we decided to see if we could run the service on hardware freely available to us. Edwin owns a collocated rackmount server that he uses for testing from time to time. We decide to use this machine as our web application server. Since our Monitor service ensures the number of Tweet mp3s is kept relatively small we realized we didn’t really need the scalability of S3. We opted to just use the local disk on our Application server. Further the Converter services were distributed out across several home computers accessing the database queue over the internet and using scp to write the converted mp3s back to the web server.

    This brings us to…

    Our Current Architecture

    Our new architecture while less impressive on paper, is simpler and more cost effective then the initial architecture we designed. Although it may be necessary to scale out on AWS at some point, the current solution is standing up to load nicely at the moment.

    current architecture

    With our current architecture, when a user hits Tweetrad.io with a search query our page-cached javascript and html client is returned to thee browser. The javascript client then fires off a jsonp request to the twitter search api and loads the results into a client side tweet cache. Meanwhile a player built with Soundmanager2 starts checking the local tweet cache for unplayed tweets. When an unplayed tweet is found the player makes a get request to a canonical url based on the tweet id. If the file is found it is streamed from the server and played. If a 404 is received the tweet json is posted to the sinatra service at /convert. The sinatra service checks to see if this tweet has already been queued for conversion. If not the tweet is written to the mysql queue (This is the only dynamic action in the web application. Everything else is page cached). Converter processes running on various OS X boxes outside the datacenter poll the mysql queue directly. When a converter finds a row to be processed it locks the row so other converters won’t pick up the job. The converter then runs it’s conversion process and pushes the file back to our application server using scp. Back on the client polling for the converting mp3 has been continuing periodically. As the tweet is now available in the app server’s audio tweet cache, the next request for the tweet mp3 will be a 200 and sound manager begins playing the tweet. The player keeps track of how many unplayed tweets are in the queue and periodically goes back to twitter for more recent or previous page tweets for conversion.

    Open Source Projects we use
    A big thank you to the developers of all the open source software that tweetrad.io runs on. Without these projects we’d never have gotten this bird off the ground.

    Ruby – our language
    Mysql – our database
    nginx – our webserver
    vlad the deployer
    passenger – easy deployment for rack based applications
    Rack – ruby webserver interface
    Sinatra – framework for building lightweight web applications and services
    sinatra-cache – page cache plugin for sinatra Daemons – ruby gem we use to daemons our various services
    ActiveRecord – domain model for our database queue
    Mysqlplus – allow us to do threadsafe mysql access in ruby
    Prototype – javascript extensions and dom utilities
    Scriptaculous – effects for morphing css
    SoundManager2 – javascript library for playing audio
    Raphael – javascript vector drawing library (provides fun radio wave animation)
    Grackle – not currently using but was a big part of the early prototypes
    Festival TTS – not currently using but will use if our scalability needs grow beyond what we can support on our local os x cluster

    Thanks again for checking out Tweetrad.io and remember “At Tweetrad.io, we read the tweets so you don’t have to!”

    View Comments


  • Shareflow now has an Air Application |

    I just wrapped up development on version 1 of an Adobe AIR client for Shareflow. It was a great experience working with AIR. I was able to port my JS-fu to a desktop application with a minimal learning curve. On the Desktop I was able to take advantage of things like offline encrypted storage, API’s to play sounds and animate application icons, and Drag and Drop file uploading.

    I posted about it on the Zenbe Blog here: Shareflow On Your Desktop

    View Comments


  • Quakespotter an App for America |

    I helped my friend and former coworker Jeremy Ashkenas finish up a submission for the Apps for America 2 competition sponsored by Sunlight Labs"

    Quakespotter is a visualization of recent earthquakes plotted on a 3d globe with some nice integrations to twitter, usgs, google news/maps and the red cross (for disaster relief donations).

    The app was built with Ruby-Processing.

    You can download it at quakespotter.org.

    View Comments


  • New Blog Post on blog.zenbe.com |

    There’s a new blog post by me over at blog.zenbe.com on a search feature I developed for shareflow. Check it out.

    http://blog.zenbe.com/vzwmi.

    View Comments


  • Recently |


    View Comments


  • open tape of what I'm listening to |

    I just setup open tape on my favorite new domain tunes.toodleobootsnook.info. If you are wondering about the name, I was joking around with a friend about possible domain names for an as yet unreleased side project and this was the most ridiculous one I came up with. Hey 99 cents will still buy something.

    Anyway, feel free to check out what I’m listening to lately. If you have a song you’d like to share with me, and the millions of readers of this fine web publication you can upload songs to share.toodleobootsnook.info. Just click the admin link, the password is grimlock.

    View Comments


  • Life lessons from Rambo |

    excerpted from Rambo’s commencement speech…
    …sometimes in life you just need to fire "explosive laden arrows of dedication" from your "compound bow of perseverance" to overcome the "unwitting russian supply convoy of self-doubt"…

    View Comments


  • Random Hex Colors with Javascript |

    Generate random hex colors with javascript

    randomHexColor : function(){
      var c = Math.round(Math.random() *16777216).toString(16);
      while (c.length < 6){c='0'+c;};
      return '#' + c;
    }
    
    randomHexColor : function(){
    var c = Math.round(Math.random() *16777216).toString(16);
    while (c.length < 6){c='0'c;};
    return '#' c;
    }

    View Comments


  • Javascript Rogaine |
    Now you can cover those unsightly bald spots on websites with Javascript Rogaine. Regrow hair the natural "javascript" way…with Javascript Rogaine

    Javascript Rogaine Bookmarklet: drag to your bookmark bar
    javascript rogaine

    View Comments


  • useful little script for serving up a directory in development |
    #!/usr/bin/env ruby
    
    require 'webrick'
    include WEBrick
    
    def run_server mount
      mount ||= Dir::pwd
      puts "starting http server at: #{mount}"
      s = HTTPServer.new(
        :Port            => 2000,
        :DocumentRoot    => mount
      )
    
      ## mount subdirectories
      s.mount("/", HTTPServlet::FileHandler, mount, true)
      trap("INT"){ s.shutdown }
      s.start
    end
    
    if __FILE__ == $0
      run_server ARGV[0]
    
    View Comments


  • The Great Wave off Kanagawa |

    This weekend I started working on a project with a couple of friends that I’ve
    always wanted to try. We are painting a mural of The Great Wave off Kanagawa. So far I think we are doing pretty well. We got the initial tracing done on Friday night and painted in the first color on Saturday afternoon. Thanks to Stefanie, Susan, and Jim for all the help so far. I’ll be posting more pictures as we progress.


    View Comments


  • Hanging out in LIC |

    I spent Sunday back in my old neighborhood LIC helping my friends move apartments. I got a couple pics of the view of the city from their new apt.


    View Comments


  • first link of the new year |

    Happy new year’s to my millions of imaginary readers. Here is an interesting link that describes a somewhat surprising solution to the problem of choosing the door with the car behind it instead of the goat cart on “Let’s Make a Deal”.

    Let’s make a deal dilemma

    View Comments


  • Think Different |

    I believe this is the appropriate way to approach most problems.
    how to measure the height of a building with a barometer

    View Comments


  • Arduino for Dummies |

    I’m enjoying learning the basics of electronics and embedded systems programming with the arduino board I got for Christmas. Here’s a picture of the sample project I built. I had to assemble the prototyping shield from a kit, which required me to do some soldering. I then followed the instructions on MakerShed for building this simple pressure sensitive LED display. Pressing on the force sensing resistor pad causes the lights to go out in order based on the amount of pressure applied.


    After I get a bit more proficient with this stuff I’d like to attempt to build
    a simple a weather monitor that would display the current conditions based on
    a web service call on an lcd screen.

    View Comments


  • Grand Army Plaza Photo |

    I took these pictures of The Soldiers’ and Sailors’ Arch at Grand Army Plaza tonight.


    View Comments


WillBailey::About

Hi there I'm Will. I live in Brooklyn NY and work as a Rails and Javascript programmer for Zenbe.com.

web stuff:

projects:

  • Shareflow : I'm the lead developer of Shareflow, Zenbe's new stream based collaboration tool. Check it out.
  • Zenbe : I work for Zenbe.com. Some of my contributions to the app include ZenPages, the tasks and contacts sidebar, and the scribd integration for viewing files.
  • Zenbe Lists : I developed the web client for Zenbe Lists
  • TweetRad.io : I created this fun way to listen to twitter with a couple of my Zenbe coworkers. Check it out.
  • pitchforkd : just a little scraper and sinatra app to make the content on pitchfork media a bit more accessible
  • quakespotter : I helped build this Ruby-Processing based submission to Apps for America 2.

photo albums: