Archive for November, 2009

Migrating an existing deployment to Heroku

November 20, 2009 2 comments

It was fun — pure unadulterated fun!!

Now, I started looking at heroku since our current deployment on a linode was getting a little bulky. Our client has been complaining of ‘suddenly things slowing down’ and ‘site not working’. The site is used at a hit rate of 36 Requests per minute and we need to keep our hosting costs down — so what is the solution???

Currently, its hosted on a linode (740MB) memory and and nginx + 3 thin cluster. There is monit which ensures that ANY thin server going above 140MB is restarted. BackgrounDrb is taking a lot of memory too ( an unavoidable 160MB) and we have ferret to make things more complicated !! 🙂

Enter Heroku – first thing that got me hooked was a ‘git push heroku’ DEPLOYS your application! Fantastic. I have been a capistrano freak till now and loved it – but its all gone now! (wonder if thats good or bad though).  LoL 🙂

So, to get started, you sign-up on heroku and follow instructions on your My Apps page:

$ heroku create
$ git init && git add . && git commit -m"initial checkins"
$ git push heroku

I used the Blossom account (5 MB space) and was pleasantly surprised to see it work ‘out of the box’. It created some wierd named sub-domain for me which I changed using

$ heroku rename <new-name>

Digging deeper, I wanted to deploy the application ‘as is’ but that was not to be:

Setup Woes

Heroku uses PostGres and we had used MySQL – you need to ensure that there are no special data types being used. As goes Murphy’s law – we had used blob and had to change that the bytea (bytearray in PG). The interesting thing I found was that there is NO way to drop the database and re-create it. The answer lies in:

$ heroku rake db:schema:load

Production environment is assumed. I was not sure which environment I am in till I issued the command (after reading up) that I was in production mode:

$ heroku config
RACK_ENV => production

I was also curious to see how I get the logs and check the console and it really ROCKS:

$ heroku logs
$ heroku console

Alas I could not find a way to access the dbconsole (like what I do sometimes with script/dbconsole -p). Well, that is the way it is!

Ferret Woes

Heroku does not use ferret – it uses solr and sunspot. I had to install these additional gems using the gem manifest i.e. the .gems file located at RAILS_ROOT.

$ git commit
$ git push heroku

Since acts_as_ferret is installed, it loads the Rails models and ‘expects’ the database to be in place. So, I had to comment out ALL the references to acts_as_ferret to be able to run a

$ heroku rake db:migrate

So far so good. Now, since ferret requires to index files and heroku mounts the deployment on a read-only filesystem, we need to tweak the configurations to ensure that the index is created in tmp/ (the are where we can write stuff). Add the following lines to the config/environments/production.rb

require 'acts_as_ferret'
ActsAsFerret.index_dir = "#{RAILS_ROOT}/tmp/index"

Now, default ferret production environment is to start a Drb server on 9010 port — and Heroku will not allow this! So, we have to comment out all the configuration in config/ferret_server.yml to ensure that this loads like it does in development. I hate this but no alternative for now. I do plan to migrate to the WebSolr that Heroku is offering but its at 20$ per month! Lets see how that works out.

This should at least get you started on ferret – phew!

BackgrounDrb woes

BackgrounDrb comes with its own baggage I must say. Heroku proposes the use of DelayedJob (yippee) and I have just this plan, so rather than even try to figure out a way to get backgrounDrb to work on Heroku, I simply commented out the config/backgroundrb.yml settings (to ensure it does not get initialized).

I also had to comment out all the MiddleMan references in the controller. The backgrounDrb workers are easily  modified so that they can work as DelayedJob objects ;). I have mentioned this in my earlier posts Moving from BackgrounDrb to DelayedJob

Once this was done, I did not face any problems and after some quick code fixes, git commits and git push’es, I was able to get my heroku application up and running.

All in a days work! Thats really amazing. Now, the next step is to get WebSolr  and DelayedJob integrated and see if I can move onto a bigger storage setup — a couple of months will let me know if this move was right.


RPCFN#3 Short-circuit

November 20, 2009 5 comments

It was great to see so many solutions for the problem statement I had set. After an initial ‘shock’ from all electrical engineers, the problem was clearly understood as a variant of the ‘shortest path algorithm’. I hope I was able to derive a laugh from the cryptic clue ‘short-circuit’.

Speaking of ‘derive’, I did see most solutions going into a class implementation of  ‘class Graph’ or ‘class Vector’. Though its pretty cool to see a complete packaged solution, I believe this was a slight overkill. This was NOT the basis of judging but I do feel that ruby has its essence in getting the job done in far lesser code. i.e. less LOC. So, I personally did not foresee 3-4 classes with inheritance in them.. I do totally agree its an excellent way to show-case one’s skills  😉

Djikstra’s algorithm seems to be the popular hit for solving this problem – however, I do feel a recursive solution than an iterative one is more appropriate for this. Again this is just an opinion but to get a sense of the power of ruby programing, a recursive program probably paints a better picture. Its also very concise and readable.

A lot of effort was spent in ‘initializing’ the data-structure. This was really nice to see — unlike me, who took a short-cut. The aim was to see the algorithm but I was really proud to see ‘complete’ solutions. Test cases were written in most solutions and it was a pleasure to see them run.

It was interesting to see various forms of Infinity:

Infinity = 1 << 64
Infinity = 1 << 32
Infinity = 1.0 / 0 # simple and ideal
Infinity = 1000000 # incorrect

It was interesting to note that very few catered for multiple shortest paths, though the question was raised in the comments earlier. Not trying to be a hypocrite here, I should say that I too did not implement multiple shortest paths – LoL.

My solution to the problem is provided at and its merely a representation of my style of programing. Creating a gem was for kicks but lib/short_circuit.rb has the core code. It was great fun AND learning while doing this and I realized that I will always be a student.


Capistrano + Nginx + Thin deployment on Linode

November 10, 2009 6 comments

This was long lost post I had written about 8 months ago (converted from wiki to HTML – so pardon typos if any)


Capistrano is a ruby gem which helps in remote deployment. As against widely known convention, Capistrano can be used for any deployment, not just a rails app!

Nginx is a web-proxy server. This is simply a light weight HTTP web-server which received requests on HTTP and passes them to other applications. This is way more preferable than Application servers like Apache! Moreover, nginx is very easily configurable and can support multiple domain-names very easily. It has an in-build load-balancer which can send requests to apps based on its internal load-balancing mechanism.

Thin is the next-generation lean, mean rails server. Its much faster, lighter in memory than mongrel. Its has an internal event based mechanism for request processing and a very high concurrency performance ratio than other rails servers.

Linode is a VPS (a Virtual Private Server) that is hosted by As the name suggests ;), its a “Linux Node”.  We are using Ubuntu 8.10 (Tip: To find Ubuntu release, issue command: lsb_release -a) NOTE: In the linode we had, it was a raw machine with no packages installed. Please read Linode RoR package installation for details.


Capistrano Configuration Follow the steps provided by Capistrano for basic instructions: Capistrano – From The Beginning Some modifications that you may need (as I needed for deployment):

  • Edit Capfile and add the following to it. This ensures that remote capistrano deployment does not fork a remote shell using command “sh -c”. Some hosting servers do not allow remote shells.
  • default_run_options[:shell] = false
  • In addition to changes mentioned in Capistrano tutorial, add the following to config/deploy.rb. This ensures that “sudo” is not used (default for Capistrano) and the user is “root”. Not usually a good practice.. but what the hell!
  • set :use_sudo, false            set :user, "root"
  • Since capistrano uses default script/spin and script/process/reaper, we need to override the deploy:start, deploy:stop and deploy:restart to ensure that we can start/stop the thin service and the ferret_server. I know that in deply:restart, there is a copy-paste involved but I am trying to find out how to invoke a rake task from another rake task.
namespace :deploy do
    desc "Custom AceMoney deployment: stop."
    task :stop, :roles => :app do

        invoke_command "cd #{current_path};./script/ferret_server -e production stop"
        invoke_command "service thin stop"

    desc "Custom AceMoney deployment: start."
    task :start, :roles => :app do

        invoke_command "cd #{current_path};./script/ferret_server -e production start"
        invoke_command "service thin start"

    # Need to define this restart ALSO as 'cap deploy' uses it
    # (Gautam) I dont know how to call tasks within tasks.
    desc "Custom AceMoney deployment: restart."
    task :restart, :roles => :app do

        invoke_command "cd #{current_path};./script/ferret_server -e production stop"
        invoke_command "service thin stop"
        invoke_command "cd #{current_path};./script/ferret_server -e production start"
        invoke_command "service thin start"

Thin Configuraion I looked up most of the default configuration of Thin and Nginx on Ubunto at Nginx + Thin. Some extra configuration or differences are mentioned below.

  • The init script for starting thin and nginx during startup is configured during package installation. Leave them as they are.
  • The following command generates the /etc/thin/acemoney.yml for 3 server starting from port 3000. Note that the -c option specifies the BASEDIR of the rails app. Do NOT change any settings in this file as far as possible.
  • thin config -C /etc/thin/acemoney.yml -c /home/josh/current --servers 3 -e production -p 3000
  • Starting and stopping thin is as simple as
  • service thin start
    service thin stop
  • This will read the acemoney.yml file and spawn the 3 thin processes. I noticed that each thin server took about 31MB in memory to start with and with caching went upto ~70MB. On the contrary, a mongrel server (tested earlier) started with 31MB but exceeded 110MB later!

Nginx Configuration Installation on nginx is simple on Ubuntu 😉

apt-get install nginx

Configure the base /etc/nginx/nginx.conf. The default configuration are fine but I added / edited a few more for as recommended at Nginx Configuration

        worker_processes  4;

        gzip_comp_level 2;
        gzip_proxied any;
        gzip_types  text/plain text/html text/css application/x-javascript
                    text/xml application/xml application/xml+rss text/javascript;

According to this configuration above, nginx will spawn 4 worker threads and each worker thread can process 1024 connections (default setting). So, nginx can now process ~4000 concurrent HTTP requests !!! See performance article of thin at Thin Server

Configure the domainname, in our case Ensure that “A record” entry points to this server! Check this by doing a nslookup or a ping for the server. In /etc/nginx/sites-available create a file by the domainname to be hosted. So I added /etc/nginx/sites-available/ In /etc/nginx/sites-enabled create a symbolic link to this file.

ln -s /etc/nginx/sites-available/ /etc/nginx/sites-enabled/ 

Now add the contents in /etc/nginx/sites-available/ This is the key configuration to hook up nginx with thin.

upstream thin {

server {
    listen 80;

    root /home/josh/current/public;

    location / {
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header Host $http_host;
      proxy_redirect false;

      if (-f $request_filename/index.html) {
        rewrite (.*) $1/index.html break;
      if (-f $request_filename.html) {
        rewrite (.*) $1.html break;
      if (!-f $request_filename) {
        proxy_pass http://thin;

    error_page 500 502 503 504 /50x.html;
    location = /50x.html {
      root html;

To analyze this configuration, here are some details:

The following lines tell nginx to listen on port 80 for HTTP requests to The ‘root’ is the public directory for our rails app deployed at /home/josh/current!

server {
    listen 80;

    root /home/josh/current/public;

Now, nginx will try to process all HTTP requests and try to give the response.. for static HTML’s it will automatically give the data from the ‘root’. If it cannot find the HTML file, it will ‘proxy_pass’ it to thin. “thin” in the code below is an ‘upstream’ directive that tells nginx where to forward the current request it cannot directly serve.

if (!-f $request_filename) {
        proxy_pass http://thin;

The upstream code is where load-balancing plays a role in nginx. The following code tells nginx which all processes are running on which different ports and it forwards requests to any of the servers based on its internal load balancing algorithm. The servers can be on different machines (i.e. different IP addresses) if needed. In AceMoney, we have started 3 thin servers on 3 different ports!

upstream thin {

Performance Statistics Nothing is complete without them. Here is what I found out for 3 thin servers and 1 ferret_server.

top - 14:06:10 up 7 days, 22:58,  2 users,  load average: 0.00, 0.00, 0.00
Tasks:  84 total,   1 running,  83 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    553176k total,   530868k used,    22308k free,    16196k buffers
Swap:   524280k total,     2520k used,   521760k free,    87280k cached

12424 mysql     18   0  127m  42m 5520 S    0  7.9   0:23.01 mysqld
18338 root      15   0 77572  70m 4392 S    0 13.1   0:06.79 thin
18348 root      15   0 71176  64m 4388 S    0 11.9   0:06.51 thin
18343 root      15   0 68964  62m 4384 S    0 11.5   0:07.20 thin
18375 root      18   0 70912  54m 2660 S    0 10.0   2:34.24 ruby
 8141 www-data  15   0  5176 1736  820 S    0  0.3   0:00.07 nginx
 8142 www-data  15   0  5176 1724  816 S    0  0.3   0:00.01 nginx
 8144 www-data  15   0  5152 1720  816 S    0  0.3   0:00.06 nginx
 8143 www-data  15   0  5156 1656  784 S    0  0.3   0:00.00 nginx

As can be seen:

  • Each thin server takes around 70M
  • The Mysql server takes 41M
  • Ruby process (18375 above) is the ferret_serve which takes 54M
  • 4 nginx threads take about 1.7K in memory.
 Overall: (3 thin server cluster + Mysql + ferret): 300MB

Moving from backgrounDrb to DelayedJob

November 4, 2009 5 comments

In my earlier post regarding DelayedJob setup I mentioned how to setup DelayedJob and create news tasks in your existing code. This post provides details of how to move from backgrounDrb to delayed_job. First of all its important to know why ?

– BackgroundDrb takes in a lot of memory. It spawns worker at start-up. You can use MiddleMan for dynamically spawing backgrounDrb tasks but in any case, it slows thing down a little. I have 4 workers and overall the parent process consumes over 120MB — which is huge considering my linode.

– Monitoring these jobs is a little bit of a pain. Moreover, running in development / production mode requires a Drb server which adds more memory overhead.

As we speak, github has introduced Resque, which is inspired from DelayedJob but I plan to  currently continue with DelayedJob because I dont use Redis (yet). The blog post for Resque has a LOT of details about issues with current background job schedulers. Worth reading!

OK – so you’re now convinced we should use DelayedJob instead of backgrounDrb but have a lot of tasks already configured. These are the steps to follow:

1. Convert your backgrounDrb workers to standard ruby classes:

class BulkUploadWorker < BackgrounDRb::MetaWorker
  set_worker_name :bulk_upload_worker

  def upload(args)


class BulkUploadWorker

  def perform

2. If you earlier used arguments to be passed to the backgrounDrb jobs, you need to tweak the code a little.

Suppose I have an upload method which takes ‘arg’ as the Hash parameter, it would be invoked in the controller for backgrounDrb like this:

MiddleMan.worker(:bulk_upload_worker).async_upload(:arg => {
  'correction' => correction, 'file' => in_file, 'user' =>} )

Simple change to DelayedJob

Delayed::Job.enqueue, infile,

And change the worker to have a perform method (which is the one which gets called on the job):

BulkUploadTask =, :infile, :user_id) do
   def perform
     file =
     user = User.find(user_id)

If you look closely at the code above, even for an experienced Ruby coder – its no piece of cake. Now, I tried the original approch that was on github of

class BulkUploadWorker < (:correction, :infile, :user)

but this gives me a type mis-match error. After some searching on the net.. I found the answer to this, quite understandably from one of the ruby greats JEG2 Here James clearly explains how a returns a Class and accepts a block of code for all the methods. Note the use of Symbol :infile in the declaration but data member infile in the perform method.

Since my file was in lib/workers/bulk_upload_worker.rb, we need to explicitly require this file for DelayedJob. At this in the config/initializers/delayed_job.rb. Now, before I can get down to brass tags and incorporate it, I really need to know if this works . First ensure that the task works — directly from console:

RAILS_ENV=production ./script/console
>> task -, 'test_file', 3)
>> task.perform

Once the task performs as per expectation, start up delayed_job server and test it out from your Web App. If there are errors or exception, delayed_job stores this in the database. So, its a good idea for command-line users with access to the server, to keep an eye out for errors and exceptions in the database.