Archive

Posts Tagged ‘backgrounDrb’

Migrating an existing deployment to Heroku

November 20, 2009 1 comment

It was fun — pure unadulterated fun!!

Now, I started looking at heroku since our current deployment on a linode was getting a little bulky. Our client has been complaining of ‘suddenly things slowing down’ and ‘site not working’. The site is used at a hit rate of 36 Requests per minute and we need to keep our hosting costs down — so what is the solution???

Currently, its hosted on a linode (740MB) memory and and nginx + 3 thin cluster. There is monit which ensures that ANY thin server going above 140MB is restarted. BackgrounDrb is taking a lot of memory too ( an unavoidable 160MB) and we have ferret to make things more complicated !! :)

Enter Heroku – first thing that got me hooked was a ‘git push heroku’ DEPLOYS your application! Fantastic. I have been a capistrano freak till now and loved it – but its all gone now! (wonder if thats good or bad though).  LoL :)

So, to get started, you sign-up on heroku and follow instructions on your My Apps page:

$ heroku create
$ git init && git add . && git commit -m"initial checkins"
$ git push heroku

I used the Blossom account (5 MB space) and was pleasantly surprised to see it work ‘out of the box’. It created some wierd named sub-domain for me which I changed using

$ heroku rename <new-name>

Digging deeper, I wanted to deploy the application ‘as is’ but that was not to be:

Setup Woes

Heroku uses PostGres and we had used MySQL – you need to ensure that there are no special data types being used. As goes Murphy’s law – we had used blob and had to change that the bytea (bytearray in PG). The interesting thing I found was that there is NO way to drop the database and re-create it. The answer lies in:

$ heroku rake db:schema:load

Production environment is assumed. I was not sure which environment I am in till I issued the command (after reading up) that I was in production mode:

$ heroku config
RACK_ENV => production

I was also curious to see how I get the logs and check the console and it really ROCKS:

$ heroku logs
$ heroku console

Alas I could not find a way to access the dbconsole (like what I do sometimes with script/dbconsole -p). Well, that is the way it is!

Ferret Woes

Heroku does not use ferret – it uses solr and sunspot. I had to install these additional gems using the gem manifest i.e. the .gems file located at RAILS_ROOT.

rupport
acts_as_ferret
prawn
$ git commit
$ git push heroku

Since acts_as_ferret is installed, it loads the Rails models and ‘expects’ the database to be in place. So, I had to comment out ALL the references to acts_as_ferret to be able to run a

$ heroku rake db:migrate

So far so good. Now, since ferret requires to index files and heroku mounts the deployment on a read-only filesystem, we need to tweak the configurations to ensure that the index is created in tmp/ (the are where we can write stuff). Add the following lines to the config/environments/production.rb

require 'acts_as_ferret'
ActsAsFerret.index_dir = "#{RAILS_ROOT}/tmp/index"

Now, default ferret production environment is to start a Drb server on 9010 port — and Heroku will not allow this! So, we have to comment out all the configuration in config/ferret_server.yml to ensure that this loads like it does in development. I hate this but no alternative for now. I do plan to migrate to the WebSolr that Heroku is offering but its at 20$ per month! Lets see how that works out.

This should at least get you started on ferret – phew!

BackgrounDrb woes

BackgrounDrb comes with its own baggage I must say. Heroku proposes the use of DelayedJob (yippee) and I have just this plan, so rather than even try to figure out a way to get backgrounDrb to work on Heroku, I simply commented out the config/backgroundrb.yml settings (to ensure it does not get initialized).

I also had to comment out all the MiddleMan references in the controller. The backgrounDrb workers are easily  modified so that they can work as DelayedJob objects ;). I have mentioned this in my earlier posts Moving from BackgrounDrb to DelayedJob

Once this was done, I did not face any problems and after some quick code fixes, git commits and git push’es, I was able to get my heroku application up and running.

All in a days work! Thats really amazing. Now, the next step is to get WebSolr  and DelayedJob integrated and see if I can move onto a bigger storage setup — a couple of months will let me know if this move was right.

Moving from backgrounDrb to DelayedJob

November 4, 2009 5 comments

In my earlier post regarding DelayedJob setup I mentioned how to setup DelayedJob and create news tasks in your existing code. This post provides details of how to move from backgrounDrb to delayed_job. First of all its important to know why ?

- BackgroundDrb takes in a lot of memory. It spawns worker at start-up. You can use MiddleMan for dynamically spawing backgrounDrb tasks but in any case, it slows thing down a little. I have 4 workers and overall the parent process consumes over 120MB — which is huge considering my linode.

- Monitoring these jobs is a little bit of a pain. Moreover, running in development / production mode requires a Drb server which adds more memory overhead.

As we speak, github has introduced Resque, which is inspired from DelayedJob but I plan to  currently continue with DelayedJob because I dont use Redis (yet). The blog post for Resque has a LOT of details about issues with current background job schedulers. Worth reading!

OK – so you’re now convinced we should use DelayedJob instead of backgrounDrb but have a lot of tasks already configured. These are the steps to follow:

1. Convert your backgrounDrb workers to standard ruby classes:

class BulkUploadWorker < BackgrounDRb::MetaWorker
  set_worker_name :bulk_upload_worker

  def upload(args)
  end

to

class BulkUploadWorker

  def perform
  end

2. If you earlier used arguments to be passed to the backgrounDrb jobs, you need to tweak the code a little.

Suppose I have an upload method which takes ‘arg’ as the Hash parameter, it would be invoked in the controller for backgrounDrb like this:

MiddleMan.worker(:bulk_upload_worker).async_upload(:arg => {
  'correction' => correction, 'file' => in_file, 'user' => current_user.id} )

Simple change to DelayedJob

Delayed::Job.enqueue BulkUploadWorker.new(correction, infile, current_user.id)

And change the worker to have a perform method (which is the one which gets called on the job):

BulkUploadTask = Struct.new(:correction, :infile, :user_id) do
   def perform
     file = File.open(infile)
     user = User.find(user_id)
     ...

If you look closely at the code above, even for an experienced Ruby coder – its no piece of cake. Now, I tried the original approch that was on github of

class BulkUploadWorker < Struct.new (:correction, :infile, :user)

but this gives me a type mis-match error. After some searching on the net.. I found the answer to this, quite understandably from one of the ruby greats JEG2 Here James clearly explains how a Struct.new returns a Class and accepts a block of code for all the methods. Note the use of Symbol :infile in the declaration but data member infile in the perform method.

Since my file was in lib/workers/bulk_upload_worker.rb, we need to explicitly require this file for DelayedJob. At this in the config/initializers/delayed_job.rb. Now, before I can get down to brass tags and incorporate it, I really need to know if this works . First ensure that the task works — directly from console:

RAILS_ENV=production ./script/console
>> task - BulkUploadWorker.new(false, 'test_file', 3)
>> task.perform

Once the task performs as per expectation, start up delayed_job server and test it out from your Web App. If there are errors or exception, delayed_job stores this in the database. So, its a good idea for command-line users with access to the server, to keep an eye out for errors and exceptions in the database.

Enjoy!

Delayed_job for background processing in Rails

October 29, 2009 9 comments

The first thing to do obviously is to install DelayedJob. There are plenty of forked versions available on git-hub. I chose collectiveidea beacuse it was recommend on railscasts. I did refer to this site extensively for setting up delayed_job.

$ sudo gem install collectiveidea-delayed_job

Job half done. I followed instructions on the github page and ensure that I have my environment setup properly. I added the following to the environment.rb:

 config.gem 'collectiveidea-delayed_job', :lib => 'delayed_job',
                     :source => 'http://gems.github.com'

and the following to the Rakefile:

begin
      require 'delayed/tasks'
rescue LoadError
      STDERR.puts "Run `rake gems:install` to install delayed_job"
end

Then issue the command:

./script/generate delayed_job
rake db:migrate

To start the delayed_job server in production mode, issue:

$ RAILS_ENV=production ./script/delayed_job start

To start it in development mode, issue:

$ rake jobs:work

Interesting find – obvious but cost me a lot of time: Nothing stops one from running BOTH the above commands. Infact, in production mode I ran the delayed_job as a daemon AND also using rake. Silly me – I forgot that if I change any code I would need to restart both. I wrongly assumed that jobs:work only ‘showed’ the console — it starts the delayed_job server. So, if you do it this way, you will have twice the number of background jobs floating around ;)

Well, anyway, once I had this properly configured as a daemon, I set about changing code. This was the really aweome part of DelayedJob — no dependencies!! I changed the code from:

user = User.find_by_name('xyz')
user.get_daily_call_statistics

to

user = User.find_by_name('xyz')
user.send_later(:get_daily_call_statistics)

AND IT WORKS !! Awesome.  While digging around a little more, I realized the there is not enough scope for debugging in the default way. I looked up github and google for some help and found this useful tip:

1. Create a config/initializers/delayed_job_config.rb and add:

Delayed::Job.destroy_failed_jobs = false
Delayed::Worker.logger = Rails.logger

This logs all delayed job output to the environment log files and ensures that failed jobs are not destroyed. There are other settings to reduce the failure attempts and the time for the delayed job but I was too excited to try them out immediately.

2. Suppose I have some global variables in the helpers, the are not accessible in the model methods called via delayed_job. Maybe a bug in delayed_job – I do plan to dig deeper into this and figure this one out — either way. I had to break my head trying to figure this one out.

To conclude, what I had earlier was:

Processing DashboardController#explicit_refresh_daily_statistics (for 121.247.65.47 at 2009-10-29 13:16:33) [GET]
Parameters: {"action"=>"explicit_refresh_daily_statistics", "controller"=>"dashboard"}
last seen ...........
Redirected to http://acemoney.in/dashboard
Completed in 74414ms (DB: 16912) | 302 Found [http://acemoney.in/dashboard/explicit_refresh_daily_statistics]

Now after adding the send_later, I have:

Processing DashboardController#explicit_refresh_daily_statistics (for 121.247.65.47 at 2009-10-29 15:16:41) [GET]
Parameters: {"action"=>"explicit_refresh_daily_statistics", "controller"=>"dashboard"}
last seen ...........
Redirected to http://acemoney.in/dashboard
Completed in 420ms (DB: 99) | 302 Found [http://acemoney.in/dashboard/explicit_refresh_daily_statistics]

This means my response time fell from from 74 seconds to 0.5 seconds

Now, I already had backgrounDrb tasks configured earlier and want to migrate them ‘somehow’ to DelayedJob with minimal code. Stay tuned, this post will be updated.

Follow

Get every new post delivered to your Inbox.