Archive

Posts Tagged ‘background jobs’

Moving from backgrounDrb to DelayedJob

November 4, 2009 5 comments

In my earlier post regarding DelayedJob setup I mentioned how to setup DelayedJob and create news tasks in your existing code. This post provides details of how to move from backgrounDrb to delayed_job. First of all its important to know why ?

– BackgroundDrb takes in a lot of memory. It spawns worker at start-up. You can use MiddleMan for dynamically spawing backgrounDrb tasks but in any case, it slows thing down a little. I have 4 workers and overall the parent process consumes over 120MB — which is huge considering my linode.

– Monitoring these jobs is a little bit of a pain. Moreover, running in development / production mode requires a Drb server which adds more memory overhead.

As we speak, github has introduced Resque, which is inspired from DelayedJob but I plan to  currently continue with DelayedJob because I dont use Redis (yet). The blog post for Resque has a LOT of details about issues with current background job schedulers. Worth reading!

OK – so you’re now convinced we should use DelayedJob instead of backgrounDrb but have a lot of tasks already configured. These are the steps to follow:

1. Convert your backgrounDrb workers to standard ruby classes:

class BulkUploadWorker < BackgrounDRb::MetaWorker
  set_worker_name :bulk_upload_worker

  def upload(args)
  end

to

class BulkUploadWorker

  def perform
  end

2. If you earlier used arguments to be passed to the backgrounDrb jobs, you need to tweak the code a little.

Suppose I have an upload method which takes ‘arg’ as the Hash parameter, it would be invoked in the controller for backgrounDrb like this:

MiddleMan.worker(:bulk_upload_worker).async_upload(:arg => {
  'correction' => correction, 'file' => in_file, 'user' => current_user.id} )

Simple change to DelayedJob

Delayed::Job.enqueue BulkUploadWorker.new(correction, infile, current_user.id)

And change the worker to have a perform method (which is the one which gets called on the job):

BulkUploadTask = Struct.new(:correction, :infile, :user_id) do
   def perform
     file = File.open(infile)
     user = User.find(user_id)
     ...

If you look closely at the code above, even for an experienced Ruby coder – its no piece of cake. Now, I tried the original approch that was on github of

class BulkUploadWorker < Struct.new (:correction, :infile, :user)

but this gives me a type mis-match error. After some searching on the net.. I found the answer to this, quite understandably from one of the ruby greats JEG2 Here James clearly explains how a Struct.new returns a Class and accepts a block of code for all the methods. Note the use of Symbol :infile in the declaration but data member infile in the perform method.

Since my file was in lib/workers/bulk_upload_worker.rb, we need to explicitly require this file for DelayedJob. At this in the config/initializers/delayed_job.rb. Now, before I can get down to brass tags and incorporate it, I really need to know if this works . First ensure that the task works — directly from console:

RAILS_ENV=production ./script/console
>> task - BulkUploadWorker.new(false, 'test_file', 3)
>> task.perform

Once the task performs as per expectation, start up delayed_job server and test it out from your Web App. If there are errors or exception, delayed_job stores this in the database. So, its a good idea for command-line users with access to the server, to keep an eye out for errors and exceptions in the database.

Enjoy!

Advertisements

Delayed_job for background processing in Rails

October 29, 2009 23 comments

The first thing to do obviously is to install DelayedJob. There are plenty of forked versions available on git-hub. I chose collectiveidea beacuse it was recommend on railscasts. I did refer to this site extensively for setting up delayed_job.

$ sudo gem install collectiveidea-delayed_job

Job half done. I followed instructions on the github page and ensure that I have my environment setup properly. I added the following to the environment.rb:

 config.gem 'collectiveidea-delayed_job', :lib => 'delayed_job',
                     :source => 'http://gems.github.com'

and the following to the Rakefile:

begin
      require 'delayed/tasks'
rescue LoadError
      STDERR.puts "Run `rake gems:install` to install delayed_job"
end

Then issue the command:

./script/generate delayed_job
rake db:migrate

To start the delayed_job server in production mode, issue:

$ RAILS_ENV=production ./script/delayed_job start

To start it in development mode, issue:

$ rake jobs:work

Interesting find – obvious but cost me a lot of time: Nothing stops one from running BOTH the above commands. Infact, in production mode I ran the delayed_job as a daemon AND also using rake. Silly me – I forgot that if I change any code I would need to restart both. I wrongly assumed that jobs:work only ‘showed’ the console — it starts the delayed_job server. So, if you do it this way, you will have twice the number of background jobs floating around 😉

Well, anyway, once I had this properly configured as a daemon, I set about changing code. This was the really aweome part of DelayedJob — no dependencies!! I changed the code from:

user = User.find_by_name('xyz')
user.get_daily_call_statistics

to

user = User.find_by_name('xyz')
user.send_later(:get_daily_call_statistics)

AND IT WORKS !! Awesome.  While digging around a little more, I realized the there is not enough scope for debugging in the default way. I looked up github and google for some help and found this useful tip:

1. Create a config/initializers/delayed_job_config.rb and add:

Delayed::Job.destroy_failed_jobs = false
Delayed::Worker.logger = Rails.logger

This logs all delayed job output to the environment log files and ensures that failed jobs are not destroyed. There are other settings to reduce the failure attempts and the time for the delayed job but I was too excited to try them out immediately.

2. Suppose I have some global variables in the helpers, the are not accessible in the model methods called via delayed_job. Maybe a bug in delayed_job – I do plan to dig deeper into this and figure this one out — either way. I had to break my head trying to figure this one out.

To conclude, what I had earlier was:

Processing DashboardController#explicit_refresh_daily_statistics (for 121.247.65.47 at 2009-10-29 13:16:33) [GET]
Parameters: {"action"=>"explicit_refresh_daily_statistics", "controller"=>"dashboard"}
last seen ...........
Redirected to http://acemoney.in/dashboard
Completed in 74414ms (DB: 16912) | 302 Found [http://acemoney.in/dashboard/explicit_refresh_daily_statistics]

Now after adding the send_later, I have:

Processing DashboardController#explicit_refresh_daily_statistics (for 121.247.65.47 at 2009-10-29 15:16:41) [GET]
Parameters: {"action"=>"explicit_refresh_daily_statistics", "controller"=>"dashboard"}
last seen ...........
Redirected to http://acemoney.in/dashboard
Completed in 420ms (DB: 99) | 302 Found [http://acemoney.in/dashboard/explicit_refresh_daily_statistics]

This means my response time fell from from 74 seconds to 0.5 seconds

Now, I already had backgrounDrb tasks configured earlier and want to migrate them ‘somehow’ to DelayedJob with minimal code. Stay tuned, this post will be updated.