Pune Rails Meetup #2
It was great to be a part of the Pune Rails Meetup which was held yesterday (19th December, 2009) at ThoughtWorks, Pune. It was an idea initiated by Anthony Hsiao of Sapna Solutions which has got the Pune Rails community up on their feet. Helping him organize was a pleasure!
It was great to see almost 35 people for this meet — it was a probably more than what we expected. It was also heartening to see a good mix in the crowd – professionals in rails, students working in rails and students interested in rails – not to forget entrepreneurs who were very helpful.
Proceedings began with Vincent and _______ (fill in the gaps please — am really lousy with names) from ThinkDRY gave an excellent presentation on BlankApplication – a CMS++ that they are developing. I say CMS++ because its not just another CMS but has quite a lot of ready-to-use features that gets developers jump-started. There were interesting discussions regarding how ‘workspaces’ are managed and how its indeed easier to manage websites.
After this technical talk, I spoke next on my experience at the Lone Star Ruby Conference in Texas. I tried to keep the session interactive with the intention of telling everyone how important it is to know and use Ruby effectively while working in Rails. Dave Thomas’s references to the ‘glorious imperfection’ of Ruby did create quite a buzz. To quote a little from Dave’s talk:
name {}
This is a method which takes a block as a parameter but the following line is a method which takes a has as a parameter! A simple curly parenthesis makes all the difference!
name ( {} )
Similarly, the following line is a method m() whose result is divided by ‘n’ whose result is divided by ‘o’
m/n/o
but add a space between this and its a method m() which takes a regular expression as a parameter!
m /n/o
It was nice to see everyone get involved in these interactive sessions. More details about my experience at LSRC is here.
After this there was another technical talk about a multi-app architecture that has been developed by Sapna Solutions. Anthony and Hari gave a talk on this and it was very interesting to see it work. Using opensource applications like shopify, CMS and other social networking apps to work with a shared-plugin and a single database, its possible to create a mammoth application which is easily customizable and scalable.
Hari did mention a few problems like complexity in migrations and custom routes which they currently ‘work-around’ but prefer a cleaner approach. Some good suggestions were provided by Scot from ThoughtWorks regarding databases. I suggested some meta-programing to align models. Working with git submodules and ensuring rake scripts to sync up data, this indeed seems to have a lot of potential.
There were some new entrepreneurs from VectorBrook who have already developed a live application in Merb which they discussed and explained details of. It was good to hear about how they managed performance and scalability testing. The Q&A forum which was the next event was extremely interactive. Some of the discussions were:
Which are really great CMS in Rails?
There were some intense discussions regarding RadiantCMS, Adva and even BlankApp. The general consensus was a ‘programmable CMS’ Vs WYSIWYG. Those who prefer more of the content management prefer CMS’s like Drupal, Joomla. Those who prefer more customization via programing and code, prefer Radiant. This topic could not close and is still open for discussion.. Do comment in your views – I am a radiant fan
What about testing? Cucumber, Rspec, others?
Usually its still adhoc – testing is expensive for smaller firms — so adhoc blackbox testing is what is done. I opined that cucumber and rspec ROCK! Cucumber is great for scenario testing and testing controller logic and views. Rspec is great for Direct Model Access and Cucumber can make great use of Webrat for browser testing.
In Rpsec, when do we use mocks and stubs?
It was suggested that mocks and stubs should be used when there are no ready model and code. If the code is ready, its probably just enough not to use mocks and stubs directly. Comments welcome on this!
How do you do stress testing?
Stress testing, concurrency testing and performance testing can be done using http-perf. It was interesting to note that ____ have actually done their own implementation for stress and concurrency testing. I recommended they open source it.
How are events, scheduled job and delayed jobs handled?
This was my domain
Using delayed_job is the way to go. Following the leaders (github) and using Redis and resque would be great too but definitely not backgrounDrb or direct cron!
What project management tools do you use? Pivotal Tracker, Trac, Mingle?
Pivotal tracker suits startup needs. Mingle rocks but becomes expensive. Scott ?
Dhaval from TW mentioned how easy it was to co-ordinate an ‘mingle’ with their 200 strong team over distributed geographies.
Which SCM do you use? git, svn, cvs?
People have been very comfortable with git and more and more are migrating from svn to git. It was heartening to see that nobody uses CVS
Jaju (I have have misspelt) gave an excellent brief about how code and diffs can be squished and ‘diff’ed with another repository before the final merge and push to the master. Dhaval gave an idea about how they effectively used git for managing their 1GB source code (wow!)
Some pending questions – probably in next meet-up
- Which hosting service do you use and why?
- TDD or BDD?
Suggestions are welcome!
Migrating an existing deployment to Heroku
It was fun — pure unadulterated fun!!
Now, I started looking at heroku since our current deployment on a linode was getting a little bulky. Our client has been complaining of ’suddenly things slowing down’ and ’site not working’. The site is used at a hit rate of 36 Requests per minute and we need to keep our hosting costs down — so what is the solution???
Currently, its hosted on a linode (740MB) memory and and nginx + 3 thin cluster. There is monit which ensures that ANY thin server going above 140MB is restarted. BackgrounDrb is taking a lot of memory too ( an unavoidable 160MB) and we have ferret to make things more complicated !!
Enter Heroku – first thing that got me hooked was a ‘git push heroku’ DEPLOYS your application! Fantastic. I have been a capistrano freak till now and loved it – but its all gone now! (wonder if thats good or bad though). LoL
So, to get started, you sign-up on heroku and follow instructions on your My Apps page:
$ heroku create $ git init && git add . && git commit -m"initial checkins" $ git push heroku
I used the Blossom account (5 MB space) and was pleasantly surprised to see it work ‘out of the box’. It created some wierd named sub-domain for me which I changed using
$ heroku rename <new-name>
Digging deeper, I wanted to deploy the application ‘as is’ but that was not to be:
Setup Woes
Heroku uses PostGres and we had used MySQL – you need to ensure that there are no special data types being used. As goes Murphy’s law – we had used blob and had to change that the bytea (bytearray in PG). The interesting thing I found was that there is NO way to drop the database and re-create it. The answer lies in:
$ heroku rake db:schema:load
Production environment is assumed. I was not sure which environment I am in till I issued the command (after reading up) that I was in production mode:
$ heroku config RACK_ENV => production
I was also curious to see how I get the logs and check the console and it really ROCKS:
$ heroku logs $ heroku console
Alas I could not find a way to access the dbconsole (like what I do sometimes with script/dbconsole -p). Well, that is the way it is!
Ferret Woes
Heroku does not use ferret – it uses solr and sunspot. I had to install these additional gems using the gem manifest i.e. the .gems file located at RAILS_ROOT.
rupport acts_as_ferret prawn $ git commit $ git push heroku
Since acts_as_ferret is installed, it loads the Rails models and ‘expects’ the database to be in place. So, I had to comment out ALL the references to acts_as_ferret to be able to run a
$ heroku rake db:migrate
So far so good. Now, since ferret requires to index files and heroku mounts the deployment on a read-only filesystem, we need to tweak the configurations to ensure that the index is created in tmp/ (the are where we can write stuff). Add the following lines to the config/environments/production.rb
require 'acts_as_ferret'
ActsAsFerret.index_dir = "#{RAILS_ROOT}/tmp/index"
Now, default ferret production environment is to start a Drb server on 9010 port — and Heroku will not allow this! So, we have to comment out all the configuration in config/ferret_server.yml to ensure that this loads like it does in development. I hate this but no alternative for now. I do plan to migrate to the WebSolr that Heroku is offering but its at 20$ per month! Lets see how that works out.
This should at least get you started on ferret – phew!
BackgrounDrb woes
BackgrounDrb comes with its own baggage I must say. Heroku proposes the use of DelayedJob (yippee) and I have just this plan, so rather than even try to figure out a way to get backgrounDrb to work on Heroku, I simply commented out the config/backgroundrb.yml settings (to ensure it does not get initialized).
I also had to comment out all the MiddleMan references in the controller. The backgrounDrb workers are easily modified so that they can work as DelayedJob objects
. I have mentioned this in my earlier posts Moving from BackgrounDrb to DelayedJob
Once this was done, I did not face any problems and after some quick code fixes, git commits and git push’es, I was able to get my heroku application up and running.
All in a days work! Thats really amazing. Now, the next step is to get WebSolr and DelayedJob integrated and see if I can move onto a bigger storage setup — a couple of months will let me know if this move was right.
RPCFN#3 Short-circuit
It was great to see so many solutions for the problem statement I had set. After an initial ’shock’ from all electrical engineers, the problem was clearly understood as a variant of the ’shortest path algorithm’. I hope I was able to derive a laugh from the cryptic clue ’short-circuit’.
Speaking of ‘derive’, I did see most solutions going into a class implementation of ’class Graph’ or ‘class Vector’. Though its pretty cool to see a complete packaged solution, I believe this was a slight overkill. This was NOT the basis of judging but I do feel that ruby has its essence in getting the job done in far lesser code. i.e. less LOC. So, I personally did not foresee 3-4 classes with inheritance in them.. I do totally agree its an excellent way to show-case one’s skills ;)
Djikstra’s algorithm seems to be the popular hit for solving this problem – however, I do feel a recursive solution than an iterative one is more appropriate for this. Again this is just an opinion but to get a sense of the power of ruby programing, a recursive program probably paints a better picture. Its also very concise and readable.
A lot of effort was spent in ‘initializing’ the data-structure. This was really nice to see — unlike me, who took a short-cut. The aim was to see the algorithm but I was really proud to see ‘complete’ solutions. Test cases were written in most solutions and it was a pleasure to see them run.
It was interesting to see various forms of Infinity:
Infinity = 1 << 64 Infinity = 1 << 32 Infinity = 1.0 / 0 # simple and ideal Infinity = 1000000 # incorrect
It was interesting to note that very few catered for multiple shortest paths, though the question was raised in the comments earlier. Not trying to be a hypocrite here, I should say that I too did not implement multiple shortest paths – LoL.
My solution to the problem is provided at http://github.com/gautamrege/short_circuit and its merely a representation of my style of programing. Creating a gem was for kicks but lib/short_circuit.rb has the core code. It was great fun AND learning while doing this and I realized that I will always be a student.
Cheers!
Capistrano + Nginx + Thin deployment on Linode
This was long lost post I had written about 8 months ago (converted from wiki to HTML – so pardon typos if any)
Terminologies
Capistrano is a ruby gem which helps in remote deployment. As against widely known convention, Capistrano can be used for any deployment, not just a rails app!
Nginx is a web-proxy server. This is simply a light weight HTTP web-server which received requests on HTTP and passes them to other applications. This is way more preferable than Application servers like Apache! Moreover, nginx is very easily configurable and can support multiple domain-names very easily. It has an in-build load-balancer which can send requests to apps based on its internal load-balancing mechanism.
Thin is the next-generation lean, mean rails server. Its much faster, lighter in memory than mongrel. Its has an internal event based mechanism for request processing and a very high concurrency performance ratio than other rails servers.
Linode is a VPS (a Virtual Private Server) that is hosted by www.linode.com. As the name suggests
, its a “Linux Node”. We are using Ubuntu 8.10 (Tip: To find Ubuntu release, issue command: lsb_release -a) NOTE: In the linode we had, it was a raw machine with no packages installed. Please read Linode RoR package installation for details.
Steps
Capistrano Configuration Follow the steps provided by Capistrano for basic instructions: Capistrano – From The Beginning Some modifications that you may need (as I needed for deployment):
- Edit Capfile and add the following to it. This ensures that remote capistrano deployment does not fork a remote shell using command “sh -c”. Some hosting servers do not allow remote shells.
default_run_options[:shell] = false
set :use_sudo, false set :user, "root"
namespace :deploy do
desc "Custom AceMoney deployment: stop."
task :stop, :roles => :app do
invoke_command "cd #{current_path};./script/ferret_server -e production stop"
invoke_command "service thin stop"
end
desc "Custom AceMoney deployment: start."
task :start, :roles => :app do
invoke_command "cd #{current_path};./script/ferret_server -e production start"
invoke_command "service thin start"
end
# Need to define this restart ALSO as 'cap deploy' uses it
# (Gautam) I dont know how to call tasks within tasks.
desc "Custom AceMoney deployment: restart."
task :restart, :roles => :app do
invoke_command "cd #{current_path};./script/ferret_server -e production stop"
invoke_command "service thin stop"
invoke_command "cd #{current_path};./script/ferret_server -e production start"
invoke_command "service thin start"
end
end
Thin Configuraion I looked up most of the default configuration of Thin and Nginx on Ubunto at Nginx + Thin. Some extra configuration or differences are mentioned below.
- The init script for starting thin and nginx during startup is configured during package installation. Leave them as they are.
- The following command generates the /etc/thin/acemoney.yml for 3 server starting from port 3000. Note that the -c option specifies the BASEDIR of the rails app. Do NOT change any settings in this file as far as possible.
thin config -C /etc/thin/acemoney.yml -c /home/josh/current --servers 3 -e production -p 3000
service thin start service thin stop
Nginx Configuration Installation on nginx is simple on Ubuntu
apt-get install nginx
Configure the base /etc/nginx/nginx.conf. The default configuration are fine but I added / edited a few more for as recommended at Nginx Configuration
worker_processes 4;
gzip_comp_level 2;
gzip_proxied any;
gzip_types text/plain text/html text/css application/x-javascript
text/xml application/xml application/xml+rss text/javascript;
According to this configuration above, nginx will spawn 4 worker threads and each worker thread can process 1024 connections (default setting). So, nginx can now process ~4000 concurrent HTTP requests !!! See performance article of thin at Thin Server
Configure the domainname, in our case acemoney.in. Ensure that acemoney.in “A record” entry points to this server! Check this by doing a nslookup or a ping for the server. In /etc/nginx/sites-available create a file by the domainname to be hosted. So I added /etc/nginx/sites-available/acemoney.in. In /etc/nginx/sites-enabled create a symbolic link to this file.
ln -s /etc/nginx/sites-available/acemoney.in /etc/nginx/sites-enabled/acemoney.in
Now add the contents in /etc/nginx/sites-available/acemoney.in This is the key configuration to hook up nginx with thin.
upstream thin {
server 127.0.0.1:3000;
server 127.0.0.1:3001;
server 127.0.0.1:3002;
}
server {
listen 80;
server_name acemoney.in;
root /home/josh/current/public;
location / {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_redirect false;
if (-f $request_filename/index.html) {
rewrite (.*) $1/index.html break;
}
if (-f $request_filename.html) {
rewrite (.*) $1.html break;
}
if (!-f $request_filename) {
proxy_pass http://thin;
break;
}
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
}
To analyze this configuration, here are some details:
The following lines tell nginx to listen on port 80 for HTTP requests to acemoney.in. The ‘root’ is the public directory for our rails app deployed at /home/josh/current!
server {
listen 80;
server_name acemoney.in;
root /home/josh/current/public;
Now, nginx will try to process all HTTP requests and try to give the response.. for static HTML’s it will automatically give the data from the ‘root’. If it cannot find the HTML file, it will ‘proxy_pass’ it to thin. “thin” in the code below is an ‘upstream’ directive that tells nginx where to forward the current request it cannot directly serve.
if (!-f $request_filename) {
proxy_pass http://thin;
break;
}
The upstream code is where load-balancing plays a role in nginx. The following code tells nginx which all processes are running on which different ports and it forwards requests to any of the servers based on its internal load balancing algorithm. The servers can be on different machines (i.e. different IP addresses) if needed. In AceMoney, we have started 3 thin servers on 3 different ports!
upstream thin {
server 127.0.0.1:3000;
server 127.0.0.1:3001;
server 127.0.0.1:3002;
}
Performance Statistics Nothing is complete without them. Here is what I found out for 3 thin servers and 1 ferret_server.
top - 14:06:10 up 7 days, 22:58, 2 users, load average: 0.00, 0.00, 0.00 Tasks: 84 total, 1 running, 83 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 553176k total, 530868k used, 22308k free, 16196k buffers Swap: 524280k total, 2520k used, 521760k free, 87280k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12424 mysql 18 0 127m 42m 5520 S 0 7.9 0:23.01 mysqld 18338 root 15 0 77572 70m 4392 S 0 13.1 0:06.79 thin 18348 root 15 0 71176 64m 4388 S 0 11.9 0:06.51 thin 18343 root 15 0 68964 62m 4384 S 0 11.5 0:07.20 thin 18375 root 18 0 70912 54m 2660 S 0 10.0 2:34.24 ruby 8141 www-data 15 0 5176 1736 820 S 0 0.3 0:00.07 nginx 8142 www-data 15 0 5176 1724 816 S 0 0.3 0:00.01 nginx 8144 www-data 15 0 5152 1720 816 S 0 0.3 0:00.06 nginx 8143 www-data 15 0 5156 1656 784 S 0 0.3 0:00.00 nginx
As can be seen:
- Each thin server takes around 70M
- The Mysql server takes 41M
- Ruby process (18375 above) is the ferret_serve which takes 54M
- 4 nginx threads take about 1.7K in memory.
Overall: (3 thin server cluster + Mysql + ferret): 300MB
Moving from backgrounDrb to DelayedJob
In my earlier post regarding DelayedJob setup I mentioned how to setup DelayedJob and create news tasks in your existing code. This post provides details of how to move from backgrounDrb to delayed_job. First of all its important to know why ?
- BackgroundDrb takes in a lot of memory. It spawns worker at start-up. You can use MiddleMan for dynamically spawing backgrounDrb tasks but in any case, it slows thing down a little. I have 4 workers and overall the parent process consumes over 120MB — which is huge considering my linode.
- Monitoring these jobs is a little bit of a pain. Moreover, running in development / production mode requires a Drb server which adds more memory overhead.
As we speak, github has introduced Resque, which is inspired from DelayedJob but I plan to currently continue with DelayedJob because I dont use Redis (yet). The blog post for Resque has a LOT of details about issues with current background job schedulers. Worth reading!
OK – so you’re now convinced we should use DelayedJob instead of backgrounDrb but have a lot of tasks already configured. These are the steps to follow:
1. Convert your backgrounDrb workers to standard ruby classes:
class BulkUploadWorker < BackgrounDRb::MetaWorker set_worker_name :bulk_upload_worker def upload(args) end
to
class BulkUploadWorker def perform end
2. If you earlier used arguments to be passed to the backgrounDrb jobs, you need to tweak the code a little.
Suppose I have an upload method which takes ‘arg’ as the Hash parameter, it would be invoked in the controller for backgrounDrb like this:
MiddleMan.worker(:bulk_upload_worker).async_upload(:arg => {
'correction' => correction, 'file' => in_file, 'user' => current_user.id} )
Simple change to DelayedJob
Delayed::Job.enqueue BulkUploadWorker.new(correction, infile, current_user.id)
And change the worker to have a perform method (which is the one which gets called on the job):
BulkUploadTask = Struct.new(:correction, :infile, :user_id) do
def perform
file = File.open(infile)
user = User.find(user_id)
...
If you look closely at the code above, even for an experienced Ruby coder – its no piece of cake. Now, I tried the original approch that was on github of
class BulkUploadWorker < Struct.new (:correction, :infile, :user)
but this gives me a type mis-match error. After some searching on the net.. I found the answer to this, quite understandably from one of the ruby greats JEG2 Here James clearly explains how a Struct.new returns a Class and accepts a block of code for all the methods. Note the use of Symbol :infile in the declaration but data member infile in the perform method.
Since my file was in lib/workers/bulk_upload_worker.rb, we need to explicitly require this file for DelayedJob. At this in the config/initializers/delayed_job.rb. Now, before I can get down to brass tags and incorporate it, I really need to know if this works . First ensure that the task works — directly from console:
RAILS_ENV=production ./script/console >> task - BulkUploadWorker.new(false, 'test_file', 3) >> task.perform
Once the task performs as per expectation, start up delayed_job server and test it out from your Web App. If there are errors or exception, delayed_job stores this in the database. So, its a good idea for command-line users with access to the server, to keep an eye out for errors and exceptions in the database.
Enjoy!
Delayed_job for background processing in Rails
The first thing to do obviously is to install DelayedJob. There are plenty of forked versions available on git-hub. I chose collectiveidea beacuse it was recommend on railscasts. I did refer to this site extensively for setting up delayed_job.
$ sudo gem install collectiveidea-delayed_job
Job half done. I followed instructions on the github page and ensure that I have my environment setup properly. I added the following to the environment.rb:
config.gem 'collectiveidea-delayed_job', :lib => 'delayed_job', :source => 'http://gems.github.com'
and the following to the Rakefile:
begin require 'delayed/tasks' rescue LoadError STDERR.puts "Run `rake gems:install` to install delayed_job" end
Then issue the command:
./script/generate delayed_job rake db:migrate
To start the delayed_job server in production mode, issue:
$ RAILS_ENV=production ./script/delayed_job start
To start it in development mode, issue:
$ rake jobs:work
Interesting find – obvious but cost me a lot of time: Nothing stops one from running BOTH the above commands. Infact, in production mode I ran the delayed_job as a daemon AND also using rake. Silly me – I forgot that if I change any code I would need to restart both. I wrongly assumed that jobs:work only ’showed’ the console — it starts the delayed_job server. So, if you do it this way, you will have twice the number of background jobs floating around
Well, anyway, once I had this properly configured as a daemon, I set about changing code. This was the really aweome part of DelayedJob — no dependencies!! I changed the code from:
user = User.find_by_name('xyz')
user.get_daily_call_statistics
to
user = User.find_by_name('xyz')
user.send_later(:get_daily_call_statistics)
AND IT WORKS !! Awesome. While digging around a little more, I realized the there is not enough scope for debugging in the default way. I looked up github and google for some help and found this useful tip:
1. Create a config/initializers/delayed_job_config.rb and add:
Delayed::Job.destroy_failed_jobs = false Delayed::Worker.logger = Rails.logger
This logs all delayed job output to the environment log files and ensures that failed jobs are not destroyed. There are other settings to reduce the failure attempts and the time for the delayed job but I was too excited to try them out immediately.
2. Suppose I have some global variables in the helpers, the are not accessible in the model methods called via delayed_job. Maybe a bug in delayed_job – I do plan to dig deeper into this and figure this one out — either way. I had to break my head trying to figure this one out.
To conclude, what I had earlier was:
Processing DashboardController#explicit_refresh_daily_statistics (for 121.247.65.47 at 2009-10-29 13:16:33) [GET]
Parameters: {"action"=>"explicit_refresh_daily_statistics", "controller"=>"dashboard"}
last seen ...........
Redirected to http://acemoney.in/dashboard
Completed in 74414ms (DB: 16912) | 302 Found [http://acemoney.in/dashboard/explicit_refresh_daily_statistics]
Now after adding the send_later, I have:
Processing DashboardController#explicit_refresh_daily_statistics (for 121.247.65.47 at 2009-10-29 15:16:41) [GET]
Parameters: {"action"=>"explicit_refresh_daily_statistics", "controller"=>"dashboard"}
last seen ...........
Redirected to http://acemoney.in/dashboard
Completed in 420ms (DB: 99) | 302 Found [http://acemoney.in/dashboard/explicit_refresh_daily_statistics]
This means my response time fell from from 74 seconds to 0.5 seconds
Now, I already had backgrounDrb tasks configured earlier and want to migrate them ’somehow’ to DelayedJob with minimal code. Stay tuned, this post will be updated.
Migrating Acemoney onto a different server with nginx+passenger
Acemoney is a hosted application built by us i.e. Josh Software. Currently, its hosted on a linode with nginx+thin configured. The problem here is that there are 3 thin servers which consume humongous ’stagnant’ memory. We have decided to migrate to nginx+passenger so that we an control in greater detail the number of instances, the memory and the performance.
Some things in the post are specific to Josh Software and its client. Overall, this should give a good clean idea about migrating to passenger.
1. Checkout from the branch from the respository (the 2.3.4 version)
2. Ensure Rails 2.3.4 is installed
3. Edit the nginx configuration. /opt/nginx/conf/servers/acemoney.in
server {
listen 80; server_name acemoney.in; passenger_enabled on; root <path-to-deployment>/acemoney/public; }
4. Restart Nginx. In case you get an error — something like:
2009/10/27 16:02:07 [error] 32685#0: *9 directory index of "/home/gautam/deployment/acemoney/public/" is forbidden, client: 121.247.65.47, server: acemoney.in, request: "GET / HTTP/1.1", host: "acemoney.in"
Check syntax in the conf file (I had forgotten a ‘;’) OR check permissions of the root directory to see if you have given r+x permissions
5. The current gem setup was:
$:/opt/nginx/conf/servers$ sudo gem search *** LOCAL GEMS *** abstract (1.0.0) actionmailer (2.3.4, 2.3.3, 2.2.2, 2.1.2) actionpack (2.3.4, 2.3.3, 2.2.2, 2.1.2) activerecord (2.3.4, 2.3.3, 2.2.2, 2.1.2) activeresource (2.3.4, 2.3.3, 2.2.2, 2.1.2) activesupport (2.3.4, 2.3.3, 2.2.2, 2.1.2) capistrano (2.5.5) cgi_multipart_eof_fix (2.5.0) chronic (0.2.3) contacts (1.0.13) daemons (1.0.10) engineyard-eycap (0.4.7) erubis (2.6.4) eventmachine (0.12.6) fastthread (1.0.7) gem_plugin (0.2.3) heywatch (0.0.1) highline (1.5.0) hoe (1.12.2) json (1.1.6) memcache-client (1.7.2) mislav-will_paginate (2.3.10) mongrel (1.1.5) mongrel_cluster (1.0.5) mysql (2.7) net-scp (1.0.2) net-sftp (2.0.2) net-ssh (2.0.11) net-ssh-gateway (1.0.1) packet (0.1.15) passenger (2.2.5) rack (1.0.0, 0.9.1) rails (2.3.4, 2.3.3, 2.2.2, 2.1.2) rake (0.8.4) RedCloth (4.1.9) right_aws (1.10.0) right_http_connection (1.2.4) rubyforge (1.0.3) rubyist-aasm (2.0.5) thin (1.0.0) tidy (1.1.2) xml-simple (1.0.12)
So, I had to add the following gems:
$ sudo gem install acts_as_reportable
This added the following dependencies:
Successfully installed fastercsv-1.2.3
Successfully installed archive-tar-minitar-0.5.2
Successfully installed color-1.4.0
Successfully installed transaction-simple-1.4.0
Successfully installed pdf-writer-1.1.8
Successfully installed ruport-1.6.1
Successfully installed acts_as_reportable-1.1.1
Successfully installed json_pure-1.1.9
Successfully installed rubyforge-2.0.3
Successfully installed rake-0.8.7
$ sudo gem install prawn $ sudo gem install ferret $ sudo gem install acts_as_ferret
6. Then I got the latest database dump from Acemoney server and configured the database locally. Edit the config/database.yml and the following:
production: adapter: mysql user: acemoney password: <password> host: localhost
Created a new user in mysql and granted ALL permissions to acemoney database
mysql> grant all on acemoney.* to 'acemoney'@'localhost' identified by '<password>'
Then create the database and dump the contents from the backup
$ RAILS_ENV=production rake db:create $ mysql -uacemoney acemoney -p < <backupfile>
Start the ferret_server
$ ./script/ferret_server -eproduction start
Build the index:
$ RAILS_ENV=production rake ace:rebuildFerretIndex
NOW, we are good to go. Make some local changes in /etc/hosts file on your machine to point acemoney.in to the linode IP address. Then http://acemoney.in should take you to the hosted application on nginx+passenger.
Musings on cache-money – Part I
So, I always wanted to find a way of memcaching via ActiveRecord, without having to re-invent the wheel
My investigations initially took me via CachedModel, cache_fu and finally I settled on cache-money. Seems to be *almost exactly* what I wanted – any lookup goes via memcache, any update /edit goes via ActiveRecord + memcache and then if needed to the database.
This is the first of my stunts:
1. Create a basic rails project with a simple posts controller. Being as lazy as I am, I used the ./script/generate scaffold help contents to create my Posts controller!
2. Install the cache-money gem
3. Install memcachd server and configure the rails project (all from the github README of cache-money)
– config/memcache.yml –
production: ttl: 604800 namespace: 'josh1' sessions: false debug: false servers: localhost:11211
– environments/production.rb –
# Use a different cache store in production
config.cache_store = :mem_cache_store
memcache_options = {
:c_threshold => 10000,
:compression => true,
:debug => false,
:namespace => 'josh1',
:readonly => false,
:urlencode => false
}
# require the new gem, this will load up latest memcache
# instead of using the built in 1.5.0
require 'memcache'
# make a CACHE global to use in your controllers instead of
# Rails.cache, this will use the new memcache-client 1.7.2
CACHE = MemCache.new memcache_options
# connect to your server that you started earlier
CACHE.servers = '127.0.0.1:11211'
# this is where you deal with passenger's forking
begin
PhusionPassenger.on_event(:starting_worker_process) do |forked|
if forked
# We're in smart spawning mode, so...
# Close duplicated memcached connections - they will open themselves
CACHE.reset
end
end
# In case you're not running under Passenger (i.e. devmode with mongrel)
rescue NameError => error
end
— config/initializers/cache_money.rb
require 'cache_money'
config = YAML.load(IO.read(File.join(RAILS_ROOT, "config",
"memcached.yml")))[RAILS_ENV]
$memcache = MemCache.new(config)
$memcache.servers = config['servers']
$local = Cash::Local.new($memcache)
$lock = Cash::Lock.new($memcache)
$cache = Cash::Transactional.new($local, $lock)
class ActiveRecord::Base
is_cached :repository => $cache
end
4. Benchmarking – Instead of running benchmarking, I wrote a few lines of code myself to create a 1000 posts with random text ranging from 1 to 100000 letters.
Setup: Mac OS 1.5 (Leopard) with nginx + passenger + memcached on a MacBook Pro laptop. I used Ruby 1.8.6 (default Mac OS 1.5 Version) and Rails 2.3.3
[term1] $ memcached -uroot -vv
$ ./script/console production
Loading production environment (Rails 2.3.3)
>> alphanumerics = [('0'..'9'),('A'..'Z'),('a'..'z')].map {
?> |range| range.to_a}.flatten
=> ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D",
"E","F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S",
"T", "U", "V", "W", "X", "Y", "Z", "a", "b", "c", "d", "e", "f", "g", "h",
"i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w",
"x", "y", "z"]
>> 1000.times do |i|
?> Post.create(:title => i.to_s, :body => (0...100000).map {
?> alphanumerics[Kernel.rand(alphanumerics.size)] }.join)
>> end
Results & Analysis
1. Memcache quickly scaled upto 64MB (default setting) which was cool. I could see the memcache screen scrolling fast to update all objects – in reality it will tries an LRU style purging and gets rid of the oldest objects once its actual memory is ‘full’. Memcache did not crash or stall or thrash — which was great!
2. For each object created in cache, we see the following memcache update:
<22 add lock/Post:1/id/15 0 30 6 >22 STORED <22 set Post:1/id/15 0 86400 280 >22 STORED <22 delete lock/Post:1/id/15 0 >22 DELETED
A GET for any posts results in:
<23 get Post:1/id/15 >23 sending key Post:1/id/15 >23 END
An UPDATE / CREATE / DELETE for any posts results in:
<23 add lock/Post:1/id/15 0 30 6 >23 STORED <23 set Post:1/id/15 0 86400 361 >23 STORED <23 delete lock/Post:1/id/15 0 >23 DELETED
Excellent!!
2. To confirm if we were getting it right, I checked the logs:
First time for GET:
Processing PostsController#show (for 127.0.0.1 at 2009-10-10 16:06:05) [GET]
Parameters: {"id"=>"15"}
Rendering template within layouts/posts
Rendering posts/show
Completed in 5ms (View: 2, DB: 95) | 200 OK [http://josh1.local/posts/15]
Second Time for the same GET request:
Processing PostsController#edit (for 127.0.0.1 at 2009-10-10 16:06:42) [GET]
Parameters: {"id"=>"15"}
Rendering template within layouts/posts
Rendering posts/edit
Completed in 6ms (View: 5, DB: 0) | 200 OK [http://josh1.local/posts/15/edit]
Excellent!!
2. Just to push the pedal, I ran another iteration of 1000 posts with random body text and saw that memcache memory stayed put at 62-64mb. I could see expired cache objects hitting the database and get cached and objects already in the cache NOT hitting the database. Exactly what I wished for.
Some caveats:
- a. Every ‘find’ request hits the database. Understandable but I wonder if this too can hit via ‘cache’ – its not safe or synch’ed but I wonder.
- money-cache still does not support joins or includes or nested attributes. (Time to contribute!! )
Overall: VERY VERY GOOD
Next Steps in Part II:
- Test money-cache on a live project with about 1 million records.
- It has currently ~35 Requests Per Minute.
- Some controllers calls take almost 50% of the time. Gotta reduce that to 5% (hopefully)
Cheers!
10 minutes that get you your dream job
So, you have just graduated or are in the final year of graduation – nice! All set to look out for a job. A ‘dream job’ is a concept of the past – its a job that you want right now. HOWEVER, its still the first 10 minutes of the interview that you actually gain 90% chances of getting your job. Here are a few things I learnt and would like to tell people about interview skills:
- Be prepared. Every job position is different. its better to first profile the company you intend to apply for. Find out what they do, who is the management team, who is the core team and primarily what technologies they work in. For example, if you are applying to Microsoft, don’t have your skill sets shout: I have excellent skills in Java! If you are applying to a kernel or embedded systems company, they would be interested in seeing the first few skills they seek: C, unix etc. NOW, this is not hard and fast rules – mostly companies look for potential and talent but I see no harm in this approach, especially when your resume could get screened !
- Prepare the resume properly. Prepare 2-3 different resumes, each highlighting a different skill. After profiling the company, send the relevant resume there first.
- Always mention your graduation degree and years of experience if any, awards won, scholarships won and other extra curricular activities on your resume where they can be easily seen. This helps as companies prefer team players and all-rounders and not just people with a scholastic apptitude.
- There is no need to prominently mention things like age, marital status and other irrelevant details upfront.
- Dont mention each and every little tiny technical detail. The bigger the resume, the more boring it gets !
- Know every word on your resume – If you say you know Flex, you should better know something relevant in it. You are not expected to be an expert in the field, but at least should have proficiency. If you are not confident
- Think before you talk. During the interview, usually the first few questions are general questions about you, your school etc. This is just to get you comfortable and calm your nerves. Don’t give really lengthy answers cos the attention span of interviews will then reduce
Keep the talk small, precise and brief.
- Once you are asked a technical question – dont answer immediately – pause, think and then answer.
- Speak slowly, there is no hurry.
- Very important: If you dont know – say so! Honesty is indeed the best policy. Sometimes, the interview does tend to go into directions where you are not comfortable – for example, if you are not very comfortable in digital signal processing and the interviewer is aking you something in this field – say it curtly that this is not your area of expertize or interest.
- Its also important to know when to say this is not your area of interest
No point telling the interviewer in Cisco that networking is not your area of interest!
- Once you are asked a technical question – dont answer immediately – pause, think and then answer.
- Don’t open the doors. No! By this, I dont mean be impolite and slam the door
When you are asked a technical question, every word that you mention may be probed into. You may think its wise to mention a few big technologies or words in your answers that make you look good – however, think again! As you start taking a dig at big words, the interviewer will definitely ask more details about what you mentioned. If you are not smart enough to ‘lead the interview‘ it professional suicide. For example, I had an experience with candidate who answered my question of ‘whether he know what are data structures’ and he responded with – ‘yes, I know what are AVL trees’. Next question I teased – ‘Whats the full form of AVL mean? Never heard of it’. He responded with ‘Dont know but its something do with balanced trees.’ Now, I was enjoying myself as he was digging his grave – so I cross examined (while my colleague was silently laughing) – ‘What other trees have you heard of’ – immediate response – ‘rb trees and binary trees’ – my question ” what are rb trees?’ – now he was scratching his head as he did not know how to get out of this maze.
- Keep an ace up your sleeve. Now, here is what another candidate tackled the situation. This is an example of leading the interview. (We hired him btw). My question was simple – ‘What all inter process communication are you familiar with’ – Simple answer: ‘Threads, shared memory, pipes and mutex’. Next question: ‘what are mutex?’ -Simple ‘leading’ answer: ‘Mutual exclusion of code via atomic operations’. Next obvious question: ‘What are atomic operations’ — You can see the candidate what leading the interview. This was probably the ACE he had up his sleeve. ‘Atomic operations are steps which are performed within a CPU cycle using DCAS’– woh – next obvious question – ‘DCAS?’. Nailed the interview: ‘Double compare and set – the assembly operations which ensure that a flag is set and compared twice to ensure the lock is taken’ .. my response – ‘welcome to the family’. It takes some skill but it ALWAYS helps to read some extra technicals and steer the interview to you strenghts !!
- Watch your body language. Seems very wierd how much our body talks in the interview. Sweaty hands and nervousness is obvious – infact if you are not nervous, you are abnormal
What matters is that you calm yourself down – breathe deeply and maintain a constant rhythm. It helps. Trembling hands, sweating is normal. Ask for a glass of water, coffee or tea. It calms you down. Avoid repetitive movements like clicking the ball-point pen, dangling your feet while talking, tapping on the table etc. it distracts !
- Watch your dressing sense. Its better to be formally dressed in an informal interview than to be dressed in shorts when you interviewers are in a suit
This does not mean you dress in a suit for an interview. Dress in casuals – its ok. A shirt and jeans or a shirt and trousers is fine. Avoid floaters (if your feet stink!) What is most important is that you have to be comfortable with what you are wearing – if not, you will be conscious of what you are wearing and will not be able to concentrate !
- Watch your language. Address people politely. Does not have to be ‘Sir’ or ‘Madam’ every time but no harm done. Please do not use slang or abusive language (obviously) and dont get carried away.
- You must learn from the interview. It is NOT wrong to ask for an answer in the interview. You are not expected to know all the answers. But its perfectly normal and a sign of confidence to ask for answers in the interview. It is generally appreciated.
Overall, you should come away from an interview feeling that you have at least learnt something from it ! If you feel that you have not learnt something from it, its not gone well. Also remember, the longer the interview carries on, ‘usually’ it means that its all good. Always recap your interview later and try to find the answers of all the questions you did not know.
Giving interviews is a skill – but its something that can be learnt from and should always be enjoyable.
All the best!