Using ruby threads you can make your applications do multiple things at the same time, making them faster.
In MRI (Matz’s Ruby Interpreter) you will only benefit from threads in i/o bound applications. This limitation exists because of the GIL (Global Interpreter Lock), alternative Ruby interpreters like JRbuy or Rubinius can take full advantage of multi-threading.
So, what are threads? You can think of them as units of execution or workers. Every process has at least one thread and you can create more on demand.
An i/o bound app is one that needs to wait for an external resource: a network connection, a disk read, etc. A thread can decide to stop while it waits for this resource to be available, this means that another thread can run and do its thing and not waste time waiting.
One example of an i/o bound app is a web crawler. For every requests the crawler does it has to wait for the server to respond, and it can’t do anything while waiting. But if you are using threads, you could make 4 request at a time and handle the responses as they come back, which will let you fetch pages faster. Sounds interesting? Let’s get started!
To create a new thread we call Thread.new, and pass in a block with the code that will be executed in that thread.
1 |
Thread.new { puts "hello from thread" } |
Pretty easy, right? However, if you have the following code you will notice that there is no output from the thread:
1 2 |
t = Thread.new { puts 10**10 } puts "hello" |
The problem is that Ruby doesn’t wait for threads to finish. You need to call the .join method on your thread to fix the code above:
1 2 3 |
t = Thread.new { puts 10**10 } puts "hello" t.join |
During your exploration of ruby threads you may find the documentation useful: http://ruby-doc.org/core-2.3.0/Thread.html
If an exception happens inside a thread it will die silently without stopping your program or showing any kind of error message. Here is an example:
1 |
Thread.new { raise 'hell' } |
For debugging purposes, you may want your program to stop when something bad happens. To do that you can set the following flag on Thread to true:
1 |
Thread.abort_on_exception = true |
Let’s say you have hundreds of items to process, starting a thread for each of them is going to destroy your system resources. It would look something like this:
1 2 3 4 5 |
pages_to_crawl = %w( index about contact ... ) pages_to_crawl.each do |page| Thread.new { puts page } end |
If you do this you would be launching hundreds of connections against the server, so that’s probably not a good idea. One solution is to use a thread pool.
Thread pools allow you to control the number of active threads at any given time.
You could build your own pool, but I wouldn’t recommend it. In the following example we are using the celluloid gem to do this for us.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
require 'celluloid' class Worker include Celluloid def process_page(url) puts url end end pages_to_crawl = %w( index about contact products ... ) worker_pool = Worker.pool(size: 5) # If you need to collect the return values check out 'futures' pages_to_crawl.each do |page| worker_pool.process_page(page) end |
This time only 5 threads will be running, and as they finish they will pick the next item.
This may sound all very cool but before you go out sprinkling threads all over your code you must know that there are some problems associated with concurrent code.
For example, threads are prone to race conditions. A race condition is when things happen out of order and make a mess.
Another problem that can happen is a deadlock, this is when one thread holds exclusive access (using a locking system like a mutex) to some resource and never releases it, which makes it inaccessible to all the other threads.
To avoid these issues, it’s best to avoid raw threads and stick with some gem that already takes care of the details for you.
We already used celluloid for our thread pool, but there are many other concurrency-focused gems that you should check out:
https://celluloid.io/
https://rubygems.org/gems/thread
https://github.com/grosser/parallel
https://github.com/chadrem/workers
https://github.com/ruby-concurrency/concurrent-ruby
Ok that’s it, hopefully you learned a thing or two about ruby threads! If you found this article useful please share it with your friends so they can learn too 🙂