Black Bytes
Share this post!

Category Archives for Programming

mri vs jruby vs rubinius

Battle of Interpreters: MRI vs JRuby vs Rubinius

In this post I want to talk about the different Ruby interpreters available. All the most popular languages have multiple interpreters or compilers (in the case of compiled languages like C), but what’s an interpreter?

An interpreter is a program that reads your source code, converts it into a series of executable instructions & then runs them.

In other words: it’s like a compiler, but it runs your code directly, without producing an output file.

What production-ready interpreters do we have available in Ruby?

  • MRI (The original implementation & what most people use)
  • Rubinius
  • JRuby

There are other interpreters, but most of them are experimental (like topaz) or not actively maintained (like IronRuby).

In the rest of this article I’m going to focus on explaining the main differences between MRI, Rubinius & JRuby so that you can be more informed about the different options available to you 🙂

Meet The Interpreters

Let’s start talking about MRI, the original and most popular interpreter.

MRI stands for “Matz’s Ruby Interpreter”, but some of the core developers prefer to call it “CRuby”. It was created (and is still maintained by) Yukihiro Matsumoto (Matz) in 1995 & it’s written entirely in C.

mri ruby logo

Then we have JRuby, which is written in Java & runs on the JVM (Java Virtual Machine). One thing you can do, that isn’t possible in any other Ruby interpreter, is to use Java libraries in your code.

jruby logo

And the last interpreter we are going to talk about is Rubinius. The main goal of Rubinius is to have a Ruby interpreter written in Ruby itself (but there are still some parts written in C++). I think it’s a great way to take a look at how some things work under the hood if you don’t want to deal with C or Java code.

rubinius logo

Is Anything Missing?

What are the main differences in terms of features? Is there anything missing from JRuby or Rubinius that could prevent you from running them?

Well according to the Rubinius README, the main things missing are Refinements & the TracePoint module from the standard library. There are other things missing, but I think those two stand out the most.

How about JRuby?

JRuby 9.1 claims to be compatible with Ruby 2.3, but I can’t find more details about what level of compatibility we are talking about.

Comparing Performance

So what about performance? Is there a huge gap between the 3 main interpreters?

I ran some benchmarks for you using the latest versions of every interpreter, so you can see for yourself. The results are in iterations per second.

Code:

MRI 2.3.1

JRuby 9.1.5 (OpenJDK 1.8)

Rubinius 3.6

This doesn’t mean that MRI is too slow for regular use or that JRuby is going to be the fastest on every situation so don’t choose your interpreter based on these results. The results change a lot depending on what code you are benchmarking (also one big goal for MRI 3.0 is to x3 performance!).

So if that is not a good way to choose, which interpreter should you be using? Well, most of the time you should be fine with MRI, but there are more differences worth exploring.

Error Output Differences

There are also some differences when it comes to error output & stack traces in particular.

Here is the same stack trace as it appears on each implementation.

MRI

rubinius stack

Rubinius

rubinius-stack

JRuby

jruby stack

The Rubinius backtrace looks the hardest to read to me (because of the extra noise), but the color helps a little 🙂

What do you think? Let me know in the comments!

What About The GIL?

Another important difference between MRI & other interpreters is that MRI has something called the GIL (Global Interpreter Lock).

The GIL is something used internally by MRI to simplify some multi-threading code, but this also has an impact on the code you write.

But before I expand on that, let me give you some required background on concurrency theory.

Threads can work in two ways: concurrency or parallelism.

Concurrency means that, while you can have multiple tasks active, only one can use the CPU. What happens is that the tasks take turns, similar to how process scheduling works. This is what you get with MRI, the job of the GIL is to only let one thread run at a time.

On the other hand you have parallelism, this is full-on multi-threading, where you can have multiple tasks running at the same time. This is the only way to take advantage of multi-core or multi-cpu systems.

So you may be asking, why is the GIL a thing? Well, concurrency is hard, there are many things that can go wrong (like dead locks & race conditions). So Matz decided many years ago (when multi-threading was not as prevalent) to include the GIL to avoid most of these issues.

In summary, what this means to you as a Ruby developer:

  • You can still use Threads in MRI & they are still very effective for IO-heavy workloads.
  • If you need true parallelism you may want to try Rubinius or JRuby as your main interpreter.
  • A goal for Ruby 3.0 is to remove the GIL, so you may not need to switch interpreters anyway 🙂

Conclusion

In this post you have learned about the different Ruby interpreters available (MRI, Rubinius & JRuby) and how they differ from each other.

If you found this useful don’t forget to click on those share buttons 🙂

Ruby ObjectSpace

What’s Happening in Your Ruby Application?

What would you do if you wanted to know what’s going on with your Ruby application?

In Ruby we don’t have fancy tools like Java, but we have the ObjectSpace module which can give you some information about the current state of your application.

Counting Objects

Using ObjectSpace you can know what objects are currently ‘alive’ in your program.

What does it mean for an object to be alive? An object is alive as long as it has any references pointing to it. A reference is just a way to access the object, like a variable or a constant. If an object can’t be reached then it means that it’s safe to be removed from memory.

Example:

Now let’s see an example of ObjectSpace in action:

This will print a table with the object count for your top-10 classes.

If you suspect of a memory leak you could log this data every hour & find out if there is some object count that keeps increasing all the time but never goes down.

Fun with Objects

When using ObjectSpace you get access to the actual objects, not just information about them, so you can do some fun things like printing the value of all the strings or printing the path of all your File objects.

Example:

This will print all the in-memory strings, sorted by size. You will notice that there are many strings that you didn’t create yourself, they are created by the Ruby interpreter.

Practical uses? Well, this is mostly for debugging & gathering stats about your app 🙂

Object Memory Size

Another thing you can do is to use ObjectSpace.memsize_of to find the memory size of a particular object.

Example:

One thing to keep in mind is this warning from the documentation:

“Note that the return size is incomplete. You need to deal with this information as only a HINT.”

If you try this method with different types of objects you will find some interesting things, like Fixnums always returning 0.

The reason for this is that Ruby doesn’t internally create Fixnum objects, you can learn more about this on the post I wrote about numbers in Ruby.

Another interesting one are strings:

I use "A" * size as a way to create a longer string without having to type it out 🙂

Wait! What did just happen? Well, it turns out that Ruby has a built-in optimization for strings smaller than 24 characters, that’s why there is a jump in memory use after that. You can see this in more detail in this post from Pat Shaughnessy.

Finding Aliased Methods

Wouldn’t it be nice if there was a ‘master’ list of all the aliased methods in Ruby?

Wish granted! Take a look at this:

I got this code from a Stackoverflow answer. It defines an aliased_methods method on the Module class, which uses the instance_methods method to get a list of all the instance methods defined on a class.

I know that may sound a bit confusing, but that’s metaprogramming for you!

Here is the rest of the code, which builds an array of all the class names that have at least one ‘alive’ object, then it calls aliased_methods on every class & prints the output.

This is what the output looks like:

Conclusion

I hope you enjoyed learning about the cool things you can do with ObjectSpace, now go try it out and let me know if you find anything interesting!

Don’t forget to share this post with all your programmer friends, it will help them learn something new & it will help me get more readers 🙂

build a web server in ruby

Build Your Own Web Server

Have you ever built your own web server? I think this is a great learning exercise & in this post you will learn how to do this, step-by-step!

Listening For Connections

So where do we start? The first thing that we need is to listen for new connections on TCP port 80. I already wrote a post about network programming in Ruby, so I’m not going to explain how that works here.

I’m just going to give you the code:

When you run this code you will have a server that accepts connections on port 80. It doesn’t do much yet, but it will allow you to see what an incoming request looks like.

Note: To use port 80 in a Linux/Mac system you will need root privileges. As an alternative, you can use another port above 1024. I like 8080 🙂

An easy way to generate a request is to just use your browser or something like curl.

When you do that you will see this printed in your server:

This is an HTTP request. HTTP is a plain-text protocol used for communication between web browsers and web servers.

The official protocol specification can be found here: https://tools.ietf.org/html/rfc7230.

Parsing The Request

Now we need to break down the request into smaller components that our server can understand.

To do that we can build our own parser or use one that already exists. We are going to build our own so we need to understand what the different parts of the request mean.

This image should help:

http://i.imgur.com/WEhYtyK.png

The headers are used for things like browser caching, virtual hosting and data compression, but for a basic implementation we can ignore them & still have a functional server.

To build a simple HTTP parser we can take advantage of the fact that the request data is separated via new lines (\r\n). We are not going to do any error or validity checking to keep things simple.

Here is the code I came up with:

This will return a hash with the parsed request data. Now that we have our request in a usable format we can build our response for the client.

Preparing & Sending The Response

To build the response we need to see if the requested resource is available. In other words, we need to check if the file exists.

Here is the code I wrote for doing that:

There are two things happening here. First, if the path is set to / we assume that the file we want is index.html. Second, if the requested file is found, we are going to send the file contents with an OK response.

But if the file is not found then we are going to send the typical 404 Not Found response.

This table contains the most common response codes:

Code Description
200 OK
301 Moved permanently
302 Found
304 Not Modified
400 Bad Request
401 Unauthorized
403 Forbidden
404 Not found
500 Internal Server Error
502 Bad Gateway

Here are the “send” methods that are used in the last example:

And here is the Response class:

The response is built from a template & some string interpolation.

At this point we just need to tie everything together in our connection-accepting loop and then we should have a functional server.

Try adding some HTML files under the SERVER_ROOT directory and you should be able to load them from your browser. This will also serve any other static assets, including images.

Of course a real web-server has many more features that we didn’t cover here.

Here is a list of some of the missing features, so you can implement them on your own as an exercise (practice is the mother of skill!):

  • Virtual hosting
  • Mime types
  • Data compression
  • Access control
  • Multi-threading
  • Request validation
  • Query string parsing
  • POST body parsing
  • Browser caching (response code 304)
  • Redirects

A Lesson on Security

Taking input from a user & doing something with it is always dangerous. In our little web server project, the user input is the HTTP request.

We have introduced a little vulnerability known as “path traversal”. People will be able to read any files that our web server user has access to, even if they are outside of our SERVER_ROOT directory.

This is the line responsible for this issue:

You can try to exploit this issue yourself to see it in action. You will need to make a “manual” HTTP request, because most HTTP clients (including curl) will pre-process your URL and remove the part that triggers the vulnerability.

One tool you can use is called netcat.

Here is a possible exploit:

This will return the contents of the /etc/passwd file if you are on a Unix-based system. The reason this works is because a double dot (..) allows you to go one directory up, so you are “escaping” the SERVER_ROOT directory.

One possible solution is to “compress” multiple dots into one:

When thinking about security always put your “hacker hat” on & try to find ways to break your solution. For example, if you just did path.gsub!("..", "."), you could bypass that by using triple dots (...).

Summary

In this post you learned how to listen for new connections, what an HTTP request looks like & how to parse it. You also learned how to build the response using a response code and the contents of the required file (if available).

And finally you learned about the “path traversal” vulnerability & how to avoid it.

I hope you enjoyed this post & learned something new! Don’t forget to subscribe to my newsletter on the form below, so you won’t miss a single post 🙂

ruby numbers fixnum

Behind The Scenes: How Numbers Work in Ruby

Ruby 2.4 will be merging both Fixnum & Bignum into the same class (Integer) so I think this is a good time to review the different number types in Ruby!

And that’s what we are going to talk about in this post 🙂

An Overview of Number Types

Let’s start by taking a look at the class hierarchy of all the number related classes in Ruby:

As you can see, the Numeric class is the parent for all the number classes. Remember that you can use the ancestors method to discover the parent classes for any class.

Example:

Now let’s see these classes in table form:

Class Description Example
Fixnum Normal numbers that fit into the OS integer type 1
Bignum Used for bigger numbers 111111111111
Float Imprecise decimal numbers 5.0
Complex Used for math stuff with imaginary numbers (1+0i)
Rational Used to represent fractions (2/3)
BigDecimal Perfect precision decimal numbers 3.0

Float Imprecision

You may have noticed that in the description for the Float class it says “imprecise”, what’s the meaning of that?

Let me show you with an example:

Why is this false? Let’s look at the result of 0.2 + 0.1.

And that’s what I mean by imprecision! The reason this happens is because of the way that a float is stored. If you need decimal numbers that are always accurate you can use the BigDecimal class.

Example:

Why don’t we always use BigDecimal then? Because it’s a lot slower!

Here is a benchmark:

BigDecimal is 12 times slower than Float, and that’s why it’s not the default 🙂

Fixnum vs Bignum

In this section I want to explore the differences between Fixnum and Bignum.

Let’s start with some code:

Ruby creates the correct class for us, and it will automatically promote a Fixnum to a Bignum when necessary.

Note: You may need a bigger number to get a Bignum object if you have a 64-bit Ruby interpreter.

Why do we need different classes? The answer is that to work with bigger numbers you need a different implementation, and working with big numbers is slower, so we end up with a similar situation to Float vs BigDecimal.

The Fixnum class also has some special properties. For example, the object id is calculated using a formula.

The formula is: (number * 2) + 1.

But there is more to this, when you use a Fixnum there is no object being created at all. There is no data to store in a Fixnum, because the value is derived from the object id itself. This is just an implementation detail, but I think it’s interesting to know 🙂

MRI (Matz’s Ruby Interpreter) uses these two macros to convert between value & object id:

What happens here is called “bit shifting”, which moves all the bits to the left or the right. Shifting one position to the left is equivalent to multiplying by 2 & that’s why the formula is (number * 2) + 1. The +1 comes from the FIXNUM_FLAG.

In contrast, Bignum works more like a normal class & uses normal object ids:

All this means is that Fixnum objects are closer to symbols in terms of how they work at the interpreter level, while Bignum objects are closer to strings.

Summary

In this post you learned about the different number-related classes that exist in Ruby.

You learned that floats are imprecise, and that you can use BigDecimal if accuracy is a lot more important than performance. And after that you learned that Fixnum objects are special at the interpreter level, but Bignums are just regular objects.

If you found this post interesting don’t forget to sign-up to my newsletter in the form below 🙂

ruby shell readline

Writing a Shell in 25 Lines of Ruby Code

If you use Linux or Mac, every time you open a terminal you are using a shell application. A shell is just an interface that helps you execute commands in your system.

In addition to that, the shell also hosts environment variables & has useful features like a command history and auto-completion.

If you are the kind of person that likes to learn how things work under the hood, this post will be perfect for you!

How Does a Shell Work?

To build our own shell application let’s think about what a shell really is: first, there is a prompt, usually with some extra information like your current user & current directory, then you type a command & when you press enter the results are displayed on your screen.

Yeah, that sounds pretty basic, but doesn’t this remind you of something?

If you are thinking of pry then you are right! A shell in basically a REPL (Read-Eval-Print-Loop) for your operating system.

So knowing that we can write our first version of your shell:

This will give us a minimal, but functional shell. We can improve this by using a library that many other REPL-like applications use. That library is called Readline.

Using The Readline Library

Readline is part of the Ruby Standard Library, so there is nothing to install, you just need to require it.

One of the advantages of using Readline is that it can keep a command history automatically for us. It can also take care of printing the command prompt & many other things.

Here is v2 of our shell, this time using Readline:

This is great, we got rid of the two puts for the prompt & now we have access to some powerful capabilities from Readline. For example, we can use keyboard shortcuts to delete a word (CTRL + W) or even search the history (CTRL + R)!

Let’s add a new command to print the full history:

Fun fact: If you try this code in pry you will get pry’s command history! The reason is that pry is also using Readline, and Readline::HISTORY is shared state.

Now you can type hist to get your command history 🙂

Adding Auto-Completion

Thanks to the auto-completion feature of your favorite shell you will be able to save a lot of typing. Readline makes it really easy to integrate this feature into your shell.

Let’s start by auto-completing commands from our history.

Example:

With this code you should be able to auto-complete previously typed commands by pressing the <tab> key. Now let’s take this a step further & add directory auto-completion.

Example:

The completion_proc returns the list of possible candidates, in this case we just need to check if the typed string is part of a directory name by using Dir.glob. Readline will take care of the rest!

Implementing The System Method

Now you should have a working shell, with history & auto-completion, not too bad for 25 lines of code 🙂

But there is something that I want to dig deeper into, so you can get some insights on what is going on behind the scenes of actually executing a command.

This is done by the system method, in C this method just sends your command to /bin/sh, which is a shell application. Let’s see how you can implement what /bin/sh does in Ruby.

Note: This will only work on Linux / Mac 🙂

The system method:

What happens here is that fork creates a new copy of the current process, then this process is replaced by the command we want to run via the exec method. This is a very common pattern in Linux programming.

If you don’t fork then the current process is replaced, which means that when the command you are running (ls, cd or anything else) is done then your Ruby program will terminate with it.

You can see that happening here:

Conclusion

In this post you learned that a shell is a REPL-like interface (think irb / pry) for interacting with your system. You also learned how to build your own shell by using the powerful Readline library, which provides many built-in features like history & auto-completion (but you have to define how that works).

And after that you learned about the fork + exec pattern commonly used in Linux programming projects.

If you enjoyed this post could you do me a favor & share it with all your Ruby friends? It will help the blog grow & more people will be able to learn 🙂