How to Make Your Classes More Powerful by Implementing Equality

6 days ago /
By Jesus Castello

How do you compare two things in Ruby? Using == as you already know… but did you know that == is a method & not just syntax?

You can implement this method in your own classes to make them more powerful. And that’s what I want to talk about in this post.

Equality Basics

As you know you can compare two strings like this:

1	"foo" == "foo"

And if the content is equal then this will evaluate to true. This works because the String class implements a == method that knows how to compare strings.

But what if String didn’t implement ==?

Then Ruby would use Object‘s implementation of ==, which defaults to testing for object identity, instead of object contents.

Example:

1 2	Object.new == Object.new # false String.new == String.new # true

Implementing Equality

Now let’s use what you just learned to make your own classes more powerful by being able to compare them.

Thanks to the == method you can define exactly what it means for two instances of your own class to be equal.

Example:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

class Product

attr_reader :name, :price

def initialize(name, price)

@name, @price = name, price

end

def ==(other)

self.name == other.name &&

self.price == other.price

end

p1 = Product.new('book', 49)

p2 = Product.new('book', 49)

p1 == p2 # true

The == method says that both the name and the price must be the same for two Product objects to be considered equal.

Remember:

If you don’t implement this method (or use the Comparable module, which I explain in my Ruby book) the two objects will be compared using their object id’s, instead of their values.

Also I should mention that if you use a Struct it already implements == for you.

What About Triple Equals?

You may be wondering if == is a method, is === also a method? And the answer is yes 🙂

So what’s the difference between the two?

In Javascript there is a clear difference, where == will try to convert the object types to be the same if they aren’t (1 vs '1'). And === is for ‘strict’ equality.

But in Ruby there is not such thing. What === means depends on the class implementing it.

In many cases it is just an alias for ==.

Like in String and Object.

Here’s a table of built-in classes which give === a special meaning:

Class	Meaning
Range	Returns true if obj is an element of the range, false otherwise.
Regexp	Match regexp against a string.
Module	Returns true if obj is an instance of mod or and instance of one of mod’s descendants.
Proc	Invokes the block with obj as the proc’s parameter like `Proc#call`. It is to allow a proc object to be a target of a `when` clause in a case statement.

Conclusion

In this post you learned how to make your classes more powerful by implementing the == method. You also learned the difference between == and ===.

Don’t forget to share this post so more people can see it 🙂

15 Weird Things About Ruby That You Should Know

a couple of weeks ago /
By Jesus Castello /
18 COMMENTS

15 Weird Things About Ruby That You Should Know

By Jesus Castello

Ruby is an amazing language with a lot of interesting details that you may not have seen before...

…in this post I compiled some of those details for your own enjoyment in a nice-looking list :)

1

Heredoc + Method

If you have some data that you want to embed into your program you may want to use a “heredoc”.

Like this:

1

2

3

4

5

input = <<-IN

ULL

RRDDD

LURDL

IN

This will give you a string. But you may want to do some post-processing, like splitting this string into an array of strings.

Ruby lets you do this:

1

2

3

4

5

input = <<-IN.split

ULL

RRDDD

LURDL

IN

Bonus tip:
Ruby 2.3 introduced the "squiggly heredoc" <<~. This will remove all the extra spaces introduced by indentation, which is a common problem when using heredocs for text.

2

Call a Method Using Double Colon

Apparently this is a thing…

1

2

3

4

5

"abc"::size

# 3

[1,2,3,4,5]::size

# 5

3

Puts with Multiple Arguments

Pretty simple, but could be useful in some situations I guess.

1

2

3

4

puts 1,2,3

1

2

3

4

Infinite Indexing

Example:

1 2	words = ["abc", "foo"] words[0][0][0][0][0]

This works because [] is just a method & it keeps returning the first character, which is also a string.

5

De-structuring Block Arguments (or whatever you want to call this)

Want to get rid of some local variables? You will love this trick!

1

2

3

4

5

a = [[1,2],[3,4]]

a.each do |(first, last)|

# ...

end

This has the same effect as if we did this:

1

2

3

4

5

6

a = [[1,2],[3,4]]

a.each do |sub_array|

first, last = sub_array

# ...

end

But it saves you one line of code 🙂

6

Special Global Variables

When you use a regular expression with capture groups it will set the global variable $1 to the first group, $2 for the second group, etc.

The thing about these is that they don't behave like normal variables. They are ‘method-local’ & ‘thread-local’, as described by the documentation.

Also they can’t be directly assigned to like regular global variables.

1 2	$1 = 'test' # SyntaxError: (eval):2: Can't set variable $1

7

Shovel Method on Strings

There is this “shovel” method on string which doesn’t do what you would expect when you use it with a number…

1 2	"" << 97 # a

…it’s interpreting the number as an ASCII character.

Here’s another way to do that:

1 2	97.chr # a

8

Character Literals

Not sure if there are any practical uses for this one…

1

2

3

4

5

?a

"a"

?aa

# Syntax error

Let me know in the comments what you think 🙂

9

The RbConfig Module

RbConfig is a module which is not documented & it contains some info about your Ruby installation.

1 2	RbConfig.constants [:TOPDIR, :DESTDIR, :CONFIG, :MAKEFILE_CONFIG]

There is some useful info under RbConfig::CONFIG (like compile flags, ruby version & operating system).

1

2

3

4

5

RbConfig::CONFIG['host_os']

# "linux-gnu"

RbConfig::CONFIG['ruby_version']

# "2.4.0"

10

Spaces, Spaces Everywhere!

You can put as many spaces as you want between a method call & the receiver of that call.

1

2

3

4

5

a = [1,2,3]

a [0]

a .size

a . empty?

Yes, this is valid Ruby syntax 🙂

11

Infinite Nesting of Constants

You can have an infinite amount of nested constants like this:

1	String::String::Fixnum::Float

The reason this works is that all top-level constants (defined outside any class) are contained in the Object class & every class inherits from Object by default.

Try Object.constants to see what I mean.

12

Chaining the Shovel Operator

You can chain the shovel << operator multiple times:

1

2

3

4

5

a = []

a << 1 << 2 << 3

# a = [1, 2, 3]

13

BEGIN & END

Two keywords that you don’t see very often are BEGIN & END. I believe these come from the Perl / Unix world, where it’s common to write short scripts for processing output from other programs.

Let’s see an example of how this works:

1

2

3

4

5

puts 123

BEGIN {

puts "Program starting..."

}

This code will print "Program starting..." before it prints 123. It could be useful if you are writing the kind of short scripts that this is meant for, but probably not very useful in web applications.

Update:
Reader Ronald sent me some interesting uses for this trick. Here is what he said:

"It is very useful, for example for fiddling with the RUBYLIB path for the 'require' statements, because it is guaranteed to be executed before all the 'require'. I also use it to set $VERBOSE to true, or set some environment variables, etc."

14

Flip-Flop

I don’t even know why this is a thing, but I would advice to stay away from it because it can be confusing & most people are not familiar with this feature

But it could be useful to know in case you find this in other people’s code.

This is the syntax:

1

2

3

if (condition)..(condition)

# do something

end

The idea is that once the first condition is true an invisible “switch” will turn on & everything from there will evaluate as true until the 2nd condition is true.

Example:

1

2

3

(1..20).each do |i|

puts i if (i == 3)..(i == 15)

end

This prints all the numbers from 3 to 15, but if you skip 15 it will keep printing.

15

Redo Keyword

Another keyword that you don’t see very often is redo, this allows you to repeat the same iteration inside a loop…

1

2

3

4

10.times do |n|

puts n

redo

end

…but unless you use next or break you will have an infinite loop. So I think you should not use this feature.

Conclusion

You learned about a few cool Ruby tricks & tips. If you want more see my other post here.

Don’t forget to share this post so more people can see it!

Hash Tables Explained

last month /
By Jesus Castello

One of my favorite data structures is the hash table because it’s simple & powerful. You probably have used it before since it’s an efficient way to store key-value pairs.

There are some interesting computer science concepts behind the implementation of a hash table that are worth studying, and that’s exactly what we are going to do in this article!

Buckets & The Hash Function

The basic idea of a hash table is to allow us to efficiently (in O(1)) retrieve data that is indexed by a key.

As a quick refresher, this is what using a hash table looks like in Ruby:

1

2

3

4

5

prices = {

apple: 0.50,

ice_cream: 3,

steak: 10

}

To implement a hash table we need two components:

A place to store the table entries
A way to assign key/value pairs to a specific position (index) inside this data store

In other words, we need an array & a hash function.

Implementing a Simple Hash Function

The hash function is an important component of a hash table. This function transforms the key into an index that can be used to lookup or update its associated value.

ruby hash tables

This is the big difference between plain arrays & hash tables.

In an array, you can only access values via their index and the index can only be a number. In a hash table, you access values via their key & the key can be anything (string, symbol, integer…), as long as you can write a hash function for it.

You can write a simple hash function for strings by converting all the letters into their ASCII values then adding them up.

Here is an example:

1

2

3

4

5

BUCKETS = 32

def hash(input)

input.to_s.chars.inject(0) { |sum, ch| sum + ch.ord } % BUCKETS

end

In this method we use to_s to make sure we are working with a string. This will help us avoid ‘undefined method’ errors. Then a combination of chars (to convert the string into an Array of its characters) & inject for adding up the values.

Inside the block I used the ord method to convert the characters into their ordinal values.

Finally, I used the modulo operator % to make sure the resulting value fits into the array. We call each entry in this array a ‘bucket’.

Bucket Distribution

Ideally we want all our buckets to be filled evenly, this will give us the best performance when we need to retrieve a value.

Let’s look at what happens when we test our hash function using this code:

1

2

3

4

5

6

7

8

9

10

11

# Create an array of size BUCKETS with all elements set to 0

table = Array.new(BUCKETS) { 0 }

letters = Array('a'..'z')

10_000.times do

# Create a random string

input = Array.new(5) { letters.sample }.join

# Count hash distribution

table[hash(input)] += 1

end

This produces the following results:

1	[302, 290, 299, 309, 321, 293, 316, 301, 296, 306, 340, 321, 313, 304, 318, 296, 331, 306, 348, 330, 310, 313, 298, 292, 304, 315, 337, 325, 325, 331, 319, 291]

It looks like our keys are evenly distributed…

…but what happens if we ramp up the number of buckets?

In this example I used a bucket size of 128 (it was 32 before):

1

[22, 24, 33, 36, 41, 58, 61, 66, 97, 77, 88, 110, 89, 82, 123, 121, 119, 111, 147, 178, 136, 176, 144, 180, 190, 193, 185, 192, 223, 209, 208, 196, 215, 251, 233, 226, 231, 236, 219, 218, 227, 221, 206, 220, 208, 213, 201, 191, 182, 165, 188, 141, 160, 135, 130, 117, 139, 106, 121, 85, 70, 93, 74, 61, 57, 54, 40, 46, 32, 36, 30, 21, 25, 17, 14, 16, 16, 14, 8, 11, 5, 5, 1, 1, 2, 1, 3, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 4, 3, 6, 0, 2, 9, 13, 11, 12, 14, 12, 23, 12, 22]

That doesn’t look like a great distribution anymore!

What’s going on?

The problem is that our hash function is not good enough because all the strings of the same size stay in a certain range. That’s why we have a lot of empty buckets in the middle.

A Better Hash Function

We need a better way to convert our string into an index. Let’s take a look at one possible implementation.

1

2

3

4

5

6

7

BUCKETS = 256

def hash(input)

input.to_s.each_char.inject(0) do |sum, ch|

(sum << 8) ^ (ch.ord) ^ (sum >> 4)

end % BUCKETS

end

What’s happening here is bit shifting (with the >> & << operators). The values are combined using the “exclusive OR operator” (^).

This bit shifting will mix things up, which will give us a better key distribution. Not perfect, but it’s better than our simple ASCII-based function 🙂

If you want a proper hash function you would be looking at something like MurmurHash, which I believe is what Ruby uses.

Handling Collisions

We don’t have a useful hash table yet.

Why?

Well, you may have noticed that when two keys hash to the same index they will overwrite the old value, and that’s not good!

This is called a hash collision & there are a few strategies to deal with these.

For example:

Double Hashing
Linear probing
Separate chaining

Let’s take a look at separate chaining, where we use a linked-list to store the entries for a particular “bucket”.

So if we assume that :abc & :ccc hash to the same index, our hash table would look something like this:

1

2

3

3: [:abc, 100] -> [:ccc, 200]

4: nil

5: [:yx, 50]

Then we will need a linear search to find the correct key.

This will have an impact on our performance because our lookup time can slowly degrade towards O(n), instead of the expected O(1).

If you are not familiar with this O(something) notation that’s called “Big-O Notation“.

To avoid the linked list from growing too deep & therefore degrade the performance of the hash table, it’s necessary to recreate the hash table using a bigger number of buckets.

Ruby does this for you automatically, but still good to know.

Conclusion

The purpose of this article is not to have you writing a hash table implementation, but to help you understand how they actually work, so I hope that you found that interesting!

Don’t forget to share this post to keep the blog going 🙂

Packing & Unpacking: A Guide to Reading Binary Data in Ruby

last month /
By Jesus Castello

Working with text is a lot easier than working with binary data…

…with text you can use regular expressions & methods like scan, match & gsub.

But if you want to work with binary data there is some extra work to do. That’s where the Array#pack & String#unpack methods come into play.

Let me show you some examples, starting with just a plain string & then moving on to more interesting stuff.

String to ASCII Values

This will convert every character in the string into a decimal value:

1

2

3

4

str = "AABBCC"

str.unpack("c*")

# [65, 65, 66, 66, 67, 67]

Notice the "c*" argument for unpack.

This is a “format string” which tells unpack what to do with the data. In this case, c means take one character & convert it into an integer value (the String#ord method also does this).

The asterisk * just says “repeat this format for all the input data”.

Hex to Integer

This format string takes 4 bytes of data & returns an integer. One thing to notice is that these bytes are in “little-endian” format.

Examples:

1 2	"\xff\x00\x00\x00".unpack("l").first # 255

1 2	"\x90\xC0\xDD\x08".unpack("l").first # 148750480

I used first here because unpack returns an array.

Binary File Parsing

How do you read a binary file like an EXE, PNG or GZIP?

If you treat these like strings you will just see something that looks like random data…

ruby string pack

…but there is a documented structure for most of these file formats & the unpack method is what you would use to read that data and convert it into something useful.

Here is an example:

1

2

3

4

binary_data = "\x05\x00\x68\x65\x6c\x6c\x6f"

length, message = binary_data.unpack("Sa*")

# [5, "hello"]

In this example, the binary data (represented in hexadecimal, which is way more compact than 1s & 0s) has a two-byte (16 bit) length field that contains the length of the following string. Then there is the string itself.

It is very common for binary files & binary network protocols to have a “length” field.

This tells the parser exactly how many bytes should be read (and yes, I know in this example I read both the length & the data in one step, that’s just to keep things simple).

There is also the bindata gem, which is built specifically to help you parse binary structures.

Here is an example:

1

2

3

4

5

class BinaryString < BinData::Record

endian :little

uint16 :len

string :name, :read_length => :len

end

Notice the read_length parameter. This will tell bindata to work out the length from the field, so this will save you a lot of work 🙂

So if you want to write a parser for any binary format, these are the steps:

Find the specification for this format (if it’s not public you will have to reverse-engineer it, which is an entire topic on its own)
Write a bindata class for every section of the file (you will usually find a header section first with metadata & then multiple data sections)
Read the data & process it however you want (for example, in a PNG you could change the colors of the image)
Profit!

If you want to see a full example of bindata in action take a look at my PNG parser on github.

Base64 Encoding

There is this type of encoding called “Base64”. You may have seen it before on a URL.

Looks something like this:

1	U2VuZCByZWluZm9yY2VtZW50cw==

The double equals at the end is usually the tell-tale sign that you are dealing with Base64, although some inputs can result in the equals signs not being there (they are used as padding).

So why I’m telling you this, besides being a useful thing to know in itself?

Well it turns out that you can convert a string into Base64 using the pack method.

As you can see here:

1

2

3

4

5

6

7

def encode64(bin)

[bin].pack("m")

end

encode64 "abcd"

# "YWJjZA==\n"

In fact, this is the exact method used in the Base64 module from the standard library 🙂

Summary

In this post you learned about the pack & unpack methods, which help you work with binary data. It can be used to parse binary files, convert a string into ASCII values & Base64 encoding.

Don’t forget to share & subscribe so you can enjoy more blog post like this! 🙂

How To Spy on Your Ruby Methods

a couple of months ago /
By Jesus Castello /
7 COMMENTS

Ruby has a built-in tracing system which you can access using the TracePoint class. Some of the things you can trace are method calls, new threads & exceptions.

Why would you want to use this?

Well, it could be useful if you want to trace the execution of a certain method. You will be able to see what other methods are being called & what are the return values.

Let’s see a few examples!

Tracing Method Calls

Most of the time you will want TracePoint to trace application code & not built-in methods (like puts, size, etc).

You can do this using the call event.

Example:

1

2

3

4

5

6

7

8

9

10

11

12

13

def the_method; other_method; end

def other_method; end

def start_trace

trace =

TracePoint.new(:call) { |tp| p [tp.path, tp.lineno, tp.event, tp.method_id] }

trace.enable

yield

trace.disable

end

start_trace { the_method }

This prints the file path, the line number, the event name & the method name.

1 2	["test.rb", 1, :call, :the_method] ["test.rb", 2, :call, :other_method]

If you don’t specify any events Ruby will call your block for all of them, resulting in more output. So I would recommend that you focus on specific events to find what you want faster 🙂

Here’s is a table of TracePoint events:

Event name	Description
call	Application methods
c_call	C-level methods (like puts)
return	Method return (for tracing return values & call depth)
b_call	Block call
b_return	Block return
raise	Exception raised
thread_begin	New thread
thread_end	Thread ending

TracePoint + Graphviz

Many methods will make more than just 3 methods calls, especially in framework code, so the output from Tracepoint can be hard to visualize.

So I made a gem that lets you create a visual call graph like this:

1

2

3

require 'visual_call_graph'

VisualCallGraph.trace { "Your method call here..." }

This generates a call_graph.png file with the results.

ruby call graph

Keep in mind that this is not static analysis, this will actually call the method!

Showing File Paths

Would you like to know where these methods are defined?

Don’t worry, I got you covered! I added an option you can enable to show the file path for each method call.

1	VisualCallGraph.trace(show_path: true) { Foo.aaa }

Which results in:

visual call graph

If you want to see some massive call graphs you just have to trace some Rails methods 😉

Return Values

In the intro I mentioned that you can also get return values. For this you will need to trace the return event and use the return_value method.

Example:

1

2

3

4

5

6

7

def the_method; "A" * 10; end

trace = TracePoint.new(:return) { |tp| puts "Return value for #{tp.method_id} is #{tp.return_value}." }

trace.enable

the_method

trace.disable

This will print:

1	Return value for the_method is AAAAAAAAAA.

Events First

Someone asked on reddit how it’s possible to avoid having the word “bar” printed when calling the foo method in the following code:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

class Thing

def foo

puts "foo"

bar

end

def bar

puts "bar"

end

# your code here

t = Thing.new

t.foo

There are many ways to achieve this, like prepending a module, redirecting $stdout or redefining the bar method.

If you are feeling creative, comment on this post with your own idea!

But I found one of the answers particularly interesting because it used the TracePoint class.

Here it is:

1	TracePoint.trace(:call) { \|tp\| exit if tp.method_id == :bar }

This code will call exit when the method bar is called, which prevents the string from being printed by ending the program.

Probably not something you want to use in real code, but it proves one thing about TracePoint: Events are triggered before they happen.

Something to keep in mind if you are going to build some sort of tool around this 🙂

Summary

In this post you learned about the TracePoint class, which allows you to trace a few events like methods calls or new threads. This can be useful as a debugging tool or for code exploration.

Remember to share this post so more people can enjoy it 🙂

All posts by Jesus Castello

How to Make Your Classes More Powerful by Implementing Equality

Equality Basics

Implementing Equality

What About Triple Equals?

Conclusion

15 Weird Things About Ruby That You Should Know

15 Weird Things About Ruby That You Should Know

Heredoc + Method

Call a Method Using Double Colon

Puts with Multiple Arguments

Infinite Indexing

De-structuring Block Arguments (or whatever you want to call this)

Special Global Variables

Shovel Method on Strings

Character Literals

The RbConfig Module

Spaces, Spaces Everywhere!

Infinite Nesting of Constants

Chaining the Shovel Operator

BEGIN & END

Flip-Flop

Redo Keyword

Conclusion

Hash Tables Explained

Buckets & The Hash Function

Implementing a Simple Hash Function

Bucket Distribution

A Better Hash Function

Handling Collisions

Conclusion

Packing & Unpacking: A Guide to Reading Binary Data in Ruby

String to ASCII Values

Hex to Integer

Binary File Parsing

Base64 Encoding

Summary

How To Spy on Your Ruby Methods

Tracing Method Calls

TracePoint + Graphviz

Showing File Paths

Return Values

Events First

Summary