Regular Expressions in Ruby

Ruby regular expressions (regex for short) let us find specific patterns in a chunk of data, with the intent of either extracting that data for futher processing or validating it. For example, think about an email address, with regular expressions we can define what a valid email address looks like. That will make our program able to differentiate a valid email address from an invalid one.

Regular expressions are defined between two forward slashes, to differentiate them from other language syntax. The most simple expressions just match a word or even a single letter, for example:

1 2	# Find the word 'like' "Do you like cats?" =~ /like/

This will return the index of the first occurrence of the word if it was found or nil otherwise. If we don’t care about the index we could just use the String#include? method.

Character Classes

A character class lets you define either a range or a list of characters to match. For example, [aeiou] matches any vowel.

Example: Does the string contain a vowel?

def contains_vowel(str)

str =~ /[aeiou]/

end

contains_vowel("test") # returns 1

contains_vowel("sky") # returns nil

This will not take into account the amount of characters, we will see how to do that soon.

Ranges

We can use ranges to match multiple letters or numbers without having to type them all out. In other words, a range like [2-5] is equivalent to [2345].

Some useful ranges:

[0-9] matches any number from 0 to 9
[a-z] matches any letter from a to z (no caps)
[^a-z] negated range

Example: Does this string contain any numbers?

def contains_number(str)

str =~ /[0-9]/

end

contains_number("The year is 2015") # returns 12

contains_number("The cat is black") # returns nil

Remember: the return value when using =~ is either the string index or nil

There is a nice shorthand syntax for specifying character ranges:

\w is equivalent to [0-9a-zA-Z_]
\d is the same as [0-9]
\s matches white space (tabs, regular space, newline)

There is also the negative form of these:

\W anything that’s not in [0-9a-zA-Z_]
\D anything that’s not a number
\S anything that’s not a space

There is also a special ‘.’ character which matches everything but new lines. If you need to use a literal ‘.’ then you will have to escape it.

Example: Escaping special characters

# If we don't escape, the letter will match

"5a5".match(/\d.\d/)

# In this case only the literal dot matches

"5a5".match(/\d\.\d/) # nil

"5.5".match(/\d\.\d/) # match

Modifiers

Up until now we have only been able to match a single character at a time. To match multiple characters we can use pattern modifiers.

Modifier	Description
+	1 or more
*	0 or more
?	0 or 1
{3,5}	between 3 and 5

We can combine everything we learned so far to create more complex regular expressions.

Example: Does this look like an IP address?

# Note that this will also match some invalid IP address

# like 999.999.999.999, but in this case we just care about the format.

def ip_address?(str)

# We use !! to convert the return value to a boolean

!!(str =~ /^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/)

end

ip_address?("192.168.1.1") # returns true

ip_address?("0000.0000") # returns false

Exact Matching

If you need exact matches you will need another type of modifier. Let’s see an example so you can see what I’m talking about:

# We want to find if this string is exactly five letters long, this will

# still match because it has at least five, but it's not what we want.

"Regexp are cool".match /\w{5}/

# Instead we will use the 'beginning of line' and 'end of line' modifiers

"Regexp are cool".match /^\w{5}$/

# This time it won't match. This is a rather contrived example, since we could just

# have used .size to compare, but I think it gets the idea across.

If you want to match strictly at the start of a string and not just on every line (after a ‘\n’) you need to use ‘\A’ and ‘\Z’ instead of ‘^’ and ‘$’.

Capture Groups

With capture groups, we can capture part of a match and reuse it later. To capture a match we enclose the part we want to capture from the regular expression in parenthesis.

Example: Parsing a log file

Line = Struct.new(:time, :type, :msg)

LOG_FORMAT = /(\d{2}:\d{2}) (\w+) (.*)/

def parse_line(line)

line.match(LOG_FORMAT) { |m| Line.new(*m.captures) }

end

parse_line("12:41 INFO User has logged in.")

# This produces objects like this: <struct Line time="12:41", type="INFO", msg="User has logged in.">

In this example, we are using .match instead of =~. This method returns a MatchData object if there is a match, nil otherwise. MatchData has many useful methods, check out the documentation!

You can access the captured data using the .captures method or treating the MatchData object like an array, the zero index will have the full match and consequent indexes will contain the matched groups.

We can also have non-capturing groups. They will let us group expressions together without a performance penalty. You may also find named groups useful for making complex expressions easier to read.

(?:…) – Non-capturing group
(?…) – Named groups

Example: Named Groups

m = "David 30".match /(?<name>\w+) (?<age>\d+)/

m[:age]

# => "30"

m[:name]

# => "David"

Look ahead / Look behind

This is a more advanced technique that might not be available in all regex implementations. Ruby’s regular expression engine is able to do this, so let’s see how take advantage of that.

Look ahead lets us peek and see if there is a specific match before or after.

Name	Description
(?=pat)	Positive lookahead
(?<=pat)	Positive lookbehind
(?!pat)	Negative lookahead
(?<!pat)	Negative lookbehind

Example: is there a number preceded by at least one letter?

def number_after_word?(str)

!!(str =~ /(?<=\w) (\d+)/)

end

number_after_word?("Grade 99")

Ruby’s Regex Class

In Ruby, regular expressions are an instance of the Regex class. Most of the time you won’t be using this class directly, but it is good to know

1 2	puts /a/.class p Regex.new("a")

Formatting long regular expressions

Complex regular expressions can get pretty hard to read, so it would be helpful if we broke them into multiple lines. We can accomplish this by using the ‘x’ modifier. This format also allows us to use comments.

Example:

LOG_FORMAT = %r{

(\d{2}:\d{2}) # Time

\s(\w+) # Event type

\s(.*) # Message

Ruby regex: Putting It All Together

Regular expressions can be used with many Ruby methods.

.split
.scan
.gsub
and many more…

Example: Get all words from a string using .scan

1 2	"this is some string".scan(/\w+/) # => ["this", "is", "some", "string"]

Example: Capitalize all words in a string

1	str.gsub(/\w+/) { \|w\| w.capitalize }

Conclusion

Regular expressions are amazing but sometimes they can be a bit tricky. Using a tool like rubular.com can help you build your ruby regex in a more interactive way, it also includes a ruby regular expression cheatsheet that you will find very useful. Now it’s your turn to crack open that editor and start coding.

Oh, and don’t forget to share this with your friends if you enjoyed it, so more people can learn

4 comments

Skip to comment form

- Bernardo on June 22, 2015 at 6:23 pm
Hi, nice post! It’s worth pointing out, I think, that you can also get named capture groups with the =~ notation:

1
2
3
4
5
6
7

/(?<name>\w+) (?<age>\d+)/ =~ "David 30"

name
=> "David"

age
=> "30"

As far as I know it only works this way around, and not this way:

1

"David 30" =~ /(?<name>\w+) (?<age>\d+)/
- echristopherson on June 29, 2015 at 7:24 pm
Bernardo, some characters got stripped out in your comment, and the quotes got changed to curly quotes. It should be

1

/(?<name>\w+) (?<age>\d+)/ =~ "David 30"
1. Jesus Castello on June 29, 2015 at 7:43 pm
  
  Author
  
  Hey, thanks for you comment. I edited it so it should look right now
- jaki on July 13, 2015 at 10:17 am
It’s always better to make your regular expression more specific e.g. searching for a word at the start of line and searching for a word in the line can make big performance difference, if majority of traffic is for unmatched input.

Black Bytes

Mastering Ruby Regular Expressions

Character Classes

Ranges

Modifiers

Exact Matching

Capture Groups

Look ahead / Look behind

Ruby’s Regex Class

Formatting long regular expressions

Ruby regex: Putting It All Together

Conclusion

4 comments

Leave a Reply Cancel reply

SUBSCRIBE NOW

Top Posts & Pages

Black Bytes

Mastering Ruby Regular Expressions

Character Classes

Ranges

Modifiers

Exact Matching

Capture Groups

Look ahead / Look behind

Ruby’s Regex Class

Formatting long regular expressions

Ruby regex: Putting It All Together

Conclusion

Share this!

4 comments

Leave a Reply Cancel reply

SUBSCRIBE NOW

Top Posts & Pages