Black Bytes
Share this post!

Category Archives for Programming

Ruby: fine grained sorting

Ruby sorting is really easy, lets see some examples. If we had this array: ["abc", "aaa", "add", "bcc", "baa"] sorting normally we would get:

Let’s say we wanted to sort by the second letter, we could do this using the sort_by method:

Now lets see a more complex example, if we wanted to sort an email list:

First by the host name and then by the user name we can use the sort_by method like this:

We pass a block to sort_by with the ‘rules’ we want to sort by, we are using a regular expression to express how we want to sort, they are /@./ which matches everything after the at sign and /.@/ which matches everything before.

Finally we can apply the same idea to uniq so that we can get unique data based on a pattern, I use this in Dirfuzz to filter the results and avoid duplicates when I have duplicate results that aren’t exactly the same.

This regexp will allow me to get rid of duplicates with this data:

This would stay the same with a simple uniq, but passing a block with that regexp will get rid of the duplicates.

Ruby vs Java – Strings

Let’s see how we can do some basic operations with strings with 2 languages, starting with how we declare a string variable. This is Ruby vs Java!

As you may know Java is a strong typed language, which means you need to declare the variable type, Ruby infers the type from the contents.

– Length

This one is almost identical, in fact you could use () with Ruby but it’s not required so we leave it out.

– Obtaining individual characters

You can use a Ruby string like an array, in Java you will need to use the charAt method.

– Comparing

Notice how you can’t use == for comparing strings in Java.

– Replacing

These will only replace the word ‘strings’ once, if you wanted to do it for all the repetitions of the word you need to use gsub and replaceAll.

It’s important to remember that Java strings are immutable, meaning that they can’t be modified in place, but you can assign the return value to the same variable, which internally will create a new variable and assign the new value.

On Ruby while they are mutable but you still need to assign the output of sub/gsub because these methods don’t change the string, some methods in ruby have a variant that does change the variable, these usually end with ! (an exclamation mark) like sort! and uniq! in this case we could use sub!/gsub!

And finally here are the links for the documentation for the String class for Ruby and Java:

http://ruby-doc.org/core-1.9.3/String.html

http://docs.oracle.com/javase/6/docs/api/java/lang/String.html

Parsing HTML in Ruby

If you ever tried to write a scrapping tool you probably had to deal with parsing html. This task can be a bit difficult if you don’t have the right tools. Ruby has this wonderful library called Nokogiri, which makes html parsing a walk in the park. Let’s see some examples.

First install the nokogiri gem with:  gem install nokogiri

Extracting the title

Then create the following script, which contains a basic HTML snippet that will be parsed by nokogiri. The output will the page title.

Extracting anchor links

So that was pretty easy, wasn’t it? Well, it’s doesn’t get much harder than that. For example, if we want all the links from a page we need to use the xpath method on the object we get back from nokogiri, then we can print the indvidual attributes of the tag or the text inside the tags:

And that’s it, as you may have already guessed the xpath method uses the Xpath query language, for more info on xpath check out this link. You can also use CSS selectors, replace the xpath method with the css method.

Example:

Note: The difference between at_css & css is that the first one only returns the first matched element, but the latter returns ALL matched elements.

To find the correct css selector can use your browser’s developer tools.

Nokogiri documentation: http://www.rubydoc.info/github/sparklemotion/nokogiri

You might also like:
Ruby string format

Ruby String Formatting

Let’s talk about how you can format strings in ruby.

Why would you want to format a string? Well, you may want to do things like have a leading zero even if the number is under 10 (example: 01, 02, 03…), or have some console output nicely formatted in columns.

In other languages you can use the printf function to format strings, and if you have ever used C you are probably familiar with that. To use printf you have to define a list of format specifiers and a list of variables or values.

Getting Started with Ruby String Formatting

While sprintf is also available in Ruby, in this post we will use a more idiomatic way (for some reason the community style guide doesn’t seem to agree on this, but I think that’s ok).

Here is an example:

Output => "Processing of the data has finished in 5 seconds"

In this example, %d is the format specifier (here is a list of available specifiers) and time is the variable we want formatted. A %d format will give us whole numbers only.

If we want to display floating point numbers we need to use %f. We can specify the number of decimal places we want like this: %0.2f.

The 2 here indicates that we want to keep only two decimal places.

Here is an example:

Output => The average is 78.54

Remember that the number will be rounded up. For example, if I used 78.549 in the last example, it would have printed 78.55.

Converting and Padding

You can convert a decimal number and print it as hexadecimal. Using the %x format:

Output => 122 in HEX is 7a

To pad a string:

Use this format for padding a number with as many 0’s as you want: %0<number of zeros>d

Output => The number is 0020

You can also use this ruby string format trick to create aligned columns of text. Replace the 0 with a dash to get this effect:

ruby string format

Alternatively, you can use the .ljust and .rjust methods from the String class to do the same.

Example:

Conclusion

As you have seen ruby & rails string formatting is really easy, it all comes down to understanding the different format specifiers available to you.

I hope you enjoyed this fast trip into the world of output formatting! Don’t forget to subscribe to my newsletter so I can send you more great content 🙂

1 10 11 12