Share this post!

Building Your Own Linux Tools with Ruby: A Practical Guide

7 months ago /
By Jesus Castello /
11 COMMENTS

Tools like ps, top & netstat are great, they give you a lot of information about what’s going with your system.

But how do they work? Where do they get all their information from?

In this post we will recreate three popular Linux tools together. You are going to get a 2×1 meal, learn Ruby & Linux at the same time! 🙂

Finding Status Information

So let’s try answering the question of where all these tools find their info. The answer is in the /proc filesystem.

If you look inside the /proc directory it will look like a bunch of directories & files, just like any other directory on your computer. But the thing is that these aren’t real files, it’s just a way for the Linux kernel to expose data to users.

It’s very convenient because they can be treated like normal files, which means that you can read them without any special tools. In the Linux world a lot of things work like this, if you want to see another example take a look at the /dev directory.

Now that we understand what we are dealing with, let’s take a look at the contents of the /proc directory…

1

2

3

4

5

6

7

8

9

10

1

10

104

105

11

11015

11469

11474

11552

11655

This is just a small sample, but you can quickly notice a pattern. What are all those numbers? Well, it turns out these are PIDs (Process IDs). Every entry contains info about a specific process.

If you run ps you can see how every process has a PID associated with it:

1

2

3

PID TTY TIME CMD

15952 pts/5 00:00:00 ps

22698 pts/5 00:00:01 bash

From this we can deduce that what ps does is just iterate over the /proc directory & print the info it finds.

Let’s see what is inside one of those numbered directories:

1

2

3

4

5

6

7

8

9

10

11

12

attr

autogroup

auxv

cgroup

clear_refs

cmdline

comm

cpuset

cwd

environ

exe

fd

That’s just a sample to save space, but I encourage you to take a look at the full list.

Here are some important / interesting entries:

Entry	Description
comm	Name of the program
cmdline	Command used to launch this process
environ	Environment variables that this process was started with
status	Process status (running, sleeping…) & memory usage
fd	Directory that contains file descriptors (open files, sockets…)

Now that we know this we should be able to start writing some tools!

Process Listing

Let’s start by just getting a list of all the directories under /proc. We can do this using the Dir class.

Example:

1	Dir.glob("/proc/[0-9]*")

Notice how I used a number range, the reason is that there are other files under /proc that we don’t care about right now, we only want the numbered directories.

Now we can iterate over this list and print two columns, one with the PID & another with the program name.

Example:

1

2

3

4

5

6

7

8

9

10

11

pids = Dir.glob("/proc/[0-9]*")

puts "PID\tCMD"

puts "-" * 15

pids.each do |pid|

cmd = File.read(pid + "/comm")

pid = pid.scan(/\d+/).first

puts "#{pid}\t#{cmd}"

end

And this is the output:

1

2

3

4

5

6

7

8

9

10

PID CMD

---------------

1 systemd

2 kthreadd

3 ksoftirqd/0

5 kworker/0

7 migration/0

8 rcu_preempt

9 rcu_bh

10 rcu_sched

Hey, it looks like we just made ps! Yeah, it doesn’t support all the fancy options from the original, but we made something work.

Who Is Listening?

Let’s try to replicate netstat now, this is what the output looks like (with -ant as flags).

1

2

3

4

5

Active Internet connections (servers and established)

Proto Recv-Q Send-Q Local Address Foreign Address State

tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN

tcp 0 0 192.168.1.82:39530 182.14.172.159:22 ESTABLISHED

Where can we find this information? If you said “inside /proc” you’re right! To be more specific you can find it in /proc/net/tcp.

But there is a little problem, this doesn’t look anything like the netstat output!

1 2	0: 0100007F:1538 00000000:0000 0A 00000000:00000000 00:00000000 00000000 1001 0 9216 1: 2E58A8C0:9A6A 9FBB0EB9:0016 01 00000000:00000000 00:00000000 00000000 1000 0 258603

What this means is that we need to do some parsing with regular expressions. For now let’s just worry about the local address & the status.

Here is the regex I came up with:

1	\s+\d+: (?<local_addr>\w+):(?<local_port>\w+) \w+:\w+ (?<status>\w+)

This will give us some hexadecimal values that we need to convert into decimal. Let’s create a class that will do this for us.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

class TCPInfo

LINE_REGEX = /\s+\d+: (?<local_addr>\w+):(?<local_port>\w+) \w+:\w+ (?<status>\w+)/

def initialize(line)

@data = parse(line)

end

def parse(line)

line.match(LINE_REGEX)

end

def local_port

@data["local_port"].to_i(16)

end

# Convert hex to regular IP notation

def local_addr

decimal_to_ip(@data["local_addr"].to_i(16))

end

STATUSES = {

"0A" => "LISTENING",

"01" => "ESTABLISHED",

"06" => "TIME_WAIT",

"08" => "CLOSE_WAIT"

}

def status

code = @data["status"]

STATUSES.fetch(code, "UNKNOWN")

end

# Don't worry too much about this :)

def decimal_to_ip(decimal)

ip = []

ip << (decimal >> 24 & 0xFF)

ip << (decimal >> 16 & 0xFF)

ip << (decimal >> 8 & 0xFF)

ip << (decimal & 0xFF)

ip.join(".")

end

The only thing left is to print the results in a pretty table format.

1

2

3

require 'table_print'

tp connections

Example output:

1

2

3

4

STATUS | LOCAL_PORT | LOCAL_ADDR

------------|------------|--------------

LISTENING | 5432 | 127.0.0.1

ESTABLISHED | 39530 | 192.168.88.46

Yes, this gem is awesome!

I just found about it & looks like I won’t have to fumble around with ljust / rjust again 🙂

Stop Using My Port!

Have you ever seen this message?

1	Address already in use - bind(2) for "localhost" port 5000

Umm… I wonder what is using that port…

1

2

3

4

fuser -n tcp -v 5000

PORT USER PID ACCESS CMD

5000/tcp: blackbytes 30893 F.... nc

Ah, so there is our culprit! Now we can stop this program if we don’t want it to be running & that will free our port. How did the fuser program find out who was using this port?

You guessed it! The /proc filesystem again.

In fact, it combines two things we have covered already: walking through the process list & reading active connections from /proc/net/tcp.

We just need one extra step:
Find a way to match the open port info with the PID.

If we look at the TCP data that we can get from /proc/net/tcp, the PID is not there. But we can use the inode number.

“An inode is a data structure used to represent a filesystem object.” – Wikipedia

How can we use the inode to find the matching process? If we look under the fd directory of a process that we know has an open port, we will find a line like this:

1	/proc/3295/fd/5 -> socket:[12345]

The number between brackets is the inode number. So now all we have to do is iterate over all the files & we will find the matching process.

Here is one way to do that:

1

2

3

4

5

6

7

8

9

x =

Dir.glob("/proc/[0-9]*/fd/*").find do |fd|

File.readlink(fd).include? "socket:[#{socket_inode}]" rescue nil

end

pid = x.scan(/\d+/).first

name = File.readlink("/proc/#{pid}/exe")

puts "Port #{hex_port.to_i(16)} in use by #{name} (#{pid})"

Example output:

1	Port 5432 in use by /usr/bin/postgres (474)

Please note that you will need to run this code as root or as the process owner. Otherwise you won’t be able to read the process details inside /proc.

Conclusion

In this post you learned that Linux exposes a lot of data via the virtual /proc filesystem. You also learned how to recreate popular Linux tools like ps, netstat & fuser by using the data under /proc.

Don’t forget to subscribe to the newsletter below so you don’t miss the next post (and get some free gifts I prepared for you) 🙂

11 comments

jasonjoeldata says 7 months ago

Interesting post. I’ll see if I can apply it to Mac OS

Jesus Castello says 7 months ago

Thanks for reading 🙂

june says 7 months ago

Very instructive!

Jesus Castello says 7 months ago

Thank you!

thisisole says 7 months ago

Very nice article. Thank you

Jesus Castello says 7 months ago

Thanks for reading! 🙂

Kelvin says 7 months ago

added to my favortites 😉

Jesus Castello says 7 months ago

I’m glad you liked it 🙂

Andy says 7 months ago

WOW – how did I not know this?

Antony says 6 months ago

Hi! Good article! Can I translate this to russian and post in my blog – doam.ru, with link to original article of course?

Jesus Castello says 6 months ago

Yes, you can do that 🙂

Building Your Own Linux Tools with Ruby: A Practical Guide

Finding Status Information

Process Listing

Who Is Listening?

Stop Using My Port!

Conclusion

11 comments

Comments are closed

Popular posts

11 Ruby Tricks You Haven’t Seen Before

9 New Features in Ruby 2.4

Building Your Own Linux Tools with Ruby: A Practical Guide

The Ultimate Guide to Blocks, Procs & Lambdas