RubyGuides
Share this post!

Build Your Own Web Server

Have you ever built your own web server? I think this is a great learning exercise & in this post you will learn how to do this, step-by-step!

Listening For Connections

So where do we start? The first thing that we need is to listen for new connections on TCP port 80. I already wrote a post about network programming in Ruby, so I’m not going to explain how that works here.

I’m just going to give you the code:

When you run this code you will have a server that accepts connections on port 80. It doesn’t do much yet, but it will allow you to see what an incoming request looks like.

Note: To use port 80 in a Linux/Mac system you will need root privileges. As an alternative, you can use another port above 1024. I like 8080 🙂

An easy way to generate a request is to just use your browser or something like curl.

When you do that you will see this printed in your server:

This is an HTTP request. HTTP is a plain-text protocol used for communication between web browsers and web servers.

The official protocol specification can be found here: https://tools.ietf.org/html/rfc7230.

Parsing The Request

Now we need to break down the request into smaller components that our server can understand.

To do that we can build our own parser or use one that already exists. We are going to build our own so we need to understand what the different parts of the request mean.

This image should help:

http://i.imgur.com/WEhYtyK.png

The headers are used for things like browser caching, virtual hosting and data compression, but for a basic implementation we can ignore them & still have a functional server.

To build a simple HTTP parser we can take advantage of the fact that the request data is separated via new lines (\r\n). We are not going to do any error or validity checking to keep things simple.

Here is the code I came up with:

This will return a hash with the parsed request data. Now that we have our request in a usable format we can build our response for the client.

Preparing & Sending The Response

To build the response we need to see if the requested resource is available. In other words, we need to check if the file exists.

Here is the code I wrote for doing that:

There are two things happening here. First, if the path is set to / we assume that the file we want is index.html. Second, if the requested file is found, we are going to send the file contents with an OK response.

But if the file is not found then we are going to send the typical 404 Not Found response.

This table contains the most common response codes:

Code Description
200 OK
301 Moved permanently
302 Found
304 Not Modified
400 Bad Request
401 Unauthorized
403 Forbidden
404 Not found
500 Internal Server Error
502 Bad Gateway

Here are the “send” methods that are used in the last example:

And here is the Response class:

The response is built from a template & some string interpolation.

At this point we just need to tie everything together in our connection-accepting loop and then we should have a functional server.

Try adding some HTML files under the SERVER_ROOT directory and you should be able to load them from your browser. This will also serve any other static assets, including images.

Of course a real web-server has many more features that we didn’t cover here.

Here is a list of some of the missing features, so you can implement them on your own as an exercise (practice is the mother of skill!):

  • Virtual hosting
  • Mime types
  • Data compression
  • Access control
  • Multi-threading
  • Request validation
  • Query string parsing
  • POST body parsing
  • Browser caching (response code 304)
  • Redirects

A Lesson on Security

Taking input from a user & doing something with it is always dangerous. In our little web server project, the user input is the HTTP request.

We have introduced a little vulnerability known as “path traversal”. People will be able to read any files that our web server user has access to, even if they are outside of our SERVER_ROOT directory.

This is the line responsible for this issue:

You can try to exploit this issue yourself to see it in action. You will need to make a “manual” HTTP request, because most HTTP clients (including curl) will pre-process your URL and remove the part that triggers the vulnerability.

One tool you can use is called netcat.

Here is a possible exploit:

This will return the contents of the /etc/passwd file if you are on a Unix-based system. The reason this works is because a double dot (..) allows you to go one directory up, so you are “escaping” the SERVER_ROOT directory.

One possible solution is to “compress” multiple dots into one:

When thinking about security always put your “hacker hat” on & try to find ways to break your solution. For example, if you just did path.gsub!("..", "."), you could bypass that by using triple dots (...).

Summary

In this post you learned how to listen for new connections, what an HTTP request looks like & how to parse it. You also learned how to build the response using a response code and the contents of the required file (if available).

And finally you learned about the “path traversal” vulnerability & how to avoid it.

I hope you enjoyed this post & learned something new! Don’t forget to subscribe to my newsletter on the form below, so you won’t miss a single post 🙂