Life of a web request

web-request

Every time you click on a link or directly type an address, a number of steps have to happen for the site to show on your screen. It may be a bit intimidating because multiple protocols and servers are involved in the process. In reality it’s not that hard, so please follow me and let’s dive in!

It all starts with one packet

The first thing your browser needs to do when you click that link is resolving the domain name to an ip address. It does this by using the Domain Name System (DNS). This protocol will allow the browser to get the server ip address. DNS works via the UDP transport protocol, but TCP is also an option.

dns-request

Most operating systems have tools to manually resolve a host name. Here is an example using the host command in Linux.

host-resolution

In wireshark you can use the ‘dns’ display filter to find dns request.

The initial connection

Once the browser has the ip address it can open a connection against the web server. To do this it uses the TCP/IP protocol and by default it will connect to port 80. A TCP/IP request is started using the “three-way handshake”.

syn-ack

This is a sequence of three TCP packets that carry the following flags: SYN / SYN-ACK / ACK. We can see this in wireshark using the ‘tcp’ display filter.

three-way-handshake

The http request

The next step in our journey is to tell the web server what page we want. The browser will have to prepare an http get request asking for that page. This is what a minimal http request looks like:

In fact, you can try this at home if you have netcat or a similar program, you can just copy and paste this request. Hit enter twice and you will get a bunch of HTML.

http request

Of course, there is an easier way, as having to build a http request manually every time can be a bit tiresome. You can use the curl tool to make a http request for you, curl has  some useful options to help you, for example with -I it will just grab the HTTP headers and show them to you. This can be very helpful in a troubleshooting scenario.

http request using curl

You can use the ‘http’ display filter in wireshark to show only http requests. There are also a number of tools that act as a proxy and allow you to see and even modify request as they are being made by your browser. Examples of this are fiddler and burp proxy.

Redirected

If the http response code is not ‘200 OK’ it means there was some problem with the request. The code can help us and the browser determine what’s wrong. In the case of a 301/302 it means the resource is somewhere else. The browser will have to send a new request, but this time for the resource indicated by the ‘Location’ header. Thankfully it won’t need to restart the whole process again if Keep Alive is enabled.

The last steps

There are a few more steps that the browser doesn’t have to worry about, for example, there might be a load balancer between the actual web servers and the server you are connecting to. Once the response is received the browser will start parsing the HTML and opening up a few more connections. Modern browsers can open up to 6 connections to download assets (like images or css files) in parallel.

Leave a Reply