The Hyperlink Trailing Slash

glider_clarince63

The Hyperlink Trailing Slash

If you’ve spent any considerable amount of time running a web server you may have heard that adding a trailing slash to your site’s hyperlinks speeds up how quickly the pages will come up when clicked. So:

https://danielmiessler.com/study (slower)

vs.

https://danielmiessler.com/study/ (faster)

Some consider this gospel, others think it’s crap. I’ve never taken a position because I was too lazy to test it myself. But no more. I’ve done the testing and I now have a well-informed position.

For directories, it absolutely does matter. A lot.

Proof is in the Logs

This is best shown in the following way:

1.1.1.1 - - [15/Jun/2008:12:34:25 -0000] "GET /study HTTP/1.1" 301 298
1.1.1.1 - - [15/Jun/2008:12:34:26 -0000] "GET /study/ HTTP/1.1" 200 22029

The first log entry was me asking for danielmiessler.com/studywithout the trailing slash. Notice the HTTP code returned: a 301, which is a redirect. The second line is the result of the second response that apache forced my browser to make. In short, I ended up making double the GET requests because I asked for the “study” resource without the trailing slash.

Some may say, “So what? It was a tiny little request and redirect; no harm done, right? Wrong.

That redirect actually creates a ton of overhead at the TCP/IP level. How much? Well, if you make the request using study/ (the proper way) your fourth (4) packet is the actual proper request. But if you make the request for study by itself you don’t end up at study/ until the fifteenth (15) packet!

Unsupervised Learning — Security, Tech, and AI in 10 minutes…

Get a weekly breakdown of what's happening in security and tech—and why it matters.

Correct

  1. Client SYNs

  2. Server SYN ACKs

  3. Client ACKs

  4. Client makes proper request for study/

Suboptimal

  1. Client SYNs

  2. Server SYN-ACKs

  3. Client ACKs

  4. Improper request for study

  5. Server ACKs improper request

  6. Server PUSHes redirect

  7. Server FINS ACKs previous connection

  8. Client ACKs

  9. Client ACKs

  10. Client FIN ACKs

  11. Server ACKs

  12. Client SYNs

  13. Sever SYN ACKs

  14. Client ACKs

  15. Client makes proper request for study/

Sure, maybe for most sites it’s pretty minor — especially if you’re not getting much traffic, but it’s just inefficient. If you can help it (which you can) why not having your pages come back as a result of only one GET request instead of two? And why not use only four packets instead of fifteen?

The Explanation

Let’s take note of the reason for this behavior. When you ask apache for “foo”, apache basically says, “Huh? What the hell does ‘foo’ mean?” Then it has to go and start taking guesses at what you meant. And that’s what it does when it sends you off to foo/: it’s taking a guess that foo is a directory that you’re trying to reach. That takes time and energy, both at the HTTP and the TCP/IP level.

Other Types of Content

Remember that we’re not always requesting directory resources. So, danielmiessler.com is a directory and should have a trailing slash appended, but my blog articles are not. My WordPress URL structure, for example, looks like this:

https://danielmiessler.com/blog/socialism-anarchy-and-ideal-government

Notice the lack of trailing slash. And if you add one, guess what?

"GET /blog/socialism-anarchy-and-ideal-government/ HTTP/1.1" 301 -
"GET /socialism-anarchy-and-ideal-government HTTP/1.1" 200 27627

The opposite happens! It redirects you from the URL with the slash to the URL without it.

So the theme here is simple: know what types of resources you’re linking to when you hyperlink and build your links accordingly. If it’s a directory, be sure to use a trailing slash. And if it’s a WordPress blog URL or a direct file (such as foo/index.php) leave the slash off.

As always, ping me if I missed something.:

Related posts: