- Unsupervised Learning
- Posts
- The Hyperlink Trailing Slash
The Hyperlink Trailing Slash
The Hyperlink Trailing Slash
If you’ve spent any considerable amount of time running a web server you may have heard that adding a trailing slash to your site’s hyperlinks speeds up how quickly the pages will come up when clicked. So:
https://danielmiessler.com/study (slower)
vs.
https://danielmiessler.com/study/ (faster)
Some consider this gospel, others think it’s crap. I’ve never taken a position because I was too lazy to test it myself. But no more. I’ve done the testing and I now have a well-informed position.
For directories, it absolutely does matter. A lot.
Proof is in the Logs
This is best shown in the following way:
1.1.1.1 - - [15/Jun/2008:12:34:25 -0000] "GET /study HTTP/1.1" 301 298 1.1.1.1 - - [15/Jun/2008:12:34:26 -0000] "GET /study/ HTTP/1.1" 200 22029
The first log entry was me asking for danielmiessler.com/study — without the trailing slash. Notice the HTTP code returned: a 301, which is a redirect. The second line is the result of the second response that apache forced my browser to make. In short, I ended up making double the GET requests because I asked for the “study” resource without the trailing slash.
Some may say, “So what? It was a tiny little request and redirect; no harm done, right? Wrong.
That redirect actually creates a ton of overhead at the TCP/IP level. How much? Well, if you make the request using study/ (the proper way) your fourth (4) packet is the actual proper request. But if you make the request for study by itself you don’t end up at study/ until the fifteenth (15) packet!
Unsupervised Learning — Security, Tech, and AI in 10 minutes…
Get a weekly breakdown of what's happening in security and tech—and why it matters.
Correct
Client SYNs
Server SYN ACKs
Client ACKs
Client makes proper request for study/
Suboptimal
Client SYNs
Server SYN-ACKs
Client ACKs
Improper request for study
Server ACKs improper request
Server PUSHes redirect
Server FINS ACKs previous connection
Client ACKs
Client ACKs
Client FIN ACKs
Server ACKs
Client SYNs
Sever SYN ACKs
Client ACKs
Client makes proper request for study/
Sure, maybe for most sites it’s pretty minor — especially if you’re not getting much traffic, but it’s just inefficient. If you can help it (which you can) why not having your pages come back as a result of only one GET request instead of two? And why not use only four packets instead of fifteen?
The Explanation
Let’s take note of the reason for this behavior. When you ask apache for “foo”, apache basically says, “Huh? What the hell does ‘foo’ mean?” Then it has to go and start taking guesses at what you meant. And that’s what it does when it sends you off to foo/: it’s taking a guess that foo is a directory that you’re trying to reach. That takes time and energy, both at the HTTP and the TCP/IP level.
Other Types of Content
Remember that we’re not always requesting directory resources. So, danielmiessler.com is a directory and should have a trailing slash appended, but my blog articles are not. My WordPress URL structure, for example, looks like this:
https://danielmiessler.com/blog/socialism-anarchy-and-ideal-government
Notice the lack of trailing slash. And if you add one, guess what?
"GET /blog/socialism-anarchy-and-ideal-government/ HTTP/1.1" 301 - "GET /socialism-anarchy-and-ideal-government HTTP/1.1" 200 27627
The opposite happens! It redirects you from the URL with the slash to the URL without it.
So the theme here is simple: know what types of resources you’re linking to when you hyperlink and build your links accordingly. If it’s a directory, be sure to use a trailing slash. And if it’s a WordPress blog URL or a direct file (such as foo/index.php) leave the slash off.
As always, ping me if I missed something.: