Link Sanitation

Sometimes when working on a client’s site, you might be tasked with trying to capture all out bound links as what we call “external links” and display a confirm message to the user before letting them leave the site.

Sounds simple enough right? Well, in a perfect world yes it should be, but more times than not you will be dealing with a whole host of ways page links have been authored.

Take this made up domain for instance:
http://www.agreatdomain.com

On any given page, our site could be internal linking over the following ways:
http://www.agreatdomain.com/test-page.html
//www.agreatdomain.com/test-page.html
/test-page.html
http://agreatdomain.com/test-page.html
//agreatdomain.com/test-page.html
agreatdomain.com/test-page.html

Now, lets throw some external links into the mix and make it real fun!:
http://www.agreatdomain.com/test-page.html
www.google.com
//www.agreatdomain.com/test-page.html
/test-page.html
http://www.google.com
http://agreatdomain.com/test-page.html
//agreatdomain.com/test-page.html
//www.google.com

Now go ahead and try to identify what links are external links and what links are internal links with JS. Obviously the first thing we would want to condition for is said link(s) contain our domain or not. However you might quickly realized that not all links on a given page will always contain our domain to condition off of; so whats a developer to do? In steps link sanitation to save the day!

If you simply do the following at the inital load of your pages:

var links = document.querySelectorAll('a');
var i = links.length;
while(i--){
    links[i].href = links[i].href;
}

Low and behold all links will be turned into links with their full fledge domain included.
http://www.agreatdomain.com/test-page.html = http://www.agreatdomain.com/test-page.html
www.google.com = http://www.google.com
//www.agreatdomain.com/test-page.html = http://www.agreatdomain.com/test-page.html
/test-page.html = http://www.agreatdomain.com/test-page.html
http://www.google.com = http://www.google.com
http://agreatdomain.com/test-page.html = http://www.agreatdomain.com/test-page.html
//agreatdomain.com/test-page.html = http://www.agreatdomain.com/test-page.html
//www.google.com = http://www.google.com

So now when you try to condition for links that do not contain our sites domain, you should be able to obtain a true representation of what link is external or not.

var links = document.querySelectorAll('a');
var i = links.length;
while(i--){
    links[i].href = links[i].href;
    if(!links[i].href.match(window.location.hostname)){
       links[i].className += 'external-link';
    }
}

Now I can hear the senior developers in the back of the room screaming “Why not use hostname”, and I agree all this is moot if you truly trust the link’s hostname property.

var links = document.querySelectorAll('a');
var i = links.length;
while(i--){
    if(links[i].hostname !== window.location.hostname){
       links[i].className += 'external-link';
    }
}

You don’t need to link sanitize anything in this last circumstance. However if don’t trust hostname prop, this sanitize method works just as well.

So to conclude, if you sanitize your links, you magically get another option to detecting external links off the href attribute, however might not be necessary if you can trust link’s hostname property instead.

To each their own, but options are nice no?
Enjoy!
Devin R. Olsen

Devin R. Olsen

Devin R. Olsen

Located in Portland Oregon. I like to teach, share and dabble deep into the digital dark arts of web and game development.

More Posts

Follow Me:TwitterFacebookGoogle Plus