In today’s world almost every website is based on powerful CMS system, where it’s administrator is able to edit website’s main content on the fly. Often this ability drives into an unlikely email address publications, which brings a lot of spam to a website’s owners.
Therefore, to fight against spam I do offer to hide all the email addresses on every website’s page in handsome and clear way. From one side, this would help us to avoid spiders, which catches all the open to public email addresses. But from the other side, this technique would bring our email address to real visitors.
Playing on the server side
As is well known, the greater part of email spiders (robots) surfs every website’s code, looking for email addresses and saves them. That is why I offer to find and replace every email address, which has been loaded as a single text or mailto hyperlink before spider will see it. The fastest way is by using PHP server side scripting.
First of all, we are going to write a function to use all over our website. The main purpose of this function is to find all the published email addresses and replace them into specific, but very useful format (example AT example DOT com). In this function we’ll use PHP Regular expressions to find all the necessary formats and some basic PHP function to rewrite them properly. By this way we hide all our emails on the server side and web spiders, bots, which catches emails, will get nothing at all.
Start to code a function
So, let’s start with the function coding:
// function which replaces mailto links into specific format
function tep_rewrite_email($content) {
// function rules will go here
}
Include the right format email pattern
Inside this function we are going to include some variables, which will help us to do the job. And the first one is regex email pattern, which is written in a specific format to use further. Drop an eye on NetTuts+ Regular Expressions Complete Guide and than move forward to the code below:
// regex email address pattern, format (\\1)@(\\2).(\\3) $email_pattern = '([A-Za-z0-9._%-]+)\@([A-Za-z0-9._%-]+)\.([A-Za-z0-9._%-]+)';
Add standard mailto link pattern
Than we add standart html mailto link pattern, which will match all the mailto links. Moreover, this pattern is flexible, and that is why it will not skip any real link which contains ‘mailto:’ expression:
// pattern for html links: <a href="mailto: example@exmaple.com">Some other text</a> // attributes before and after 'href' do not interfere $mailto_pattern = '#\]*?href=\"mailto:\s?' . $email_pattern . '[^>]*?\>[^>]*?<\/a\>#';
Set up a result you need to get
And the last variable to add is the result, which we finally would like to get. It’s specific (example AT example DOT com), but very usefull to avoid any spider. Have a look:
// rewrtite result $rewrite_result = '\\1 AT \\2 DOT \\3';
Find and replace is simple
When variables are completely added, we are going to use simple PHP functions to find and replace necessary content. I’d like to remeber, that emails can be published in two ways – as a mailto link and as a simple emails address. Therefore we check webpage’s content two times. Let’s see the code below:
// firstly, look for html mailto links and replace them
$content = preg_replace($mailto_pattern, $rewrite_result, $content);
// secondly, find stacionary emails without links and replace them too
$content = preg_replace('#' . $email_pattern . '#', $rewrite_result, $content);
As you see, PHP preg_replace function is looking for our mailto link and email patterns and replace them completely by outputting a necessary result.
I will repeat again, that we need such a specific format (example AT example DOT com) for 2 aims:
- To avoid spiders looking for a fresh emails to catch inside your page’s source code
- To revert it back into clear, valid and visible format for real visitors, not bots
PHP function which is ready to work for you
Now, when every component of a function is ready, let’s group them all:
function tep_rewrite_email($content) {
$email_pattern = '([A-Za-z0-9._%-]+)\@([A-Za-z0-9._%-]+)\.([A-Za-z0-9._%-]+)';
$mailto_pattern = '#\]*?href=\"mailto:\s?' . $email_pattern . '[^>]*?\>[^>]*?<\/a\>#';
$rewrite_result = '\\1 AT \\2 DOT \\3';
$content = preg_replace($mailto_pattern, $rewrite_result, $content);
$content = preg_replace('#' . $email_pattern . '#', $rewrite_result, $content);
// remember to add return here
return $content;
}
Finally, we get a wonderful and necessary result in our source code. The image below show how does the code looks like:

JQuery or what can be simpler?!
When all the single text or mailto hyperlink email addresses have been rewritten to a new format, we are going to turn them back into valid links by using JQuery, which provides client side scripting and works with all the major browsers. I think the best way here would be to use HTML-Advisor method. However, there are different ways to do it.
But first of all, you should know the fastest way how to implement JQuery into your page. Read about 10 Ways to Instantly Increase Your jQuery Performance.
I’d like to note, that JavaScript is a clients side language, therefore it provides us an ability to make code modifications directly in a visitor browser. And our aim here is to workout a simple function, which will find prepared email format and replace it back into a visible and understandable links to the audience.
So, let’s get down to the code by understanding each peace.
Check if the necessary element exists
// check if our pre-made class exists on a page
if ( $("span.mailme").length ) {
// variables and function will be there
}
JQuery code posted above check if there’re any <span class=”mailme”> on the page
Set up variables
var at = / AT /; // pattern for AT letters var dot = / DOT /g; // pattern for DOT letter
Replace them all on-the-fly
Than we code a function, which adds mailto link into our prepared content and replaces AT and DOT letters with ‘@’ and ‘.’ appropriately:
// function, which replaces pre-made class
$('span.mailme').each(function () {
var addr = $(this).text().replace(at, '@').replace(dot, '.');
$(this).after('<a href="mailto:' + addr + '">' + addr + '</a>');
$(this).remove();
});
The final JQuery function
Completely our JQuery code should look like this:
if ( $("span.mailme").length ) {// variables, which will be replaced
var at = / AT /;
var dot = / DOT /g;
// function, which replaces pre-made class
$('span.mailme').each(function () {
var addr = $(this).text().replace(at, '@').replace(dot, '.');
$(this).after('<a href="mailto:' + addr + '">' + addr + '</a>');
$(this).remove();
});
}
As you see we have made some modification to the clients side. As a result every website visitor will see published emails as a normal links, but email spiders won’t have a dinner this night.
Modifications to the output
The last we should make, is to modificate our webpages output. I mean to find out where our CMS outputs each page’s body text (which may consist email addresses) and replace it with our premade PHP function:
// find a string which outputs every page body text from your database $string = 'Our company is based in London and we bring strong metallic structures to the world. Our experience are wide and stable. To get more information about products we offer, contact sales department by email: <a href="mailto: sales@company.com" title="Sales">sales@company.com</a>. And to offer sponsorship email directly to Lisa: lisa@company.com'; $string = tep_rewrite_email($string);
What we get is what we have made
All in all, we get a quite simple result, which will be never cathed by email spiders. And our real website’s audience would see all the published email addresses as normal. By implementing this technique into each new page you are going to build, you will stay sure, that email addresses are hiden and spam is in the past.
I see this technique very useful for every website, which is updating by persons who don’t know how to keep an email address safe. It goes very well with different CMS systems and can be implemented in a simple way.
The “free” distribution of unwelcome or misleading messages to thousands of people is an annoying and sometimes destructive use of the Internet’s unprecedented efficiency.
Bill Gates, New York Times, 1998
28 January, 2010
[...] Read the original here: The best way to hide email address from spyders and bots | Manakor … [...]
28 January, 2010
[...] reading here: The best way to hide email address from spyders and bots | Manakor … [...]
30 January, 2010
[...] This post was mentioned on Twitter by nettuts: The best way to hide an email address from bots: http://www.manakor.org/the-bes.....om-spyders [...]
31 January, 2010
[...] The best way to hide email address from spyders and bots | Manakor Coding Map™ [...]
19 February, 2010
[...] reading here: The best way to hide email address from spyders and bots | Manakor Coding Map™ Tags: cloak, hide, spam, [...]