...rollover me and I'll toogle for you 

You are surfing Manakor Coding Map TMHope you are keen on this useful tips & tricks. Thank you for staying in touch.
Just continue to collaborate and enjoy!

Portfolio of Nikita SumeikoProfessional Front End Web Developer

The best way to hide email address from spyders and bots

In today’s world almost every website is based on powerful CMS system, where it’s administrator is able to edit website’s main content on the fly. Often this ability drives into an unlikely email address publications, which brings a lot of spam to a website’s owners.
Therefore, to fight against spam I do offer to hide all the email addresses on every website’s page in handsome and clear way. From one side, this would help us to avoid spiders, which catches all the open to public email addresses. But from the other side, this technique would bring our email address to real visitors.

Playing on the server side

As is well known, the greater part of email spiders (robots) surfs every website’s code, looking for email addresses and saves them. That is why I offer to find and replace every email address, which has been loaded as a single text or mailto hyperlink before spider will see it. The fastest way is by using PHP server side scripting.

First of all, we are going to write a function to use all over our website. The main purpose of this function is to find all the published email addresses and replace them into specific, but very useful format (example AT example DOT com). In this function we’ll use PHP Regular expressions to find all the necessary formats and some basic PHP function to rewrite them properly. By this way we hide all our emails on the server side and web spiders, bots, which catches emails, will get nothing at all.

Start to code a function

So, let’s start with the function coding:

// function which replaces mailto links into specific format
function tep_rewrite_email($content) {

  // function rules will go here

}
Include the right format email pattern

Inside this function we are going to include some variables, which will help us to do the job. And the first one is regex email pattern, which is written in a specific format to use further. Drop an eye on NetTuts+ Regular Expressions Complete Guide and than move forward to the code below:

// regex email address pattern, format (\\1)@(\\2).(\\3)
$email_patt = '([A-Za-z0-9._%-]+)\@([A-Za-z0-9._%-]+)\.([A-Za-z0-9._%-]+)';
Add standard mailto link pattern

Than we add standart html mailto link pattern, which will match all the mailto links. Moreover, this pattern is flexible, and that is why it will not skip any real link which contains ‘mailto:’ expression:

  // pattern for html links: <a href="mailto: example@exmaple.com">Some other text</a>
  // attributes before and after 'href' do not interfere
  $mailto_pattern = '#\<a[^>]*?href=\"mailto:\s?' . $email_patt . '[^>]*?\>[^>]*?<\/a\>#';
Set up a result you need to get

And the last variable to add is the result, which we finally would like to get. It’s specific (example AT example DOT com), but very usefull to avoid any spider. Have a look:

  // rewrtite result
  $rewrite_result = '<span class="mailme">\\1 AT \\2 DOT \\3</span>';
Find and replace is simple

When variables are completely added, we are going to use simple PHP functions to find and replace necessary content. I’d like to remeber, that emails can be published in two ways – as a mailto link and as a simple emails address. Therefore we check webpage’s content two times. Let’s see the code below:

  // firstly, look for html mailto links and replace them
  $content = preg_replace($mailto_pattern, $rewrite_result, $content);

  // secondly, find stacionary emails without links and replace them too
  $content = preg_replace('#' . $email_patt . '#', $rewrite_result, $content);

As you see, PHP preg_replace function is looking for our mailto link and email patterns and replace them completely by outputting a necessary result.

I will repeat again, that we need such a specific format (example AT example DOT com) for 2 aims:

  1. To avoid spiders looking for a fresh emails to catch inside your page’s source code
  2. To revert it back into clear, valid and visible format for real visitors, not bots
PHP function which is ready to work for you

Now, when every component of a function is ready, let’s group them all:

  function tep_rewrite_email($content) {
    $email_patt = '([A-Za-z0-9._%-]+)\@([A-Za-z0-9._%-]+)\.([A-Za-z0-9._%-]+)';
    $mailto_pattern = '#\<a[^>]*?href=\"mailto:\s?' . $email_patt . '[^>]*?\>[^>]*?<\/a\>#';
    $rewrite_result = '<span class="mailme">\\1 AT \\2 DOT \\3</span>';

    $content = preg_replace($mailto_pattern, $rewrite_result, $content);
    $content = preg_replace('#' . $email_patt . '#', $rewrite_result, $content);

    // remember to add return here
    return $content;
  }

Finally, we get a wonderful and necessary result in our source code. The image below show how does the code looks like:

JQuery or Mootools, it’s your choice

When all the single text or mailto hyperlink email addresses have been rewritten to a new format, we are going to turn them back into valid links by using JQuery, which provides client side scripting and works with all the major browsers. I think the best way here would be to use HTML-Advisor method. However, there are different ways to do it, for example using Oskar’s Mootools conversion.

I’d like to note, that JavaScript is a clients side language, therefore it provides us an ability to make code modifications directly in a visitor browser. And our aim here is to workout a simple function, which will find prepared email format and replace it back into a visible and understandable links to the audience.

JQuery function that works

Completely our JQuery code should look like this:

if ( $("span.mailme").length ) {
  // variables, which will be replaced
  var at = / AT /;
  var dot = / DOT /g;

  // function, which replaces pre-made class
  $('span.mailme').each(function () {
    var addr = $(this).text().replace(at, '@').replace(dot, '.');
    $(this).after('<a href="mailto:' + addr + '">' + addr + '</a>');
    $(this).remove();
  });
}
Mootools conversion
var mailme = $$('.mailme'), at = / AT /, dot = / DOT /g;

mailme.each(function(el){
 var addr = el.get('text').replace(at, '@').replace(dot, '.');

 new Element('a', {
 href: 'mailto:'+ addr,
 html: addr
 }).inject(el, 'after');

 el.destroy();
});

As you see we have made some modification to the clients side. As a result every website visitor will see published emails as a normal links, but email spiders won’t have a dinner this night.

Modifications to the output

The last we should make, is to modificate our webpages output. I mean to find out where our CMS outputs each page’s body text (which may consist email addresses) and replace it with our premade PHP function:


  // find a string which outputs every page body text from your database
  $string = 'Our company is based in London and we bring strong metallic structures to the world. Our experience are wide and stable. To get more information about products we offer, contact sales department by email: <a href="mailto: sales@company.com" title="Sales">sales@company.com</a>. And to offer sponsorship email directly to Lisa: lisa@company.com';
  $string = tep_rewrite_email($string);

What we get is what we have made

All in all, we get a quite simple result, which will be never cathed by email spiders. And our real website’s audience would see all the published email addresses as normal. By implementing this technique into each new page you are going to build, you will stay sure, that email addresses are hiden and spam is in the past.

I see this technique very useful for every website, which is updating by persons who don’t know how to keep an email address safe. It goes very well with different CMS systems and can be implemented in a simple way.

The “free” distribution of unwelcome or misleading messages to thousands of people is an annoying and sometimes destructive use of the Internet’s unprecedented efficiency.
Bill Gates, New York Times, 1998

Comments line (11)
  1. [...] Read the original here: The best way to hide email address from spyders and bots | Manakor … [...]

  2. [...] reading here: The best way to hide email address from spyders and bots | Manakor … [...]

  3. [...] This post was mentioned on Twitter by nettuts: The best way to hide an email address from bots: http://www.manakor.org/the-bes.....om-spyders [...]

  4. [...] The best way to hide email address from spyders and bots | Manakor Coding Map™ [...]

  5. [...] reading here: The best way to hide email address from spyders and bots | Manakor Coding Map™ Tags: cloak, hide, spam, [...]

  6. Just updated email address pattern and rewrite result pattern.
    Also added a solution for Mootools fans.
    Now suits us much better, I guess. Enjoy!

    • John
    • posted on
      27 June, 2010

    Good work, Thanks for the information. It was worth reading and thinking.

  7. Nice post! I’ve written a php function myself and described it in detail at:
    http://www.maurits.vdschee.nl/php_hide_email/

    What do you think?

    • Johnny
    • posted on
      12 January, 2011

    Just what I needed. My client was nagging me for non-spider readable emails for a couple of months now. In case some spiders actualy search for something AT something DOT something, this could still be encoded and decoded in the javascript which would make it harder for the spider

  8. [...] & plugins: You can read more about it here and something older but worth checking [...]

    • Doug
    • posted on
      20 November, 2011

    I made a jquery script to hide email addresses: http://www.dougnorfolk.com.au/.....addresses/

Post your comment

* Fill in all the fields, except your webpage, to activate the button!

« small navigation between posts »
Subscribe and Bookmark Follow and Share Notebook posts categories Twitter Updates What other say about my job